Author Topic: Level of False Positives on Ancestry and MyHertiage  (Read 2054 times)

Offline ikas

  • RootsChat Senior
  • ****
  • Posts: 309
  • Census information Crown Copyright, from www.nationalarchives.gov.uk
    • View Profile
Re: Level of False Positives on Ancestry and MyHertiage
« Reply #18 on: Friday 19 July 24 11:39 BST (UK) »
Thanks for the link Southsea Steel. A really interesting article. Very surprised at the figure for 10cM matches ie 15% chance of being false. That means 85% chance of being IBD.

Offline melba_schmelba

  • RootsChat Aristocrat
  • ******
  • Posts: 1,789
  • Census information Crown Copyright, from www.nationalarchives.gov.uk
    • View Profile
Re: Level of False Positives on Ancestry and MyHertiage
« Reply #19 on: Monday 29 July 24 13:06 BST (UK) »
MyHeritage has a very high level of false positives. I posted a link here some time ago to someone who had gone through their own and parent's matches, some very large % did not match his own parents that matches him, I think it might have been over 40% under 40cM or so. MyHeritage has the same problem that GEDMATCH and other sites have in accepting uploads from different providers which use different chips at different times, which test parts of the DNA which is some cases only overlap by a small amount. So GEDMATCH and MyHeritage use algorithms to guess, which unfortunately mostly seem to produce nonsense matches. Ironically I think you may be more likely to get accurate matches on MH by uploading Ancestry DNA, as that better matches the original FTDNA/MH SNPs than the new GSA chip that MH and FTDNA have used since 2019 (23andme since 2017). Same applies for GEDMATCH whose database still use the SNPs of the original FTDNA kits, which produce wildly nonsensical matches for most GSA kit uploads.

Offline melba_schmelba

  • RootsChat Aristocrat
  • ******
  • Posts: 1,789
  • Census information Crown Copyright, from www.nationalarchives.gov.uk
    • View Profile
Re: Level of False Positives on Ancestry and MyHertiage
« Reply #20 on: Monday 29 July 24 13:15 BST (UK) »
Here is my post where I summarized this person's findings (unfortunately the website is defunct)

https://www.rootschat.com/forum/index.php?topic=850885.msg7190496#msg7190496

"This person did a detailed analysis of their matches compared to the parents, and concluded that any match that has segments of below 26cM in size may be IBS, and that 1/3 of those with segments under 17.6cM may simply be false matches possibly due to the MyHeritage algorithm not working correctly. Note though, that these were all LivingDNA uploads which test different but overlapping parts of the genome which may make the problem worse. So it might be that MyHeritage-MyHeritage matching is more accurate, but unfortunately I don't think the matches are marked with their origins (unlike GEDMATCH) so there is no way of judging this."

Offline melba_schmelba

  • RootsChat Aristocrat
  • ******
  • Posts: 1,789
  • Census information Crown Copyright, from www.nationalarchives.gov.uk
    • View Profile
Re: Level of False Positives on Ancestry and MyHertiage
« Reply #21 on: Monday 29 July 24 17:19 BST (UK) »


Offline 4b2

  • RootsChat Member
  • ***
  • Posts: 123
  • Census information Crown Copyright, from www.nationalarchives.gov.uk
    • View Profile
Re: Level of False Positives on Ancestry and MyHertiage
« Reply #22 on: Monday 29 July 24 17:50 BST (UK) »
In case you were wondering!
https://isogg.org/wiki/Identical_by_state

Do you have any insight into whether false positives don't form themselves into clusters. For example, on Ancestry you will have a cluster of matches, where many members overlap with each other. If you research the trees you will often find a common ancestor. Even if you can't find your link to that. I'd say of my best clusters I can find a common ancestor in up to about 40% of matches down to 8cM. Obviously we can't identify all of them, and then another 20% of so will be via NPEs. So, I think over all generations we can assume at least two thirds of clusterable Ancestry matches down 8cM are real relatives.

My guess is that when the overlap is via coincidence, rather than ancestry then it would be much less likely to show up as a shared match in a cluster. Maybe these inherited by coincidence are formed of DNA from more than one line of descent that just happen to be equal to someone else. It would make sense, if Ancestry are able to strip out many false positives.

Offline melba_schmelba

  • RootsChat Aristocrat
  • ******
  • Posts: 1,789
  • Census information Crown Copyright, from www.nationalarchives.gov.uk
    • View Profile
Re: Level of False Positives on Ancestry and MyHertiage
« Reply #23 on: Monday 29 July 24 18:25 BST (UK) »
In case you were wondering!
https://isogg.org/wiki/Identical_by_state

Do you have any insight into whether false positives don't form themselves into clusters. For example, on Ancestry you will have a cluster of matches, where many members overlap with each other. If you research the trees you will often find a common ancestor. Even if you can't find your link to that. I'd say of my best clusters I can find a common ancestor in up to about 40% of matches down to 8cM. Obviously we can't identify all of them, and then another 20% of so will be via NPEs. So, I think over all generations we can assume at least two thirds of clusterable Ancestry matches down 8cM are real relatives.

My guess is that when the overlap is via coincidence, rather than ancestry then it would be much less likely to show up as a shared match in a cluster. Maybe these inherited by coincidence are formed of DNA from more than one line of descent that just happen to be equal to someone else. It would make sense, if Ancestry are able to strip out many false positives.
I (and I assume most people) have many clusters I can't place (yet) :). In fact, using the new Ancestry Pro Tools it is very much easier to find them. Previously you were limited to searching for the same name and places on linked trees and hoping for the best! Most of them are in America, and likely just reflect a couple of things - (i) A much higher percentage of the US population has done DNA tests than anywhere else, and (ii) Since US censuses don't list exact birthplaces, either in the US, or abroad (although they do list the state for US born people), many earlier baptisms or births have left no record, it can be very difficult for Americans to trace back to the correct line in the UK or elsewhere. A few states have death certificates that help by naming both parents names, but in most cases these only came later at the end of the 19th century or later so won't help with earlier immigrants. In fact DNA may be the only way in some cases that it will be possible to work out a link. But it depends whether someone has already done that in depth investigation to do that whether they have made the link back to the UK that might then fit into your tree. Hopefully with these Pro Tools it might make it possible for more of our cousins over the Atlantic to do so!

Offline 4b2

  • RootsChat Member
  • ***
  • Posts: 123
  • Census information Crown Copyright, from www.nationalarchives.gov.uk
    • View Profile
Re: Level of False Positives on Ancestry and MyHertiage
« Reply #24 on: Tuesday 30 July 24 00:28 BST (UK) »
I (and I assume most people) have many clusters I can't place (yet) :). In fact, using the new Ancestry Pro Tools it is very much easier to find them. Previously you were limited to searching for the same name and places on linked trees and hoping for the best! Most of them are in America, and likely just reflect a couple of things - (i) A much higher percentage of the US population has done DNA tests than anywhere else, and (ii) Since US censuses don't list exact birthplaces, either in the US, or abroad (although they do list the state for US born people), many earlier baptisms or births have left no record, it can be very difficult for Americans to trace back to the correct line in the UK or elsewhere. A few states have death certificates that help by naming both parents names, but in most cases these only came later at the end of the 19th century or later so won't help with earlier immigrants. In fact DNA may be the only way in some cases that it will be possible to work out a link. But it depends whether someone has already done that in depth investigation to do that whether they have made the link back to the UK that might then fit into your tree. Hopefully with these Pro Tools it might make it possible for more of our cousins over the Atlantic to do so!

Yes. Pro Tools is proving invaluable. I have one match who I think represents one half of a relationship via infidelity. The match is about 100cM, so expecting around a 3rd cousin. The only thing I have to go on is their name, which is a common forename + the surname Evans. But with Pro Tools, I can see two common matches are closely related. So I can pad their trees out down and probably find out who that key match is, with a very good idea who the common ancestor is. Plus, I've already found about 10 matches with common ancestry in that cluster. So there is a good amount to work with.

With the US, most of my ancestry is Welsh, and thus many relatives began moving to the Welsh areas in places like Pennsylvania and Ohio from around 1800. So, I do find quite a few matches where I can see where the connection comes from, where people have US dead-end ancestors where it's something like Owen Davies born 14 Mar 1831 in Wales and his wife Grace Thomas born 18 May 1834 in Wales. Since, as you note the census doesn't list the place of birth, and even if the death certificate does list the place of death and parents, many don't know that or don't know where to find it.

As noted in another thread, I download all my matches, from my now 23 tests into a database and then compare related ones against each other. Pre-Pro Tools this allowed me to pick out some much deeper matches that you couldn't get a handle on using the Shared Matches tab. If you have ten related people, they will all obviously have inherited portions of DNA that reveal links others do not. This is useful as you end up with various unknown clusters from each match. This is a quick way to show which of those clusters overlap with other matches and are thus probably relevant.

As an example, one test has  a cluster of about 200 matches with 30cM being the highest. Some of those matches appeared in three other tests, but all below the 20cM threshold. And now with Pro Tools you can see that it's the same people clustering with all of them. If I look at the shared matches it's mostly people assigned to the same cluster. Conversely, when I now look at teen-level cM (13-19cM) matches that previously showed nothing under Shared Matches, I can see that many of them have no discernible common group. You can get a match who has common matches from four or so groups. In such cases I am inclined to think they are more likely a false positive. If they overlap with lots of people from the same group and you can see many people in it with shared ancestry, I would have thought it's probably not a false positive. Incoherent matches or no matches and I'd be inclined to believe it's a false positive.

I base that also on using MyHeritage, where there are people in the 40-50cM range who have no shared matches with overlapping segments and a tree where you can see no link.

Offline 4b2

  • RootsChat Member
  • ***
  • Posts: 123
  • Census information Crown Copyright, from www.nationalarchives.gov.uk
    • View Profile
Re: Level of False Positives on Ancestry and MyHertiage
« Reply #25 on: Tuesday 30 July 24 00:33 BST (UK) »
To add to that £7.50/ month is a lot for Pro Tools. As a programmer I can say the cost of offering the Pro Tools is around £0. For this price - £90 / year - they really should add a tool (like Thru Lines) that shows any ancestor in match trees that occurs more than once, extending the trees using their grouping of suspected duplicate individuals in member trees. Invaluable. Huge time saver. Auto-clustering of some sort would also be useful. With Pro Tools I can see that there are clusters of common matches below 20cM that you can assign as reliably as ones above 20cM. Not sure if the whole cluster could be a false positive.

Offline Biggles50

  • RootsChat Aristocrat
  • ******
  • Posts: 1,279
    • View Profile
Re: Level of False Positives on Ancestry and MyHertiage
« Reply #26 on: Tuesday 30 July 24 11:16 BST (UK) »
Pro Tools is a dip in and exit add on.

Subscribe then cancel and just use it for a month.

Wait a few more months and subscribe again.

As it is it has just proved useful.

My Wife has a DNA match of 171cM and we know that he is the Grandson of her Maternal Cousin but he matches Paternal DNA matches that my Wife has, and as the two sides were seperated by 300 miles there is no pedigree collapse in recent times.

Now we know that the guess work we made in his tree is in error and now we can look at the shared matches branches to see if we can find the elusive Paternal links.

So this month’s subs were worth it.