Author Topic: Size of DNA data bases  (Read 938 times)

Offline Eric Hatfield

  • RootsChat Member
  • ***
  • Posts: 143
  • Sydney, Australia
    • View Profile
Size of DNA data bases
« on: Tuesday 06 March 18 00:00 GMT (UK) »
The DNA Geek website has published its latest estimates of the database sizes for the major DNA vendors. The graphs show an enormous increase in the number of people testing their autosomal DNA, from about 2 million 3 years ago to something 15 million today. This must increase the number of potential matches for all of us.

The current estimates are:

Ancestry - greater that 7 m
23 and me - greater than 5 m
My Heritage - greater than 1 m
FTDNA - greater than 0.7 m
Living DNA - unknown (matching not yet available though coming soon)

Two interesting thoughts.

(1) My Heritage has grown from nothing to more than a million in just over a year, which is amazing. I think their strength might be in Europe.

(2) FTDNA was one of the first in the business, and has generally provided the best tools and the full array of tests, and yet has grown at a much slower rate than the others. I think they don't advertise as much, while Ancestry advertises prolifically, and pushes ethnicity testing which is attractive to many people not all that interested in genealogical research.

But the interesting thing is, I tested with both Ancestry and FTDNA, and although the Ancestry database is perhaps ten times bigger, I had twice as many matches with FTDNA. [Clarification: this was matches at 4th cousin or better. Ancestry has thousands of matches at 5th-8th cousin, but I don't think these are very useful.] I think that is due to location - I live in Australia so most of my closest matches are Australian, and Ancestry has only entered the Australian market relatively recently. So raw size of database isn't the only factor.

RootsChat is the busiest, largest free family history forum site in the country. It is completely free to use. Register now.
Also register instantly with Facebook or Twitter (and other social networks). Start your genealogy search now.


Offline sugarfizzle

  • RootsChat Veteran
  • *****
  • Posts: 593
    • View Profile
Re: Size of DNA data bases
« Reply #1 on: Tuesday 06 March 18 05:27 GMT (UK) »
Very interesting, Eric, thanks for sharing.

I find it hard to believe that myheritageDNA has outstripped familytreeDNA.  Does this include uploads from other companies? People who upload to other sites are likely to upload to more than one, so ftDNA should benefit as much as myheritage.

Your experiences of ftDNA compared to ancestryDNA also surprises me. I have many thousands of matches at ancestry, most of whom I have not even looked at, whilst at ftDNA I have a only a few hundred - is this because I have uploaded there rather than been tested there?

Whatever the reasons, the sheer number of people being tested is amazing.  If everyone who put an online tree up at ancestry took a DNA test, how many mistakes would be rectified?  Mind you, I have found a second cousin with an incorrect tree, in spite of DNA matching with me she is reluctant to change her tree. The proof is there for her to see.

Thanks for posting.

Regards Margaret
STEER, mainly Surrey, Kent; PINNOCKS/HAINES, Gosport, Hants; BARKER, mainly Broadwater, Sussex; Gosport, Hampshire; LAVERSUCH, Micheldever, Hampshire; WESTALL, London, Reading, Berks; HYDE, Croydon, Surrey; BRIGDEN, Hadlow, Kent and London; TUTHILL/STEPHENS, London
WILKINSON, Leeds, Yorkshire and Liverpool; WILLIAMSON, Liverpool; BEARE, Yeovil, Somerset; ALLEN, Kent and London; GORST, Liverpool; HOYLE, mainly Leeds, Yorkshire

Census Information is Crown Copyright, from www.nationalarchives.go

RootsChat is the busiest, largest free family history forum site in the country. It is completely free to use. Register now.
Also register instantly with Facebook or Twitter (and other social networks). Start your genealogy search now.


Offline Eric Hatfield

  • RootsChat Member
  • ***
  • Posts: 143
  • Sydney, Australia
    • View Profile
Re: Size of DNA data bases
« Reply #2 on: Tuesday 06 March 18 07:05 GMT (UK) »
Hi Margaret,

In answer to your questions .....

1. The graph says it is number of testers, so if that is accurate, the databases of My Heritage and FTDNA could be significantly larger. But since the graph includes Gedmatch, which doesn't test, then that makes me wonder if it is actually database size. So it isn't clear. But, yes, I was surprised that My Heritage had grown so fast.

2. I should have been more specific about matches and I have corrected my previous post. I was referring to matches at 4th cousin or better, because I think they are the main ones, perhaps the only ones, likely to give me any useful information. For them, I have currently 226 matches on FTDNA and 115 on Ancestry. If I counted all the matches listed, and this really depends on their respective cutoffs, I have 2210 on FTDNA and 17,900 on Ancestry, which is obviously much much more.

3. My cousin tested at Ancestry and we uploaded to FTDNA, and he has only 369 matches, but they are all 4th cousin (listed as 3rd-5th cousin) or better, which is better than I have. So it must be that FTDNA only shows uploaded results to that level - perhaps if you pay to unlock the tools you'll get extra matches as well?

4. Yes, it is frustrating finding people who have no trees, or who don't seem to be interested. But even so, the sheer number testing is such an asset.

Offline sugarfizzle

  • RootsChat Veteran
  • *****
  • Posts: 593
    • View Profile
Re: Size of DNA data bases
« Reply #3 on: Tuesday 06 March 18 07:35 GMT (UK) »
Eric, Thanks for that.

The graph says database size (4)

Clicking on (4) it does say number of people tested. But have I been tested at ftDNA? My DNA has been uploaded there, and has been tested there in one sense, but the test kit wasn't purchased from them.

As for matches, I currently have 130 4th to 6th cousins or closer at ancestry, compared to 138 3rd to 5th cousins or closer at ftDNA, so broadly similar. The number at ancestry increases almost daily, the number at ftDNA only very slowly.

At ancestry I have made definite connectIons to 48 testers, half of them 4 to 6 cousins, half of them 5 to 8 cousins.
By contrast I have made no definite connectIons at all at ftDNA, not even from 2 to 4 cousins.

Don't ignore your 5 to 8 cousins at ancestry - use different methods for searching them, such as surname or place searches.

As you say, the country may have something to do with the number of matches at different sites, I am from UK. Also less people seem to have trees at ftDNA than ancestry.

Regards Margaret
STEER, mainly Surrey, Kent; PINNOCKS/HAINES, Gosport, Hants; BARKER, mainly Broadwater, Sussex; Gosport, Hampshire; LAVERSUCH, Micheldever, Hampshire; WESTALL, London, Reading, Berks; HYDE, Croydon, Surrey; BRIGDEN, Hadlow, Kent and London; TUTHILL/STEPHENS, London
WILKINSON, Leeds, Yorkshire and Liverpool; WILLIAMSON, Liverpool; BEARE, Yeovil, Somerset; ALLEN, Kent and London; GORST, Liverpool; HOYLE, mainly Leeds, Yorkshire

Census Information is Crown Copyright, from www.nationalarchives.go

Online familydar

  • RootsChat Veteran
  • *****
  • Posts: 630
    • View Profile
Re: Size of DNA data bases
« Reply #4 on: Tuesday 06 March 18 08:18 GMT (UK) »
Somewhere in the ftdna settings you can tweak the closeness of match.  I can't remember their exact terminology offhand, but you can limit matches to the equivalent of very close, medium or distant.  The loosest match setting will probably give you thousands but the majority of them will be at such low cm values they're probably spurious.

It is possible these settings are only accessible to people who tested with them rather than uploaded their data, I fall into the former category so don't know for certain.

Jane :-)
ALLEN
BARR, BARRATT, BERRY, BRADLEY,BRAMLEY,BRISTOW,BROWN,BUGBIRD,BUTLER
CAIN,CARR,CHAPMAN,CHARLES,CH*LTON,CHESTER,COCKETT
COLLASON,COLLYER,CORKERY
DARLING, DENYER,DICKERSON,DOLLING,DURBAN
FARMER,FURNELL
GIBSON,GILES,GROOMBRIDGE
HALL,HAMBIDGE,HARMES,HART,HICKS,HILL,HOLLOWAY
JACKSON
K*AT*S
LANCASTER,LINTON
MCDONALD,MCFADEN,MEARS,MILLARD
NICOLAS,NOAK,NORTH
PARFIT,PORTER
RIPPINGALE,ROBINS
SEARLE,SPENCER,STEDHAM
TYLER,TILLY,TUCKWELL
WADE,WAGER,WALKER,WATSON,WEBB,WITHRINGTON,WOOD

Online Guy Etchells

  • RootsChat Marquessate
  • *******
  • Posts: 3,481
    • View Profile
    • Anguline Research Archives
Re: Size of DNA data bases
« Reply #5 on: Tuesday 06 March 18 08:21 GMT (UK) »
The DNA Geek website has published its latest estimates of the database sizes for the major DNA vendors. The graphs show an enormous increase in the number of people testing their autosomal DNA, from about 2 million 3 years ago to something 15 million today. This must increase the number of potential matches for all of us.

The current estimates are:

Ancestry - greater that 7 m
23 and me - greater than 5 m
My Heritage - greater than 1 m
FTDNA - greater than 0.7 m
Living DNA - unknown (matching not yet available though coming soon)

Two interesting thoughts.

(1) My Heritage has grown from nothing to more than a million in just over a year, which is amazing. I think their strength might be in Europe.

(2) FTDNA was one of the first in the business, and has generally provided the best tools and the full array of tests, and yet has grown at a much slower rate than the others. I think they don't advertise as much, while Ancestry advertises prolifically, and pushes ethnicity testing which is attractive to many people not all that interested in genealogical research.

But the interesting thing is, I tested with both Ancestry and FTDNA, and although the Ancestry database is perhaps ten times bigger, I had twice as many matches with FTDNA. [Clarification: this was matches at 4th cousin or better. Ancestry has thousands of matches at 5th-8th cousin, but I don't think these are very useful.] I think that is due to location - I live in Australia so most of my closest matches are Australian, and Ancestry has only entered the Australian market relatively recently. So raw size of database isn't the only factor.

The above figures show why DNA testing is not really worth while for genealogy yet.
If we ignore for the moment that data may be duplicated (for instance MyHeritage will allow users to import their raw data from FTDNA, AncestryDNA and 23andMe) or shared between the various companies by individuals and imagine the above figures all represent unique tests how does their size compare to populations?

For example the Population of various cities in 2016 was London 8.7 million; Paris 9.7 million and New York city 8.5 million.
The fact that any two of the above cities contain populations in excess of the combined datasets of all the DNA companies puts the claims of those companies into proportion.

If we talk about populations of countries in 2016 rather than cities we can see that DNA datasets are still only a pin prick in the population figures in real terms are porportionally insignificant
UK 65.6 million ; France 66.9 million and USA 323.1 million.

Cheers
Guy
http://anguline.co.uk/Framland/index.htm   The site that gives you facts not promises!
http://burial-inscriptions.co.uk Tombstones & Monumental Inscriptions.

As we have gained from the past, we owe the future a debt, which we pay by sharing today.

Offline Eric Hatfield

  • RootsChat Member
  • ***
  • Posts: 143
  • Sydney, Australia
    • View Profile
Re: Size of DNA data bases
« Reply #6 on: Tuesday 06 March 18 10:21 GMT (UK) »
Thanks for your thoughts, Margaret, I am interested to explore a little more, please, some of the matters you mention.

Quote
The number at ancestry increases almost daily, the number at ftDNA only very slowly.
Have you paid to "unlock" your results and the tools at FTDNA? If not (which is the case with my cousin), then I think that is why you aren't getting many new ones, because you only see up to 3rd-5th cousins. If you have paid, then that kills that hypothesis.

Using both my kits as a comparison, I received my Ancestry results in June last year, and in the 9 months since then, my 4th cousin or better matches have grown from about 80 to 115 (i.e. +35). In the same period, my FTDNA  3rd-5th cousin matches have grown from to 162 to 226 (i.e. +64), so FTDNA is doing almost twice as well.

Quote
At ancestry I have made definite connectIons to 48 testers, half of them 4 to 6 cousins, half of them 5 to 8 cousins.
By contrast I have made no definite connectIons at all at ftDNA, not even from 2 to 4 cousins.
Again, my experience is different (so it is good to compare notes). My most useful match was on Ancestry, but I have had many more matches I have found helpful on FTDNA. And I find the analysis tools far better on FTDNA, but of course the trees on Ancestry are very useful - except that most matches don't have trees. at least not yet.

Quote
Don't ignore your 5 to 8 cousins at ancestry - use different methods for searching them, such as surname or place searches.
How do you use the surname or place search, and what have you learned from them? (I haven't tried this much.)

Offline Eric Hatfield

  • RootsChat Member
  • ***
  • Posts: 143
  • Sydney, Australia
    • View Profile
Re: Size of DNA data bases
« Reply #7 on: Tuesday 06 March 18 10:24 GMT (UK) »
Hi Jane,

Quote
Somewhere in the ftdna settings you can tweak the closeness of match.  I can't remember their exact terminology offhand, but you can limit matches to the equivalent of very close, medium or distant. 
You can use that searching in the chromosome browser (available only to paying customers), though I have never bothered. In the main match list, I don't think we need that because we can sort on relationship and so see all the closest matches first.

Offline Eric Hatfield

  • RootsChat Member
  • ***
  • Posts: 143
  • Sydney, Australia
    • View Profile
Re: Size of DNA data bases
« Reply #8 on: Tuesday 06 March 18 11:18 GMT (UK) »
Hi Guy,

Quote
The above figures show why DNA testing is not really worth while for genealogy yet.
You have said this before, but it simply isn't true, for most people at any rate.

Yes, the number of people who have tested is small, but consider:

1. If we consider just 4th cousins or better, which is 5 generations to the common ancestor, each tester will have 63 ancestors (except in endogamous populations). If there are 10 million testers (assuming 5 million are doubled up), then there are potentially 630 million ancestors. Now of course many of those will be multiples also, which is exactly what we want, say half of them = 315m. Virtually everyone I am connected to is in USA (330 m), UK (67m), Canada (38 m) and Australia (25m), a total of 460m. So 630m or 315m ancestors is looking pretty reasonable.

2. If we consider any one of our pairs of ancestors, the ones 5 generations back could easily have several thousands of descendants today. If we make some assumptions, for the purpose of the exercise, of how many children each couple had and how many of them had children, it is possible to make a calculation for each pair.  Assuming only 7, 6, 5, 4 and 3 children for the generations, we'd get roughly 2400 descendants for each 5th generation couple, 360 for 4th, 60 for each 3rd, 12 for each 2nd and 3 for our parents' generation. Multiply that by the number of couples in each generation and the total number of present day descendants of all our ancestors = 16 x 2400 + 8 x 360 + 4 x 60 + 2 x 12 + 1 x 3 = 41,500. If I have done the calculation correctly, that is a very approximate estimation of the total number of possible 4th cousin or better matches any of us have.  If we included out to 8th cousins as Ancestry does (which I think is not generally very useful), then the number would be absolutely enormous. So there is no shortage of potential matches. Of course I don't pretend that these figure are any more than notional, but they are illustrative.

3. And so it is no surprise that I have several hundred (only a few are repeats) 4th cousin matches on Ancestry and FTDNA out of the possible 40 thousand, or whatever the figure is, and 17 thousand Ancestry matches overall.

4. But the real proof, which you seem to have not considered, is that people are finding relatives they couldn't find any other way - adoptees, people with uncertain parentage, people whose ancestors' paper records are lost, etc. If you check out adoptee websites, you'll find plenty of success stories - and a few disappointments too!

5. In my own situation, both my maternal grandparents were of unknown origin due to an adoption, possible false names, no record of father's name, etc. DNA has enabled me to solve one of the mysteries and I have high hopes of resolving the other one day.

So a rough estimate of numbers and the real experience of many people shows that DNA is a great boon to genealogy. It doesn't solve everything of course, and it generally requires a lot of work to be done, but for many of us it is absolutely essential.