Thursday, June 16, 2011

Top 10 things to do with your FTDNA raw data

So in the latest installment of "Lindsay needs to get a life and to stop playing with her FF results" (aka my series on the Family Finder DNA test)...

I want to provide some cool tools and programs I've discovered that are not only FREE but can give you even more insight into both your DNA and that of your biological father's.

So this is my Top 10 list of tools, programs, and manipulations that can enhance your DNA testing experience.

But first you need to download your raw data. I've created a very simple tutorial here on how to do this.


1. Eurogenes Biogeographic Ancestry Project
This project is one of several ADMIXTURES out there that are individuals who take raw FTDNA/23andMe results and attempt to place them (in relation to the other individuals in the project) with the breakdown of their deep ancestry.

If you are interested in submitting your raw samples to this project, email the unzipped (.csv) files of both your autosomal and X-DNA raw data to eurogenesblog [at] hotmail [dot] com, and include your known ancestry (at least the ethnic groups of your 4 grandparents - if able).  They will respond with an ID # that you will use later to determine where your results lie on the BGA charts.

I just joined this project and I am "US238" so hopefully I will show up in some upcoming BGA charts.

2. Doug McDonald's BGA Analysis
Doug McDonald is a Chemistry professor at University of Illinois.  In his spare time he takes raw data from FF and 23andMe and does "chromosome painting".  This is a process (using a software program he developed) that can identify regions of your DNA that correlate to specific ethnic groups.  This a great addition to FF's Population Finder, because McDonald's analysis can help to narrow down specific ethnic groups where the PF has vague results.  It can be especially helpful if you have a tiny bit of say African, Asian, or Middle Eastern in your PF results, to narrow down exactly what ethnic group it comes from.  It can also be helpful because the PF results have difficulty differentiating between individuals whose ancestors were from Southern Europe, the Middle East, and North Africa.

If you are interested in having a free BGA analysis, Doug's contact information can be found here.  He only responds to email requests and often has a backlog and may not be taking on new people, so be prepared to wait for your results.

UPDATE [6/22/2011]: I received my BGA analysis from Dr. McDonald in about 36 hours from when I sent him my files, so it appears that currently he has no backlog.


3. Dienekes Ponikos' Dodecad Ancestry Project
The Dodecad Project is another ADMIXTURE program like Eurogenes.  The focus of this project is on under-represented European ethnic groups (mostly southern European and Middle Eastern, and some Scandinavian).

This project currently has an open-ended submission opportunity for 23andMe and FF results.  However, it is only  limited to individuals who are of European, Asian, or North African ancestry and all 4 of their grandparents are from the same European, Asian, or North African ethnic group or country.

Dienekes has an older program, that he himself does not support anymore (it was superseded by the Dodecad Project) but for anyone interested, it's called EURO-DNA-CALC.  It's an old program and has some very notable errors.  Mainly that it is unable to distinguish many Southern Europeans from Ashkenazi Jewish populations, so many individuals get over-inflated or downright wrong assumptions of Jewish heritage.  For example, my results on this test came out to:
65% Northwestern European
18% Southeastern European
17% Ashkenazi Jewish
So it's reading my Armenian DNA as Jewish because it's unable to differentiate between Mediterranean/Middle Eastern and Jewish ancestry.


4. Y-Search
I know that I have discussed the Y-Search database previously.  This is the one raw data tool that is not for autosomal results (as well as it's mtDNA counterpart MitoSearch).  This is a free public database for individuals who have tested with different companies for their Y-DNA, to upload their raw data and find matches.

5. GEDMatch
GEDMatch is the autosomal DNA equivalent of Y-Search and MitoSearch.  It is a free program that you upload your results to and it provides your "matches" and their email addresses.  The only caution on this program is that their thresholds are much lower than that of FTDNA and 23andMe so take results with a grain of salt.  If there is no number under the "Gen" column, it's probably not a viable match.

This is a great opportunity to see if there are potential matches who may have tested with 23andMe.  There is also a triangulation function, where after uploading your FTDNA matches you can see what matches overlap.  The other plus is that you can click on a match and see what their results are, so it's simple to identify matches in common.

6. Converting FF data to 23andMe format
From some of the extremely helpful members of the FTDNA Forums, here's how you convert your FF raw data to the 23andMe format:
# You autosomal Family Finder data has a csv.gz extension, i.e., it is a comma-delimited GZIP-compressed file. You should use an suitable program (e.g., Winrar or Winzip) to extract the csv file into the same directory as in step #1.# Open the csv file in any text editor (Word or Wordpad should work fine).# Remove the header (RSID,CHROMOSOME,POSITION,RESULT) at the top of the file# Replace all quotes (") with nothing.# Replace all commas (,) with tabs.# Replace all missing value characters (-) with the character m.# Save the file in the same directory as 23andme.txt. You've just converted your Family Finder data into a format that mimics that of 23andme.
This is necessary for some of the programs, such as Enlis and Genomera, that do not work in the FTDNA format.

7. Interpretome
Interpretome is the newest tool listed here, literally just released in the past few weeks.  It's another program that was designed for 23andMe customers, but some aspects will work with FTDNA raw data (notably the "Ancestry" section).

There are two tools within Interpretome, under the Ancestry heading, that are really great.  You just upload your raw data and select the population group that most likely resembles you.

First is chromosome painting, though with the limited sample sets most people are going to come up as 100% European.

The second tool, and the one I found the most useful (though I'm still learning how to use it), is the PCA - Principal Component Analysis.  You can select different sample sets (World, Asian, African, European, and Middle Eastern/Jewish) and it will map your data onto the data of that sample set.  You can also select how many SNPs are looked at.  1,000 will give you the fastest results, but 100,000 is the most detailed and likely the most accurate.  The best "dimensions" to use for Europeans are:
X-axis: PC4
Y-axis: PC1

This will give results that are very geographically representative of Europe.  But I'm still not 100% sure what the dimensions are for and what they do.

It seems that the best sample set is the POPRES: European set.  But it's far from a perfect program.  My world results peg me in-between Near Eastern (okay, that one makes sense) and African (woah?!?).  Not sure where the 75% European part of my DNA went.

8. Promethease
If you're contemplating doing the 23andMe test solely for the medical information, let me let you in on a little secret.  Promethease is a free downloadable program through SNPedia that you upload your raw data to and it provides you with your very own website that has many SNPs that have been shown in various (some not very reliable b/c of small sample size or wrong population group) medical conditions and other traits. Some I've found are dead on.  Others I wonder as to their accuracy.  But it's a fun tool to play around with.  For $2 the processing time goes from 2-4 hours to 10 minutes and there are some special features that are unlocked.

The Mac version of the program worked just fine for me, but I don't know how the PC version is.

FTDNA has fixed their autosomal raw data so that works fine in Promethease, but the X-DNA raw data is still in the wrong format and will display incorrect results.  Just FYI.

9. David Pike's Runs of Homozygosity (ROHs) calculator
Wonder if your mom and biological father might be distantly related?  Looking at Runs of Homozygosity can give you an idea as to how much of your DNA is the same from both your maternal and paternal side.

This program must be run on Firefox.

NOTE: Do not look at the % homozygosity.  Most individuals are around 70% homozygous.  Instead at the top of your report will be chromosomes with # of SNPs and Mb size of "runs" of homozygous SNPs in a row.  This is what is important.  If you have runs of SNPs that are 5, 10, 15Mb (megabases) long, THAT indicates that your parents perhaps are from similar ethnic backgrounds or similar ancestral makeups and may be very distantly related.

My longest "run" was only 5Mb, and all the rest were under 2, so that suggests (as I already pretty much knew) that my mother and biological father are from very different ethnic backgrounds.

10. Enlis and Genomera
And lastly rounding out the top 10, I confess, I am adding this because I needed 10 for my top 10.  These two programs are the only ones I have not either researched or participated in, simply because I have not had the time to format my FTDNA raw data to 23andMe format.  Also, I'm not clear if they are only free demos and then you have to pay.  I've heard mixed reviews about these programs, so at least for now these are my least recommended (along with the EURO-DNA-CALC program).

6 comments:

Anonymous said...

Lindsay,

The New York Times today had an op/ed you may be interested in reading, "A Father’s Day Plea to Sperm Donors."

I'm surprised the mainstream media would run something like this, since it's usually only the mother's perspective they let through. Things are changing.

- Days of Broken Arrows

Anonymous said...

Fantastic. I just submitted my Family Finder + mtDNA kit, and I'm in batch 417. Great tips for when I get my results. I'm an adoptee, not DC.

Anonymous said...

Best wishes in your search. I'm adopted to my father and have been searching for years. It's a hack of an emotional rollercoaster ride.

MEJ said...

interesting finding you, i hope you find your roots!

MEJ said...

good luck finding your roots, i hope it works out, your situation reminds me of the who song, trick of the light

C Miller said...

I used the ROH tool to analyze my raw data from Ancestry. I got several very long runs - 6.07 MB (on Ch20), 13.75 MB, 34.78 MB, 14.95 MB, 6.19 MB, 28.43 MB, 11.55 MB, 9.36 MB, 8.74 MB (all on ChX), and 56.23 MB (on ChY). Any idea what this means?