Population Genetic Statistics for the DIPnet dataset

Added April 20, 2015 by Eric Crandall

These scripts are intended for use with the DIPnet dataset that will be finished on May 1, 2015. We plan to convert these scripts eventually into an R package that will talk with the DIPnet database as stored in the FIMS framework to calculate multispecies population genetic statistics. The way it currently works is that all the functions are stored in DIPnet_Stats_Functions.R and a script for driving those functions is in DIPnet_Stats_Script.R.

You'll want to change the paths in config_example.R to values that point to local versions of the DIPnet DB and spatial DB and then save that file as config.R. Similarly, you can create an empty directory "/output" save whatever results you get to it. These files won't be synced (pushed or pulled) with GitHub because they are included in the .gitignore file.

There are major functions for looping over the whole database and calculating:

  1. Genetic diversity statistics

    1. Unique Haplotype Number
    2. Haplotype Diversity
    3. Shannon-Wiener Diversity
    4. Effective number of haplotypes
    5. Local FST
    6. Nucleotide Diversity (per site and per locus)
    7. Watterson's ThetaS
    8. Tajima's D
    9. Coverage (Chao and Jost 2012)
    10. Coverage standardized haplotype diversity
  2. Genetic structure statistics

    1. Nei's Fst
    2. Nei's Gst
    3. Hedrick's G'st
    4. Jost's D
    5. Weir & Cockerham's Theta
    6. Excoffier et al. PhiST
    7. Chi-Square Test of Population Differentiation
    8. Nei's Da
  3. Hierarchical Structure using AMOVA (up to 3 levels currently supported)