Includes R scripts and bash commands (mostly based on plink2) with instructions for processing real datasets in the usual ways (download, reformatting, merging, filtering for biallelic autosomal loci, by MAF, LD pruning, and subpopulation subsets).
Noteworthy instruction files:
- 1000 Genomes high coverage (NYGC) version, n = 2504
- 1000 Genomes high coverage (NYGC) version plus trios, n = 3202
- Human Genome Diversity Panel, whole genome sequencing version
- Human Origins and Pacific merged
- Allen Ancient DNA resource
- HCHS/SOL V2 from dbGaP
Some of the rest of the files are a bit of a dump and may be obsolete (i.e. other 1000 genomes versions) but are retained as the commands remain useful and/or for reference.
This repository doesn't contain data, just code to process data.