Dear All
I am wondering where to download hg19 reference files. I need to map my illumina reads to hg19 by using BWA.
All your help will be appreciated.
Thanks, Thaley, I just found that page two. Here is a question, how to use twoBitToFa to convert hg19.2bit to hg19.fa?
I just tried
./twoBitToFa hg19.2bit hg19.fa
but it said "Floating point exception"..
Hmm.. You followed the directions on UCSC for the tool - build the source, etc?
Honestly, I got my references in .fa format before they started using this 2bit format. Sorry I can't be more help.
Off hand, I would double check the downloaded file to make sure it's not truncated and be sure the source for 2bit is building successfully.
...or if someone knows of an alternate location to get the .fa files, that would be the easiest.
I used the 1000 genomes hg19 reference sequence from:
ftp://ftp.sanger.ac.uk/pub/1000genom...k_v37.fasta.gz
They already have the haplotype chromosomes removed.
mard:
Thanks for your response! Is this 1000 genome hg19 reference sequence different from that one from UCSC? All the files I have been using were downloaded from UCSC and I hope there won't be any discrepancy between those different versions of hg19.
Thanks
Hi cliff,
according to
ftp://ftp.1000genomes.ebi.ac.uk/vol1...k_v37.fasta.gz
the 1000 genomes hg19 reference was built as follows:
Quote:
10th October 2009
Here are the steps used to produce this version of the human reference sequence to be used for the
main production project of the 1000 Genomes.
1. Download individual chrs from ensembl ftp
ftp://ftp.ensembl.org/pub/current_fa...o_sapiens/dna/
2. Download the newer version of the MT (NC_012920) from:
http://www.ncbi.nlm.nih.gov/nuccore/251831106
3. Create a reference with chrs1-22, X, Y, NC_012920 MT, and include the non-chromosomal supercontigs. The new single fasta is posted:
ftp://ftp.sanger.ac.uk/pub/1000genom...ect_reference/
Note on chrM
Since the release of the UCSC hg19 assembly, the Homo sapiens mitochondrion sequence (represented as "chrM" in the Genome Browser) has been replaced in GenBank with the record NC_012920. We have not replaced the original sequence, NC_001807, in the hg19 Genome Browser. We plan to use the Revised Cambridge Reference Sequence (rCRS) in the next human assembly release.
Besides UCSC's older version of the mitochondrion sequence and in the included haploids, the 1000 genomes reference should be identical to UCSC.
Cheers