Reference Genomes
In this module, we will learn:
- what a reference genome is and what it contains
- details about the FASTA and GTF formats
- to appreciate the differences in gene identifiers
- how to download a reference genome
Differential Expression Workflow
Here we will set the stage for the next steps by discussing reference
genomes, which are integral to genome alignments and gene/isoform
quantification. Along the way we will touch on some things to be aware
of.
Reference Genomes
A reference genome consists of the reference
sequence and, optionally, any number of genomic
annotations that describe attributes about that sequence.
Examples of annotations include:
- Gene models consisting of the location and other information about
genes.
- Variants consisting of the location of common or rare genetic
variants, their alleles, and frequencies.
- Small RNAs consisting of the location and other information about
various types of small RNAs.
Of particular relevance to us for this workshop are the reference
sequence and gene models.
Reference Sequence
Reference sequence is stored in FASTA files. They
are similar to FASTQ files in their storage of sequence information, but
their format is a little different in a couple ways:
- Records are separated by lines beginning with
>
instead of @
.
- Only the sequence is stored in a FASTA file, there is no notion of
quality attached to the nucleotides.
>chrM
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT
TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG
GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT
CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA
AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT
GTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAAAATTTCCACCA
AACCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGC
CAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAAT
TTTATCTTTAGGCGGTATGCACTTTTAACAGTCACCCCCCAACTAACACA
Gene Models
Well-characterized organisms (e.g. human, mouse, zebrafish) have
fairly mature gene models. These are stored in GTF
format, which gives location and other information about each gene
feature. Below are two examples:
chr1 unknown exon 11874 12227 . + . gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "NR_046018"; tss_id "TSS16932";
chr1 unknown exon 12613 12721 . + . gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "NR_046018"; tss_id "TSS16932";
chr1 unknown exon 13221 14409 . + . gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "NR_046018"; tss_id "TSS16932";
chr1 unknown exon 14362 14829 . - . gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS8568";
1 havana gene 11869 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";
1 havana transcript 11869 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "lncRNA"; tag "basic"; transcript_support_level "1";
1 havana exon 11869 12227 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "lncRNA"; exon_id "ENSE00002234944"; exon_version "1"; tag "basic"; transcript_support_level "1";
The GTF format stores specific information in each column:
1 |
Chromosome |
2 |
Source, e.g. ensembl, havana |
3 |
Gene feature, e.g. exon, intron, mRNA, transcript |
4 |
Start location, 1-based |
5 |
End location, 1-based |
6 |
Score |
7 |
Strand |
8 |
Frame, relating to codons |
9 |
Attribute, a semicolon separated list of key/value pairs giving
additional information about the feature. |
Minutiae, Very Briefly
Bioinformatics is a relatively new, fast-changing, field and its data
standards and formats are no different. Consequently there are some
oddities and tedious items of note which we would like to only briefly
touch on here.
Genome Builds
On occassion new reference genomes are released, and the genome build
number changes. You may be familiar with the UCSC manner of naming human
genome builds: hg18, hg19, hg38. ENSEMBL, naturally, has their own way
of referring to genome builds: GRCh36, GRCh37, and GRCh38. Notice with
the most recent human reference, the numbering now aligns between UCSC
and ENSEMBL.
Different organisms have their own versioning.
Gene IDs
The two GTF examples above highlight different ways of referring to
the same gene. In the first GTF we see:
And in the second GTF we see:
- ENSG00000223972, the ENSEMBL gene ID
- DDX11L1, the gene symbol, thankfully the same
- ENST00000456328, the ENSEMBL transcript ID
Translating between different gene IDs is possible, as we will see in
Day Two with biomaRt
. But in terms of best
practice it is generally a good idea to avoid using the gene
symbol as the primary gene identifier because not everyone refers to the
same gene by the same symbol.
Getting a Reference Genome
The Illumina
iGenomes resource is one of the easiest, and most comprehensive,
ways to download a reference genome. iGenomes includes both the
reference sequence and gene models.
Reference genomes can be very large, depending on
the organism, and so we will not download one to the Amazon instance we
are using for this workshop. We’ve included instructions for downloading
these, in case you want to download these to the server where you intend
to later do a similar RNA-seq analysis (e.g. on High-Performance
Compute, GreatLakes).
How would I download references with iGenomes?
As noted, it’s not recommended to download the iGenomes references to
the AWS instance. However, if you wanted to know in general how you
would do that, the process is described here.
First go to the iGenomes
page, find the build you want from the source you want, right click the
genome build you want to download, and select “Copy link location”:
Then on the remote server you would go to the directory you’d like to
download the genome to and type (that URL is what we copied):
$ wget http://igenomes.illumina.com.s3-website-us-east-1.amazonaws.com/Homo_sapiens/NCBI/GRCh38/Homo_sapiens_NCBI_GRCh38.tar.gz
After the download finishes (it may take a while as it is tens of GB
large), you can unpack it with:
$ tar -xf Homo_sapiens_NCBI_GRCh38.tar.gz
Which Reference is Right for Me?
The key is to be consistent in your research. Switching from ENSEMBL
to UCSC will create many headaches because of the change in gene
identifiers, and differences in the gene models themselves. Often people
choose the one they’re most comfortable with, which is often a function
of historical accident. The key is not to overthink it.
Another important note is not to mix the sources. If you download
reference sequence from UCSC, don’t use an ENSEMBL GTF (and vice versa).
One of the quirky differences between the two databases is that ENSEMBL
refers to chromosome only by their number, i.e. 1
, whereas
UCSC refers to chromsomes as chr1
. This makes reference
FASTAs from one source incompatible with gene builds from another.
These materials have been adapted and extended from materials created
by the Harvard Chan
Bioinformatics Core (HBC). These are open access materials
distributed under the terms of the Creative Commons
Attribution license (CC BY 4.0), which permits unrestricted use,
distribution, and reproduction in any medium, provided the original
author and source are credited.
LS0tCnRpdGxlOiAiTW9kdWxlIDAzYTogUmVmZXJlbmNlIEdlbm9tZXMiCmF1dGhvcjogIlVNIEJpb2luZm9ybWF0aWNzIENvcmUiCm91dHB1dDoKICAgICAgICBodG1sX2RvY3VtZW50OgogICAgICAgICAgICBpbmNsdWRlczoKICAgICAgICAgICAgICAgIGluX2hlYWRlcjogaGVhZGVyLmh0bWwKICAgICAgICAgICAgdGhlbWU6IHBhcGVyCiAgICAgICAgICAgIHRvYzogdHJ1ZQogICAgICAgICAgICB0b2NfZGVwdGg6IDQKICAgICAgICAgICAgdG9jX2Zsb2F0OiB0cnVlCiAgICAgICAgICAgIG51bWJlcl9zZWN0aW9uczogZmFsc2UKICAgICAgICAgICAgZmlnX2NhcHRpb246IHRydWUKICAgICAgICAgICAgbWFya2Rvd246IEdGTQogICAgICAgICAgICBjb2RlX2Rvd25sb2FkOiB0cnVlCi0tLQo8c3R5bGUgdHlwZT0idGV4dC9jc3MiPgpib2R5eyAvKiBOb3JtYWwgICovCiAgICAgIGZvbnQtc2l6ZTogMTRwdDsKICB9CnByZSB7CiAgZm9udC1zaXplOiAxMnB0Cn0KPC9zdHlsZT4KCiMgUmVmZXJlbmNlIEdlbm9tZXMKCkluIHRoaXMgbW9kdWxlLCB3ZSB3aWxsIGxlYXJuOgoKKiB3aGF0IGEgcmVmZXJlbmNlIGdlbm9tZSBpcyBhbmQgd2hhdCBpdCBjb250YWlucwoqIGRldGFpbHMgYWJvdXQgdGhlIEZBU1RBIGFuZCBHVEYgZm9ybWF0cwoqIHRvIGFwcHJlY2lhdGUgdGhlIGRpZmZlcmVuY2VzIGluIGdlbmUgaWRlbnRpZmllcnMKKiBob3cgdG8gZG93bmxvYWQgYSByZWZlcmVuY2UgZ2Vub21lCgojIERpZmZlcmVudGlhbCBFeHByZXNzaW9uIFdvcmtmbG93CgpIZXJlIHdlIHdpbGwgc2V0IHRoZSBzdGFnZSBmb3IgdGhlIG5leHQgc3RlcHMgYnkgZGlzY3Vzc2luZyByZWZlcmVuY2UgZ2Vub21lcywgd2hpY2ggYXJlIGludGVncmFsIHRvIGdlbm9tZSBhbGlnbm1lbnRzIGFuZCBnZW5lL2lzb2Zvcm0gcXVhbnRpZmljYXRpb24uIEFsb25nIHRoZSB3YXkgd2Ugd2lsbCB0b3VjaCBvbiBzb21lIHRoaW5ncyB0byBiZSBhd2FyZSBvZi4KCiFbXShpbWFnZXMvd2F5ZmluZGVyL3dheWZpbmRlci1SZWZlcmVuY2VHZW5vbWVzLnBuZykKPGJyPgo8YnI+Cjxicj4KPGJyPgoKIyBSZWZlcmVuY2UgR2Vub21lcwoKQSByZWZlcmVuY2UgZ2Vub21lIGNvbnNpc3RzIG9mIHRoZSAqKnJlZmVyZW5jZSBzZXF1ZW5jZSoqIGFuZCwgb3B0aW9uYWxseSwgYW55IG51bWJlciBvZiAqKmdlbm9taWMgYW5ub3RhdGlvbnMqKiB0aGF0IGRlc2NyaWJlIGF0dHJpYnV0ZXMgYWJvdXQgdGhhdCBzZXF1ZW5jZS4gRXhhbXBsZXMgb2YgYW5ub3RhdGlvbnMgaW5jbHVkZToKCiogR2VuZSBtb2RlbHMgY29uc2lzdGluZyBvZiB0aGUgbG9jYXRpb24gYW5kIG90aGVyIGluZm9ybWF0aW9uIGFib3V0IGdlbmVzLgoqIFZhcmlhbnRzIGNvbnNpc3Rpbmcgb2YgdGhlIGxvY2F0aW9uIG9mIGNvbW1vbiBvciByYXJlIGdlbmV0aWMgdmFyaWFudHMsIHRoZWlyIGFsbGVsZXMsIGFuZCBmcmVxdWVuY2llcy4KKiBTbWFsbCBSTkFzIGNvbnNpc3Rpbmcgb2YgdGhlIGxvY2F0aW9uIGFuZCBvdGhlciBpbmZvcm1hdGlvbiBhYm91dCB2YXJpb3VzIHR5cGVzIG9mIHNtYWxsIFJOQXMuCgpPZiBwYXJ0aWN1bGFyIHJlbGV2YW5jZSB0byB1cyBmb3IgdGhpcyB3b3Jrc2hvcCBhcmUgdGhlIHJlZmVyZW5jZSBzZXF1ZW5jZSBhbmQgZ2VuZSBtb2RlbHMuCgojIyBSZWZlcmVuY2UgU2VxdWVuY2UKClJlZmVyZW5jZSBzZXF1ZW5jZSBpcyBzdG9yZWQgaW4gW0ZBU1RBXShodHRwczovL2VuLndpa2lwZWRpYS5vcmcvd2lraS9GQVNUQV9mb3JtYXQpIGZpbGVzLiBUaGV5IGFyZSBzaW1pbGFyIHRvIEZBU1RRIGZpbGVzIGluIHRoZWlyIHN0b3JhZ2Ugb2Ygc2VxdWVuY2UgaW5mb3JtYXRpb24sIGJ1dCB0aGVpciBmb3JtYXQgaXMgYSBsaXR0bGUgZGlmZmVyZW50IGluIGEgY291cGxlIHdheXM6CgoxLiBSZWNvcmRzIGFyZSBzZXBhcmF0ZWQgYnkgbGluZXMgYmVnaW5uaW5nIHdpdGggYD5gIGluc3RlYWQgb2YgYEBgLgoyLiBPbmx5IHRoZSBzZXF1ZW5jZSBpcyBzdG9yZWQgaW4gYSBGQVNUQSBmaWxlLCB0aGVyZSBpcyBubyBub3Rpb24gb2YgcXVhbGl0eSBhdHRhY2hlZCB0byB0aGUgbnVjbGVvdGlkZXMuCgpgYGAKPmNock0KR0FUQ0FDQUdHVENUQVRDQUNDQ1RBVFRBQUNDQUNUQ0FDR0dHQUdDVENUQ0NBVEdDQVQKVFRHR1RBVFRUVENHVENUR0dHR0dHVEdUR0NBQ0dDR0FUQUdDQVRUR0NHQUdBQ0dDVEcKR0FHQ0NHR0FHQ0FDQ0NUQVRHVENHQ0FHVEFUQ1RHVENUVFRHQVRUQ0NUR0NDVENBVFQKQ1RBVFRBVFRUQVRDR0NBQ0NUQUNHVFRDQUFUQVRUQUNBR0dDR0FBQ0FUQUNDVEFDVEEKQUFHVEdUR1RUQUFUVEFBVFRBQVRHQ1RUR1RBR0dBQ0FUQUFUQUFUQUFDQUFUVEdBQVQKR1RDVEdDQUNBR0NDR0NUVFRDQ0FDQUNBR0FDQVRDQVRBQUNBQUFBQUFUVFRDQ0FDQ0EKQUFDQ0NDQ0NDQ1RDQ0NDQ0NHQ1RUQ1RHR0NDQUNBR0NBQ1RUQUFBQ0FDQVRDVENUR0MKQ0FBQUNDQ0NBQUFBQUNBQUFHQUFDQ0NUQUFDQUNDQUdDQ1RBQUNDQUdBVFRUQ0FBQVQKVFRUQVRDVFRUQUdHQ0dHVEFUR0NBQ1RUVFRBQUNBR1RDQUNDQ0NDQ0FBQ1RBQUNBQ0EKYGBgCgojIyBHZW5lIE1vZGVscwoKV2VsbC1jaGFyYWN0ZXJpemVkIG9yZ2FuaXNtcyAoZS5nLiBodW1hbiwgbW91c2UsIHplYnJhZmlzaCkgaGF2ZSBmYWlybHkgbWF0dXJlIGdlbmUgbW9kZWxzLiBUaGVzZSBhcmUgc3RvcmVkIGluIFtHVEZdKGh0dHBzOi8vdXN3ZXN0LmVuc2VtYmwub3JnL2luZm8vd2Vic2l0ZS91cGxvYWQvZ2ZmLmh0bWwpIGZvcm1hdCwgd2hpY2ggZ2l2ZXMgbG9jYXRpb24gYW5kIG90aGVyIGluZm9ybWF0aW9uIGFib3V0IGVhY2ggZ2VuZSBmZWF0dXJlLiBCZWxvdyBhcmUgdHdvIGV4YW1wbGVzOgoKCiAgICBjaHIxCXVua25vd24JZXhvbgkxMTg3NAkxMjIyNwkuCSsJLglnZW5lX2lkICJERFgxMUwxIjsgZ2VuZV9uYW1lICJERFgxMUwxIjsgdHJhbnNjcmlwdF9pZCAiTlJfMDQ2MDE4IjsgdHNzX2lkICJUU1MxNjkzMiI7CiAgICBjaHIxCXVua25vd24JZXhvbgkxMjYxMwkxMjcyMQkuCSsJLglnZW5lX2lkICJERFgxMUwxIjsgZ2VuZV9uYW1lICJERFgxMUwxIjsgdHJhbnNjcmlwdF9pZCAiTlJfMDQ2MDE4IjsgdHNzX2lkICJUU1MxNjkzMiI7CiAgICBjaHIxCXVua25vd24JZXhvbgkxMzIyMQkxNDQwOQkuCSsJLglnZW5lX2lkICJERFgxMUwxIjsgZ2VuZV9uYW1lICJERFgxMUwxIjsgdHJhbnNjcmlwdF9pZCAiTlJfMDQ2MDE4IjsgdHNzX2lkICJUU1MxNjkzMiI7CiAgICBjaHIxCXVua25vd24JZXhvbgkxNDM2MgkxNDgyOQkuCS0JLglnZW5lX2lkICJXQVNIN1AiOyBnZW5lX25hbWUgIldBU0g3UCI7IHRyYW5zY3JpcHRfaWQgIk5SXzAyNDU0MCI7IHRzc19pZCAiVFNTODU2OCI7CgoKICAgIDEJaGF2YW5hCWdlbmUJMTE4NjkJMTQ0MDkJLgkrCS4JZ2VuZV9pZCAiRU5TRzAwMDAwMjIzOTcyIjsgZ2VuZV92ZXJzaW9uICI1IjsgZ2VuZV9uYW1lICJERFgxMUwxIjsgZ2VuZV9zb3VyY2UgImhhdmFuYSI7IGdlbmVfYmlvdHlwZSAidHJhbnNjcmliZWRfdW5wcm9jZXNzZWRfcHNldWRvZ2VuZSI7CiAgICAxCWhhdmFuYQl0cmFuc2NyaXB0CTExODY5CTE0NDA5CS4JKwkuCWdlbmVfaWQgIkVOU0cwMDAwMDIyMzk3MiI7IGdlbmVfdmVyc2lvbiAiNSI7IHRyYW5zY3JpcHRfaWQgIkVOU1QwMDAwMDQ1NjMyOCI7IHRyYW5zY3JpcHRfdmVyc2lvbiAiMiI7IGdlbmVfbmFtZSAiRERYMTFMMSI7IGdlbmVfc291cmNlICJoYXZhbmEiOyBnZW5lX2Jpb3R5cGUgInRyYW5zY3JpYmVkX3VucHJvY2Vzc2VkX3BzZXVkb2dlbmUiOyB0cmFuc2NyaXB0X25hbWUgIkREWDExTDEtMjAyIjsgdHJhbnNjcmlwdF9zb3VyY2UgImhhdmFuYSI7IHRyYW5zY3JpcHRfYmlvdHlwZSAibG5jUk5BIjsgdGFnICJiYXNpYyI7IHRyYW5zY3JpcHRfc3VwcG9ydF9sZXZlbCAiMSI7CiAgICAxCWhhdmFuYQlleG9uCTExODY5CTEyMjI3CS4JKwkuCWdlbmVfaWQgIkVOU0cwMDAwMDIyMzk3MiI7IGdlbmVfdmVyc2lvbiAiNSI7IHRyYW5zY3JpcHRfaWQgIkVOU1QwMDAwMDQ1NjMyOCI7IHRyYW5zY3JpcHRfdmVyc2lvbiAiMiI7IGV4b25fbnVtYmVyICIxIjsgZ2VuZV9uYW1lICJERFgxMUwxIjsgZ2VuZV9zb3VyY2UgImhhdmFuYSI7IGdlbmVfYmlvdHlwZSAidHJhbnNjcmliZWRfdW5wcm9jZXNzZWRfcHNldWRvZ2VuZSI7IHRyYW5zY3JpcHRfbmFtZSAiRERYMTFMMS0yMDIiOyB0cmFuc2NyaXB0X3NvdXJjZSAiaGF2YW5hIjsgdHJhbnNjcmlwdF9iaW90eXBlICJsbmNSTkEiOyBleG9uX2lkICJFTlNFMDAwMDIyMzQ5NDQiOyBleG9uX3ZlcnNpb24gIjEiOyB0YWcgImJhc2ljIjsgdHJhbnNjcmlwdF9zdXBwb3J0X2xldmVsICIxIjsKCgpUaGUgR1RGIGZvcm1hdCBzdG9yZXMgc3BlY2lmaWMgaW5mb3JtYXRpb24gaW4gZWFjaCBjb2x1bW46Cgp8IENvbHVtbiB8IERlc2NyaXB0aW9uIHwKfCA6LS0tLTogfCAtLS0tLS0tLS0tLSB8CnwgMSB8IENocm9tb3NvbWUgfAp8IDIgfCBTb3VyY2UsIGUuZy4gZW5zZW1ibCwgaGF2YW5hIHwKfCAzIHwgR2VuZSBmZWF0dXJlLCBlLmcuIGV4b24sIGludHJvbiwgbVJOQSwgdHJhbnNjcmlwdCB8CnwgNCB8IFN0YXJ0IGxvY2F0aW9uLCAxLWJhc2VkIHwKfCA1IHwgRW5kIGxvY2F0aW9uLCAxLWJhc2VkIHwKfCA2IHwgU2NvcmUgfAp8IDcgfCBTdHJhbmQgfAp8IDggfCBGcmFtZSwgcmVsYXRpbmcgdG8gY29kb25zIHwKfCA5IHwgQXR0cmlidXRlLCBhIHNlbWljb2xvbiBzZXBhcmF0ZWQgbGlzdCBvZiBrZXkvdmFsdWUgcGFpcnMgZ2l2aW5nIGFkZGl0aW9uYWwgaW5mb3JtYXRpb24gYWJvdXQgdGhlIGZlYXR1cmUuIHwKCiMjIE1pbnV0aWFlLCBWZXJ5IEJyaWVmbHkKCkJpb2luZm9ybWF0aWNzIGlzIGEgcmVsYXRpdmVseSBuZXcsIGZhc3QtY2hhbmdpbmcsIGZpZWxkIGFuZCBpdHMgZGF0YSBzdGFuZGFyZHMgYW5kIGZvcm1hdHMgYXJlIG5vIGRpZmZlcmVudC4gQ29uc2VxdWVudGx5IHRoZXJlIGFyZSBzb21lIG9kZGl0aWVzIGFuZCB0ZWRpb3VzIGl0ZW1zIG9mIG5vdGUgd2hpY2ggd2Ugd291bGQgbGlrZSB0byBvbmx5IGJyaWVmbHkgdG91Y2ggb24gaGVyZS4KCiMjIyBHZW5vbWUgQnVpbGRzCgpPbiBvY2Nhc3Npb24gbmV3IHJlZmVyZW5jZSBnZW5vbWVzIGFyZSByZWxlYXNlZCwgYW5kIHRoZSBnZW5vbWUgYnVpbGQgbnVtYmVyIGNoYW5nZXMuIFlvdSBtYXkgYmUgZmFtaWxpYXIgd2l0aCB0aGUgVUNTQyBtYW5uZXIgb2YgbmFtaW5nIGh1bWFuIGdlbm9tZSBidWlsZHM6IGhnMTgsIGhnMTksIGhnMzguIEVOU0VNQkwsIG5hdHVyYWxseSwgaGFzIHRoZWlyIG93biB3YXkgb2YgcmVmZXJyaW5nIHRvIGdlbm9tZSBidWlsZHM6IEdSQ2gzNiwgR1JDaDM3LCBhbmQgR1JDaDM4LiBOb3RpY2Ugd2l0aCB0aGUgbW9zdCByZWNlbnQgaHVtYW4gcmVmZXJlbmNlLCB0aGUgbnVtYmVyaW5nIG5vdyBhbGlnbnMgYmV0d2VlbiBVQ1NDIGFuZCBFTlNFTUJMLgoKRGlmZmVyZW50IG9yZ2FuaXNtcyBoYXZlIHRoZWlyIG93biB2ZXJzaW9uaW5nLgoKIyMjIEFubm90YXRpb24gU291cmNlcwoKW05DQkkgUmVmU2VxXShodHRwczovL3d3dy5uY2JpLm5sbS5uaWguZ292L3JlZnNlcS9yc2cvKSwgW0VOU0VNQkxdKGh0dHBzOi8vd3d3LmVuc2VtYmwub3JnL2luZm8vZ2Vub21lL2dlbmVidWlsZC9pbmRleC5odG1sKSwgYW5kIFtVQ1NDIEtub3duIEdlbmVzXShodHRwczovL2FjYWRlbWljLm91cC5jb20vYmlvaW5mb3JtYXRpY3MvYXJ0aWNsZS8yMi85LzEwMzYvMjAwMDkzKSBhcmUgdGhlIHRocmVlIHByaW1hcnkgZ2VuZSBhbm5vdGF0aW9uIGRhdGFiYXNlcyAoZGlmZmVyZW50IG9yZ2FuaXNtcyBoYXZlIHRoZWlyIG93biBkYXRhYmFzZXMpLiBXZSB3aWxsIG5vdCBnbyBpbnRvIGV4YWN0bHkgaG93IHRoZSBnZW5lIGFubm90YXRpb25zIGFyZSBkaWZmZXJlbnQsIGJ1dCB3ZSBub3RlIHRoYXQgdGhlIGFyZSwgYW5kIFtvdGhlcnMgaGF2ZSBleGFtaW5lZCB0aGUgY29uc2VxdWVuY2VzIG9mIHRoaXNdKGh0dHBzOi8vYm1jZ2Vub21pY3MuYmlvbWVkY2VudHJhbC5jb20vYXJ0aWNsZXMvMTAuMTE4Ni9zMTI4NjQtMDE1LTEzMDgtOCkuCgojIyMgR2VuZSBJRHMKClRoZSB0d28gR1RGIGV4YW1wbGVzIGFib3ZlIGhpZ2hsaWdodCBkaWZmZXJlbnQgd2F5cyBvZiByZWZlcnJpbmcgdG8gdGhlIHNhbWUgZ2VuZS4gSW4gdGhlIGZpcnN0IEdURiB3ZSBzZWU6CgoqIEREWDExTDEsIHRoZSBnZW5lIHN5bWJvbCwgY29udHJvbGxlZCBieSB0aGUgW0h1bWFuIEdlbmUgTm9tZW5jbGF0dXJlIENvbW1pdHRlZSAoSFVHTyldKGh0dHBzOi8vd3d3LmdlbmVuYW1lcy5vcmcvKS4KKiBOUl8wNDYwMTgsIHRoZSBSZWZTZXEgdHJhbnNjcmlwdCBJRAoKQW5kIGluIHRoZSBzZWNvbmQgR1RGIHdlIHNlZToKCiogRU5TRzAwMDAwMjIzOTcyLCB0aGUgRU5TRU1CTCBnZW5lIElECiogRERYMTFMMSwgdGhlIGdlbmUgc3ltYm9sLCB0aGFua2Z1bGx5IHRoZSBzYW1lCiogRU5TVDAwMDAwNDU2MzI4LCB0aGUgRU5TRU1CTCB0cmFuc2NyaXB0IElECgpUcmFuc2xhdGluZyBiZXR3ZWVuIGRpZmZlcmVudCBnZW5lIElEcyBpcyBwb3NzaWJsZSwgYXMgd2Ugd2lsbCBzZWUgaW4gRGF5IFR3byB3aXRoIGBiaW9tYVJ0YC4gQnV0IGluIHRlcm1zIG9mICoqYmVzdCBwcmFjdGljZSoqIGl0IGlzIGdlbmVyYWxseSBhIGdvb2QgaWRlYSB0byBhdm9pZCB1c2luZyB0aGUgZ2VuZSBzeW1ib2wgYXMgdGhlIHByaW1hcnkgZ2VuZSBpZGVudGlmaWVyIGJlY2F1c2Ugbm90IGV2ZXJ5b25lIHJlZmVycyB0byB0aGUgc2FtZSBnZW5lIGJ5IHRoZSBzYW1lIHN5bWJvbC4KCiMgR2V0dGluZyBhIFJlZmVyZW5jZSBHZW5vbWUKClRoZSBbSWxsdW1pbmEgaUdlbm9tZXNdKGh0dHBzOi8vc3VwcG9ydC5pbGx1bWluYS5jb20vc2VxdWVuY2luZy9zZXF1ZW5jaW5nX3NvZnR3YXJlL2lnZW5vbWUuaHRtbCkgcmVzb3VyY2UgaXMgb25lIG9mIHRoZSBlYXNpZXN0LCBhbmQgbW9zdCBjb21wcmVoZW5zaXZlLCB3YXlzIHRvIGRvd25sb2FkIGEgcmVmZXJlbmNlIGdlbm9tZS4gaUdlbm9tZXMgaW5jbHVkZXMgYm90aCB0aGUgcmVmZXJlbmNlIHNlcXVlbmNlIGFuZCBnZW5lIG1vZGVscy4KClJlZmVyZW5jZSBnZW5vbWVzIGNhbiBiZSAqKnZlcnkgbGFyZ2UqKiwgZGVwZW5kaW5nIG9uIHRoZSBvcmdhbmlzbSwgYW5kIHNvIHdlIHdpbGwgbm90IGRvd25sb2FkIG9uZSB0byB0aGUgQW1hem9uIGluc3RhbmNlIHdlIGFyZSB1c2luZyBmb3IgdGhpcyB3b3Jrc2hvcC4gV2UndmUgaW5jbHVkZWQgaW5zdHJ1Y3Rpb25zIGZvciBkb3dubG9hZGluZyB0aGVzZSwgaW4gY2FzZSB5b3Ugd2FudCB0byBkb3dubG9hZCB0aGVzZSB0byB0aGUgc2VydmVyIHdoZXJlIHlvdSBpbnRlbmQgdG8gbGF0ZXIgZG8gYSBzaW1pbGFyIFJOQS1zZXEgYW5hbHlzaXMgKGUuZy4gb24gSGlnaC1QZXJmb3JtYW5jZSBDb21wdXRlLCBHcmVhdExha2VzKS4KCjxkZXRhaWxzPgo8c3VtbWFyeT5Ib3cgd291bGQgSSBkb3dubG9hZCByZWZlcmVuY2VzIHdpdGggaUdlbm9tZXM/PC9zdW1tYXJ5PgoKQXMgbm90ZWQsIGl0J3Mgbm90IHJlY29tbWVuZGVkIHRvIGRvd25sb2FkIHRoZSBpR2Vub21lcyByZWZlcmVuY2VzIHRvIHRoZSBBV1MgaW5zdGFuY2UuIEhvd2V2ZXIsIGlmIHlvdSB3YW50ZWQgdG8ga25vdyBpbiBnZW5lcmFsIGhvdyB5b3Ugd291bGQgZG8gdGhhdCwgdGhlIHByb2Nlc3MgaXMgZGVzY3JpYmVkIGhlcmUuCgpGaXJzdCBnbyB0byB0aGUgW2lHZW5vbWVzXShodHRwczovL3N1cHBvcnQuaWxsdW1pbmEuY29tL3NlcXVlbmNpbmcvc2VxdWVuY2luZ19zb2Z0d2FyZS9pZ2Vub21lLmh0bWwpIHBhZ2UsIGZpbmQgdGhlIGJ1aWxkIHlvdSB3YW50IGZyb20gdGhlIHNvdXJjZSB5b3Ugd2FudCwgcmlnaHQgY2xpY2sgdGhlIGdlbm9tZSBidWlsZCB5b3Ugd2FudCB0byBkb3dubG9hZCwgYW5kIHNlbGVjdCAiQ29weSBsaW5rIGxvY2F0aW9uIjoKCiFbaUdlbm9tZXMgaW1hZ2UgZm9yIGNvcHlpbmcgbGluayBsb2NhdGlvbl0oaW1hZ2VzL2dlbm9tZV9jb3B5X2xpbmsucG5nKQoKVGhlbiBvbiB0aGUgcmVtb3RlIHNlcnZlciB5b3Ugd291bGQgZ28gdG8gdGhlIGRpcmVjdG9yeSB5b3UnZCBsaWtlIHRvIGRvd25sb2FkIHRoZSBnZW5vbWUgdG8gYW5kIHR5cGUgKHRoYXQgVVJMIGlzIHdoYXQgd2UgY29waWVkKToKCmBgYAokIHdnZXQgaHR0cDovL2lnZW5vbWVzLmlsbHVtaW5hLmNvbS5zMy13ZWJzaXRlLXVzLWVhc3QtMS5hbWF6b25hd3MuY29tL0hvbW9fc2FwaWVucy9OQ0JJL0dSQ2gzOC9Ib21vX3NhcGllbnNfTkNCSV9HUkNoMzgudGFyLmd6CmBgYAoKQWZ0ZXIgdGhlIGRvd25sb2FkIGZpbmlzaGVzIChpdCBtYXkgdGFrZSBhIHdoaWxlIGFzIGl0IGlzIHRlbnMgb2YgR0IgbGFyZ2UpLCB5b3UgY2FuIHVucGFjayBpdCB3aXRoOgoKYGBgCiQgdGFyIC14ZiBIb21vX3NhcGllbnNfTkNCSV9HUkNoMzgudGFyLmd6CmBgYAoKPC9kZXRhaWxzPgoKCiMjIFdoaWNoIFJlZmVyZW5jZSBpcyBSaWdodCBmb3IgTWU/CgpUaGUga2V5IGlzIHRvIGJlIGNvbnNpc3RlbnQgaW4geW91ciByZXNlYXJjaC4gU3dpdGNoaW5nIGZyb20gRU5TRU1CTCB0byBVQ1NDIHdpbGwgY3JlYXRlIG1hbnkgaGVhZGFjaGVzIGJlY2F1c2Ugb2YgdGhlIGNoYW5nZSBpbiBnZW5lIGlkZW50aWZpZXJzLCBhbmQgZGlmZmVyZW5jZXMgaW4gdGhlIGdlbmUgbW9kZWxzIHRoZW1zZWx2ZXMuIE9mdGVuIHBlb3BsZSBjaG9vc2UgdGhlIG9uZSB0aGV5J3JlIG1vc3QgY29tZm9ydGFibGUgd2l0aCwgd2hpY2ggaXMgb2Z0ZW4gYSBmdW5jdGlvbiBvZiBoaXN0b3JpY2FsIGFjY2lkZW50LiBUaGUga2V5IGlzIG5vdCB0byBvdmVydGhpbmsgaXQuCgpBbm90aGVyIGltcG9ydGFudCBub3RlIGlzIG5vdCB0byBtaXggdGhlIHNvdXJjZXMuIElmIHlvdSBkb3dubG9hZCByZWZlcmVuY2Ugc2VxdWVuY2UgZnJvbSBVQ1NDLCBkb24ndCB1c2UgYW4gRU5TRU1CTCBHVEYgKGFuZCB2aWNlIHZlcnNhKS4gT25lIG9mIHRoZSBxdWlya3kgZGlmZmVyZW5jZXMgYmV0d2VlbiB0aGUgdHdvIGRhdGFiYXNlcyBpcyB0aGF0IEVOU0VNQkwgcmVmZXJzIHRvIGNocm9tb3NvbWUgb25seSBieSB0aGVpciBudW1iZXIsIGkuZS4gYDFgLCB3aGVyZWFzIFVDU0MgcmVmZXJzIHRvIGNocm9tc29tZXMgYXMgYGNocjFgLiBUaGlzIG1ha2VzIHJlZmVyZW5jZSBGQVNUQXMgZnJvbSBvbmUgc291cmNlIGluY29tcGF0aWJsZSB3aXRoIGdlbmUgYnVpbGRzIGZyb20gYW5vdGhlci4KCjxicj4KPGJyPgoKLS0tCgpUaGVzZSBtYXRlcmlhbHMgaGF2ZSBiZWVuIGFkYXB0ZWQgYW5kIGV4dGVuZGVkIGZyb20gbWF0ZXJpYWxzIGNyZWF0ZWQgYnkgdGhlIFtIYXJ2YXJkIENoYW4gQmlvaW5mb3JtYXRpY3MgQ29yZSAoSEJDKV0oaHR0cDovL2Jpb2luZm9ybWF0aWNzLnNwaC5oYXJ2YXJkLmVkdS8pLiBUaGVzZSBhcmUgb3BlbiBhY2Nlc3MgbWF0ZXJpYWxzIGRpc3RyaWJ1dGVkIHVuZGVyIHRoZSB0ZXJtcyBvZiB0aGUgW0NyZWF0aXZlIENvbW1vbnMgQXR0cmlidXRpb24gbGljZW5zZSAoQ0MgQlkgNC4wKV0oaHR0cDovL2NyZWF0aXZlY29tbW9ucy5vcmcvbGljZW5zZXMvYnkvNC4wLyksIHdoaWNoIHBlcm1pdHMgdW5yZXN0cmljdGVkIHVzZSwgZGlzdHJpYnV0aW9uLCBhbmQgcmVwcm9kdWN0aW9uIGluIGFueSBtZWRpdW0sIHByb3ZpZGVkIHRoZSBvcmlnaW5hbCBhdXRob3IgYW5kIHNvdXJjZSBhcmUgY3JlZGl0ZWQuCg==