User Tools

Site Tools


appendix_b

MapSNPs annotation summary report explained

Following is a description of MapSNPs annotation summary report. MapSNPs genomic SNP annotation tool is part of the PolyPhen-2 Batch query web service. Whenever you submit genomic SNPs in the form of chromosome coordinates/alleles, a report formatted as described below will appear under SNPs link on the Batch query results web page. It is a plain text tab-separated file with each line annotating a corresponding protein sequence variant (amino acid residue substitution) for each missense allelic variant found in input.

Columns 27-40 will contain “?” placeholders for SNPs annotated as non-coding; columns 41-45 will have values only for SNPs annotated in dbSNP build 138.

Note: MapSNPs, run as part of PolyPhen-2 Batch query web service, filters SNP annotations in the output depending on the user selection of SNP functional categories made via Annotations menu, under Advanced Options section of the input form. Selecting All disables filtering and results in annotations for all SNP categories reported in pph2-snps.txt file. However, PolyPhen-2 predictions (reported in pph2-short.txt and pph2-full.txt files) are produced for missense SNPs only, regardless of the Annotations option selected.

Column
No.
Column
Name
Description
1 query_no input query ordinal
2 snp_pos input SNP chromosome:position (chromosome coordinates are 1-based)
3 str transcript strand (“+” or “-”)
4 gene gene symbol
5 transcript UCSC transcript name (unique identifier)
6 canon UCSC knowCanonical representative transcript flag: 1  -  canonical, 0  -  alternative
7 cid UCSC knownCanonical cluster identifier (number)
8 txcov transcript coverage, the number of transcipts in UCSC cluster overlapping the mutation position / total number of transcripts in the cluster
9 ccds CCDS cluster identifier
10 cciden CCDS CDS similarity level by genomic overlap with the corresponding UCSC knownGene transcript
11 refa reference allele / variant allele (“+” strand)
12 type SNP functional category (“coding-synon”, “intron”, “stop-loss”, “nonsense”, “missense”, “splice-5”, “splice-3”, “utr-5”, “utr-3”)
13 ntlen full transcript length (number of nucleotides)
14 ntpos mutation position in the full transcript nucleotide sequence (in the direction of transcription)
15 nt1 reference nucleotide (transcript strand)
16 nt2 variant nucleotide (transcript strand)
17 PtGgPaNl orthologous alleles in chimp  (Pt), gorilla  (Gg), Orangutan  (Pa) and gibbon  (Nl) if different from human reference allele, ?  -  otherwise; .  -  data not available
18 dref putative derived allele found in human reference, score: 0  -  no evidence, 1  -  variant allele matches orthologous ancestral allele, 2  -  dbSNP minor allele matches reference allele, 3  -  both dbSNP and orthologous evidence present, ?  -  not enough evidence to score
19 gerprs Genomic Evolutionary Rate Profiling (GERP++) position-specific conservation score, RS; 0  -  when alignment coverage is insufficient
20 phylop conservation scoring by phyloP (phylogenetic p-values) from the PHAST package for multiple alignments of 99 vertebrate genomes to the human genome
21 flanks nucleotides flanking mutation position in the transcript sequence, enumerated in the direction of transcription (5'3')
22 trv transversion mutation flag: 0  -  transition, 1  -  transversion
23 CpG CpG context: 0  -  non-CpG context retained, 1  -  mutation removes CpG site, 2  -  mutation creates new CpG site, 3  -  CpG context retained: C(C/G)G substitution
24 JXdon distance from mutation position to the nearest donor exon / intron junction (“-” for upstream, “+” for downstream)
25 JXacc distance from mutation position to the nearest acceptor intron / exon junction (“-” for upstream, “+” for downstream)
26 JXc mutation in a codon that is split across two exons: ?  -  no, 1  -  yes
27 exon mutation in exon # / of total exons (exons are enumerated in the direction of transcription)
28 cexon same as above but only coding (CDS) exons are being enumerated
29 cdnpos number of the mutated codon within transcript's CDS (1-base)
30 frame mutation position offset within the codon (0..2)
31 dgn degeneracy index for mutated codon position, by Nei & Kumar (2000) “Molecular Evolution and Phylogenetics”, page 64: 0  -  non-degenerate, 2  -  simple 2-fold degenerate, 3  -  complex 2-fold degenerate, 4  -  4-fold degenerate
32 cdn1 reference codon
33 cdn2 mutated codon
34 aapos position of amino acid substitution in the protein sequence (1-base)
35 aa1 wild type (reference) amino acid residue
36 aa2 mutant (substitution) amino acid residue
37 spmap CDS protein sequence similarity to known UniProtKB protein (?  -  no match)
38 spacc UniProtKB protein accession
39 spname UniProtKB protein entry name
40 refs_acc RefSeq protein accession
41 dbrsid dbSNP SNP rsID
42 dbobsrvd dbSNP observed alleles (transcript strand)
43 dbminor dbSNP minor allele nucleotide (transcript strand)
44 dbmaf dbSNP minor allele frequency
45 dbPtPaRm dbSNP orthologous alleles in chimp  (Pt), orangutan  (Pa) and macaque  (Rm)
46 Comments optional user comments, copied from input
appendix_b.txt · Last modified: 2020/04/29 22:44 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki