User Tools

Site Tools


appendix_a

PolyPhen-2 annotation summary report explained

Following is a description of PolyPhen-2 annotation summary report. Reports in this format are produced by both PolyPhen-2 Batch query web service, as well as by standalone PolyPhen-2 software. It is a plain text tab-separated file with each line annotating single protein variant (amino acid residue substitution).

Fourteen columns highlighted below (1-2, 6-11, 41, 45-48, 91) are the ones included in the Short version of the report available via Batch query web page. These are sufficient if you are interested in PolyPhen-2 prediction outcome and prediction confidence scores. The rest of the columns in Full report version are mostly useful only if you want to investigate all features supporting the prediction in detail.

Column
No.
Column
Name
Description
Original query (as copied from user input):
1 query_no input query ordinal
2 o_acc original protein identifier
3 o_pos original substitution position in the protein sequence
4 o_aa1 original wild type (reference) amino acid residue
5 o_aa2 original mutant (substitution) amino acid residue
Annotated query:
6 rsid dbSNP reference SNP identifier (rsID) if available
7 acc UniProtKB accession if known protein, otherwise same as o_acc
8 length Length of the protein sequence
9 pos substitution position in UniProtKB protein sequence, otherwise same as o_pos
10 aa1 wild type amino acid residue in relation to UniProtKB sequence
11 aa2 mutant amino acid residue in relation to UniProtKB sequence
Nucleotide sequence context annotations:
12 chr_pos SNP chromosome:position (chromosome coordinates are 1-based)
13 str transcript strand (“+” or “-”)
14 gene gene symbol
15 transcript UCSC transcript name (unique identifier)
16 canon UCSC knowCanonical representative transcript flag: 1  -  canonical, 0  -  alternative
17 cid UCSC knownCanonical cluster identifier (number)
18 txcov transcript coverage, the number of transcipts in UCSC cluster overlapping the mutation position / total number of transcripts in the cluster
19 ntlen full transcript length (number of nucleotides)
20 ntpos mutation position in the full transcript nucleotide sequence (in the direction of transcription)
21 nt1 reference nucleotide (transcript strand)
22 nt2 variant nucleotide (transcript strand)
23 PtGgPaNl orthologous alleles in chimp  (Pt), gorilla  (Gg), Orangutan  (Pa) and gibbon  (Nl) if different from human reference allele, ?  -  otherwise; .  -  data not available
24 dref putative derived allele found in human reference, score: 0  -  no evidence, 1  -  variant allele matches orthologous ancestral allele, 2  -  dbSNP minor allele matches reference allele, 3  -  both dbSNP and orthologous evidence present, ?  -  not enough evidence to score
25 gerprs Genomic Evolutionary Rate Profiling (GERP++) position-specific conservation score, RS; 0  -  when alignment coverage is insufficient
26 phylop conservation scoring by phyloP (phylogenetic p-values) from the PHAST package for multiple alignments of 99 vertebrate genomes to the human genome
27 trv transversion mutation flag: 0  -  transition, 1  -  transversion
28 CpG CpG context: 0  -  non-CpG context retained, 1  -  mutation removes CpG site, 2  -  mutation creates new CpG site, 3  -  CpG context retained: C(C/G)G substitution
29 JXmin distance from mutation position to the nearest exon / intron junction (“-” for upstream, “+” for downstream)
30 JXc mutation in a codon that is split across two exons: ?  -  no, 1  -  yes
31 exon mutation in exon # / of total exons (exons are enumerated in the direction of transcription)
32 cexon same as above but only coding (CDS) exons are being enumerated
33 cdnpos number of the mutated codon within transcript's CDS (1-base)
34 frame mutation position offset within the codon (0..2)
35 dgn degeneracy index for mutated codon position, by Nei & Kumar (2000) “Molecular Evolution and Phylogenetics”, page 64: 0  -  non-degenerate, 2  -  simple 2-fold degenerate, 3  -  complex 2-fold degenerate, 4  -  4-fold degenerate
36 cdn1 reference codon
37 cdn2 mutated codon
dbNSP annotations:
38 dbrsid dbSNP SNP rsID
39 dbminor dbSNP minor allele nucleotide (transcript strand)
40 dbmaf dbSNP minor allele frequency
PolyPhen-2 prediction outcome:
41 prediction qualitative ternary classification appraised at 5%/10% (HumDiv) or 10%/15% (HumVar) False Discovery Rate (FDR) thresholds: “benign”, “possibly damaging”, “probably damaging”
PolyPhen-1 prediction description (obsolete, please ignore):
42 based_on prediction basis
43 effect predicted substitution effect on the protein structure or function
PolyPhen-2 classifier outcome and scores:
44 pph2_class probabilistic binary classifier outcome: “damaging” or “neutral”
45 pph2_prob classifier probability of the variation being damaging
46 pph2_FPR classifier model False Positive Rate (1 - specificity) at the above probability
47 pph2_TPR classifier model True Positive Rate (sensitivity) at the above probability
48 pph2_FDR classifier model False Discovery Rate at the above probability
UniProtKB/Swiss-Prot/Pfam protein annotations:
49 PfamHit Pfam identifier of the protein family or domain to which substitution maps
50 site substitution SITE annotation
51 region substitution REGION annotation
52 PHAT PHAT matrix element for substitutions in the TRANSMEM region
Multiple sequence alignment scores:
53 dScore difference of PSIC scores for two amino acid residue variants (Score1-Score2)
54 Score1 PSIC score for wild type amino acid residue (aa1)
55 Score2 PSIC score for mutant amino acid residue (aa2)
56 MSAv version of the multiple sequence alignment used in conservation scores calculations: 1 - pairwise BLAST HSP (obsolete), 2 - MAFFT-Leon-Cluspack (default), 3 - MultiZ CDS
57 Nobs number of residues observed at the substitution position in multiple alignment (without gaps)
58 Nseqs number of sequences observed at the substitution position in multiple alignment (including gaps)
59 Nsubs number of residues different from reference residue (aa1) observed at the substitution position in multiple alignment
60 Nvars number of residues same as substitution residue (aa2) observed at the substitution position in multiple alignment (without gaps)
61 Nres number of unique residues observed at the substitution position in multiple alignment (without gaps)
Substitution scores:
62 IdPmax maximum congruency of the substitution amino acid residue across all sequences with a substitution at the substitution position in multiple alignment
63 IdPSNP maximum congruency of the substitution amino acid residue to the sequences in multiple alignment with the substitution residue at the substitution position in multiple alignment
64 IdQmax query sequence identity with the closest homologue deviating from the wild type amino acid residue (aa1)
Phylogenetic tree based scores:
65 DistPmin minimum normalized distance along the phylogenetic tree across all substitution types encountered at the substitution position
66 DistPSNP minimum normalized distance along the phylogenetic tree for substitution residues (aa2) encountered at the substitution position
67 DistQmin minimum distance (sum of branch lengths) along the phylogenetic tree across all substitution types encountered at the substitution position
68 BaRE Bayesian Rate Estimator for scoring evolutionary conservation, D.M. Jordan (2015)
Gene-based scores:
69 RVISraw Residual Variation Intolerance Score (raw), Petrovski et al. (2013)
70 RVISranked Residual Variation Intolerance Score (normalized by rank), Petrovski et al. (2013)
RCSB PDB annotations:
71 Nstruct initial number of BLAST hits to similar proteins with 3D structures in PDB
72 Nfilt number of 3D BLAST hits after identity threshold filtering
73 PDB_id PDB protein structure identifier
74 PDB_ch PDB polypeptide chain identifier
75 PDB_len PDB sequence alignment length
76 PDB_pos position of substitution in the PDB protein sequence
77 PDB_idn sequence identity between query sequence and the aligned PDB sequence
Amino acid residues structural features:
78 dVol change in residue side chain volume
79 dProp change in solvent accessible surface propensity resulting from the substitution
Protein 3D structure features:
80 SecStr DSSP secondary structure assignment
81 MapReg region of the phi-psi map (Ramachandran map) derived from the residue dihedral angles
82 NormASA normalized accessible surface
83 B-fact normalized B-factor (temperature factor) for the residue
84 H-bonds number of hydrogen sidechain-sidechain and sidechain-mainchain bonds formed by the residue
85 AveNHet number of residue contacts with heteroatoms, average per homologous PDB chain
86 MinDHet closest residue contact with a heteroatom, Å
87 AveNInt number of residue contacts with other chains, average per homologous PDB chain
88 MinDInt closest residue contact with other chain, Å
89 AveNSit number of residue contacts with critical sites, average per homologous PDB chain
90 MinDSit closest residue contact with a critical site, Å
Comments:
91 Comments optional user comments, copied from input
appendix_a.txt · Last modified: 2021/12/03 23:49 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki