PolyPhen-2 annotation summary report explained

Following is a description of PolyPhen-2 annotation summary report. Reports in this format are produced by both PolyPhen-2 Batch query web service, as well as by standalone PolyPhen-2 software. It is a plain text tab-separated file with each line annotating single protein variant (amino acid residue substitution).

Eleven columns highlighted below (1, 5-9, 12, 16-18, 56) are the ones included in the Short version of the report available via Batch query web page. These are sufficient if you are interested in PolyPhen-2 prediction outcome and prediction confidence scores. The rest of the columns in Full report version are mostly useful only if you want to investigate all features supporting the prediction in detail.

Column
No.
Column
Name
Description
Original query (as copied from user input):
1 o_acc original protein identifier
2 o_pos original substitution position in the protein sequence
3 o_aa1 original wild type (reference) amino acid residue
4 o_aa2 original mutant (substitution) amino acid residue
Annotated query:
5 rsid dbSNP reference SNP identifier (rsID) if available
6 acc UniProtKB accession if known protein, otherwise same as o_acc
7 pos substitution position in UniProtKB protein sequence, otherwise same as o_pos
8 aa1 wild type amino acid residue in relation to UniProtKB sequence
9 aa2 mutant amino acid residue in relation to UniProtKB sequence
10 nt1 wild type (reference) allele nucleotide
11 nt2 mutant allele nucleotide
PolyPhen-2 prediction outcome:
12 prediction qualitative ternary classification appraised at 5%/10% (HumDiv) or 10%/20% (HumVar) FPR thresholds (“benign”, “possibly damaging”, “probably damaging”)
PolyPhen-1 prediction description (obsolete, please ignore):
13 based_on prediction basis
14 effect predicted substitution effect on the protein structure or function
PolyPhen-2 classifier outcome and scores:
15 pph2_class probabilistic binary classifier outcome (“damaging” or “neutral”)
16 pph2_prob classifier probability of the variation being damaging
17 pph2_FPR classifier model False Positive Rate (1 - specificity) at the above probability
18 pph2_TPR classifier model True Positive Rate (sensitivity) at the above probability
19 pph2_FDR classifier model False Discovery Rate at the above probability
UniProtKB/Swiss-Prot derived protein sequence annotations:
20 site substitution SITE annotation
21 region substitution REGION annotation
22 PHAT PHAT matrix element for substitutions in the TRANSMEM region
Multiple sequence alignment scores:
23 dScore difference of PSIC scores for two amino acid residue variants (Score1-Score2)
24 Score1 PSIC score for wild type amino acid residue (aa1)
25 Score2 PSIC score for mutant amino acid residue (aa2)
26 MSAv version of the multiple sequence alignment used in conservation scores calculations: 1 - pairwise BLAST HSP (obsolete), 2 - MAFFT-Leon-Cluspack (default), 3 - MultiZ CDS
27 Nobs number of residues observed at the substitution position in multiple alignment (without gaps)
Protein 3D structure features:
28 Nstruct initial number of BLAST hits to similar proteins with 3D structures in PDB
29 Nfilt number of 3D BLAST hits after identity threshold filtering
30 PDB_id PDB protein structure identifier
31 PDB_pos position of substitution in PDB protein sequence
32 PDB_ch PDB polypeptide chain identifier
33 ident sequence identity between query sequence and aligned PDB sequence
34 length PDB sequence alignment length
35 NormASA normalized accessible surface area
36 SecStr DSSP secondary structure assignment
37 MapReg region of the phi-psi map (Ramachandran map) derived from the residue dihedral angles
38 dVol change in residue side chain volume
39 dProp change in solvent accessible surface propensity resulting from the substitution
40 B-fact normalized B-factor (temperature factor) for the residue
41 H-bonds number of hydrogen sidechain-sidechain and sidechain-mainchain bonds formed by the residue
42 AveNHet number of residue contacts with heteroatoms, average per homologous PDB chain
43 MinDHet closest residue contact with a heteroatom, Å
44 AveNInt number of residue contacts with other chains, average per homologous PDB chain
45 MinDInt closest residue contact with other chain, Å
46 AveNSit number of residue contacts with critical sites, average per homologous PDB chain
47 MinDSit closest residue contact with a critical site, Å
Nucleotide sequence context features:
48 Transv whether substitution is a transversion
49 CodPos position of the substitution within a codon
50 CpG whether substitution changes CpG context: 0 - non-CpG context retained, 1 - removes CpG site, 2 - creates new CpG site, 3 - CpG context retained
51 MinDJnc substitution distance from closest exon / intron junction
Pfam protein family:
52 PfamHit Pfam identifier of the query protein
Substitution scores:
53 IdPmax maximum congruency of the mutant amino acid residue to all sequences in multiple alignment
54 IdPSNP maximum congruency of the mutant amino acid residue to the sequences in multiple alignment with the mutant residue
55 IdQmin query sequence identity with the closest homologue deviating from the wild type amino acid residue
Comments:
56 Comments optional user comments, copied from input
Last modified: 2016/10/14 22:49
   
 
Except where otherwise noted, content on this wiki is licensed under the following license: Public Domain