appendix_a
PolyPhen-2 annotation summary report explained
Following is a description of PolyPhen-2 annotation summary report. Reports in this format are produced by both PolyPhen-2 Batch query web service, as well as by standalone PolyPhen-2 software. It is a plain text tab-separated file with each line annotating single protein variant (amino acid residue substitution).
Eleven columns highlighted below (1, 5-9, 12, 16-18, 56) are the ones included in the Short version of the report available via Batch query web page. These are sufficient if you are interested in PolyPhen-2 prediction outcome and prediction confidence scores. The rest of the columns in Full report version are mostly useful only if you want to investigate all features supporting the prediction in detail.
Column No. | Column Name | Description |
---|---|---|
Original query (as copied from user input): | ||
1 | o_acc | original protein identifier |
2 | o_pos | original substitution position in the protein sequence |
3 | o_aa1 | original wild type (reference) amino acid residue |
4 | o_aa2 | original mutant (substitution) amino acid residue |
Annotated query: | ||
5 | rsid | dbSNP reference SNP identifier (rsID) if available |
6 | acc | UniProtKB accession if known protein, otherwise same as o_acc |
7 | pos | substitution position in UniProtKB protein sequence, otherwise same as o_pos |
8 | aa1 | wild type amino acid residue in relation to UniProtKB sequence |
9 | aa2 | mutant amino acid residue in relation to UniProtKB sequence |
10 | nt1 | wild type (reference) allele nucleotide |
11 | nt2 | mutant allele nucleotide |
PolyPhen-2 prediction outcome: | ||
12 | prediction | qualitative ternary classification appraised at 5%/10% (HumDiv) or 10%/20% (HumVar) FPR thresholds (“benign”, “possibly damaging”, “probably damaging”) |
PolyPhen-1 prediction description (obsolete, please ignore): | ||
13 | based_on | prediction basis |
14 | effect | predicted substitution effect on the protein structure or function |
PolyPhen-2 classifier outcome and scores: | ||
15 | pph2_class | probabilistic binary classifier outcome (“damaging” or “neutral”) |
16 | pph2_prob | classifier probability of the variation being damaging |
17 | pph2_FPR | classifier model False Positive Rate (1 - specificity) at the above probability |
18 | pph2_TPR | classifier model True Positive Rate (sensitivity) at the above probability |
19 | pph2_FDR | classifier model False Discovery Rate at the above probability |
UniProtKB/Swiss-Prot derived protein sequence annotations: | ||
20 | site | substitution SITE annotation |
21 | region | substitution REGION annotation |
22 | PHAT | PHAT matrix element for substitutions in the TRANSMEM region |
Multiple sequence alignment scores: | ||
23 | dScore | difference of PSIC scores for two amino acid residue variants (Score1-Score2) |
24 | Score1 | PSIC score for wild type amino acid residue (aa1) |
25 | Score2 | PSIC score for mutant amino acid residue (aa2) |
26 | MSAv | version of the multiple sequence alignment used in conservation scores calculations: 1 - pairwise BLAST HSP (obsolete), 2 - MAFFT-Leon-Cluspack (default), 3 - MultiZ CDS |
27 | Nobs | number of residues observed at the substitution position in multiple alignment (without gaps) |
Protein 3D structure features: | ||
28 | Nstruct | initial number of BLAST hits to similar proteins with 3D structures in PDB |
29 | Nfilt | number of 3D BLAST hits after identity threshold filtering |
30 | PDB_id | PDB protein structure identifier |
31 | PDB_ch | PDB polypeptide chain identifier |
32 | length | PDB sequence alignment length |
33 | PDB_pos | position of substitution in PDB protein sequence |
34 | ident | sequence identity between query sequence and aligned PDB sequence |
35 | dVol | change in residue side chain volume |
36 | dProp | change in solvent accessible surface propensity resulting from the substitution |
37 | SecStr | DSSP secondary structure assignment |
38 | MapReg | region of the phi-psi map (Ramachandran map) derived from the residue dihedral angles |
39 | NormASA | normalized accessible surface area |
40 | B-fact | normalized B-factor (temperature factor) for the residue |
41 | H-bonds | number of hydrogen sidechain-sidechain and sidechain-mainchain bonds formed by the residue |
42 | AveNHet | number of residue contacts with heteroatoms, average per homologous PDB chain |
43 | MinDHet | closest residue contact with a heteroatom, Å |
44 | AveNInt | number of residue contacts with other chains, average per homologous PDB chain |
45 | MinDInt | closest residue contact with other chain, Å |
46 | AveNSit | number of residue contacts with critical sites, average per homologous PDB chain |
47 | MinDSit | closest residue contact with a critical site, Å |
Nucleotide sequence context features: | ||
48 | Transv | whether substitution is a transversion |
49 | CodPos | position of the substitution within a codon |
50 | CpG | whether substitution changes CpG context: 0 - non-CpG context retained, 1 - removes CpG site, 2 - creates new CpG site, 3 - CpG context retained |
51 | MinDJnc | substitution distance from closest exon / intron junction |
Pfam protein family: | ||
52 | PfamHit | Pfam identifier of the query protein |
Substitution scores: | ||
53 | IdPmax | maximum congruency of the mutant amino acid residue to all sequences in multiple alignment |
54 | IdPSNP | maximum congruency of the mutant amino acid residue to the sequences in multiple alignment with the mutant residue |
55 | IdQmin | query sequence identity with the closest homologue deviating from the wild type amino acid residue |
Comments: | ||
56 | Comments | optional user comments, copied from input |
appendix_a.txt · Last modified: 2021/12/03 23:06 by 127.0.0.1