PolyPhen-2 annotation summary report explained

Following is a description of PolyPhen-2 annotation summary report. Reports in this format are produced by both PolyPhen-2 Batch query web service, as well as by standalone PolyPhen-2 software. It is a plain text tab-separated file with each line annotating single protein variant (amino acid residue substitution).

Eleven columns highlighted below (1, 5-9, 12, 16-18, 56) are the ones included in the Short version of the report available via Batch query web page. These are sufficient if you are interested in PolyPhen-2 prediction outcome and prediction confidence scores. The rest of the columns in Full report version are mostly useful only if you want to investigate all features supporting the prediction in detail.

Column No.	Column Name	Description
Original query (as copied from user input):
1	o_acc	original protein identifier
2	o_pos	original substitution position in the protein sequence
3	o_aa1	original wild type (reference) amino acid residue
4	o_aa2	original mutant (substitution) amino acid residue
Annotated query:
5	rsid	dbSNP reference SNP identifier (rsID) if available
6	acc	UniProtKB accession if known protein, otherwise same as o_acc
7	pos	substitution position in UniProtKB protein sequence, otherwise same as o_pos
8	aa1	wild type amino acid residue in relation to UniProtKB sequence
9	aa2	mutant amino acid residue in relation to UniProtKB sequence
10	nt1	wild type (reference) allele nucleotide
11	nt2	mutant allele nucleotide
PolyPhen-2 prediction outcome:
12	prediction	qualitative ternary classification appraised at 5%/10% (HumDiv) or 10%/20% (HumVar) FPR thresholds (“benign”, “possibly damaging”, “probably damaging”)
PolyPhen-1 prediction description (obsolete, please ignore):
13	based_on	prediction basis
14	effect	predicted substitution effect on the protein structure or function
PolyPhen-2 classifier outcome and scores:
15	pph2_class	probabilistic binary classifier outcome (“damaging” or “neutral”)
16	pph2_prob	classifier probability of the variation being damaging
17	pph2_FPR	classifier model False Positive Rate (1 - specificity) at the above probability
18	pph2_TPR	classifier model True Positive Rate (sensitivity) at the above probability
19	pph2_FDR	classifier model False Discovery Rate at the above probability
UniProtKB/Swiss-Prot derived protein sequence annotations:
20	site	substitution SITE annotation
21	region	substitution REGION annotation
22	PHAT	PHAT matrix element for substitutions in the TRANSMEM region
Multiple sequence alignment scores:
23	dScore	difference of PSIC scores for two amino acid residue variants (Score1-Score2)
24	Score1	PSIC score for wild type amino acid residue (aa1)
25	Score2	PSIC score for mutant amino acid residue (aa2)
26	MSAv	version of the multiple sequence alignment used in conservation scores calculations: 1 - pairwise BLAST HSP (obsolete), 2 - MAFFT-Leon-Cluspack (default), 3 - MultiZ CDS
27	Nobs	number of residues observed at the substitution position in multiple alignment (without gaps)
Protein 3D structure features:
28	Nstruct	initial number of BLAST hits to similar proteins with 3D structures in PDB
29	Nfilt	number of 3D BLAST hits after identity threshold filtering
30	PDB_id	PDB protein structure identifier
31	PDB_ch	PDB polypeptide chain identifier
32	length	PDB sequence alignment length
33	PDB_pos	position of substitution in PDB protein sequence
34	ident	sequence identity between query sequence and aligned PDB sequence
35	dVol	change in residue side chain volume
36	dProp	change in solvent accessible surface propensity resulting from the substitution
37	SecStr	DSSP secondary structure assignment
38	MapReg	region of the phi-psi map (Ramachandran map) derived from the residue dihedral angles
39	NormASA	normalized accessible surface area
40	B-fact	normalized B-factor (temperature factor) for the residue
41	H-bonds	number of hydrogen sidechain-sidechain and sidechain-mainchain bonds formed by the residue
42	AveNHet	number of residue contacts with heteroatoms, average per homologous PDB chain
43	MinDHet	closest residue contact with a heteroatom, Å
44	AveNInt	number of residue contacts with other chains, average per homologous PDB chain
45	MinDInt	closest residue contact with other chain, Å
46	AveNSit	number of residue contacts with critical sites, average per homologous PDB chain
47	MinDSit	closest residue contact with a critical site, Å
Nucleotide sequence context features:
48	Transv	whether substitution is a transversion
49	CodPos	position of the substitution within a codon
50	CpG	whether substitution changes CpG context: 0 - non-CpG context retained, 1 - removes CpG site, 2 - creates new CpG site, 3 - CpG context retained
51	MinDJnc	substitution distance from closest exon / intron junction
Pfam protein family:
52	PfamHit	Pfam identifier of the query protein
Substitution scores:
53	IdPmax	maximum congruency of the mutant amino acid residue to all sequences in multiple alignment
54	IdPSNP	maximum congruency of the mutant amino acid residue to the sequences in multiple alignment with the mutant residue
55	IdQmin	query sequence identity with the closest homologue deviating from the wild type amino acid residue
Comments:
56	Comments	optional user comments, copied from input