====== PolyPhen-2 annotation summary report explained ====== Following is a description of **PolyPhen-2** annotation summary report. Reports in this format are produced by both PolyPhen-2 **Batch query** web service, as well as by **standalone** PolyPhen-2 software. It is a plain text tab-separated file with each line annotating single protein variant (amino acid residue substitution). Eleven columns highlighted below (1, 5-9, 12, 16-18, 56) are the ones included in the **Short** version of the report available via **Batch query** web page. These are sufficient if you are interested in PolyPhen-2 prediction outcome and prediction confidence scores. The rest of the columns in **Full** report version are mostly useful only if you want to investigate all features supporting the prediction in detail. ^ Column\\ No. ^ Column\\ Name ^ Description ^ | **Original query** (as copied from user input): ||| ^ 1 | o_acc | original protein identifier | | 2 | o_pos | original substitution position in the protein sequence | | 3 | o_aa1 | original wild type (reference) amino acid residue | | 4 | o_aa2 | original mutant (substitution) amino acid residue | | **Annotated query**: ||| ^ 5 | rsid | dbSNP reference SNP identifier (rsID) if available | ^ 6 | acc | UniProtKB accession if known protein, otherwise same as o_acc | ^ 7 | pos | substitution position in UniProtKB protein sequence, otherwise same as o_pos | ^ 8 | aa1 | wild type amino acid residue in relation to UniProtKB sequence | ^ 9 | aa2 | mutant amino acid residue in relation to UniProtKB sequence | | 10 | nt1 | wild type (reference) allele nucleotide | | 11 | nt2 | mutant allele nucleotide | | **PolyPhen-2 prediction outcome**: ||| ^ 12 | prediction | qualitative ternary classification appraised at 5%/10% (HumDiv) or 10%/20% (HumVar) FPR thresholds ("benign", "possibly damaging", "probably damaging")| | **PolyPhen-1 prediction description** (obsolete, please ignore): ||| | 13 | based_on | prediction basis | | 14 | effect | predicted substitution effect on the protein structure or function | | **PolyPhen-2 classifier outcome and scores**: ||| | 15 | pph2_class | probabilistic binary classifier outcome ("damaging" or "neutral") | ^ 16 | pph2_prob | classifier probability of the variation being damaging | ^ 17 | pph2_FPR | classifier model False Positive Rate (1 - specificity) at the above probability | ^ 18 | pph2_TPR | classifier model True Positive Rate (sensitivity) at the above probability | | 19 | pph2_FDR | classifier model False Discovery Rate at the above probability | | **UniProtKB/Swiss-Prot derived protein sequence annotations**: ||| | 20 | site | substitution SITE annotation | | 21 | region | substitution REGION annotation | | 22 | PHAT | PHAT matrix element for substitutions in the TRANSMEM region | | **Multiple sequence alignment scores**: ||| | 23 | dScore | difference of PSIC scores for two amino acid residue variants (Score1-Score2) | | 24 | Score1 | PSIC score for wild type amino acid residue (aa1) | | 25 | Score2 | PSIC score for mutant amino acid residue (aa2) | | 26 | MSAv | version of the multiple sequence alignment used in conservation scores calculations: 1 - pairwise BLAST HSP (obsolete), 2 - MAFFT-Leon-Cluspack (default), 3 - MultiZ CDS | | 27 | Nobs | number of residues observed at the substitution position in multiple alignment (without gaps) | | **Protein 3D structure features**: ||| | 28 | Nstruct | initial number of BLAST hits to similar proteins with 3D structures in PDB | | 29 | Nfilt | number of 3D BLAST hits after identity threshold filtering | | 30 | PDB_id | PDB protein structure identifier | | 31 | PDB_ch | PDB polypeptide chain identifier | | 32 | length | PDB sequence alignment length | | 33 | PDB_pos | position of substitution in PDB protein sequence | | 34 | ident | sequence identity between query sequence and aligned PDB sequence | | 35 | dVol | change in residue side chain volume | | 36 | dProp | change in solvent accessible surface propensity resulting from the substitution | | 37 | SecStr | DSSP secondary structure assignment | | 38 | MapReg | region of the phi-psi map (Ramachandran map) derived from the residue dihedral angles | | 39 | NormASA | normalized accessible surface area | | 40 | B-fact | normalized B-factor (temperature factor) for the residue | | 41 | H-bonds | number of hydrogen sidechain-sidechain and sidechain-mainchain bonds formed by the residue | | 42 | AveNHet | number of residue contacts with heteroatoms, average per homologous PDB chain | | 43 | MinDHet | closest residue contact with a heteroatom, Å | | 44 | AveNInt | number of residue contacts with other chains, average per homologous PDB chain | | 45 | MinDInt | closest residue contact with other chain, Å | | 46 | AveNSit | number of residue contacts with critical sites, average per homologous PDB chain | | 47 | MinDSit | closest residue contact with a critical site, Å | | **Nucleotide sequence context features**: ||| | 48 | Transv | whether substitution is a transversion | | 49 | CodPos | position of the substitution within a codon | | 50 | CpG | whether substitution changes CpG context: 0\ -\ non-CpG context retained, 1\ -\ removes CpG site, 2\ -\ creates new CpG site, 3\ -\ CpG context retained | | 51 | MinDJnc | substitution distance from closest exon / intron junction | | **Pfam protein family**: ||| | 52 | PfamHit | Pfam identifier of the query protein | | **Substitution scores**: ||| | 53 | IdPmax | maximum congruency of the mutant amino acid residue to all sequences in multiple alignment | | 54 | IdPSNP | maximum congruency of the mutant amino acid residue to the sequences in multiple alignment with the mutant residue | | 55 | IdQmin | query sequence identity with the closest homologue deviating from the wild type amino acid residue | | **Comments**: ||| ^ 56 | Comments | optional user comments, copied from input |