====== PolyPhen-2 annotation summary report explained ======

Following is a description of **PolyPhen-2** annotation summary report. Reports in this format are produced by both PolyPhen-2 **Batch query** web service, as well as by **standalone** PolyPhen-2 software. It is a plain text tab-separated file with each line annotating single protein variant (amino acid residue substitution).

Eleven columns highlighted below (1, 5-9, 12, 16-18, 56) are the ones included in the **Short** version of the report available via **Batch query** web page. These are sufficient if you are interested in PolyPhen-2 prediction outcome and prediction confidence scores. The rest of the columns in **Full** report version are mostly useful only if you want to investigate all features supporting the prediction in detail.

^  Column\\ No.  ^  Column\\ Name  ^ Description  ^
| **Original query** (as copied from user input):   |||
^   1 |  o_acc       | original protein identifier |
|   2 |  o_pos       | original substitution position in the protein sequence |
|   3 |  o_aa1       | original wild type (reference) amino acid residue |
|   4 |  o_aa2       | original mutant (substitution) amino acid residue |
| **Annotated query**:   |||
^   5 |  rsid        | dbSNP reference SNP identifier (rsID) if available |
^   6 |  acc         | UniProtKB accession if known protein, otherwise same as o_acc |
^   7 |  pos         | substitution position in UniProtKB protein sequence, otherwise same as o_pos |
^   8 |  aa1         | wild type amino acid residue in relation to UniProtKB sequence  |
^   9 |  aa2         | mutant amino acid residue in relation to UniProtKB sequence |
|  10 |  nt1         | wild type (reference) allele nucleotide |
|  11 |  nt2         | mutant allele nucleotide |
| **PolyPhen-2 prediction outcome**:   |||
^  12 |  prediction  | qualitative ternary classification appraised at 5%/10% (HumDiv) or 10%/20% (HumVar) FPR thresholds ("benign", "possibly damaging", "probably damaging")|
| **PolyPhen-1 prediction description** (obsolete, please ignore):   |||
|  13 |  based_on    | prediction basis |
|  14 |  effect      | predicted substitution effect on the protein structure or function |
| **PolyPhen-2 classifier outcome and scores**:   |||
|  15 |  pph2_class  | probabilistic binary classifier outcome ("damaging" or "neutral") |
^  16 |  pph2_prob   | classifier probability of the variation being damaging |
^  17 |  pph2_FPR    | classifier model False Positive Rate (1 - specificity) at the above probability |
^  18 |  pph2_TPR    | classifier model True Positive Rate (sensitivity) at the above probability |
|  19 |  pph2_FDR    | classifier model False Discovery Rate at the above probability |
| **UniProtKB/Swiss-Prot derived protein sequence annotations**:   |||
|  20 |  site        | substitution SITE annotation |
|  21 |  region      | substitution REGION annotation |
|  22 |  PHAT        | PHAT matrix element for substitutions in the TRANSMEM region |
| **Multiple sequence alignment scores**:   |||
|  23 |  dScore      | difference of PSIC scores for two amino acid residue variants (Score1-Score2) |
|  24 |  Score1      | PSIC score for wild type amino acid residue (aa1) |
|  25 |  Score2      | PSIC score for mutant amino acid residue (aa2) |
|  26 |  MSAv        | version of the multiple sequence alignment used in conservation scores calculations: 1 - pairwise BLAST HSP (obsolete), 2 - MAFFT-Leon-Cluspack (default), 3 - MultiZ CDS |
|  27 |  Nobs        | number of residues observed at the substitution position in multiple alignment (without gaps) |
| **Protein 3D structure features**:   |||
|  28 |  Nstruct     | initial number of BLAST hits to similar proteins with 3D structures in PDB |
|  29 |  Nfilt       | number of 3D BLAST hits after identity threshold filtering |
|  30 |  PDB_id      | PDB protein structure identifier |
|  31 |  PDB_ch      | PDB polypeptide chain identifier |
|  32 |  length      | PDB sequence alignment length |
|  33 |  PDB_pos     | position of substitution in PDB protein sequence |
|  34 |  ident       | sequence identity between query sequence and aligned PDB sequence |
|  35 |  dVol        | change in residue side chain volume |
|  36 |  dProp       | change in solvent accessible surface propensity resulting from the substitution |
|  37 |  SecStr      | DSSP secondary structure assignment |
|  38 |  MapReg      | region of the phi-psi map (Ramachandran map) derived from the residue dihedral angles |
|  39 |  NormASA     | normalized accessible surface area |
|  40 |  B-fact      | normalized B-factor (temperature factor) for the residue |
|  41 |  H-bonds     | number of hydrogen sidechain-sidechain and sidechain-mainchain bonds formed by the residue |
|  42 |  AveNHet     | number of residue contacts with heteroatoms, average per homologous PDB chain |
|  43 |  MinDHet     | closest residue contact with a heteroatom, Å |
|  44 |  AveNInt     | number of residue contacts with other chains, average per homologous PDB chain |
|  45 |  MinDInt     | closest residue contact with other chain, Å |
|  46 |  AveNSit     | number of residue contacts with critical sites, average per homologous PDB chain |
|  47 |  MinDSit     | closest residue contact with a critical site, Å |
| **Nucleotide sequence context features**:   |||
|  48 |  Transv      | whether substitution is a transversion |
|  49 |  CodPos      | position of the substitution within a codon |
|  50 |  CpG         | whether substitution changes CpG context: 0\ -\ non-CpG context retained, 1\ -\ removes CpG site, 2\ -\ creates new CpG site, 3\ -\ CpG context retained |
|  51 |  MinDJnc     | substitution distance from closest exon / intron junction |
| **Pfam protein family**:   |||
|  52 |  PfamHit     | Pfam identifier of the query protein |
| **Substitution scores**:   |||
|  53 |  IdPmax      | maximum congruency of the mutant amino acid residue to all sequences in multiple alignment |
|  54 |  IdPSNP      | maximum congruency of the mutant amino acid residue to the sequences in multiple alignment with the mutant residue |
|  55 |  IdQmin      | query sequence identity with the closest homologue deviating from the wild type amino acid residue |
| **Comments**:   |||
^  56 |  Comments    | optional user comments, copied from input |