====== PolyPhen-2 annotation summary report explained ====== Following is a description of **PolyPhen-2** annotation summary report. Reports in this format are produced by both PolyPhen-2 **Batch query** web service, as well as by **standalone** PolyPhen-2 software. It is a plain text tab-separated file with each line annotating single protein variant (amino acid residue substitution). Fourteen columns highlighted below (1-2, 6-11, 41, 45-48, 91) are the ones included in the **Short** version of the report available via **Batch query** web page. These are sufficient if you are interested in PolyPhen-2 prediction outcome and prediction confidence scores. The rest of the columns in **Full** report version are mostly useful only if you want to investigate all features supporting the prediction in detail. ^ Column\\ No. ^ Column\\ Name ^ Description ^ | **Original query** (as copied from user input): ||| ^ 1 | query_no | input query ordinal | ^ 2 | o_acc | original protein identifier | | 3 | o_pos | original substitution position in the protein sequence | | 4 | o_aa1 | original wild type (reference) amino acid residue | | 5 | o_aa2 | original mutant (substitution) amino acid residue | | **Annotated query**: ||| ^ 6 | rsid | dbSNP reference SNP identifier (rsID) if available | ^ 7 | acc | UniProtKB accession if known protein, otherwise same as o_acc | ^ 8 | length | Length of the protein sequence | ^ 9 | pos | substitution position in UniProtKB protein sequence, otherwise same as o_pos | ^ 10 | aa1 | wild type amino acid residue in relation to UniProtKB sequence | ^ 11 | aa2 | mutant amino acid residue in relation to UniProtKB sequence | | **Nucleotide sequence context annotations**: ||| | 12 | chr_pos | SNP chromosome:position (chromosome coordinates are 1-based) | | 13 | str | transcript strand ("+" or "-") | | 14 | gene | gene symbol | | 15 | transcript | UCSC transcript name (unique identifier) | | 16 | canon | UCSC knowCanonical representative transcript flag: 1\ -\ canonical, 0\ -\ alternative | | 17 | cid | UCSC knownCanonical cluster identifier (number) | | 18 | txcov | transcript coverage, the number of transcipts in UCSC cluster overlapping the mutation position / total number of transcripts in the cluster | | 19 | ntlen | full transcript length (number of nucleotides) | | 20 | ntpos | mutation position in the full transcript nucleotide sequence (in the direction of transcription) | | 21 | nt1 | reference nucleotide (transcript strand) | | 22 | nt2 | variant nucleotide (transcript strand) | | 23 | PtGgPaNl | orthologous alleles in chimp\ (Pt), gorilla\ (Gg), Orangutan\ (Pa) and gibbon\ (Nl) if different from human reference allele, ?\ -\ otherwise; .\ -\ data not available | | 24 | dref | putative derived allele found in human reference, score: 0\ -\ no evidence, 1\ -\ variant allele matches orthologous ancestral allele, 2\ -\ dbSNP minor allele matches reference allele, 3\ -\ both dbSNP and orthologous evidence present, ?\ -\ not enough evidence to score | | 25 | gerprs | Genomic Evolutionary Rate Profiling (GERP++) position-specific conservation score, RS; 0\ -\ when alignment coverage is insufficient | | 26 | phylop | conservation scoring by phyloP (phylogenetic p-values) from the [[http://compgen.bscb.cornell.edu/phast/|PHAST package]] for multiple alignments of 99 vertebrate genomes to the human genome | | 27 | trv | transversion mutation flag: 0\ -\ transition, 1\ -\ transversion | | 28 | CpG | CpG context: 0\ -\ non-CpG context retained, 1\ -\ mutation removes CpG site, 2\ -\ mutation creates new CpG site, 3\ -\ CpG context retained: C(C/G)G substitution | | 29 | JXmin | distance from mutation position to the nearest exon / intron junction ("-" for upstream, "+" for downstream) | | 30 | JXc | mutation in a codon that is split across two exons: ?\ -\ no, 1\ -\ yes | | 31 | exon | mutation in exon # / of total exons (exons are enumerated in the direction of transcription) | | 32 | cexon | same as above but only coding (CDS) exons are being enumerated | | 33 | cdnpos | number of the mutated codon within transcript's CDS (1-base) | | 34 | frame | mutation position offset within the codon (0..2) | | 35 | dgn | degeneracy index for mutated codon position, by Nei & Kumar (2000) "Molecular Evolution and Phylogenetics", page 64: 0\ -\ non-degenerate, 2\ -\ simple 2-fold degenerate, 3\ -\ complex 2-fold degenerate, 4\ -\ 4-fold degenerate | | 36 | cdn1 | reference codon | | 37 | cdn2 | mutated codon | | **dbNSP annotations**: ||| | 38 | dbrsid | dbSNP SNP rsID | | 39 | dbminor | dbSNP minor allele nucleotide (transcript strand) | | 40 | dbmaf | dbSNP minor allele frequency | | **PolyPhen-2 prediction outcome**: ||| ^ 41 | prediction | qualitative ternary classification appraised at 5%/10% (HumDiv) or 10%/15% (HumVar) False Discovery Rate (FDR) thresholds: "benign", "possibly damaging", "probably damaging" | | **PolyPhen-1 prediction description** (obsolete, please ignore): ||| | 42 | based_on | prediction basis | | 43 | effect | predicted substitution effect on the protein structure or function | | **PolyPhen-2 classifier outcome and scores**: ||| | 44 | pph2_class | probabilistic binary classifier outcome: "damaging" or "neutral" | ^ 45 | pph2_prob | classifier probability of the variation being damaging | ^ 46 | pph2_FPR | classifier model False Positive Rate (1 - specificity) at the above probability | ^ 47 | pph2_TPR | classifier model True Positive Rate (sensitivity) at the above probability | ^ 48 | pph2_FDR | classifier model False Discovery Rate at the above probability | | **UniProtKB/Swiss-Prot/Pfam protein annotations**: ||| | 49 | PfamHit | Pfam identifier of the protein family or domain to which substitution maps | | 50 | site | substitution SITE annotation | | 51 | region | substitution REGION annotation | | 52 | PHAT | PHAT matrix element for substitutions in the TRANSMEM region | | **Multiple sequence alignment scores**: ||| | 53 | dScore | difference of PSIC scores for two amino acid residue variants (Score1-Score2) | | 54 | Score1 | PSIC score for wild type amino acid residue (aa1) | | 55 | Score2 | PSIC score for mutant amino acid residue (aa2) | | 56 | MSAv | version of the multiple sequence alignment used in conservation scores calculations: 1 - pairwise BLAST HSP (obsolete), 2 - MAFFT-Leon-Cluspack (default), 3 - MultiZ CDS | | 57 | Nobs | number of residues observed at the substitution position in multiple alignment (without gaps) | | 58 | Nseqs | number of sequences observed at the substitution position in multiple alignment (including gaps) | | 59 | Nsubs | number of residues different from reference residue (aa1) observed at the substitution position in multiple alignment | | 60 | Nvars | number of residues same as substitution residue (aa2) observed at the substitution position in multiple alignment (without gaps) | | 61 | Nres | number of unique residues observed at the substitution position in multiple alignment (without gaps) | | **Substitution scores**: ||| | 62 | IdPmax | maximum congruency of the substitution amino acid residue across all sequences with a substitution at the substitution position in multiple alignment | | 63 | IdPSNP | maximum congruency of the substitution amino acid residue to the sequences in multiple alignment with the substitution residue at the substitution position in multiple alignment | | 64 | IdQmax | query sequence identity with the closest homologue deviating from the wild type amino acid residue (aa1) | | **Phylogenetic tree based scores**: ||| | 65 | DistPmin | minimum normalized distance along the phylogenetic tree across all substitution types encountered at the substitution position | | 66 | DistPSNP | minimum normalized distance along the phylogenetic tree for substitution residues (aa2) encountered at the substitution position | | 67 | DistQmin | minimum distance (sum of branch lengths) along the phylogenetic tree across all substitution types encountered at the substitution position | | 68 | BaRE | Bayesian Rate Estimator for scoring evolutionary conservation, D.M. Jordan (2015) | | **Gene-based scores**: ||| | 69 | RVISraw | Residual Variation Intolerance Score (raw), Petrovski //et al.// (2013) | | 70 | RVISranked | Residual Variation Intolerance Score (normalized by rank), Petrovski //et al.// (2013) | | **RCSB PDB annotations**: ||| | 71 | Nstruct | initial number of BLAST hits to similar proteins with 3D structures in PDB | | 72 | Nfilt | number of 3D BLAST hits after identity threshold filtering | | 73 | PDB_id | PDB protein structure identifier | | 74 | PDB_ch | PDB polypeptide chain identifier | | 75 | PDB_len | PDB sequence alignment length | | 76 | PDB_pos | position of substitution in the PDB protein sequence | | 77 | PDB_idn | sequence identity between query sequence and the aligned PDB sequence | | **Amino acid residues structural features**: ||| | 78 | dVol | change in residue side chain volume | | 79 | dProp | change in solvent accessible surface propensity resulting from the substitution | | **Protein 3D structure features**: ||| | 80 | SecStr | DSSP secondary structure assignment | | 81 | MapReg | region of the phi-psi map (Ramachandran map) derived from the residue dihedral angles | | 82 | NormASA | normalized accessible surface | | 83 | B-fact | normalized B-factor (temperature factor) for the residue | | 84 | H-bonds | number of hydrogen sidechain-sidechain and sidechain-mainchain bonds formed by the residue | | 85 | AveNHet | number of residue contacts with heteroatoms, average per homologous PDB chain | | 86 | MinDHet | closest residue contact with a heteroatom, Å | | 87 | AveNInt | number of residue contacts with other chains, average per homologous PDB chain | | 88 | MinDInt | closest residue contact with other chain, Å | | 89 | AveNSit | number of residue contacts with critical sites, average per homologous PDB chain | | 90 | MinDSit | closest residue contact with a critical site, Å | | **Comments**: ||| ^ 91 | Comments | optional user comments, copied from input |