Dataset:
  PolyPhen-2 annotations for whole human RefSeq sequence space
  (WHRESS)

Source databases:
  UCSC GRCh37/hg19 knownGene annotations (08-Oct-2009)
  MultiZ46Way multiple alignments of 45 vertebrate genomes with hg19/GRCh37 human genome (08-Oct-2009)
  UniProtKB/Swiss-Prot/UniRef100 Release 2011_12 (14-Dec-2011)
  NCBI Homo sapiens Annotation Release 104 (02-Nov-2012):
    ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/protein/protein.fa.gz

Software:
  PolyPhen-2 v2.2.2r398

Description:
  The dataset consists of PolyPhen-2 annotations for 157,907,066
  amino acid residue substitutions encoded by a putative single-nucleotide
  codon change, enumerated for each sequence position in the 35,586
  NCBI RefSeq proteins. The database format and contents are similar
  to the ones utilized by the Whole Human Exome Sequence Space (WHESS)
  database released earlier.

File:
  ========================================
  polyphen-2.2.2-whress-2012_11.sqlite.bz2
  File size:     8.2 GB
  Uncompressed: 73.0 GB
  ========================================

  This is a complete set of annotations loaded into a database in SQLite v3 format.
  The database scheme is identical to the one used for the WHESS database.

  SQLite usage example:
  
  $ sqlite3 -header -column polyphen-2.2.2-whress-2012_11.sqlite 
  SQLite version 3.7.15.2 2013-01-09 11:53:05
  Enter ".help" for instructions
  Enter SQL statements terminated with a ";"
  sqlite> SELECT chrom||':'||chrpos AS chrpos,refa,txname||strand AS txname,gene,nt1,nt2,refs_acc,cdnpos,aa1,aa2,hdiv_prediction,hdiv_prob,hvar_prediction,hvar_prob
    ...> FROM features JOIN scores USING(id)
    ...> WHERE gene='MAP2K1' AND hdiv_prediction LIKE '%damaging%' ORDER BY hdiv_prob DESC LIMIT 10;
  chrpos      refa        txname       gene        nt1         nt2         refs_acc     cdnpos      aa1         aa2         hdiv_prediction    hdiv_prob   hvar_prediction    hvar_prob 
  ----------  ----------  -----------  ----------  ----------  ----------  -----------  ----------  ----------  ----------  -----------------  ----------  -----------------  ----------
                          uc010bhq.2+  MAP2K1                              NP_002746.1  6           P           H           probably damaging  1.0         probably damaging  0.972     
  chr15:6667  CA          uc010bhq.2+  MAP2K1      C           A           NP_002746.1  6           P           Q           probably damaging  1.0         probably damaging  0.979     
  chr15:6667  CG          uc010bhq.2+  MAP2K1      C           G           NP_002746.1  6           P           R           probably damaging  1.0         probably damaging  0.979     
                          uc010bhq.2+  MAP2K1                              NP_002746.1  17          G           W           probably damaging  1.0         probably damaging  0.957     
  chr15:6672  CT          uc010bhq.2+  MAP2K1      C           T           NP_002746.1  49          R           C           probably damaging  1.0         probably damaging  0.997     
  chr15:6672  GA          uc010bhq.2+  MAP2K1      G           A           NP_002746.1  49          R           H           probably damaging  1.0         probably damaging  0.996     
                          uc010bhq.2+  MAP2K1                              NP_002746.1  49          R           M           probably damaging  1.0         probably damaging  0.997     
  chr15:6672  GC          uc010bhq.2+  MAP2K1      G           C           NP_002746.1  49          R           P           probably damaging  1.0         probably damaging  0.998     
                          uc010bhq.2+  MAP2K1                              NP_002746.1  49          R           W           probably damaging  1.0         probably damaging  0.997     
  chr15:6672  TG          uc010bhq.2+  MAP2K1      T           G           NP_002746.1  53          F           C           probably damaging  1.0         probably damaging  0.982     
  sqlite> .q

Notes:
1) Missing (NULL) values in "chrom", "chrpos", "refa", "nt1" and "nt2" columns indicate
   substitutions which cannot result from a single-nucleotide change in the context of
   the corresponding transcript's nucleotide sequence. They are included because initially,
   substitution are enumerated for the protein sequences alone, without taking into
   consideration transcript nucleotide sequences.
2) Original RefSeq identifiers are stored in the "refs_acc" column; original RefSeq
   sequence positions can be found in the "cdnpos" column. All RefSeq identifiers
   include unique version numbers. Use SQL LIKE operator when you want to ignore
   version numbers in your search, e.g.: SELECT ... WHERE refs_acc LIKE 'NP_002746.%'

Released:
  16-Apr-2013

Contacts:
  Ivan Adzhubey   <ivan_adzhubey@hms.harvard.edu>
  Shamil Sunyaev  <ssunyaev@hms.harvard.edu>
[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory  -  
[TXT]README.html2024-11-22 13:10 4.8K 
[   ]polyphen-2.2.2-whress-2012_11.sqlite.bz22013-04-17 16:29 8.2G 

Apache/2.4.58 (Ubuntu) Server at genetics.bwh.harvard.edu Port 80