Databases available for BLAST2 search

Please refer also to which BLAST program is appropriate to search against which database?. Almost all databases are updated on a regularly basis (either daily or weekly, except the Genomic sets!).

Peptide Sequence Databases

Using Liisa Holm's nrdb program, the database "nrdb" is checked and clustered during this procedure by using 95%-identity criteria. Only one seq. of a cluster remains in the "nrdb95", thus near-neighbour redundant sequences are removed from the database nrdb95. The reduction in sequence number is huge: 209190 sequences in "nrdb95" in comparison to 329796 sequences in nrdb (state: 02.09.1998). This should speed up your homology search. This database will be updated on a daily basis.

All non-redundant Protein Database SwissProt+SwissProtNew+SptremblNew+Sptrembl.
Actually, nonidentical sequences extracted from the above databases. It has fewer sequences (on 1.4.1998), 72558 sequences less) than nrdb (which is more exhaustive, 303,844 sequences), but mostly due to reduction of redundancy that is not identified by the NCBI's nrdb program. The sensitivity thus should increase.

All non-identical protein sequences extracted from EMBL CDS translations+PDB+SwissProt+PIR
Although more sequences than in sp_nrdb and thus more exhautive, the redundancy is larger than in sp_nrdb which might cause loss of sensitivity.

always the latest major release of the SWISS-PROT protein sequence database

The Homo sapiens sequences in the last major release of SWISS-PROT + SwissProtNew + Sptrembl + SptremblNew, daily updated.

Sequences derived from the 3-dimensional structure Brookhaven Protein Data Bank. This db is daily updated and all sequences only containing "X" and all theoret. Modell-sequences are removed from this db.

Proteomics (22.Jan.1999)

Genomes                             old no.(Dec.1996)
All Genomic Protein Sets                  40845
Aquifex aeolicus (p)                       1522
Archaeoglobus fulgidus (p)                 2407
Bacillus subtilis (p)                      4100
Borrelia burgdorferi (p)                    850
Chlamydia trachomatis (p)                   894
Escherichia coli (p)                       4289
Haemophilus influenzae (p)                 1709       1680 
Helicobacter pylori (p)                    1565
Helicobacter pylori J99 (p)                1491
Methanobacterium thermoautotrophicum (p)   1869
Methanococcus jannaschii (p)               1715       1735 
Mycobacterium tuberculosis (p)             3918
Mycoplasma genitalium (p)                   480        468  
Mycoplasma pneumoniae (p)                   677        300
Pyrococcus horikoshii (p)                  2064 
Rickettsia prowazekii (p)                   834
Saccharomyces cerevisiae (p)               6261       8692 
Synechocystis pcc6803 (p)                  3169       3168
Treponema pallidum (p)                     1031

Nucleotide Sequence Databases

NCBI's UniGene (Homo sapiens set). Unique Gene Sequence Collection for Human. DNA db (ref.->:NCBI's UniGene). This is actually the set of all the UniGene clusters.

NCBI's UniGene (Mus musculus set). Unique Gene Sequence Collection for Mouse. DNA db (ref.->:NCBI's UniGene). This is actually the set of all the UniGene clusters.

hs_nrest (due to the limited memory size on dove, nrdb failed to update this last state was 23.Sep.2000:
The Homo sapiens part of the non-redundant Database of EMBL+GenBank+DDBJ EST Divisions

All Non-redundant Nucleotides Sequences Database (EMBL+GenBank+DDBJ sequences, but no EST's or STS's)

Non-redundant Database of EMBL+GenBank+DDBJ EST Divisions