Protein sequence to be used in PolyPhen query should be pasted in the Amino acid sequence in FASTA format text area of the input form which, as the name implies, accepts only sequences that follow the FASTA format specification described below. More information about FASTA format can be found here.

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The definition line (defline) is distinguished from the sequence data by a greater-than (>) symbol at the beginning. The word following the ">" symbol is the identifier of the sequence, and the rest of the line is the description (optional). Normally, identifiers are simply protein accession, name or Entrez gi's (e.g., Q5I7T1, AG10B_HUMAN, 129295), but a bar-separated NCBI sequence identifier (e.g., gi|129295) will also be accepted. Any arbitrary user-specified sequence identifier can also be used (e.g., CLONE00073452) but you are advised to use sufficiently long unique words in such case. There should be no space between the ">" and the first letter of the identifier. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA format is:


Blank lines are not allowed in the middle of FASTA input.

Sequences are expected to be represented in the standard IUB/IUPAC amino acid and codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., X for unknown amino acid residue).

The IUB/IUPAC amino acid codes are:

		A  alanine               P  proline
		B  aspartate/asparagine  Q  glutamine
		C  cystine               R  arginine
		D  aspartate             S  serine
		E  glutamate             T  threonine
		F  phenylalanine         U  selenocysteine
		G  glycine               V  valine
		H  histidine             W  tryptophan
		I  isoleucine            Y  tyrosine
		K  lysine                Z  glutamate/glutamine
		L  leucine               X  any
		M  methionine            *  translation stop
		N  asparagine            -  gap of indeterminate length
1 U in protein sequences is replaced by X first before the search since it is not specified in any scoring matrices.
2 PolyPhen will not accept "-" in the query. To represent gaps, use a string of N or X instead.