Note that you can also use the newer CBaSE v1.2, which is available for download as a standalone python script with documentation.
CBaSE v1.1 can be run in two modes, set by the command line argument model:
- model=0: All six possible models are fitted and model selection is done based on the Akaike information criterion (default).
- model>0: Only the model with index model of the six possible functional forms is fitted.
Unzip the CBaSEv1.1.zip file and run the following command within the folder:
python CBaSE_v1.1.py pathInputFile auxFolder context model dataName
Command line arguments:
(1) pathInputFile: Path to file containing somatic mutation data (see below).
(2) auxFolder: Folder containing program and auxiliary files (default "Auxiliary").
(3) context: Sequence context size used to compute the cancer-type-specific mutation matrix; one of {0=trinucleotides, 1=pentanucleotides}.
(4) model: Model assumption for the distribution of expected synonymous mutation counts; one of {0,1,2,3,4,5,6}, where 0 corresponds to the default option that compares all six possible models.
(5) dataName: Name of the dataset used to identify output files (e.g. cancer type).
Input file format:
Gene symbol | Mutation effect | Alternate allele | Context index |
---|---|---|---|
ECE1 | coding-synon | C | 26 |
SAMD11 | missense | A | 53 |
TNFRSF4 | nonsense | A | 52 |
(1) Gene symbol: corresponds to the official gene symbol as used in the UCSC knownGene track.
(2) Mutation effect: one of {“missense”, “nonsense”, “coding-synon”}, denoting missense, nonsense (stop-gain and stop-loss), and synonymous mutations, respectively.
(3) Alternate allele: the final allele after the mutation event; one of {A, C, G, T}.
(4) Context index: index of the sequence context of the reference allele; 0-based indices of tri- or pentanucleotide contexts can be found here.
Output file format:
CBaSE writes the output, including the q-values for negative and positive selection, into the file "q_values_dataName.txt", located in the folder "Output". The columns contain the following values:
gene | gene symbol, as provided in the input file |
p_phi_neg | p-value of the negative selection signal |
q_phi_neg | q-value of the negative selection signal |
phi_neg | meta-statistic phi of negative selection |
p_phi_pos | p-value of the positive selection signal |
q_phi_pos | q-value of the positive selection signal |
phi_pos | meta-statistic phi of positive selection |
m_obs | observed number of missense mutations |
k_obs | observed number of nonsense mutations |
s_obs | observed number of synonymous mutations |
To sort by positive (negative) selection signal, sorting by the meta-statistic phi_pos (phi_neg) in decreasing order can break ties when several q-values take on the value zero.
The CBaSE v1.1 web tool can be found here. Note that CBaSE v1.2 is only available as a standalone python script, downloadable here.
If you find our tool useful, please cite Weghorn & Sunyaev, Nature Genetics (2017).