Download

CBaSE v1.1

Note that you can also use the newer CBaSE v1.2, which is available for download as a standalone python script with documentation.


Running CBaSE v1.1

CBaSE v1.1 can be run in two modes, set by the command line argument model:
- model=0: All six possible models are fitted and model selection is done based on the Akaike information criterion (default).
- model>0: Only the model with index model of the six possible functional forms is fitted.

How to run CBaSE_v1.1.py

Unzip the CBaSEv1.1.zip file and run the following command within the folder:

python CBaSE_v1.1.py pathInputFile auxFolder context model dataName

Command line arguments:

(1) pathInputFile: Path to file containing somatic mutation data (see below).
(2) auxFolder: Folder containing program and auxiliary files (default "Auxiliary").
(3) context: Sequence context size used to compute the cancer-type-specific mutation matrix; one of {0=trinucleotides, 1=pentanucleotides}.
(4) model: Model assumption for the distribution of expected synonymous mutation counts; one of {0,1,2,3,4,5,6}, where 0 corresponds to the default option that compares all six possible models.
(5) dataName: Name of the dataset used to identify output files (e.g. cancer type).

Input file format:

Gene symbolMutation effect Alternate allele Context index
ECE1 coding-synon C 26
SAMD11 missense A 53
TNFRSF4 nonsense A 52

(1) Gene symbol: corresponds to the official gene symbol as used in the UCSC knownGene track.
(2) Mutation effect: one of {“missense”, “nonsense”, “coding-synon”}, denoting missense, nonsense (stop-gain and stop-loss), and synonymous mutations, respectively.
(3) Alternate allele: the final allele after the mutation event; one of {A, C, G, T}.
(4) Context index: index of the sequence context of the reference allele; 0-based indices of tri- or pentanucleotide contexts can be found here.

Output file format:

CBaSE writes the output, including the q-values for negative and positive selection, into the file "q_values_dataName.txt", located in the folder "Output". The columns contain the following values:

gene    gene symbol, as provided in the input file
p_phi_neg    p-value of the negative selection signal
q_phi_neg    q-value of the negative selection signal
phi_neg    meta-statistic phi of negative selection
p_phi_pos    p-value of the positive selection signal
q_phi_pos    q-value of the positive selection signal
phi_pos    meta-statistic phi of positive selection
m_obs    observed number of missense mutations
k_obs    observed number of nonsense mutations
s_obs    observed number of synonymous mutations

To sort by positive (negative) selection signal, sorting by the meta-statistic phi_pos (phi_neg) in decreasing order can break ties when several q-values take on the value zero.


CBaSE web tool

The CBaSE v1.1 web tool can be found here. Note that CBaSE v1.2 is only available as a standalone python script, downloadable here.


How to cite

If you find our tool useful, please cite Weghorn & Sunyaev, Nature Genetics (2017).