Update: The pentanucleotide model option was causing an error message and has been fixed, as well as two other rare exceptions. Note that previously in rare cases, individual highly positively selected genes may have not been included in the final output file (but shown in the log).
python CBaSE_v1.0.py input_filename path_aux_folder context model
Command line arguments:
File containing somatic mutation data (see below).
Path to folder containing auxiliary files (default "Input").
Context used to compute cancer-type-specific mutation matrix; 0=trinucleotides, 1=pentanucleotides.
Model assumption for the distribution of expected synonymous mutation counts; one of [1,2,3,4,5,6].
Input file format:
|Gene symbol||Mutation effect||Mutated nucleotide||Context index|
(1) Gene symbol – corresponds to the official gene symbol as used in the UCSC knownGene track.
(2) Mutation effect – one of [“missense”, “nonsense”, “coding-synon”, “utr-3”, “utr-5”], denoting missense, nonsense (stop-gain and stop-loss), synonymous, 3'-UTR and 5'-UTR mutations, respectively.
(3) Mutated nucleotide – one of [A, C, G, T].
(4) Context index – 0-based indices of tri- or pentanucleotide contexts can be found here.
(1) Fitted model distribution of per-gene expected synonymous mutation counts.
(2) Gene-specific q-values of negative and positive selection.
The CBaSE web tool can be found here.
Please cite Weghorn & Sunyaev, Nature Genetics (2017).