Software

This page describes software for tests for pooled association of rare variants with a phenotype, see Price, Kryukov et al., Am J Hum Genet (2010).

R Script

The reference implementation in R is available for download.

The R script implements VT test for pooled association of rare variants with a phenotype, as well as T1, T5, and WE (Madsen-Browning) tests, optionally weighted with PolyPhen-2 scores (see the paper for details).

For speed, the script supports three modes of execution: local on single CPU, multicore, and Sun Grid Engine cluster, see options below.

NOTE: currently, the script is configured to run the tests on a single gene.

Usage:
Rscript rareVariantTests.R -p <permutations> -n <nodes> -a <phenotypeFile> -b <snpWeightFile> -c <genotypeFile> [–multicore] [–seed <seed>]

where:

  • <permutations> is an integer number of permutations to perform
  • <nodes> is an integer number of nodes in the SunGridEngine to use (set 0 to run the script locally, without a cluster)
  • <phenotypeFile> is the name of a file with lines of format: individualID phenotypeValue
  • <snpWeightFile> is the name of a file with lines of format (weight is between 0 and 1): snpid weight
  • <genotypeFile> is the name of a file with lines of format (genotype is the number of rare alleles of the snp in the individual, typically one of {0,1,2}): individualID snpid genotype
  • –multicore is an optional flag that indicates that multiple CPUs of the machine can be used for computations (using this flag implies that the cluster is not to be used)
  • <seed> is an optional random seed value

Example 1 (run on multicore):

Rscript --slave rareVariantTests.R -p 100000 -n 0 -a AHITUV/data1.ind0 -b AHITUV/data1.csnp0 -c AHITUV/data1.cgeno0 --multicore

Example 2 (run on a single CPU):

Rscript --slave rareVariantTests.R -p 100000 -n 0 -a AHITUV/data1.ind0 -b AHITUV/data1.csnp0 -c AHITUV/data1.cgeno0

Example 3 (run on Sun Grid Engine cluster - split the permutations into 50 parts):

Rscript --slave rareVariantTests.R -p 100000 -n 50 -a AHITUV/data1.ind0 -b AHITUV/data1.csnp0 -c AHITUV/data1.cgeno0

Output: p-values for (see paper for explanation)

  • score1, score1P - test T1 and T1P
  • score2, score2P - test T5 and T5P
  • score3, score3P - test WE and WEP
  • score4, score4P - test VT and VTP

Example session

Rscript rareVariantTests.R -p 100000 -n 0 -a AHITUV/data1.ind0 -b AHITUV/data1.csnp0 -c AHITUV/data1.cgeno0 --multicore

Welcome to Rsge
    Version: 0.6.3 

  score1  score1P score2  score2P   score3  score3P   score4  score4P
1     24 16.50323     22 15.50323 619.7794 421.4031 2.800978 3.258393
running  100000  permutations on  8  cores  
counts of how often permuted data has higher value than unpermuted data
  score1 score1P score2 score2P score3 score3P score4 score4P
1   3197    1305   5391    2396   1052     460    924     188
p-values
      score1    score1P     score2    score2P     score3     score3P
1 0.03197968 0.01305987 0.05391946 0.02396976 0.01052989 0.004609954
       score4     score4P
1 0.009249908 0.001889981

Weights Conversion

A simple Perl script that converts PolyPhen-2 Bayesian probabilistic prediction scores to VT weights is also available for download. To install, unpack the archive in a separate directory and run the Perl script with PolyPhen-2 scores (pph2_prob) as input, for example:

$ tar vxzf pph2vt.tar.gz
$ grep -v ^# pph2-short.txt | cut -f8 | ./pph2vt.pl

where pph2-short.txt is a file containing prediction results in PolyPhen-2 Batch Query Web interface format.

To obtain built-in help use the following command:

$ perldoc -F pph2vt.pl
Last modified: 2011/08/30 01:00
   
 
 
Except where otherwise noted, content on this wiki is licensed under the following license:Public Domain