Introduction to qtlpvl

This package focuses on QTL mapping with multiple traits and on testing of pleiotrophy vs. close linkage when multiple traits map near each other, particularly for the of a trans-eQTL hotspot. We provide both exploratory plots and formal statistical tests aimed at dissecting trans-eQTL hotspots.

Data

To illustrate the package, we use the genotype data from the listeria data set (an F$_2$ population) provided with R/qtl along with a set of simulated phenotypes, included in qtlpvl as the data set fake.phenos. The phenotypes were simulated using two markers from chromosome 1 as QTL, with the first QTL having an additive allelic effect, and with one of the alleles at the second QTL being strictly dominant. There are 10 phenotypes. The first 5 are controlled by the first QTL, and the other 5 traits are controlled by the second QTL (and with a negative and larger effect). The 10 phenotypes were generated with these QTL effects plus independent, normally distributed residual variation. Treating these traits as gene expression measurements, we assigned genomic positions at random. The phenotype data is stored in matrix fake.phenos and their positions are stored in data frame fake.probepos.

We load the qtlpvl package, the simulated phenotype data, and the listeria data set as follows. (The R/qtl package is automatically loaded when qtlpvl is loaded.)

library(qtlpvl)

#> Loading required package: qtl
#> Loading required package: Rcpp
#> Loading required package: plyr

data(fake.phenos)
data(listeria)

We will first perform single-trait analysis. For each trait, we plot its LOD score versus its QTL position, when the LOD is bigger than a threshold (the default value is 3). To do so we first calculate the QTL genotype probabilities for listeria and then use the function plotLOD. We will only scan chromosome 1, and we set the variable chr=1 for use throughout.

listeria <- calc.genoprob(listeria, step=1)
chr <- 1
plotLOD(Y=fake.phenos, cross=listeria, chr=chr)

LOD score and QTL positions for each of the simulated traits

joint mapping: `scanone.mvn`

Having the cross object listeria and phenotype trait matrix fake.phenos loaded, we can perform joint, multi-trait QTL mapping with the function scanone.mvn. This assumes a multivariate normal model. chr is set to be “1”, which indicates the mapping is only on chromosome 1. Function scanone.mvn has two more parameters addcovar and intcovar that can be used to control for additive and interactive covariates. intcovar will also be used as additive covariates during mapping, and there is no need to manually add them into addcovar.

The result of joint mapping is a data frame with class scanone (that is, in the same form as returned by the R/qtl function scanone). We can then use summary and plot to look at the results.

out <- scanone.mvn(cross=listeria, Y=fake.phenos, chr=chr)
summary(out)

#>          chr pos  lod
#> c1.loc78   1  78 63.5

plot(out)

Joint mapping result under multivariate normal model for all the simulated triats

With this single-QTL analysis, the QTL is estimated to be at 78.0 cM, with LOD score 63.5. This is closer to the second QTL, as in our simulation, the second QTL had a larger effect than the first QTL.

Statistical Testing

test of 1 vs 2 QTL: `testpleio.1vs2`

We now turn to the question of pleiotropy vs. close linkage. Specifically, we test the hypotheses:

$H_0$: There is only one QTL affecting all the phenotypes;
$H_1$: There are two QTL, with each phenotype affected by one or the other QTL.

We again assume that the residual variation in the traits follows a multivariate normal distribution. We perform joint, multivariate QTL mapping to locate the single QTL, under the null hypothesis that there is one pleiotropic QTL affecting all the traits. We then run single-trait analysis on each trait and find the trait-specific QTL, sort the traits by their estimated QTL position and search for the best separation of the traits into two groups, where the first group (contains the first several traits) is controlled by the right QTL and the second group (contains the rest traits) is controlled by the left QTL. The LOD score for this two-QTL model is then subtracted by the LOD score for the single-QTL model, to give the final test statistic, $\text{LOD}_{2v1}$.

To get the null distribution of the statistics, we have two methods:

parametric bootstrap: generate data from the parameters estimated under the null hypothesis
stratified permutation: randomly permute the genotype data versus phenotype data within each QTL genotype group,

We repeat the entire procedure on data from either method and save the test statistics. The P-value is calculated from this empirical distribution.

The function testpleio.1vs2 is used to do this test. Input parameters:

cross: An R/qtl cross object.
Y: matrix of multiple traits, with samples in the row, traits in the column.
chr: Character string referring to chromosome of interest by name.
region.l, region.r: left and right bounds for the interval of interest.
int.method: method to calculate the interval of interest if region.l and region.r is not specified (either bayes or 1.5lod)
search.method: method for searching the two-QTL model (either fast or complete; the default is fast).
RandomStart: When search.method is fast, indicates whether to use a random starting point for the search over two-QTL models; the default is TRUE.
RandomCut: Indicates whether to use random cutting or not when there are traits mapped to the same location. The default is FALSE (traits mapped to the same location will be bound together).
simu.method: method for determining the null distribution (parametric or permutation; the default is parametric).
n.simu: number of simulations for p-value.
tol: Tolerance value for the qr decomposition in fitting the linear models with lm.
addcovar, intcovar: Optional additive and interactive covariates to include when mapping.

This function will return a list with class testpleio.1vs2. We can use summary and plot to study the results. The result of summary includes the estimated QTL position and LOD score for the single-QTL and two-QTL models, as well as the test statistic $\text{LOD}_{2v1}$ and its P-value.

As a quick illustration, we will perform just 10 bootstrap simulation replicates.

obj <- testpleio.1vs2(cross=listeria, Y=fake.phenos, chr=chr, n.simu=10,
                      region.l=60, region.r=90,
                      search.method="complete")
summary(obj)

#> Single QTL model: lod 63.46, pos 78.00 cM
#> Two QTLs model: lod 81.66, pos 70.11 cM, 81.40 cM
#>   Traits influenced by the left QTL: 1 2 3 4 5
#>   Traits influenced by the right QTL: 6 7 8 9 10
#> Difference of LOD score: 18.20
#> P-value is: 0 (from 10 simulations)

plot(obj)

LOD profile plot for result of test1vs2 on simulated traits

The above figure shows the joint mapping result in black and the profile LOD curves for each of the two QTL under $H_1$:

red line: $L_1(\lambda_1) = \max_{\lambda_2} LOD2(\lambda_1, \lambda_2)$
blue line: $L_2(\lambda_2) = \max_{\lambda_1} LOD2(\lambda_1, \lambda_2)$

Triangles indicate the estimated positions of the QTL and solid points indicate QTL positions and LODs from single trait mapping.

We use plottrace to see how $\text{LOD}_{2v1}$ changes when we move the cut-point of the left versus right group.

plottrace(obj)

$plot of LOD$_{2v1}$ by cut-point$

There are 7 possible QTL positions from the above plot with the results of single-trait mapping. Thus we have 6 possible ways of grouping these positions into the left versus right group. The 6 dots indicate the possible cut-points and the corresponding value for $\text{LOD}_{2v1}$.Thus when the first 5 traits and the last 5 traits are grouped separately, the $\text{LOD}_{2v1}$ is the biggest, it is our best two QTL model. This figure can be used as a diagnostic, to see how our inferred two-QTL model compares to other possible models.

test of 1 vs p QTL: `testpleio.1vsp`

We could also test the hypotheses:

$H_0$: There is only one QTL affecting all the phenotypes;
$H_1$: There are p QTL, each affecting one of the phenotypes (i.e., each phenotype is allowed to have its own QTL).

This function has fewer parameters than testpleio.1vs2. The arguments are cross, Y, chr, addcovar, intcovar and n.simu, and their usage is the same as before.

This function will return a list with class testpleio.1vsp. We can again use summary and plot to look at the results. The summary includes the estimated QTL positions and LOD score for the single-QTL and p-QTL models, as well as test statistics $\text{LOD}_{pv1}$ and its P-value.

obj2 <- testpleio.1vsp(cross=listeria, Y=fake.phenos, chr=chr, n.simu=10)
summary(obj2)

#> Single QTL model: lod 63.46, pos 78.00 cM
#> Multiple QTLs model: lod 82.28
#>          Triat      POS       LOD
#> phenos1      1 70.11204 10.862842
#> phenos2      2 70.11204 10.174272
#> phenos3      3 69.00000  9.895711
#> phenos4      4 70.11204 14.309338
#> phenos5      5 68.00000  9.652426
#> phenos6      6 80.62324 18.919347
#> phenos7      7 79.00000 17.460231
#> phenos8      8 81.39623 17.944594
#> phenos9      9 82.00000 15.928201
#> phenos10    10 81.39623 16.367111
#> 
#> Difference of LOD score: 18.82
#> P-value is: 0 (from 10 simulations)

plot(obj2)

plot for result of testpleio.1vsp, showing joing mapping LOD curve and max LOD score for each trait seperately

Exploratry plots

plot of genetic pattern: `plotGenetpattern`

The function plotGenetpattern takes two kinds of input parameters:

a phenotype matrix Y and a genotype vector genotype, the later being genotypes at a common QTL.
a phenotype matrix Y and a cross object cross and chromosome number chr. This is used when there is uncertainty in the QTL locations for the traits. In this case, each trait is mapped separately and the genotypes at corresponding estimated QTL position are used.

We'll use the labels B and R for the alleles in the cross. (These correspond to the strains in the main application that motivated this work). The additive QTL effect is defined as $a=(\mu_{RR}-\mu_{BB})/2$. The dominance effect is $d=\mu_{BR}-(\mu_{BB}+\mu_{RR})/2$. When plotted against each other, traits with pure additive effects are near the x-axis ($d=0$) and traits with dominant effects are along the diagonals, $d=a$ (that is, $\mu_{BR}=\mu_{RR}$) and $d=-a$ (that is, $\mu_{BR}=\mu_{BB}$).

Here, we'll make plots by each method: with a common QTL and with separate QTL for each trait.

par(mfrow=c(1,2))
qtlpos <- max(out)$pos
m <- find.pseudomarker(listeria, chr, qtlpos, "prob", addchr=FALSE)
qtlgeno <- apply(listeria$geno[[chr]]$prob[,m,], 1, which.max)

plotGenetpattern(Y=fake.phenos, genotype=qtlgeno, main="by common QTL genotype")
plotGenetpattern(Y=fake.phenos, cross=listeria, chr=chr,
                 main="by individual QTL genotype")

plot of inheritance pattern for all the simulated traits

In both plots, we see one set of traits for which the QTL is dominant, with the B allele associated with larger average trait value ($d \approx -a$ and $a < 0$), and another set of traits for which the QTL alleles are additive ($d \approx 0$), with the R allele associated with larger average trait value ($a > 0$). This is evidence for two QTL in the region, with different inheritance patterns.

plot signed LOD score: `plotLODsign`

The function plotLODsign gives a second exploratory plot to display the direction of the QTL effects. The input arguments are cross, Y, and chr, as well as addcovar and intcovar, all as above.

For each trait, the estimated additive QTL effect is used as the sign of the LOD score. We first run single-trait mapping to obtain LOD scores and estimated QTL positions. We use the R/qtl function argmax.geno to obtain imputed QTL genotypes, and estimated the additive QTL effect as $\mu_{RR} - \mu_{BB}$. Only traits with LOD score bigger than a threshold will be displayed, the default value of the threshold (LOD.threshold) is 3.

plotLODsign(Y=fake.phenos, cross=listeria, chr=chr)

plot of signed LOD score for all the simulated triats

We see that the traits mapping to the left side of the region all show positive QTL effects, with the R allele associated with larger average trait, while the traits mapping to the right side all show negative QTL effects, with the B allele associated with larger average trait. This is again evidence for two QTL in the region.

Tick marks at the bottom of the plot indicate the positions of genotyped markers. We can pass these values to parameter map, but the default is pull this information from the input cross object on the given chromosome, chr.

If we had already performed QTL analysis on the traits, we can pass the signed LOD scores and their mapped positions to LODsign and maxPOS, respectively. By skipping the QTL analysis of the traits, this can speed up the procedure. The use of this feature will be shown at the last section.

Linear discriminant plot: `plottrans.LDA`

In our final exploratory plot, we identify individuals with no recombination event in the region of the QTL. For these individuals, we will know their QTL genotype. We apply linear discriminant analysis to the traits with the QTL genotype as the label and plot the first two linear discriminants. If the QTL effect is strong, this should show three distinct clusters. We then calculate the linear discriminants for the individuals that show a recombinantion event in the region, using the same coffecients, and add them as points in the plot. If the recombinant individuals fall within the clusters defined by the non-recombinants, this is consistent with there being a single QTL. If, on the other hand, the recombinants look distinctly different from the non-recombinants, this suggests more than one QTL.

In the following code, we first find which individuals have no recombinant event, and then call the function plottrans.LDA with phenotype matrix Y, QTL genotype qtlgeno and nonrecombinant IDs nonrecomb.

out <- out[out$chr==chr, ]
m <- which(out$pos >= qtlpos-5 & out$pos <= qtlpos+5)
g <- apply(listeria$geno[[chr]]$prob[,m,], 1:2, which.max)
nonrecomb <- which(sapply(apply(g, 1, unique), length) == 1)
names(nonrecomb) <- rownames(fake.phenos)[nonrecomb]
plottrans.LDA(Y=fake.phenos, qtlgeno, nonrecomb)

LDA plot for the simulated triats, non-recombinant mice are colored as yellow

In the above plot, each point is a individual colored by its QTL genotype if it is non-recombinant, or yellow if it was recombinant. This particular example is not terribly informative about whether there is one or two QTL.

Analysis of a trans-eQTL hotspot with gene positions

Lastly, we will demonstrate the entire procedure, as recommended for the analysis of expression data. We start by mapping each phenotype individually, then count the number of eQTL in windows of 10~cM. When there seems to be a trans-eQTL hotspot, we can build an object transband with the function make.transbands. This object contains basic information on the transband: chromosome and position of the transband, number of eQTL in the transband, and an estimate of the QTL position and LOD score under the multivariate normal assumption. It also contains all of the information needed for the exploratory plots: LOD scores and estimated QTL positions, estimated additive and dominance effects, QTL genotypes, and the IDs for the non-recombinant individuals. By extracting such information from this object, we can run all of the statistical tests and create the exploratory plots as a batch.

data(fake.probepos)
phenoname <- colnames(fake.phenos)
listeria$pheno <- data.frame(listeria$pheno, fake.phenos)
out <- scanone(listeria, pheno.col=phenoname, chr=1:19)
out1 <- convert_scan1(out, phenoname, chr=1:19)

marker.info <- get.marker.info(listeria, chr)
out.count <- count.trans(out1, fake.probepos, chr, marker.info)

par(mfrow=c(2,3))
plot(out.count, main="Count of trans-eQTL")

trans <- make.transbands(out1, fake.probepos, cross=listeria, chr=1:19,
                         mlratio = fake.phenos, lod.thr = 5,
                         trans.cM = 5, kernal.width = 1,
                         window.cM = 10, trans.count.thr = 0,
                         regn.cM = 5)
transband <- trans[[1]]

geno <- attr(transband, "geno")
nonrecomb <- attr(transband, "nonrecomb")
out <- attr(transband, "out")
map <- pull.map(listeria)

plot(obj, main="LOD profile")
plottrace(obj, main="Trace")
plotLODsign(maxPOS=out$pos, LODsign=sign(out$eff.a)*out$lod1, map=map[[chr]],
            main="Signed LOD")
plotGenetpattern(a=out$eff.a, d=out$eff.d, main="Inheritance Pattern")
plottrans.LDA(Y=fake.phenos, geno, nonrecomb, main="LDA plot")

Summary plot for a trans-eQTL hotspot using simulated data

The results are as before, with one added plot in the top left: a count of traits with a trans-eQTL in a sliding 10 cM window.

Session Info

The following indicates the set of R packages that were used, as well as the versions of R and the packages that we used.

sessionInfo()

#> R version 3.2.0 (2015-04-16)
#> Platform: x86_64-apple-darwin13.4.0 (64-bit)
#> Running under: OS X 10.10.2 (Yosemite)
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] qtlpvl_0.1-2 plyr_1.8.3   Rcpp_0.12.0  qtl_1.36-6  
#> 
#> loaded via a namespace (and not attached):
#>  [1] XML_3.98-1.1      assertthat_0.1    digest_0.6.8     
#>  [4] MASS_7.3-40       bitops_1.0-6      jsonlite_0.9.16  
#>  [7] git2r_0.10.1      formatR_1.2       magrittr_1.5     
#> [10] evaluate_0.7      broman_0.55-3     stringi_0.5-5    
#> [13] devtools_1.8.0    RJSONIO_1.3-0     tools_3.2.0      
#> [16] RPushbullet_0.2.0 stringr_1.0.0     RCurl_1.95-4.6   
#> [19] parallel_3.2.0    rversions_1.0.0   memoise_0.2.1    
#> [22] knitr_1.10.5

Introduction to qtlpvl

Data

joint mapping: scanone.mvn

Statistical Testing

test of 1 vs 2 QTL: testpleio.1vs2

test of 1 vs p QTL: testpleio.1vsp