This package focuses on QTL mapping with multiple traits and on testing of pleiotrophy vs. close linkage when multiple traits map near each other, particularly for the of a trans-eQTL hotspot. We provide both exploratory plots and formal statistical tests aimed at dissecting trans-eQTL hotspots.
To illustrate the package, we use the genotype data from the
listeria
data set (an F\(_2\) population) provided with R/qtl
along with a set of simulated phenotypes, included in qtlpvl as the
data set fake.phenos
. The phenotypes were simulated using
two markers from chromosome 1 as QTL,
with the first QTL having an additive allelic effect, and with one of
the alleles at the second QTL being strictly dominant. There are 10
phenotypes. The first 5
are controlled by the first QTL, and the other 5 traits are controlled
by the second QTL (and with a negative and larger effect).
The 10 phenotypes were generated with these QTL effects plus
independent, normally distributed residual variation.
Treating these traits as gene expression measurements, we
assigned genomic positions at random. The phenotype data is stored
in matrix fake.phenos
and their positions are stored in data frame fake.probepos
.
We load the qtlpvl package, the simulated phenotype data, and the
listeria
data set as follows. (The R/qtl package is automatically
loaded when qtlpvl is loaded.)
library(qtlpvl)
#> Loading required package: qtl
#> Loading required package: Rcpp
#> Loading required package: plyr
data(fake.phenos)
data(listeria)
We will first perform single-trait analysis. For each
trait, we plot its LOD score versus its QTL position, when the LOD is
bigger than a threshold (the default value is 3). To do so we first
calculate the QTL genotype probabilities for listeria
and then use
the function plotLOD
. We will only scan chromosome 1, and we set the
variable chr=1
for use throughout.
listeria <- calc.genoprob(listeria, step=1)
chr <- 1
plotLOD(Y=fake.phenos, cross=listeria, chr=chr)
scanone.mvn
Having the cross object listeria
and phenotype trait matrix fake.phenos
loaded, we can perform joint, multi-trait QTL mapping with the
function scanone.mvn
. This assumes
a multivariate normal model. chr
is set to be “1”, which
indicates the mapping is only on chromosome 1. Function scanone.mvn
has two more parameters addcovar
and intcovar
that can be used
to control for additive and interactive covariates.
intcovar
will also be used as additive covariates during
mapping, and there is no need to manually add them into addcovar
.
The result of joint mapping is a data frame with class scanone
(that
is, in the same form as returned by the R/qtl function scanone
).
We can then use summary
and plot
to look at the results.
out <- scanone.mvn(cross=listeria, Y=fake.phenos, chr=chr)
summary(out)
#> chr pos lod
#> c1.loc78 1 78 63.5
plot(out)
With this single-QTL analysis, the QTL is estimated to be at 78.0 cM, with LOD score 63.5. This is closer to the second QTL, as in our simulation, the second QTL had a larger effect than the first QTL.
testpleio.1vs2
We now turn to the question of pleiotropy vs. close linkage. Specifically, we test the hypotheses:
We again assume that the residual variation in the traits follows a multivariate normal distribution. We perform joint, multivariate QTL mapping to locate the single QTL, under the null hypothesis that there is one pleiotropic QTL affecting all the traits. We then run single-trait analysis on each trait and find the trait-specific QTL, sort the traits by their estimated QTL position and search for the best separation of the traits into two groups, where the first group (contains the first several traits) is controlled by the right QTL and the second group (contains the rest traits) is controlled by the left QTL. The LOD score for this two-QTL model is then subtracted by the LOD score for the single-QTL model, to give the final test statistic, \(\text{LOD}_{2v1}\).
To get the null distribution of the statistics, we have two methods:
We repeat the entire procedure on data from either method and save the test statistics. The P-value is calculated from this empirical distribution.
The function testpleio.1vs2
is used to do this test. Input parameters:
cross
: An R/qtl cross
object.Y
: matrix of multiple traits, with samples in the row, traits in the
column.chr
: Character string referring to chromosome of interest by
name.region.l
, region.r
: left and right bounds for the interval of
interest.int.method
: method to
calculate the interval of interest if region.l
and region.r
is
not specified (either bayes
or 1.5lod
)search.method
: method for searching the two-QTL model (either
fast
or complete
; the default is fast
).RandomStart
: When search.method
is fast
, indicates whether to use a random starting point
for the search over two-QTL models; the default is TRUE
.RandomCut
: Indicates whether to use random cutting or not when there are traits mapped to
the same location. The default is FALSE
(traits mapped to the same
location will be bound together).simu.method
: method for determining the null distribution
(parametric
or permutation
; the default is parametric
).n.simu
: number of simulations for p-value.tol
: Tolerance value for the qr
decomposition in fitting the
linear models with lm
.addcovar
, intcovar
: Optional additive and interactive covariates to include
when mapping.This function will return a list with class testpleio.1vs2
. We can
use summary
and plot
to study the results.
The result of summary
includes the estimated QTL position and LOD score
for the single-QTL and two-QTL models, as well as the test statistic
\(\text{LOD}_{2v1}\) and its P-value.
As a quick illustration, we will perform just 10 bootstrap simulation replicates.
obj <- testpleio.1vs2(cross=listeria, Y=fake.phenos, chr=chr, n.simu=10,
region.l=60, region.r=90,
search.method="complete")
summary(obj)
#> Single QTL model: lod 63.46, pos 78.00 cM
#> Two QTLs model: lod 81.66, pos 70.11 cM, 81.40 cM
#> Traits influenced by the left QTL: 1 2 3 4 5
#> Traits influenced by the right QTL: 6 7 8 9 10
#> Difference of LOD score: 18.20
#> P-value is: 0 (from 10 simulations)
plot(obj)
The above figure shows the joint mapping result in black and the profile LOD curves for each of the two QTL under \(H_1\):
Triangles indicate the estimated positions of the QTL and solid points indicate QTL positions and LODs from single trait mapping.
We use plottrace
to see how \(\text{LOD}_{2v1}\) changes
when we move the cut-point of the left versus right group.
plottrace(obj)
There are 7 possible QTL positions from the above plot with the results of single-trait mapping. Thus we have 6 possible ways of grouping these positions into the left versus right group. The 6 dots indicate the possible cut-points and the corresponding value for \(\text{LOD}_{2v1}\).Thus when the first 5 traits and the last 5 traits are grouped separately, the \(\text{LOD}_{2v1}\) is the biggest, it is our best two QTL model. This figure can be used as a diagnostic, to see how our inferred two-QTL model compares to other possible models.
testpleio.1vsp
We could also test the hypotheses:
This function has fewer parameters than testpleio.1vs2
. The
arguments are cross
, Y
, chr
, addcovar
, intcovar
and
n.simu
, and their usage is
the same as before.
This function will return a list with class
testpleio.1vsp
. We can again use summary
and plot
to look at the results. The
summary includes the estimated QTL positions and LOD score for the single-QTL
and p-QTL models, as well as test statistics \(\text{LOD}_{pv1}\) and its
P-value.
obj2 <- testpleio.1vsp(cross=listeria, Y=fake.phenos, chr=chr, n.simu=10)
summary(obj2)
#> Single QTL model: lod 63.46, pos 78.00 cM
#> Multiple QTLs model: lod 82.28
#> Triat POS LOD
#> phenos1 1 70.11204 10.862842
#> phenos2 2 70.11204 10.174272
#> phenos3 3 69.00000 9.895711
#> phenos4 4 70.11204 14.309338
#> phenos5 5 68.00000 9.652426
#> phenos6 6 80.62324 18.919347
#> phenos7 7 79.00000 17.460231
#> phenos8 8 81.39623 17.944594
#> phenos9 9 82.00000 15.928201
#> phenos10 10 81.39623 16.367111
#>
#> Difference of LOD score: 18.82
#> P-value is: 0 (from 10 simulations)
plot(obj2)
plotGenetpattern
The function plotGenetpattern
takes two kinds of input parameters:
Y
and a genotype vector genotype
, the later
being genotypes at a common QTL.Y
and a cross object cross
and chromosome
number chr
. This is used when there is uncertainty in the QTL
locations for the traits. In this case, each trait is mapped
separately and the genotypes at corresponding estimated QTL
position are used.We'll use the labels B and R for the alleles in the cross. (These correspond to the strains in the main application that motivated this work). The additive QTL effect is defined as \(a=(\mu_{RR}-\mu_{BB})/2\). The dominance effect is \(d=\mu_{BR}-(\mu_{BB}+\mu_{RR})/2\). When plotted against each other, traits with pure additive effects are near the x-axis (\(d=0\)) and traits with dominant effects are along the diagonals, \(d=a\) (that is, \(\mu_{BR}=\mu_{RR}\)) and \(d=-a\) (that is, \(\mu_{BR}=\mu_{BB}\)).
Here, we'll make plots by each method: with a common QTL and with separate QTL for each trait.
par(mfrow=c(1,2))
qtlpos <- max(out)$pos
m <- find.pseudomarker(listeria, chr, qtlpos, "prob", addchr=FALSE)
qtlgeno <- apply(listeria$geno[[chr]]$prob[,m,], 1, which.max)
plotGenetpattern(Y=fake.phenos, genotype=qtlgeno, main="by common QTL genotype")
plotGenetpattern(Y=fake.phenos, cross=listeria, chr=chr,
main="by individual QTL genotype")
In both plots, we see one set of traits for which the QTL is dominant, with the B allele associated with larger average trait value (\(d \approx -a\) and \(a < 0\)), and another set of traits for which the QTL alleles are additive (\(d \approx 0\)), with the R allele associated with larger average trait value (\(a > 0\)). This is evidence for two QTL in the region, with different inheritance patterns.
plotLODsign
The function plotLODsign
gives a second exploratory plot to
display the direction of the QTL
effects. The input arguments are
cross
, Y
, and chr
, as well as addcovar
and intcovar
, all as above.
For each trait, the estimated additive QTL effect is used as the sign of the
LOD score. We first run single-trait mapping to obtain LOD scores and
estimated QTL
positions. We use the R/qtl function argmax.geno
to obtain imputed
QTL genotypes, and estimated the additive QTL effect as \(\mu_{RR} - \mu_{BB}\).
Only traits with LOD
score bigger than a threshold will be displayed, the default value of
the threshold (LOD.threshold
) is 3.
plotLODsign(Y=fake.phenos, cross=listeria, chr=chr)
We see that the traits mapping to the left side of the region all show positive QTL effects, with the R allele associated with larger average trait, while the traits mapping to the right side all show negative QTL effects, with the B allele associated with larger average trait. This is again evidence for two QTL in the region.
Tick marks at the bottom of the plot indicate the positions of genotyped markers.
We can pass these values to parameter map
, but the default is pull this information from the
input cross
object on the given chromosome, chr
.
If we had already performed QTL analysis on the traits,
we can pass the signed LOD scores and their
mapped positions to LODsign
and maxPOS
, respectively.
By skipping the QTL analysis of the traits, this can speed up the
procedure. The use of
this feature will be shown at the last section.
plottrans.LDA
In our final exploratory plot, we identify individuals with no recombination event in the region of the QTL. For these individuals, we will know their QTL genotype. We apply linear discriminant analysis to the traits with the QTL genotype as the label and plot the first two linear discriminants. If the QTL effect is strong, this should show three distinct clusters. We then calculate the linear discriminants for the individuals that show a recombinantion event in the region, using the same coffecients, and add them as points in the plot. If the recombinant individuals fall within the clusters defined by the non-recombinants, this is consistent with there being a single QTL. If, on the other hand, the recombinants look distinctly different from the non-recombinants, this suggests more than one QTL.
In the following code, we first find which individuals have no
recombinant event, and then call the function plottrans.LDA
with
phenotype matrix Y
, QTL genotype qtlgeno
and nonrecombinant IDs
nonrecomb
.
out <- out[out$chr==chr, ]
m <- which(out$pos >= qtlpos-5 & out$pos <= qtlpos+5)
g <- apply(listeria$geno[[chr]]$prob[,m,], 1:2, which.max)
nonrecomb <- which(sapply(apply(g, 1, unique), length) == 1)
names(nonrecomb) <- rownames(fake.phenos)[nonrecomb]
plottrans.LDA(Y=fake.phenos, qtlgeno, nonrecomb)
In the above plot, each point is a individual colored by its QTL genotype if it is non-recombinant, or yellow if it was recombinant. This particular example is not terribly informative about whether there is one or two QTL.
Lastly, we will demonstrate the entire procedure, as recommended for
the analysis of
expression data. We start by mapping each phenotype individually,
then count the number of eQTL in windows of 10~cM. When there seems
to be a trans-eQTL hotspot, we can build an object transband
with
the function
make.transbands
. This object contains basic information on the
transband: chromosome and position of the transband, number of eQTL in
the transband, and an estimate of the QTL position and LOD score under
the multivariate normal assumption. It also contains all of the information needed
for the exploratory plots: LOD scores and estimated QTL positions,
estimated additive and
dominance effects, QTL genotypes, and the IDs for the non-recombinant individuals.
By extracting such information from this object, we can run all of the
statistical tests and create the exploratory plots as a batch.
data(fake.probepos)
phenoname <- colnames(fake.phenos)
listeria$pheno <- data.frame(listeria$pheno, fake.phenos)
out <- scanone(listeria, pheno.col=phenoname, chr=1:19)
out1 <- convert_scan1(out, phenoname, chr=1:19)
marker.info <- get.marker.info(listeria, chr)
out.count <- count.trans(out1, fake.probepos, chr, marker.info)
par(mfrow=c(2,3))
plot(out.count, main="Count of trans-eQTL")
trans <- make.transbands(out1, fake.probepos, cross=listeria, chr=1:19,
mlratio = fake.phenos, lod.thr = 5,
trans.cM = 5, kernal.width = 1,
window.cM = 10, trans.count.thr = 0,
regn.cM = 5)
transband <- trans[[1]]
geno <- attr(transband, "geno")
nonrecomb <- attr(transband, "nonrecomb")
out <- attr(transband, "out")
map <- pull.map(listeria)
plot(obj, main="LOD profile")
plottrace(obj, main="Trace")
plotLODsign(maxPOS=out$pos, LODsign=sign(out$eff.a)*out$lod1, map=map[[chr]],
main="Signed LOD")
plotGenetpattern(a=out$eff.a, d=out$eff.d, main="Inheritance Pattern")
plottrans.LDA(Y=fake.phenos, geno, nonrecomb, main="LDA plot")
The results are as before, with one added plot in the top left: a count of traits with a trans-eQTL in a sliding 10 cM window.
The following indicates the set of R packages that were used, as well as the versions of R and the packages that we used.
sessionInfo()
#> R version 3.2.0 (2015-04-16)
#> Platform: x86_64-apple-darwin13.4.0 (64-bit)
#> Running under: OS X 10.10.2 (Yosemite)
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] qtlpvl_0.1-2 plyr_1.8.3 Rcpp_0.12.0 qtl_1.36-6
#>
#> loaded via a namespace (and not attached):
#> [1] XML_3.98-1.1 assertthat_0.1 digest_0.6.8
#> [4] MASS_7.3-40 bitops_1.0-6 jsonlite_0.9.16
#> [7] git2r_0.10.1 formatR_1.2 magrittr_1.5
#> [10] evaluate_0.7 broman_0.55-3 stringi_0.5-5
#> [13] devtools_1.8.0 RJSONIO_1.3-0 tools_3.2.0
#> [16] RPushbullet_0.2.0 stringr_1.0.0 RCurl_1.95-4.6
#> [19] parallel_3.2.0 rversions_1.0.0 memoise_0.2.1
#> [22] knitr_1.10.5