Twin analysis

\( \newcommand{\cov}{\mathbb{C}\text{ov}} \newcommand{\cor}{\mathbb{C}\text{or}} \newcommand{\var}{\mathbb{V}\text{ar}} \newcommand{\E}{\mathbb{E}} \newcommand{\unitfrac}[2]{#1/#2} \newcommand{\n}{} \)

Mets package

This document provides a brief tutorial to analyzing twin data using the mets package:

library("mets")
options(warn=-1)

The development version may be installed from github, i.e., with the devtools package:

devtools::install_github("kkholst/lava")
devtools::install_github("kkholst/mets")

Twin analysis, continuous traits

In the following we examine the heritability of Body Mass Index\n{}korkeila_bmi_1991 hjelmborg_bmi_2008, based on data on self-reported BMI-values from a random sample of 11,411 same-sex twins. First, we will load data

data("twinbmi")
head(twinbmi)
         tvparnr      bmi      age gender zyg
100001.1  100001 26.33289 57.57974   male  DZ
100002.1  100002 28.65014 57.04860   male  MZ
100003.1  100003 28.40909 57.67830   male  DZ
100004.1  100004 27.25089 53.51677   male  DZ
100005.1  100005 27.77778 52.57495   male  DZ
100006.1  100006 28.04282 52.57221   male  DZ

The data is on long format with one subject per row.

tvparnr
twin id
bmi
Body Mass Index (\(\unitfrac{kg}{m^2}\))
age
Age (years)
gender
Gender factor (male,female)
zyg
zygosity (MZ,DZ)

we transpose the data allowing us to do pairwise analyses

twinwide <- fast.reshape(twinbmi, id="tvparnr",varying=c("bmi"))
head(twinwide)
         tvparnr     bmi1      age gender zyg     bmi2
100001.1  100001 26.33289 57.57974   male  DZ 25.46939
100002.1  100002 28.65014 57.04860   male  MZ       NA
100003.1  100003 28.40909 57.67830   male  DZ       NA
100004.1  100004 27.25089 53.51677   male  DZ 28.07504
100005.1  100005 27.77778 52.57495   male  DZ       NA
100006.1  100006 28.04282 52.57221   male  DZ 22.30936

Next we plot the association within each zygosity group

library("cowplot")

scatterdens <- function(x) {    
    sp <- ggplot(x,  
                aes_string(colnames(x)[1], colnames(x)[2])) +
        theme_minimal() +
        geom_point(alpha=0.3) + geom_density_2d()
    xdens <- ggplot(x, aes_string(colnames(x)[1],fill=1)) + 
        theme_minimal() +
        geom_density(alpha=.5)+ 
        theme(axis.text.x = element_blank(),
              legend.position = "none") + labs(x=NULL)
    ydens <- ggplot(x, aes_string(colnames(x)[2],fill=1)) +
        theme_minimal() +
        geom_density(alpha=.5) +
        theme(axis.text.y = element_blank(),
              axis.text.x = element_text(angle=90, vjust=0),
              legend.position = "none") +
        labs(x=NULL) +
        coord_flip()
    g <- plot_grid(xdens,NULL,sp,ydens,
                  ncol=2,nrow=2,
                  rel_widths=c(4,1.4),rel_heights=c(1.4,4))    
    return(g)
}

We here show the log-transformed data which is slightly more symmetric and more appropiate for the twin analysis (see Figure fig:scatter1 and fig:scatter2)

mz <- log(subset(twinwide, zyg=="MZ")[,c("bmi1","bmi2")])
scatterdens(mz)

scatter1.png

Figure 1: Scatter plot of logarithmic BMI measurements in MZ twins

dz <- log(subset(twinwide, zyg=="DZ")[,c("bmi1","bmi2")])
scatterdens(dz)

scatter2.png

Figure 2: Scatter plot of logarithmic BMI measurements in DZ twins.

The plots and raw association measures shows considerable stronger dependence in the MZ twins, thus indicating genetic influence of the trait

cor.test(mz[,1],mz[,2], method="spearman")
	Spearman's rank correlation rho

data:  mz[, 1] and mz[, 2]
S = 165460000, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.6956209
cor.test(dz[,1],dz[,2], method="spearman")
	Spearman's rank correlation rho

data:  dz[, 1] and dz[, 2]
S = 2162500000, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.4012686

Ńext we examine the marginal distribution (GEE model with working independence)

l0 <- lm(bmi ~ gender + I(age-40), data=twinbmi)
estimate(l0, id=twinbmi$tvparnr)
            Estimate Std.Err   2.5%  97.5%   P-value
(Intercept)   23.369 0.05453 23.262 23.476  0.00e+00
gendermale     1.407 0.07322  1.264  1.551  2.35e-82
I(age - 40)    0.118 0.00479  0.108  0.127 2.00e-133
library("splines")
l1 <- lm(bmi ~ gender*ns(age,3), data=twinbmi)
marg1 <- estimate(l1, id=twinbmi$tvparnr)
dm <- Expand(twinbmi,
            bmi=0,
            gender=c("male"),
            age=seq(33,61,length.out=50))
df <- Expand(twinbmi,
            bmi=0,
            gender=c("female"),
            age=seq(33,61,length.out=50))

plot(marg1, function(p) model.matrix(l1,data=dm)%*%p, 
     data=dm["age"], ylab="BMI", xlab="Age",
     ylim=c(22,26.5))
plot(marg1, function(p) model.matrix(l1,data=df)%*%p,  
     data=df["age"], col="red", add=TRUE)
legend("bottomright", c("Male","Female"), 
       col=c("black","red"), lty=1, bty="n")

marg1.png

Figure 3:

Polygenic model

Decompose outcome into

\begin{align*} Y_i = A_i + D_i + C + E_i, \quad i=1,2 \end{align*}
\(A\)
Additive genetic effects of alleles
\(D\)
Dominante genetic effects of alleles
\(C\)
Shared environmental effects
\(E\)
Unique environmental genetic effects

Dissimilarity of MZ twins arises from unshared environmental effects only! \(\cor(E_1,E_2)=0\) and

\begin{align*} \cor(A_1^{MZ},A_2^{MZ}) = 1, \quad \cor(D_1^{MZ},D_2^{MZ}) = 1, \end{align*} \begin{align*} \cor(A_1^{DZ},A_2^{DZ}) = 0.5, \quad \cor(D_1^{DZ},D_2^{DZ}) = 0.25, \end{align*} \begin{align*} Y_i = A_i + C_i + D_i + E_i \end{align*} \begin{align*} A_i \sim\mathcal{N}(0,\sigma_A^2), C_i \sim\mathcal{N}(0,\sigma_C^2), D_i \sim\mathcal{N}(0,\sigma_D^2), E_i \sim\mathcal{N}(0,\sigma_E^2) \end{align*} \begin{gather*} \cov(Y_{1},Y_{2}) = \\ \begin{pmatrix} \sigma_A^2 & 2\Phi\sigma_A^2 \\ 2\Phi\sigma_A^2 & \sigma_A^2 \end{pmatrix} + \begin{pmatrix} \sigma_C^2 & \sigma_C^2 \\ \sigma_C^2 & \sigma_C^2 \end{pmatrix} + \begin{pmatrix} \sigma_D^2 & \Delta_{7}\sigma_D^2 \\ \Delta_{7}\sigma_D^2 & \sigma_D^2 \end{pmatrix} + \begin{pmatrix} \sigma_E^2 & 0 \\ 0 & \sigma_E^2 \end{pmatrix} \end{gather*}
dd <- na.omit(twinbmi)
l0 <- twinlm(bmi ~ age+gender, data=dd, 
            DZ="DZ", zyg="zyg", id="tvparnr", type="sat")
l <- twinlm(bmi ~ ns(age,1)+gender, data=twinbmi, 
           DZ="DZ", zyg="zyg", id="tvparnr", type="cor", missing=TRUE)
summary(l)
____________________________________________________
Group 1
                        Estimate Std. Error   Z value Pr(>|z|)
Regressions:                                                  
   bmi.1~ns(age, 1).1    4.08914    0.16354  25.00328   <1e-12
   bmi.1~gendermale.1    1.41143    0.07285  19.37536   <1e-12
Intercepts:                                                   
   bmi.1                22.57414    0.07187 314.08431   <1e-12
Additional Parameters:                                        
   log(var)              2.44584    0.01425 171.68385   <1e-12
   atanh(rhoMZ)          0.78216    0.02290  34.15832   <1e-12
____________________________________________________
Group 2
                        Estimate Std. Error   Z value Pr(>|z|)
Regressions:                                                  
   bmi.1~ns(age, 1).1    4.08914    0.16354  25.00328   <1e-12
   bmi.1~gendermale.1    1.41143    0.07285  19.37536   <1e-12
Intercepts:                                                   
   bmi.1                22.57414    0.07187 314.08431   <1e-12
Additional Parameters:                                        
   log(var)              2.44584    0.01425 171.68385   <1e-12
   atanh(rhoDZ)          0.29927    0.01848  16.19766   <1e-12

                       Estimate 2.5%    97.5%  
Correlation within MZ: 0.65394  0.62750 0.67888
Correlation within DZ: 0.29064  0.25715 0.32344

'log Lik.' -29020.35 (df=6)
AIC: 58052.71 
BIC: 58093.76

A formal test of genetic effects can be obtained by comparing the MZ and DZ correlation:

estimate(l,contr(5:6,6))
                          Estimate Std.Err  2.5% 97.5% P-value
[1@atanh(rhoMZ)] - [4....    0.483  0.0418 0.401 0.565 6.4e-31

 Null Hypothesis: 
  [1@atanh(rhoMZ)] - [4@atanh(rhoDZ)] = 0
l <- twinlm(bmi ~ ns(age,1)+gender, data=twinbmi, 
           DZ="DZ", zyg="zyg", id="tvparnr", type="cor", missing=TRUE)
summary(l)
____________________________________________________
Group 1
                        Estimate Std. Error   Z value Pr(>|z|)
Regressions:                                                  
   bmi.1~ns(age, 1).1    4.08914    0.16354  25.00328   <1e-12
   bmi.1~gendermale.1    1.41143    0.07285  19.37536   <1e-12
Intercepts:                                                   
   bmi.1                22.57414    0.07187 314.08431   <1e-12
Additional Parameters:                                        
   log(var)              2.44584    0.01425 171.68385   <1e-12
   atanh(rhoMZ)          0.78216    0.02290  34.15832   <1e-12
____________________________________________________
Group 2
                        Estimate Std. Error   Z value Pr(>|z|)
Regressions:                                                  
   bmi.1~ns(age, 1).1    4.08914    0.16354  25.00328   <1e-12
   bmi.1~gendermale.1    1.41143    0.07285  19.37536   <1e-12
Intercepts:                                                   
   bmi.1                22.57414    0.07187 314.08431   <1e-12
Additional Parameters:                                        
   log(var)              2.44584    0.01425 171.68385   <1e-12
   atanh(rhoDZ)          0.29927    0.01848  16.19766   <1e-12

                       Estimate 2.5%    97.5%  
Correlation within MZ: 0.65394  0.62750 0.67888
Correlation within DZ: 0.29064  0.25715 0.32344

'log Lik.' -29020.35 (df=6)
AIC: 58052.71 
BIC: 58093.76

Twin analysis, censored outcomes

Twin analysis, binary traits

Time to event

Bibliography

Author: Klaus Holst & Thomas Scheike

Created: 2017-02-26 Sun 17:52