The Stochastic Process Model (SPM) was developed several decades ago [1,2], and applied for analyses of clinical, demographic, epidemiologic longitudinal data as well as in many other studies that relate stochastic dynamics of repeated measures to the probability of end-points (outcomes). SPM links the dynamic of stochastical variables with a hazard rate as a quadratic function of the state variables [3]. The R-package, “stpm”, is a set of utilities to estimate parameters of stochastic process and modeling survival trajectories and time-to-event outcomes observed from longitudinal studies. It is a general framework for studying and modeling survival (censored) traits depending on random trajectories (stochastic paths) of variables.
require(devtools)
devtools::install_github("izhbannikov/stpm")
If you experience errors during installation, please download a binary file from the following url:
Than, execute this command (from R environment):
install.packages("<path to the downloaded r-package stpm>", repos=NULL, type="binary")
Data represents a typical longitudinal data in form of two datasets: longitudinal dataset (follow-up studies), in which one record represents a single observation, and vital (survival) statistics, where one record represents all information about the subject. Longitudinal dataset cat contain a subject ID (identification number), status (event(1)/censored(0)), time and measurements across the variables. The stpm
can handle an infinite number of variables but in practice, 5-7 variables is enough.
Below there is an example of clinical data that can be used in stpm
and we will discuss the fields later.
Longitudinal table:
## ID IndicatorDeath Age DBP BMI
## 1 1 0 30 80.00000 25.00000
## 2 1 0 32 80.51659 26.61245
## 3 1 0 34 77.78412 29.16790
## 4 1 0 36 77.86665 32.40359
## 5 1 0 38 96.55673 31.92014
## 6 1 0 40 94.48616 32.89139
Vital statistics table:
## ID IsDead LSmort
## 1 1 1 85.34578
## 2 2 1 80.55053
## 3 3 1 98.07315
## 4 4 1 81.29779
## 5 5 1 89.89829
## 6 6 1 72.47687
There are two main SPM types in the package: discrete-time model [4] and continuous-time model [3]. Discrete model assumes equal intervals between follow-up observations. The example of discrete dataset is given below.
library(stpm)
data <- simdata_discr(N=10) # simulate data for 10 individuals
head(data)
## id xi t1 t2 y1 y1.next
## [1,] 1 0 30 31 80.00000 79.35596
## [2,] 1 0 31 32 79.35596 81.28550
## [3,] 1 0 32 33 81.28550 70.68784
## [4,] 1 0 33 34 70.68784 71.12490
## [5,] 1 0 34 35 71.12490 64.77792
## [6,] 1 0 35 36 64.77792 61.64584
In this case there are equal intervals between \(t_1\) and \(t_2\).
In the continuous-time SPM, in which intervals between observations are not equal (arbitrary or random). The example of such dataset is shown below:
library(stpm)
data <- simdata_cont(N=5) # simulate data for 5 individuals
head(data)
## id xi t1 t2 y1 y1.next
## [1,] 0 0 33.65423 35.50419 79.39082 85.55678
## [2,] 0 0 35.50419 36.57512 85.55678 80.40500
## [3,] 0 0 36.57512 38.40167 80.40500 78.02948
## [4,] 0 0 38.40167 39.56896 78.02948 73.28373
## [5,] 0 0 39.56896 40.92755 73.28373 78.76800
## [6,] 0 0 40.92755 42.10489 78.76800 83.21616
The discrete model assumes fixed time intervals between consecutive observations. In this model, \(\mathbf{Y}(t)\) (a \(k \times 1\) matrix of the values of covariates, where \(k\) is the number of considered covariates) and \(\mu(t, \mathbf{Y}(t))\) (the hazard rate) have the following form:
\(\mathbf{Y}(t+1) = \mathbf{u} + \mathbf{R} \mathbf{Y}(t) + \mathbf{\epsilon}\)
\(\mu (t, \mathbf{Y}(t)) = [\mu_0 + \mathbf{b} \mathbf{Y}(t) + \mathbf{Y}(t)^* \mathbf{Q} \mathbf{Y}(t)] e^{\theta t}\)
Coefficients \(\mathbf{u}\) (a \(k \times 1\) matrix, where \(k\) is a number of covariates), \(\mathbf{R}\) (a \(k \times k\) matrix), \(\mu_0\), \(\mathbf{b}\) (a \(1 \times k\) matrix), \(\mathbf{Q}\) (a \(k \times k\) matrix) are assumed to be constant in the particular implementation of this model in the R-package stpm
. \(\mathbf{\epsilon}\) are normally-distributed random residuals, \(k \times 1\) matrix. A symbol ’*’ denotes transpose operation. \(\theta\) is a parameter to be estimated along with other parameters (\(\mathbf{u}\), \(\mathbf{R}\), \(\mathbf{\mu_0}\), \(\mathbf{b}\), \(\mathbf{Q}\)).
library(stpm)
#Data simulation (200 individuals)
data <- simdata_discr(N=200)
#Estimation of parameters
pars <- spm_discrete(data)
pars
## $Ak2005
## $Ak2005$theta
## [1] 0.079
##
## $Ak2005$mu0
## [1] 0.0001512160364
##
## $Ak2005$b
## [1] -3.438749888e-06
##
## $Ak2005$Q
## [,1]
## [1,] 2.083339056e-08
##
## $Ak2005$u
## [1] 4.197279952
##
## $Ak2005$R
## [1] 0.9475382842
##
## $Ak2005$Sigma
## [1] 4.975108355
##
##
## $Ya2007
## $Ya2007$a
## [,1]
## [1,] -0.05246171583
##
## $Ya2007$f1
## [,1]
## [1,] 80.0065321
##
## $Ya2007$Q
## [,1]
## [1,] 2.083339056e-08
##
## $Ya2007$f
## [,1]
## [1,] 82.52977059
##
## $Ya2007$b
## [,1]
## [1,] 4.975108355
##
## $Ya2007$mu0
## [,1]
## [1,] 9.316416729e-06
##
## $Ya2007$theta
## [1] 0.079
##
##
## attr(,"class")
## [1] "spm.discrete"
In the specification of the SPM described in 2007 paper by Yashin and collegaues [3] the stochastic differential equation describing the age dynamics of a covariate is:
\(d\mathbf{Y}(t)= \mathbf{a}(t)(\mathbf{Y}(t) -\mathbf{f}_1(t))dt + \mathbf{b}(t)d\mathbf{W}(t), \mathbf{Y}(t=t_0)\)
In this equation, \(\mathbf{Y}(t)\) (a \(k \times 1\) matrix) is the value of a particular covariate at a time (age) \(t\). \(\mathbf{f}_1(t)\) (a \(k \times 1\) matrix) corresponds to the long-term mean value of the stochastic process \(\mathbf{Y}(t)\), which describes a trajectory of individual covariate influenced by different factors represented by a random Wiener process \(\mathbf{W}(t)\). Coefficient \(\mathbf{a}(t)\) (a \(k \times k\) matrix) is a negative feedback coefficient, which characterizes the rate at which the process reverts to its mean. In the area of research on aging, \(\mathbf{f}_1(t)\) represents the mean allostatic trajectory and \(\mathbf{a}(t)\) represents the adaptive capacity of the organism. Coefficient \(\mathbf{b}(t)\) (a \(k \times 1\) matrix) characterizes a strength of the random disturbances from Wiener process \(\mathbf{W}(t)\).
The following function \(\mu(t, \mathbf{Y}(t))\) represents a hazard rate:
\(\mu(t, \mathbf{Y}(t)) = \mu_0(t) + (\mathbf{Y}(t) - \mathbf{f}(t))^* \mathbf{Q}(t) (\mathbf{Y}(t) - \mathbf{f}(t))\)
here \(\mu_0(t)\) is the baseline hazard, which represents a risk when \(\mathbf{Y}(t)\) follows its optimal trajectory; \(\mathbf{f}(t)\) (a \(k \times 1\) matrix) represents the optimal trajectory that minimizes the risk and \(\mathbf{Q}(t)\) (\(k \times k\) matrix) represents a sensitivity of risk function to deviation from the norm.
library(stpm)
#Simulate some data for 100 individuals
data <- simdata_cont(N=100)
head(data)
## id xi t1 t2 y1 y1.next
## [1,] 0 0 32.51561550 34.24827353 79.14305965 84.88388519
## [2,] 0 0 34.24827353 35.96770103 84.88388519 79.60417750
## [3,] 0 0 35.96770103 37.78338322 79.60417750 88.71078533
## [4,] 0 0 37.78338322 39.00222437 88.71078533 90.36617434
## [5,] 0 0 39.00222437 40.92267719 90.36617434 89.79047395
## [6,] 0 0 40.92267719 42.58002054 89.79047395 91.81403818
#Estimate parameters
# a=-0.05, f1=80, Q=2e-8, f=80, b=5, mu0=2e-5, theta=0.08 are starting values for estimation procedure
pars <- spm_continuous(dat=data,a=-0.05, f1=80, Q=2e-8, f=80, b=5, mu0=2e-5, theta=0.08)
## Parameter f achieved lower/upper bound.
## 72
pars
## $a
## [,1]
## [1,] -0.05011696658
##
## $f1
## [,1]
## [1,] 78.32045588
##
## $Q
## [,1]
## [1,] 2.048207562e-08
##
## $f
## [,1]
## [1,] 72
##
## $b
## [,1]
## [1,] 4.938753192
##
## $mu0
## [1] 2.199769966e-05
##
## $theta
## [1] 0.08536684528
##
## $limit
## [1] TRUE
##
## attr(,"class")
## [1] "spm.continuous"
The coefficient conversion between continuous- and discrete-time models is as follows (‘c’ and ‘d’ denote continuous- and discrete-time models respectively; note: these equations can be used if intervals between consecutive observations of discrete- and continuous-time models are equal; it also required that matrices \(\mathbf{a}_c\) and \(\mathbf{Q}_{c,d}\) must be full-rank matrices):
\(\mathbf{Q}_c = \mathbf{Q}_d\)
\(\mathbf{a}_c = \mathbf{R}_d - I(k)\)
\(\mathbf{b}_c = \mathbf{\Sigma}\)
\({\mathbf{f}_1}_c = -\mathbf{a}_c^{-1} \times \mathbf{u}_d\)
\(\mathbf{f}_c = -0.5 \mathbf{b}_d \times \mathbf{Q}^{-1}_d\)
\({\mu_0}_c = {\mu _0}_d - \mathbf{f}_c \times \mathbf{Q_c} \times \mathbf{f}_c^*\)
\(\theta_c = \theta_d\)
where \(k\) is a number of covariates, which is equal to model’s dimension and ’*’ denotes transpose operation; \(\mathbf{\Sigma}\) is a \(k \times 1\) matrix which contains s.d.
s of corresponding residuals (residuals of a linear regression \(\mathbf{Y}(t+1) = \mathbf{u} + \mathbf{R}\mathbf{Y}(t) + \mathbf{\epsilon}\); s.d.
is a standard deviation), \(I(k)\) is an identity \(k \times k\) matrix.
In previous models, we assumed that coefficients is sort of time-dependant: we multiplied them on to \(e^{\theta t}\). In general, this may not be the case [5]. We extend this to a general case, i.e. (we consider one-dimensional case):
\(\mathbf{a(t)} = \mathbf{par}_1 t + \mathbf{par}_2\) - linear function.
The corresponding equations will be equivalent to one-dimensional continuous case described above.
library(stpm)
#Data preparation:
n <- 500
data <- simdata_time_dep(N=n)
# Estimation:
opt.par <- spm_time_dep(data,
start = list(a = -0.05, f1 = 80, Q = 2e-08, f = 80, b = 5, mu0 = 0.001),
f = list(at = "a", f1t = "f1", Qt = "Q", ft = "f", bt = "b", mu0t= "mu0"))
## a f1 Q f b mu0
## -5e-02 8e+01 2e-08 8e+01 5e+00 1e-03
opt.par
## [[1]]
## [[1]]$a
## [1] -0.04950011737
##
## [[1]]$f1
## [1] 79.69567895
##
## [[1]]$Q
## [1] 2.293834891e-08
##
## [[1]]$f
## [1] 89.98987456
##
## [[1]]$b
## [1] 4.968638295
##
## [[1]]$mu0
## [1] 0.001067151218
##
## [[1]]$status
## [1] 3
##
## [[1]]$LogLik
## t2
## -80391.90441
##
## [[1]]$objective
## [1] 80391.88387
##
## [[1]]$message
## [1] "NLOPT_FTOL_REACHED: Optimization stopped because ftol_rel or ftol_abs (above) was reached."
We added one- and multi- dimensional simulation to be able to generate test data for hyphotesis testing. Data, which can be simulated can be discrete (equal intervals between observations) and continuous (with arbitrary intervals).
The corresponding function is (k
- a number of variables(covariates), equal to model’s dimension):
simdata_discr(N=100, a=-0.05, f1=80, Q=2e-8, f=80, b=5, mu0=1e-5, theta=0.08, ystart=80, tstart=30, tend=105, dt=1)
Here:
N
- Number of individuals
a
- A matrix of k
xk
, which characterize the rate of the adaptive response
f1
- A particular state, which if a deviation from the normal (or optimal). This is a vector with length of k
Q
- A matrix of k
by k
, which is a non-negative-definite symmetric matrix
f
- A vector-function (with length k
) of the normal (or optimal) state
b
- A diffusion coefficient, k
by k
matrix
mu0
- mortality at start period of time (baseline hazard)
theta
- A displacement coefficient of the Gompertz function
ystart
- A vector with length equal to number of dimensions used, defines starting values of covariates
tstart
- A number that defines a start time (30 by default). Can be a number (30 by default) or a vector of two numbers: c(a, b) - in this case, starting value of time is simulated via uniform(a,b) distribution.
tend
- A number, defines a final time (105 by default)
dt
- A time interval between observations.
This function returns a table with simulated data, as shown in example below:
library(stpm)
data <- simdata_discr(N=10)
head(data)
## id xi t1 t2 y1 y1.next
## [1,] 1 0 30 31 80.00000000 76.76477343
## [2,] 1 0 31 32 76.76477343 71.59940665
## [3,] 1 0 32 33 71.59940665 74.90510962
## [4,] 1 0 33 34 74.90510962 75.63551828
## [5,] 1 0 34 35 75.63551828 72.21924911
## [6,] 1 0 35 36 72.21924911 75.61134925
The corresponding function is (k
- a number of variables(covariates), equal to model’s dimension):
simdata_cont(N=100, a=-0.05, f1=80, Q=2e-07, f=80, b=5, mu0=2e-05, theta=0.08, ystart=80, tstart=c(30,50), tend=105)
Here:
N
- Number of individuals
a
- A matrix of k
xk
, which characterize the rate of the adaptive response
f1
- A particular state, which if a deviation from the normal (or optimal). This is a vector with length of k
Q
- A matrix of k
by k
, which is a non-negative-definite symmetric matrix
f
- A vector-function (with length k
) of the normal (or optimal) state
b
- A diffusion coefficient, k
by k
matrix
mu0
- mortality at start period of time (baseline hazard)
theta
- A displacement coefficient of the Gompertz function
ystart
- A vector with length equal to number of dimensions used, defines starting values of covariates
tstart
- A number that defines a start time (30 by default). Can be a number (30 by default) or a vector of two numbers: c(a, b) - in this case, starting value of time is simulated via uniform(a,b) distribution.
tend
- A number, defines a final time (105 by default)
This function returns a table with simulated data, as shown in example below:
library(stpm)
data <- simdata_cont(N=10)
head(data)
## id xi t1 t2 y1 y1.next
## [1,] 0 0 37.16723486 38.30612625 81.80505519 85.66223660
## [2,] 0 0 38.30612625 39.39132595 85.66223660 84.13378423
## [3,] 0 0 39.39132595 41.16853626 84.13378423 88.58793127
## [4,] 0 0 41.16853626 42.57274700 88.58793127 95.52933991
## [5,] 0 0 42.57274700 44.34708806 95.52933991 87.67130226
## [6,] 0 0 44.34708806 46.25277576 87.67130226 88.08549142
Stochastic Process Model has many applications in analysis of longitudinal biodemographic data. Such data contain various physiological variables (known as covariates). Data can also potentially contain genetic information available for all or a part of participants. Taking advantage from both genetic and non-genetic information can provide future insights into a broad range of processes describing aging-related changes in the organism.
GenSPM (Genetic SPM), presented in 2009 by Arbeev at al [6] and further advanced in [7,8], further elaborates the basic stochastic process model conception by introducing a categorical variable, \(Z\), which may be a specific value of a genetic marker or, in general, any categorical variable. Currently, \(Z\) has two gradations: 0 or 1 in a genetic group of interest, assuming that \(P(Z=1) = p\), \(p \in [0, 1]\), were \(p\) is the proportion of carriers and non-carriers of an allele in a population. Example of longitudinal data with genetic component \(Z\) is provided below.
library(stpm)
data <- simdata_gen(N=10)
head(data)
## id xi t1 t2 Z y1 y1.next
## [1,] 0 0 61.41051633 62.49976734 0 80.68916159 78.88867423
## [2,] 0 0 62.49976734 63.47274132 0 78.88867423 81.30104926
## [3,] 0 0 63.47274132 64.51027973 0 81.30104926 80.48717219
## [4,] 0 0 64.51027973 65.49772676 0 80.48717219 81.86955043
## [5,] 0 0 65.49772676 66.51982278 0 81.86955043 80.03081997
## [6,] 0 0 66.51982278 67.49157073 0 80.03081997 83.35746302
In the specification of the SPM described in 2007 paper by Yashin and colleagues [3] the stochastic differential equation describing the age dynamics of a physiological variable (a dynamic component of the model) is:
\(dY(t) = a(Z, t)(Y(t) - f1(Z, t))dt + b(Z, t)dW(t), Y(t = t_0)\)
Here in this equation, \(Y(t)\) is a \(k \times 1\) matrix, where \(k\) is a number of covariates, which is a model dimension) describing the value of a physiological variable at a time (e.g. age) t. \(f_1(Z,t)\) is a \(k \times 1\) matrix that corresponds to the long-term average value of the stochastic process \(Y(t)\), which describes a trajectory of individual variable influenced by different factors represented by a random Wiener process \(W(t)\). The negative feedback coefficient \(a(Z,t)\) (\(k \times k\) matrix) characterizes the rate at which the stochastic process goes to its mean. In research on aging and well-being, \(f_1(Z,t)\) represents the average allostatic trajectory and \(a(t)\) in this case represents the adaptive capacity of the organism. Coefficient \(b(Z,t)\) (\(k \times 1\) matrix) characterizes a strength of the random disturbances from Wiener process \(W(t)\). All of these parameters depend on \(Z\) (a genetic marker having values 1 or 0). The following function \(\mu(t,Y(t))\) represents a hazard rate:
\(\mu(t,Y(t)) = \mu_0(t) + (Y(t) - f(Z, t))^*Q(Z, t)(Y(t) - f(Z, t))\)
In this equation: \(\mu_0(t)\) is the baseline hazard, which represents a risk when \(Y(t)\) follows its optimal trajectory; f(t) (\(k \times 1\) matrix) represents the optimal trajectory that minimizes the risk and \(Q(Z, t)\) (\(k \times k\) matrix) represents a sensitivity of risk function to deviation from the norm. In general, model coefficients \(a(Z, t)\), \(f1(Z, t)\), \(Q(Z, t)\), \(f(Z, t)\), \(b(Z, t)\) and \(\mu_0(t)\) are time(age)-dependent. Once we have data, we then can run analysis, i.e. estimate coefficients (they are assumed to be time-independent and data here is simulated):
library(stpm)
#Generating data:
data <- simdata_gen(N=1000)
head(data)
## id xi t1 t2 Z y1 y1.next
## [1,] 0 0 52.44653755 53.50164236 0 80.16095394 86.94035595
## [2,] 0 0 53.50164236 54.58320141 0 86.94035595 89.00984860
## [3,] 0 0 54.58320141 55.61535630 0 89.00984860 91.39819612
## [4,] 0 0 55.61535630 56.65300708 0 91.39819612 88.40321075
## [5,] 0 0 56.65300708 57.61002007 0 88.40321075 87.25933471
## [6,] 0 0 57.61002007 58.59272133 0 87.25933471 90.67593891
#Parameters estimation:
pars <- spm_gen(gendat=data)
## Provided mode: genetic
pars
## $aH
## [,1]
## [1,] -0.05123251124
##
## $aL
## [,1]
## [1,] -0.007953630848
##
## $f1H
## [,1]
## [1,] 60.11337286
##
## $f1L
## [,1]
## [1,] 80.6117397
##
## $QH
## [,1]
## [1,] 2.03975181e-08
##
## $QL
## [,1]
## [1,] 2.514154894e-08
##
## $fH
## [,1]
## [1,] 64.97148774
##
## $fL
## [,1]
## [1,] 80.05075166
##
## $bH
## [,1]
## [1,] 4.056928496
##
## $bL
## [,1]
## [1,] 4.973394186
##
## $mu0H
## [1] 7.983052034e-06
##
## $mu0L
## [1] 9.997209371e-06
##
## $thetaH
## [1] 0.08547298895
##
## $thetaL
## [1] 0.1044229574
##
## $p
## [1] 0.2544391401
##
## $limit
## [1] FALSE
##
## attr(,"class")
## [1] "gen.spm"
Here and represents parameters when \(Z\) = 1 (H) and 0 (L).
library(stpm)
data.genetic <- simdata_gen(N=100, mode='genetic')
head(data.genetic)
## id xi t1 t2 Z y1 y1.next
## [1,] 0 0 97.80378424 98.71234172 0 80.11617675 73.92889739
## [2,] 0 0 98.71234172 99.63156793 0 73.92889739 65.38703496
## [3,] 0 0 99.63156793 100.53872736 0 65.38703496 55.87088843
## [4,] 0 0 100.53872736 101.44291087 0 55.87088843 57.68354577
## [5,] 0 0 101.44291087 102.46930606 0 57.68354577 58.65519863
## [6,] 0 0 102.46930606 103.48103863 0 58.65519863 63.72991179
data.nongenetic <- simdata_gen(N=500, mode='nongenetic')
head(data.nongenetic)
## id xi t1 t2 y1 y1.next
## [1,] 0 0 100.1837133 101.14500515 81.46636522 79.82801120
## [2,] 0 0 101.1450052 102.11892349 79.82801120 82.73650992
## [3,] 0 0 102.1189235 103.16353646 82.73650992 83.79220081
## [4,] 0 0 103.1635365 104.18042361 83.79220081 81.54844645
## [5,] 1 0 51.8349018 52.76980260 82.09191808 89.08067724
## [6,] 1 0 52.7698026 53.85527455 89.08067724 91.68975358
#Parameters estimation:
pars <- spm_gen(gendat=data.genetic, nongendat = data.nongenetic, mode='combined')
## Provided mode: combined
pars
## $aH
## [,1]
## [1,] -0.04967476518
##
## $aL
## [,1]
## [1,] -0.01079991902
##
## $f1H
## [,1]
## [1,] 62.52339366
##
## $f1L
## [,1]
## [1,] 75.28066924
##
## $QH
## [,1]
## [1,] 1.936512619e-08
##
## $QL
## [,1]
## [1,] 2.393332831e-08
##
## $fH
## [,1]
## [1,] 64.60830188
##
## $fL
## [,1]
## [1,] 78.73481538
##
## $bH
## [,1]
## [1,] 4.021079902
##
## $bL
## [,1]
## [1,] 5.016507059
##
## $mu0H
## [1] 7.942684365e-06
##
## $mu0L
## [1] 9.64441806e-06
##
## $thetaH
## [1] 0.08650108014
##
## $thetaL
## [1] 0.1086482374
##
## $p
## [1] 0.2748817363
##
## $limit
## [1] FALSE
##
## attr(,"class")
## [1] "gen.spm"
Here mode ‘genetic’ is used for simlation of data with genetic component \(Z\) and ‘nongenetic’ - without genetic component.
[1] Woodbury M.A., Manton K.G., Random-Walk of Human Mortality and Aging. Theoretical Population Biology, 1977 11:37-48.
[2] Yashin, A.I., Manton K.G., Vaupel J.W. Mortality and aging in a heterogeneous population: a stochastic process model with observed and unobserved varia-bles. Theor Pop Biology, 1985 27.
[3] Yashin, A.I. et al. Stochastic model for analysis of longitudinal data on aging and mortality. Mathematical Biosciences, 2007 208(2) 538-551.
[4] Akushevich I., Kulminski A. and Manton K.: Life tables with covariates: Dynamic model for Nonlinear Analysis of Longitudinal Data. 2005. Mathematical Popu-lation Studies, 12(2), pp.: 51-80.
[5] Yashin, A. et al. Health decline, aging and mortality: how are they related? Biogerontology, 2007 8(3), 291-302.
[6] Arbeev, K.G., Akushevich, I., Kulminski, A.M., Arbeeva, L.S., Akushevich, L., Ukraintseva, S.V., Culminskaya, I.V., Yashin, A.I.: Genetic model for longitudinal studies of aging, health, and longevity and its potential application to incomplete data. Journal of Theoretical Biology 258(1), 103{111 (2009). doi:10.1016/j.jtbi.2009.01.023
[7] Arbeev K.G, Akushevich I., Kulminski A.M., Ukraintseva S.V., Yashin A.I., Joint Analyses of Longitudinal and Time-to-Event Data in Research on Aging: Implications for Predicting Health and Survival, Front Public Health. 2014 Nov 6;2:228. doi: 10.3389/fpubh.2014.00228
[8] Arbeev K., Arbeeva L., Akushevich I., Kulminski A., Ukraintseva S., Yashin A., Latent Class and Genetic Stochastic Process Models: Implications for Analyses of Longitudinal Data on Aging, Health, and Longevity, JSM-2015, Seattle, WA.