The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This document gives a demonstration how to use the package to obtain a maximum-likelihood estimate of the protracted birth-death speciation model.
First thing is to load the PBD package itself:
rm(list = ls())
library(PBD)
We will also need ape for branching.times
:
library(ape)
Here we simulate a tree with known parameters:
seed <- 43
set.seed(seed)
b_1 <- 0.3 # speciation-initiation rate of good species
la_1 <- 0.2 # speciation-completion rate
b_2 <- b_1 # the speciation-initiation rate of incipient species
mu_1 <- 0.1 # extinction rate of good species
mu_2 <- mu_1 # extinction rate of incipient species
pars <- c(b_1, la_1, b_2, mu_1, mu_2)
age <- 15 # the age for the simulation
phylogenies <- pbd_sim(pars = pars, age = age)
plot(phylogenies$recontree)
plot(phylogenies$igtree.extant)
## no colors provided. using the following legend:
## g i
## "black" "red"
plot(phylogenies$tree)
names(phylogenies)
## [1] "tree" "stree_random" "stree_oldest" "stree_youngest"
## [5] "L" "sL_random" "sL_oldest" "sL_youngest"
## [9] "igtree.extinct" "igtree.extant" "recontree" "reconL"
## [13] "L0"
Now we try to recover the parameters by maximum likelihood estimation:
brts <- branching.times(phylogenies$recontree) # branching times
init_b <- 0.2 # speciation-initiation rate
init_mu_1 <- 0.05 # extinction rate of good species
init_la_1 <- 0.3 # speciation-completion rate
#init_mu_2 <- 0.05 # extinction rate of incipient species # not used
# The initial values of the parameters that must be optimized
initparsopt <- c(init_b, init_mu_1, init_la_1)
# The extinction rates between incipient and good species are equal
exteq <- TRUE
# The first element of the branching times is the crown age (and not the stem age)
soc <- 2
# Conditioning on non-extinction of the phylogeny
# as I actively selected for a nice phylogeny
cond <- 1
# Give the likelihood of the phylogeny (instead of the likelihood of the branching times)
btorph <- 1
Maximum likelihood estimation can now be performed:
r <- pbd_ML(
brts = brts,
initparsopt = initparsopt,
exteq = exteq,
soc = soc,
cond = cond,
btorph = btorph,
verbose = FALSE
)
The ML parameter estimates are:
knitr::kable(r)
b | mu_1 | lambda_1 | mu_2 | loglik | df | conv |
---|---|---|---|---|---|---|
0.2817166 | 0.1003351 | 0.20635 | 0.1003351 | -37.1221 | 3 | 0 |
Comparing the known true value with the recovered values:
loglik_true <- PBD::pbd_loglik(pars, brts = brts)
## Parameters: 0.3, 0.2, 0.3, 0.1, 0.1, Loglikelihood: -37.324677
df <- as.data.frame(r)
df <- rbind(df, c(b_1, mu_1, la_1, mu_2, loglik_true, NA, NA))
row.names(df) <- c("ML", "true")
knitr::kable(df)
b | mu_1 | lambda_1 | mu_2 | loglik | df | conv | |
---|---|---|---|---|---|---|---|
ML | 0.2817166 | 0.1003351 | 0.20635 | 0.1003351 | -37.12210 | 3 | 0 |
true | 0.3000000 | 0.1000000 | 0.20000 | 0.1000000 | -37.32468 | NA | NA |
Ideally, all parameter columns should have the same values.
To test for the uncertainty of our ML estimate, we can do a parametric bootstrap.
The function pbd_bootstrap
consists of a few steps:
endmc <- 10 # Sets the number of simulations for the bootstrap
b <- pbd_bootstrap(
brts = brts,
initparsopt = initparsopt,
exteq = exteq,
soc = soc,
cond = cond,
btorph = btorph,
plotltt = FALSE,
endmc = endmc,
seed = seed
)
## Finding the maximum likelihood estimates ...
##
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -37.25841
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.281715, mu_1: 0.100335, lambda_1: 0.206352, mu_2: 0.100335
## Maximum loglikelihood: -37.122103
## The expected duration of speciation for these parameters is: 2.807343
## The median duration of speciation for these parameters is: 2.206200
## Bootstrapping ...
##
## Simulated data set 1 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -22.03909
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.375753, mu_1: 0.311977, lambda_1: 0.230678, mu_2: 0.311977
## Maximum loglikelihood: -21.596081
## The expected duration of speciation for these parameters is: 2.030928
## The median duration of speciation for these parameters is: 1.543204
## Simulated data set 2 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -57.99061
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.312123, mu_1: 0.081059, lambda_1: 0.190419, mu_2: 0.081059
## Maximum loglikelihood: -57.720450
## The expected duration of speciation for these parameters is: 2.943368
## The median duration of speciation for these parameters is: 2.365253
## Simulated data set 3 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -115.261
## Optimizing the likelihood - this may take a while.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
##
## Maximum likelihood parameter estimates: b: 751.298215, mu_1: 751.000348, lambda_1: 0.004574, mu_2: 751.000348
## Maximum loglikelihood: -111.629201
## The expected duration of speciation for these parameters is: 0.386462
## The median duration of speciation for these parameters is: 0.310353
## Simulated data set 4 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -107.9402
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.271860, mu_1: 0.000000, lambda_1: 0.126555, mu_2: 0.000000
## Maximum loglikelihood: -106.870092
## The expected duration of speciation for these parameters is: 4.218409
## The median duration of speciation for these parameters is: 3.570808
## Simulated data set 5 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -47.84195
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.709877, mu_1: 0.574456, lambda_1: 0.225125, mu_2: 0.574456
## Maximum loglikelihood: -46.601652
## The expected duration of speciation for these parameters is: 1.643556
## The median duration of speciation for these parameters is: 1.279479
## Simulated data set 6 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -187.144
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 13.979806, mu_1: 13.679004, lambda_1: 0.020678, mu_2: 13.679004
## Maximum loglikelihood: -183.432908
## The expected duration of speciation for these parameters is: 1.413933
## The median duration of speciation for these parameters is: 1.171087
## Simulated data set 7 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -41.38109
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.492110, mu_1: 0.309936, lambda_1: 0.059444, mu_2: 0.309936
## Maximum loglikelihood: -40.448749
## The expected duration of speciation for these parameters is: 4.546741
## The median duration of speciation for these parameters is: 3.828187
## Simulated data set 8 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -87.55681
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.300024, mu_1: 0.000001, lambda_1: 0.153513, mu_2: 0.000001
## Maximum loglikelihood: -86.151794
## The expected duration of speciation for these parameters is: 3.610683
## The median duration of speciation for these parameters is: 3.031341
## Simulated data set 9 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -100.522
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.209810, mu_1: 0.000000, lambda_1: 0.393479, mu_2: 0.000000
## Maximum loglikelihood: -99.916686
## The expected duration of speciation for these parameters is: 2.036930
## The median duration of speciation for these parameters is: 1.540704
## Simulated data set 10 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -61.11511
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.173103, mu_1: 0.000000, lambda_1: 0.890545, mu_2: 0.000000
## Maximum loglikelihood: -60.368468
## The expected duration of speciation for these parameters is: 1.026129
## The median duration of speciation for these parameters is: 0.738873
knitr::kable(b[[3]])
ntips | b | mu_1 | lambda_1 | mu_2 | loglik | df | conv | exp_durspec | median_durspec |
---|---|---|---|---|---|---|---|---|---|
9 | 0.3757529 | 0.3119773 | 0.2306781 | 0.3119773 | -21.59608 | 3 | 0 | 2.0309277 | 1.5432035 |
23 | 0.3121229 | 0.0810590 | 0.1904194 | 0.0810590 | -57.72045 | 3 | 0 | 2.9433681 | 2.3652530 |
40 | 751.2982150 | 751.0003476 | 0.0045737 | 751.0003476 | -111.62920 | 3 | 0 | 0.3864617 | 0.3103533 |
41 | 0.2718601 | 0.0000000 | 0.1265550 | 0.0000000 | -106.87009 | 3 | 0 | 4.2184093 | 3.5708078 |
19 | 0.7098773 | 0.5744562 | 0.2251248 | 0.5744562 | -46.60165 | 3 | 0 | 1.6435563 | 1.2794793 |
69 | 13.9798056 | 13.6790037 | 0.0206780 | 13.6790037 | -183.43291 | 3 | 0 | 1.4139334 | 1.1710870 |
15 | 0.4921098 | 0.3099361 | 0.0594435 | 0.3099361 | -40.44875 | 3 | 0 | 4.5467409 | 3.8281874 |
35 | 0.3000241 | 0.0000011 | 0.1535128 | 0.0000011 | -86.15179 | 3 | 0 | 3.6106829 | 3.0313409 |
38 | 0.2098101 | 0.0000000 | 0.3934793 | 0.0000000 | -99.91669 | 3 | 0 | 2.0369305 | 1.5407041 |
23 | 0.1731033 | 0.0000000 | 0.8905452 | 0.0000000 | -60.36847 | 3 | 0 | 1.0261295 | 0.7388728 |
From the bootstrap analysis, we get
Putting this in a table:
dg <- rbind(df,
list(
b = b[[1]]$b,
mu_1 = b[[1]]$mu_1,
lambda_1 = b[[1]]$lambda_1,
mu_2 = b[[1]]$mu_2,
loglik = b[[1]]$loglik,
df = b[[1]]$df,
conv = b[[1]]$conv
),
list(
b = b[[3]]$b,
mu_1 = b[[3]]$mu_1,
lambda_1 = b[[3]]$lambda_1,
mu_2 = b[[3]]$mu_2,
loglik = b[[3]]$loglik,
df = b[[3]]$df,
conv = b[[3]]$conv
)
)
dg
## b mu_1 lambda_1 mu_2 loglik df conv
## ML 0.2817166 1.003351e-01 0.206350020 1.003351e-01 -37.12210 3 0
## true 0.3000000 1.000000e-01 0.200000000 1.000000e-01 -37.32468 NA NA
## 3 0.2817148 1.003351e-01 0.206351748 1.003351e-01 -37.12210 3 0
## 4 0.3757529 3.119773e-01 0.230678073 3.119773e-01 -21.59608 3 0
## 5 0.3121229 8.105904e-02 0.190419381 8.105904e-02 -57.72045 3 0
## 6 751.2982150 7.510003e+02 0.004573677 7.510003e+02 -111.62920 3 0
## 7 0.2718601 3.331886e-08 0.126555041 3.331886e-08 -106.87009 3 0
## 8 0.7098773 5.744562e-01 0.225124828 5.744562e-01 -46.60165 3 0
## 9 13.9798056 1.367900e+01 0.020678046 1.367900e+01 -183.43291 3 0
## 10 0.4921098 3.099361e-01 0.059443526 3.099361e-01 -40.44875 3 0
## 11 0.3000241 1.094309e-06 0.153512850 1.094309e-06 -86.15179 3 0
## 12 0.2098101 4.833861e-08 0.393479254 4.833861e-08 -99.91669 3 0
## 13 0.1731033 9.079251e-10 0.890545200 9.079251e-10 -60.36847 3 0
row.names(dg) <- c("ML", "true", "ML2", paste("BS", 1:endmc, sep = ""))
knitr::kable(dg)
b | mu_1 | lambda_1 | mu_2 | loglik | df | conv | |
---|---|---|---|---|---|---|---|
ML | 0.2817166 | 0.1003351 | 0.2063500 | 0.1003351 | -37.12210 | 3 | 0 |
true | 0.3000000 | 0.1000000 | 0.2000000 | 0.1000000 | -37.32468 | NA | NA |
ML2 | 0.2817148 | 0.1003351 | 0.2063517 | 0.1003351 | -37.12210 | 3 | 0 |
BS1 | 0.3757529 | 0.3119773 | 0.2306781 | 0.3119773 | -21.59608 | 3 | 0 |
BS2 | 0.3121229 | 0.0810590 | 0.1904194 | 0.0810590 | -57.72045 | 3 | 0 |
BS3 | 751.2982150 | 751.0003476 | 0.0045737 | 751.0003476 | -111.62920 | 3 | 0 |
BS4 | 0.2718601 | 0.0000000 | 0.1265550 | 0.0000000 | -106.87009 | 3 | 0 |
BS5 | 0.7098773 | 0.5744562 | 0.2251248 | 0.5744562 | -46.60165 | 3 | 0 |
BS6 | 13.9798056 | 13.6790037 | 0.0206780 | 13.6790037 | -183.43291 | 3 | 0 |
BS7 | 0.4921098 | 0.3099361 | 0.0594435 | 0.3099361 | -40.44875 | 3 | 0 |
BS8 | 0.3000241 | 0.0000011 | 0.1535128 | 0.0000011 | -86.15179 | 3 | 0 |
BS9 | 0.2098101 | 0.0000000 | 0.3934793 | 0.0000000 | -99.91669 | 3 | 0 |
BS10 | 0.1731033 | 0.0000000 | 0.8905452 | 0.0000000 | -60.36847 | 3 | 0 |
We expect rows ML and ML2 to be identical. Their values are indeed very similar.
We can calculate the loglikelihood for
ml_b <- b[[1]]$b
ml_mu_1 <- b[[1]]$mu_1
ml_la_1 <- b[[1]]$lambda_1
ml_mu_2 <- b[[1]]$mu_2
ml_pars1 <- c(ml_b, ml_mu_1, ml_la_1, ml_mu_2)
ml_pars2 <- c(cond, btorph, soc, 0, "lsoda")
l <- pbd_loglik(
pars1 = ml_pars1,
pars2 = ml_pars2,
brts = brts
)
print(l)
## [1] -37.1221
# Create .md, .html, and .pdf files
setwd(paste(getwd(), "vignettes", sep = "/"))
knit("PBD_ML_demo.Rmd")
markdown::markdownToHTML('PBD_ML_demo.md', 'PBD_ML_demo.html', options=c("use_xhml"))
system("pandoc -s PBD_ML_demo.html -o PBD_ML_demo.pdf")
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.