The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

pbd_ML demo

Richel Bilderbeek & Rampal S. Etienne

2017-05-04

This document gives a demonstration how to use the package to obtain a maximum-likelihood estimate of the protracted birth-death speciation model.

First thing is to load the PBD package itself:

rm(list = ls())
library(PBD)

We will also need ape for branching.times:

library(ape)

Here we simulate a tree with known parameters:

seed <- 43
set.seed(seed)
b_1 <- 0.3 # speciation-initiation rate of good species
la_1 <- 0.2 # speciation-completion rate
b_2 <- b_1 # the speciation-initiation rate of incipient species
mu_1 <- 0.1 #  extinction rate of good species
mu_2 <- mu_1 # extinction rate of incipient species 
pars <- c(b_1, la_1, b_2, mu_1, mu_2)
age <- 15 # the age for the simulation 
phylogenies <- pbd_sim(pars = pars, age = age)
plot(phylogenies$recontree)

plot(phylogenies$igtree.extant)
## no colors provided. using the following legend:
##       g       i 
## "black"   "red"

plot(phylogenies$tree)

names(phylogenies)
##  [1] "tree"           "stree_random"   "stree_oldest"   "stree_youngest"
##  [5] "L"              "sL_random"      "sL_oldest"      "sL_youngest"   
##  [9] "igtree.extinct" "igtree.extant"  "recontree"      "reconL"        
## [13] "L0"

Now we try to recover the parameters by maximum likelihood estimation:

brts <- branching.times(phylogenies$recontree)  # branching times
init_b <- 0.2  # speciation-initiation rate
init_mu_1 <- 0.05  # extinction rate of good species
init_la_1 <- 0.3 # speciation-completion rate
#init_mu_2 <- 0.05  # extinction rate of incipient species  # not used

# The initial values of the parameters that must be optimized
initparsopt <- c(init_b, init_mu_1, init_la_1)

# The extinction rates between incipient and good species are equal
exteq <- TRUE

# The first element of the branching times is the crown age (and not the stem age)
soc <- 2

# Conditioning on non-extinction of the phylogeny
# as I actively selected for a nice phylogeny
cond <- 1

# Give the likelihood of the phylogeny (instead of the likelihood of the branching times)
btorph <- 1

Maximum likelihood estimation can now be performed:

r <- pbd_ML(
  brts = brts,
  initparsopt = initparsopt, 
  exteq = exteq,
  soc = soc, 
  cond = cond,
  btorph = btorph,
  verbose = FALSE
)

The ML parameter estimates are:

knitr::kable(r)
b mu_1 lambda_1 mu_2 loglik df conv
0.2817166 0.1003351 0.20635 0.1003351 -37.1221 3 0

Comparing the known true value with the recovered values:

loglik_true <- PBD::pbd_loglik(pars, brts = brts)
## Parameters: 0.3, 0.2, 0.3, 0.1, 0.1, Loglikelihood: -37.324677
df <- as.data.frame(r)
df <- rbind(df, c(b_1, mu_1, la_1, mu_2, loglik_true, NA, NA))
row.names(df) <- c("ML", "true")
knitr::kable(df)
b mu_1 lambda_1 mu_2 loglik df conv
ML 0.2817166 0.1003351 0.20635 0.1003351 -37.12210 3 0
true 0.3000000 0.1000000 0.20000 0.1000000 -37.32468 NA NA

Ideally, all parameter columns should have the same values.

To test for the uncertainty of our ML estimate, we can do a parametric bootstrap.

The function pbd_bootstrap consists of a few steps:

  1. Do a ML estimate
  2. Run a simulation with those estimates
  3. Perform ML estimation on the simulated data
  4. Go to 2 depending on the setting of endmc.
endmc <- 10 # Sets the number of simulations for the bootstrap

b <- pbd_bootstrap(
  brts = brts,
  initparsopt = initparsopt, 
  exteq = exteq,
  soc = soc, 
  cond = cond,
  btorph = btorph,
  plotltt = FALSE,
  endmc = endmc,
  seed = seed
)
## Finding the maximum likelihood estimates ...
## 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -37.25841 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.281715, mu_1: 0.100335, lambda_1: 0.206352, mu_2: 0.100335 
##  Maximum loglikelihood: -37.122103 
##  The expected duration of speciation for these parameters is: 2.807343 
##  The median duration of speciation for these parameters is: 2.206200 
## Bootstrapping ...
## 
## Simulated data set 1 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -22.03909 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.375753, mu_1: 0.311977, lambda_1: 0.230678, mu_2: 0.311977 
##  Maximum loglikelihood: -21.596081 
##  The expected duration of speciation for these parameters is: 2.030928 
##  The median duration of speciation for these parameters is: 1.543204 
## Simulated data set 2 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -57.99061 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.312123, mu_1: 0.081059, lambda_1: 0.190419, mu_2: 0.081059 
##  Maximum loglikelihood: -57.720450 
##  The expected duration of speciation for these parameters is: 2.943368 
##  The median duration of speciation for these parameters is: 2.365253 
## Simulated data set 3 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -115.261 
## Optimizing the likelihood - this may take a while. 
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## 
##  Maximum likelihood parameter estimates: b: 751.298215, mu_1: 751.000348, lambda_1: 0.004574, mu_2: 751.000348 
##  Maximum loglikelihood: -111.629201 
##  The expected duration of speciation for these parameters is: 0.386462 
##  The median duration of speciation for these parameters is: 0.310353 
## Simulated data set 4 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -107.9402 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.271860, mu_1: 0.000000, lambda_1: 0.126555, mu_2: 0.000000 
##  Maximum loglikelihood: -106.870092 
##  The expected duration of speciation for these parameters is: 4.218409 
##  The median duration of speciation for these parameters is: 3.570808 
## Simulated data set 5 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -47.84195 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.709877, mu_1: 0.574456, lambda_1: 0.225125, mu_2: 0.574456 
##  Maximum loglikelihood: -46.601652 
##  The expected duration of speciation for these parameters is: 1.643556 
##  The median duration of speciation for these parameters is: 1.279479 
## Simulated data set 6 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -187.144 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 13.979806, mu_1: 13.679004, lambda_1: 0.020678, mu_2: 13.679004 
##  Maximum loglikelihood: -183.432908 
##  The expected duration of speciation for these parameters is: 1.413933 
##  The median duration of speciation for these parameters is: 1.171087 
## Simulated data set 7 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -41.38109 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.492110, mu_1: 0.309936, lambda_1: 0.059444, mu_2: 0.309936 
##  Maximum loglikelihood: -40.448749 
##  The expected duration of speciation for these parameters is: 4.546741 
##  The median duration of speciation for these parameters is: 3.828187 
## Simulated data set 8 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -87.55681 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.300024, mu_1: 0.000001, lambda_1: 0.153513, mu_2: 0.000001 
##  Maximum loglikelihood: -86.151794 
##  The expected duration of speciation for these parameters is: 3.610683 
##  The median duration of speciation for these parameters is: 3.031341 
## Simulated data set 9 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -100.522 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.209810, mu_1: 0.000000, lambda_1: 0.393479, mu_2: 0.000000 
##  Maximum loglikelihood: -99.916686 
##  The expected duration of speciation for these parameters is: 2.036930 
##  The median duration of speciation for these parameters is: 1.540704 
## Simulated data set 10 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -61.11511 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.173103, mu_1: 0.000000, lambda_1: 0.890545, mu_2: 0.000000 
##  Maximum loglikelihood: -60.368468 
##  The expected duration of speciation for these parameters is: 1.026129 
##  The median duration of speciation for these parameters is: 0.738873
knitr::kable(b[[3]])
ntips b mu_1 lambda_1 mu_2 loglik df conv exp_durspec median_durspec
9 0.3757529 0.3119773 0.2306781 0.3119773 -21.59608 3 0 2.0309277 1.5432035
23 0.3121229 0.0810590 0.1904194 0.0810590 -57.72045 3 0 2.9433681 2.3652530
40 751.2982150 751.0003476 0.0045737 751.0003476 -111.62920 3 0 0.3864617 0.3103533
41 0.2718601 0.0000000 0.1265550 0.0000000 -106.87009 3 0 4.2184093 3.5708078
19 0.7098773 0.5744562 0.2251248 0.5744562 -46.60165 3 0 1.6435563 1.2794793
69 13.9798056 13.6790037 0.0206780 13.6790037 -183.43291 3 0 1.4139334 1.1710870
15 0.4921098 0.3099361 0.0594435 0.3099361 -40.44875 3 0 4.5467409 3.8281874
35 0.3000241 0.0000011 0.1535128 0.0000011 -86.15179 3 0 3.6106829 3.0313409
38 0.2098101 0.0000000 0.3934793 0.0000000 -99.91669 3 0 2.0369305 1.5407041
23 0.1731033 0.0000000 0.8905452 0.0000000 -60.36847 3 0 1.0261295 0.7388728

From the bootstrap analysis, we get

Putting this in a table:

dg <- rbind(df, 
  list(
    b = b[[1]]$b, 
    mu_1 = b[[1]]$mu_1, 
    lambda_1 = b[[1]]$lambda_1, 
    mu_2 = b[[1]]$mu_2,
    loglik = b[[1]]$loglik,
    df = b[[1]]$df,
    conv = b[[1]]$conv
  ),
  list(
    b = b[[3]]$b, 
    mu_1 = b[[3]]$mu_1, 
    lambda_1 = b[[3]]$lambda_1, 
    mu_2 = b[[3]]$mu_2,
    loglik = b[[3]]$loglik,
    df = b[[3]]$df,
    conv = b[[3]]$conv
  )
)
dg
##                b         mu_1    lambda_1         mu_2     loglik df conv
## ML     0.2817166 1.003351e-01 0.206350020 1.003351e-01  -37.12210  3    0
## true   0.3000000 1.000000e-01 0.200000000 1.000000e-01  -37.32468 NA   NA
## 3      0.2817148 1.003351e-01 0.206351748 1.003351e-01  -37.12210  3    0
## 4      0.3757529 3.119773e-01 0.230678073 3.119773e-01  -21.59608  3    0
## 5      0.3121229 8.105904e-02 0.190419381 8.105904e-02  -57.72045  3    0
## 6    751.2982150 7.510003e+02 0.004573677 7.510003e+02 -111.62920  3    0
## 7      0.2718601 3.331886e-08 0.126555041 3.331886e-08 -106.87009  3    0
## 8      0.7098773 5.744562e-01 0.225124828 5.744562e-01  -46.60165  3    0
## 9     13.9798056 1.367900e+01 0.020678046 1.367900e+01 -183.43291  3    0
## 10     0.4921098 3.099361e-01 0.059443526 3.099361e-01  -40.44875  3    0
## 11     0.3000241 1.094309e-06 0.153512850 1.094309e-06  -86.15179  3    0
## 12     0.2098101 4.833861e-08 0.393479254 4.833861e-08  -99.91669  3    0
## 13     0.1731033 9.079251e-10 0.890545200 9.079251e-10  -60.36847  3    0
row.names(dg) <- c("ML", "true", "ML2", paste("BS", 1:endmc, sep = ""))
knitr::kable(dg)
b mu_1 lambda_1 mu_2 loglik df conv
ML 0.2817166 0.1003351 0.2063500 0.1003351 -37.12210 3 0
true 0.3000000 0.1000000 0.2000000 0.1000000 -37.32468 NA NA
ML2 0.2817148 0.1003351 0.2063517 0.1003351 -37.12210 3 0
BS1 0.3757529 0.3119773 0.2306781 0.3119773 -21.59608 3 0
BS2 0.3121229 0.0810590 0.1904194 0.0810590 -57.72045 3 0
BS3 751.2982150 751.0003476 0.0045737 751.0003476 -111.62920 3 0
BS4 0.2718601 0.0000000 0.1265550 0.0000000 -106.87009 3 0
BS5 0.7098773 0.5744562 0.2251248 0.5744562 -46.60165 3 0
BS6 13.9798056 13.6790037 0.0206780 13.6790037 -183.43291 3 0
BS7 0.4921098 0.3099361 0.0594435 0.3099361 -40.44875 3 0
BS8 0.3000241 0.0000011 0.1535128 0.0000011 -86.15179 3 0
BS9 0.2098101 0.0000000 0.3934793 0.0000000 -99.91669 3 0
BS10 0.1731033 0.0000000 0.8905452 0.0000000 -60.36847 3 0

We expect rows ML and ML2 to be identical. Their values are indeed very similar.

We can calculate the loglikelihood for

ml_b <- b[[1]]$b
ml_mu_1 <- b[[1]]$mu_1
ml_la_1 <- b[[1]]$lambda_1
ml_mu_2 <- b[[1]]$mu_2
ml_pars1 <- c(ml_b, ml_mu_1, ml_la_1, ml_mu_2)
ml_pars2 <- c(cond, btorph, soc, 0, "lsoda")

l <- pbd_loglik(
  pars1 = ml_pars1,
  pars2 = ml_pars2,
  brts = brts
)
print(l)
## [1] -37.1221
# Create .md, .html, and .pdf files
setwd(paste(getwd(), "vignettes", sep = "/"))
knit("PBD_ML_demo.Rmd")
markdown::markdownToHTML('PBD_ML_demo.md', 'PBD_ML_demo.html', options=c("use_xhml"))
system("pandoc -s PBD_ML_demo.html -o PBD_ML_demo.pdf")

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.