Growth curves are commonly used in a variety of microbial experiments, including experimental evolution. The data are typically obtained by repeatedly measuring the cell density, or some other representative measure like absorbance. Modern microbial growth curves can be conducted in a plate reader and may result in hundreds of absorbance measurements over the course of 24 hours.
In the Growthcurver
package, we fit growth curve data to a standard form of the logistic equation common in ecology and evolution whose parameters (the growth rate, the initial population size, and the carrying capacity) provide meaningful population-level information with straight-forward biological interpretation. The logistic equation describes the population size \(N_t\) at time \(t\) using:
\[ \label{nt} N_t = \frac{K}{1 + \left( \frac{K-N_0}{N_0} \right) e^{-rt}} \ \]
Here, the population size at the beginning of the growth curve is given by \(N_0\). The maximum possible population size in a particular environment, or the carrying capacity, is given by \(K\). The intrinsic growth rate of the population, \(r\), is the growth rate that would occur if there were no restrictions imposed on total population size. Growthcurver
finds the best values of \(K\), \(r\), and \(N_0\) for the growth curve data.
We have provided simulated sample data in the simplest format for Growthcurver
. The sample data that we’ve provided has one column for the time (in hours), and one column for each well in a 96-well plate. Each row contains the OD600 reading for each well at a given time, and the rows are sorted by time.
Preparing your data in this format will allow you to easily adapt the example simple code and the sample pipeline code (detailed in A simple first example and How to summarize growth curves for a plate) to your data.
Below, we show the first few rows and columns of the sample data.
time | A1 | B1 | C1 | D1 | E1 | F1 | G1 |
---|---|---|---|---|---|---|---|
0.0000000 | 0.0534859 | 0.0450445 | 0.0524558 | 0.0559717 | 0.0555779 | 0.0482403 | 0.0480286 |
0.1666667 | 0.0480034 | 0.0438943 | 0.0513760 | 0.0535398 | 0.0498355 | 0.0545188 | 0.0508498 |
0.3333333 | 0.0558745 | 0.0499971 | 0.0471483 | 0.0490218 | 0.0547812 | 0.0508200 | 0.0461799 |
0.5000000 | 0.0513175 | 0.0521323 | 0.0481636 | 0.0455630 | 0.0461267 | 0.0552204 | 0.0459967 |
0.6666667 | 0.0451672 | 0.0471601 | 0.0474251 | 0.0515392 | 0.0528446 | 0.0529912 | 0.0462149 |
0.8333333 | 0.0529321 | 0.0493626 | 0.0512493 | 0.0487991 | 0.0466157 | 0.0467418 | 0.0506130 |
1.0000000 | 0.0486773 | 0.0511002 | 0.0485762 | 0.0525684 | 0.0450770 | 0.0504815 | 0.0551161 |
1.1666667 | 0.0494391 | 0.0493021 | 0.0488559 | 0.0451704 | 0.0496254 | 0.0512137 | 0.0464252 |
1.3333333 | 0.0458304 | 0.0530621 | 0.0472789 | 0.0417502 | 0.0471547 | 0.0530060 | 0.0462033 |
1.5000000 | 0.0518993 | 0.0519433 | 0.0471127 | 0.0474941 | 0.0505342 | 0.0550318 | 0.0539161 |
Your data will probably come from the plate reader as an Excel spreadsheet. Convert the data in your spreadsheet so that the wells and time are the column headers and the data are the rows as shown above (Excel’s Paste Special | Transpose checkbox is useful for this). Save your growth curve data file as a tab-separated txt
file, and then read that file into R.
# Replace the next line with the location and name of your input data file.
file_name <- "the/path/to/my/data/myfilename.txt"
d <- read.table(file_name, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
With the exception of the column containing the time data, the names of the columns for Growthcurver
are unimportant. However, if you plan to adapt the pipeline code in How to summarize growth curves for a plate, then the column containing the time must be named time
, and the remaining columns must have a unique name that will be eventually be identified as the sample name.
If you don’t have a column named time
, you can create easily create it.
# Here, I assume that your time data are stored in a column called "hours" and
# you don't have a column named "time". You can copy the data from the "hours"
# column to the "time" column as follows.
d$time <- d$hours
In this example, time is reported in units of hours. This means that all metrics involving time for these data are reported by Growthcurver
in hours (e.g., \(r\) is in hours\(^{-1}\)). If your data are in a different time unit than you would like, the following code may help you adjust the data in your time column.
# Convert the "time" column from hours to minutes
d$time <- d$time * 60
# Convert the "time" column from minutes to seconds
d$time <- d$time * 60
# Convert the "time" column from seconds to hours
d$time <- d$time * 60 * 60
You will also need to do a background correction to remove the absorbance signal from the media. There are several approaches for doing the background correction, and I’ve implemented the simplest in the sample code in the A simple first example and How to summarize growth curves for a plate sections. The simple method finds the minimum value from each well, and subtracts it from all timepoints (for that well only). This method works if you do not expect your background readings to change over time. Sometimes the background readings may change over time, for example when the media precipitates over time. In this case, you can subtract the average background reading obtained from a set of blank wells at the same timepoint that the measurements were made.
# How to do a background correction by averaging several "blank" wells
# (for the sake of this example, let's assume that wells B2, D8, and G11
# are blanks even though they actually hold growth curve data).
# If you want to use this method for background correction,
# replace my well names with yours.
# First, get the "blank" values over time by averaging the "blank" columns at
# each timepoint at which measurements were made.
d$blank <- apply(d[, c("B2", "D8", "G11")], 1, mean)
# Next, subtract the blank value from the measurements for a column.
d$A1 <- d$A1 - d$blank
If you want to use this advanced background correction approach, replace the background correction code I provide in the examples in the A simple first example and How to summarize growth curves for a plate sections with these two lines. Be sure to replace “B2”, “D8”, and “G11” with the column names of the blanks on your plates.
In this example, we will use Growthcurver
to summarize a single growth curve.
# First, load the package and the dataset.
library(growthcurver)
# Load the sample growth curve data provided in the Growthcurver package.
# The first column is the time in hours, and there is one column
# for each well in a 96-well plate.
d <- growthdata
Let’s summarize the growth curve data in well A1 using Growthcurver
. We’ll first need to do a background correction to remove the absorbance signal due to media. Then we can run Growthcurver
and look at the results.
# First, we'll do a background correction by subtracting the minimum
# value from the well during the growth curve from all the timepoints.
# You could replace the following two lines with your favorite background
# correction method (one is given in the "Format of the input data" section).
min_value <- min(d$A1)
d$A1 <- d$A1 - min_value
# Now, we'll use Growthcurver to summarize the growth curve data using the
# simple background correction method (minimum value correction).
# This returns an object of type "gcfit" that holds information about
# the best parameters, the model fit, and additional metrics summarizing
# the growth curve.
gc_fit <- SummarizeGrowth(d$time, d$A1)
# It is easy to get the most useful metrics from a gcfit object
gc_fit
## Fit data to K / (1 + ((K - N0) / N0) * exp(-r * t)):
## K N0 r
## val: 0.336 0 1.119
## Residual standard error: 0.004685978 on 142 degrees of freedom
##
## Other useful metrics:
## DT 1 / DT auc_l auc_e
## 0.62 1.6e+00 5.11 5.15
# And it is easy to plot the raw data and the best fit logistic curve
plot(gc_fit)
The object returned from SummarizeGrowth
contains the metrics, and you can view them easily (as you did above), or you can manipulate them with R commands (as we will do now).
# The gcfit object returned from SummarizeGrowth also contains further metrics
# summarizing the growth curve data.
gc_fit$vals
# look at the structure of the gc_fit object
str(gc_fit)
You’ll see that there are three main parts of a gcfit
object: the vals
, the model
, and the data
. vals
contains the summarized metrics for your growth curver, model
contains the details of the model fit (for advanced users), and data
contains the input data to SummarizeGrowth
. For most purposes, vals
is the most useful. Let’s see what else we can access in the vals
.
# To see all the available metrics
str(gc_fit$vals)
## List of 15
## $ k : num 0.336
## $ k_se : num 0.000552
## $ k_p : num 1.65e-244
## $ n0 : num 1.82e-05
## $ n0_se: num 2.42e-06
## $ n0_p : num 5.59e-12
## $ r : num 1.12
## $ r_se : num 0.0151
## $ r_p : num 3.57e-115
## $ sigma: num 0.00469
## $ df : num 142
## $ t_mid: num 8.78
## $ t_gen: num 0.62
## $ auc_l: num 5.11
## $ auc_e: num 5.15
## - attr(*, "class")= chr "gcvals"
# To access a single metric (for example the residual sum of squares
# from the fit of the model to the data)
gc_fit$vals$sigma
## [1] 0.004685978
The most useful values are k
, n0
, and r
, which are the values of the parameters for the logistic equation that best fit the data. The fitting algorithm provides a measure of uncertainty for each, which is available (for n
) in the n_p
and n_se
values, for example. The values sigma
and df
are both determined during the nonlinear regression fit. Df
is the degrees of freedom and sigma
is a measure of the goodnesss of fit of the parameters of the logistic equation for the data; it is the residual sum of squares from the nonlinear regression model. Smaller sigma
values indicate a better fit of the logistic curve to the data than larger values. t_mid
is the time at which the population density reaches \(\frac{1}{2}K\) (which occurs at the inflection point), t_gen
is the fastest possible generation time (also called the doubling time), auc_l
is the area under the logistic curve obtained by taking the integral of the logistic equation, and auc_e
is the empirical area under the curve which is obtained by summing up the area under the experimental curve from the measurements in the input data.
One often measures growth curves in a plate reader for 96 wells at the same time. Following is some sample R code that uses Growthcurver
to summarize the growth curve data for a whole plate. To adapt this to your data, just ensure that your data are in the same format as the example data (discussed previously in the Format of the input data section). Most importantly, this code assumes that the time information is stored in a column named time
.
Here, we will first create a data frame in which to store the output data. We will then loop through the columns in the experimental growth curve data, call SummarizeGrowth
for each, and store the metrics for each column in the output data frame.
# As in the simple example, load the package and the data.
library(growthcurver)
d <- growthdata
# Let's create an output data frame to store the results in.
# We'll create it so that it is the right size (it's faster this way!),
# but leave it empty.
num_analyses <- length(names(d)) - 1
d_gc <- data.frame(sample = character(num_analyses),
k = numeric(num_analyses),
n0 = numeric(num_analyses),
r = numeric(num_analyses),
t_mid = numeric(num_analyses),
t_gen = numeric(num_analyses),
auc_l = numeric(num_analyses),
auc_e = numeric(num_analyses),
sigma = numeric(num_analyses),
stringsAsFactors = FALSE)
# Truncate or trim the input data to observations occuring in the first 20 hours.
# Remember that the times in these sample data are reported in hours. To use
# minutes (or to trim at a different time), change the next line of code.
# For example, if you still would like to trim at 20 hours, but your time data
# are reported in minutes use: trim_at_time <- 20 * 60
trim_at_time <- 20
# Now, loop through all of the columns in the data frame. For each column,
# run Growthcurver, save the most useful metrics in the output data frame,
# and make a plot of all the growth curve data and their best fits.
# First, create a plot for each of the wells in the 96-well plate.
# Uncomment the next line to save the plots from your 96-well plate to a
# pdf file in the working directory.
# pdf("growthcurver.pdf", height = 8.5, width = 11)
par(mfcol = c(8,12))
par(mar = c(0.25,0.25,0.25,0.25))
y_lim_max <- max(d[,setdiff(names(d), "time")]) - min(d[,setdiff(names(d), "time")])
n <- 1 # keeps track of the current row in the output data frame
for (col_name in names(d)) {
# Don't process the column called "time". It contains time and not OD600 data.
if (col_name != "time") {
# Create a temporary data frame that contains just the time and current col
d_loop <- d[, c("time", col_name)]
# Do the background correction.
# You could replace the following two lines with your favorite background
# correction method (one is given in the "Format of the input data" section).
min_value <- min(d_loop[, col_name])
d_loop[, col_name] <- d_loop[, col_name] - min_value
# Now, call Growthcurver to calculate the metrics using SummarizeGrowth
gc_fit <- SummarizeGrowth(data_t = d_loop[, "time"],
data_n = d_loop[, col_name],
t_trim = trim_at_time)
# Now, add the metrics from this column to the next row (n) in the
# output data frame, and increment the row counter (n)
d_gc$sample[n] <- col_name
d_gc[n, 2:9] <- c(gc_fit$vals$k,
gc_fit$vals$n0,
gc_fit$vals$r,
gc_fit$vals$t_mid,
gc_fit$vals$t_gen,
gc_fit$vals$auc_l,
gc_fit$vals$auc_e,
gc_fit$vals$sigma)
n <- n + 1
# Finally, plot the raw data and the fitted curve
# Here, I'll just print some of the data points to keep the file size smaller
n_obs <- length(gc_fit$data$t)
idx_to_plot <- 1:20 / 20 * n_obs
plot(gc_fit$data$t[idx_to_plot], gc_fit$data$N[idx_to_plot],
pch = 20,
xlim = c(0, trim_at_time),
ylim = c(0, y_lim_max),
cex = 0.6, xaxt = "n", yaxt = "n")
text(x = trim_at_time / 4, y = y_lim_max, labels = col_name, pos = 1)
lines(gc_fit$data$t, predict(gc_fit$model), col = "red")
}
}
# Uncomment the next line to save the plots from your 96-well plate to a file
# dev.off()
After running the above code, the summary metrics are available for each well (each column in the input data corresponds to a single well). Each column of OD600 data is summarized, and the summary is a row in the output data frame that we created.
# Look at the first few rows (samples) of data in the output data frame.
# (I'm only showing the first 4 rows of results, but you may want to see more.
# You can either look at everything using the command "d_gc", or adjust the
# number of rows displayed by changing the 4 to something else,
# e.g., "d_gc[1:15,]").
d_gc[1:4, ]
## sample k n0 r t_mid t_gen auc_l auc_e sigma
## 1 A1 0.33586 2e-05 1.11875 8.77932 0.61957 3.76854 3.75229 0.00488
## 2 B1 0.40389 2e-05 1.02383 9.94781 0.67701 4.06000 4.03270 0.00470
## 3 C1 0.37075 2e-05 0.98563 10.09205 0.70325 3.67337 3.64760 0.00463
## 4 D1 0.38262 2e-05 1.02168 9.64024 0.67844 3.96389 3.95078 0.00572
Sometimes Growthcurver
doesn’t find the best fit for your data, especially if the population didn’t reach stationary phase during your data collection. To see if this is the case, we strongly recommend plotting all the curves and checking them manually, or looking for outliers that have unusually large sigma
values. Each sigma value is the residual sum of squares from the fit of the logistic curve to the data. To simplify looking at the sigma values, I use the package dplyr
for data wrangling and exploration.
# Load dplyr and the sample data
library(dplyr)
d_gc <- as_data_frame(d_gc)
# Plot a histogram of the sigma values in order to check for outliers
hist(d_gc$sigma, main = "Histogram of sigma values", xlab = "sigma")
# Show the top 5 samples with the largest sigma value
# (with the worst model fit to the growth curve data)
d_gc %>% top_n(5, sigma) %>% arrange(desc(sigma))
## Source: local data frame [5 x 9]
##
## sample k n0 r t_mid t_gen auc_l auc_e sigma
## (chr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
## 1 E12 0.35551 4e-05 0.85970 10.72706 0.80627 3.29671 3.31902 0.00813
## 2 B11 0.57500 2e-05 0.93970 11.12467 0.73763 5.10347 5.07103 0.00614
## 3 D8 0.52894 2e-05 1.27314 8.02597 0.54444 6.33354 6.29540 0.00588
## 4 G12 0.43505 2e-05 0.77377 12.65278 0.89581 3.19832 3.18598 0.00574
## 5 D1 0.38262 2e-05 1.02168 9.64024 0.67844 3.96389 3.95078 0.00572
We simulated this dataset, so it is not very noisy. Therefore, it is not surprising that there aren’t any extreme sigma
outliers in this case.