Philipp Ottolinger
2016-04-04
In demography a Lexis diagram (named after economist and social scientist Wilhelm Lexis) is a two dimensional diagram that is used to represent events (such as births or deaths) that occur to individuals belonging to different cohorts. Calendar time is usually represented on the horizontal axis, while age is represented on the vertical axis. (https://en.wikipedia.org/wiki/Lexis_diagram)
LexisPlotR
provides a couple of functions to draw Lexis Diagrams in R. Besides the ability to draw empty Lexis grids, LexisPlotR
also offers some functionality to highlight certain areas of the grid or to add actual data to the Lexis Diagram.
A Lexis Diagram is basically determined by two measures: A range of years presented on the horizontal axis and a range of ages shown on the vertical axis. To plot an empty Lexis grid, use lexis.grid()
which takes these measures as numeric inputs:
library(LexisPlotR)
# Plot a Lexis grid from year 1900 to year 1905, representing the ages from 0 to 5
lexis.grid(year.start = 1900, year.end = 1905, age.start = 0, age.end = 5)
The aspect ratio of the axes is fixed to ensure right-angled triangles. So even non-square Lexis grids show right-angled triangles:
lexis.grid(year.start = 1900, year.end = 1905, age.start = 0, age.end = 7)
Sometimes it is useful to highlight certain areas in the Lexis Diagram, like a certain age, year or cohort. Highlighting a certain age in your grid is supported by lexis.age
which will draw a coloured rectangle inside your grid marking all points in the grid belonging to a certain age group.
First, define an empty Lexis grid with the desired dimensions:
mylexis <- lexis.grid(year.start = 1900, year.end = 1905, age.start = 0, age.end = 5)
mylexis
You can now use lexis.age()
to add a coloured layer to that Lexis grid:
# Highlight all points that belong to the age of 2
lexis.age(lg = mylexis, age = 2)
The default fill colour for lexis.age()
is "yellow"
, but you can change the colour as well as the level of transparency:
# Change the fill colour to "red" and make the layer nearly non-transparent
lexis.age(lg = mylexis, age = 2, fill = "red", alpha = 0.9)
lexis.year()
, which highlight a certain year, and lexis.cohort()
, which does the same thing for a desired cohort, work nearly the same:
# Highlight the year 1902
lexis.year(lg = mylexis, year = 1902)
# Highlight the cohort 1898
lexis.cohort(lg = mylexis, cohort = 1898)
Again, fill colour and the level of transparency can be altered:
# Highlight the year 1902, change fill colour to "orange" and increase transparency
lexis.year(lg = mylexis, year = 1902, fill = "orange", alpha = 0.2)
# Highlight the cohort 1898, change fill colour to "grey" and decrease transparency
lexis.cohort(lg = mylexis, cohort = 1898, fill = "grey", alpha = 0.8)
To add more than one layer or to make changes permanent you have to overwrite your Lexis object:
mylexis <- lexis.grid(year.start = 1900, year.end = 1905, age.start = 0, age.end = 5)
mylexis <- lexis.age(lg = mylexis, age = 2)
mylexis <- lexis.year(lg = mylexis, year = 1903)
mylexis <- lexis.cohort(lg = mylexis, cohort = 1898)
mylexis
A life line is a simple tool to represent an individual's life in a Lexis Diagram. The life line is a straight line and starts with the individual's birth at the respective point on the horizontal axis. The line ends with an individual's death (if observed).
To draw an arbitrary life line into your Lexis Diagram you use lexis.lifeline()
and provide at least an entry or birth date to the function. If death is not observed or the date of death unknown, exit
is NA
resulting in a never ending life line.
# Define a Lexis grid
mylexis <- lexis.grid(year.start = 1990, year.end = 1995, age.start = 0, age.end = 5)
# Add a life line for an individual born on 1991-09-23
lexis.lifeline(lg = mylexis, entry = "1991-09-23")
If death or any other date that can serve as an "exit" is observed, you can add the exit date:
lexis.lifeline(lg = mylexis, entry = "1991-09-23", exit = "1994-06-11")
You can also use entry and death dates from a data.frame
which is useful when plotting life lines of several individuals or hole populations. LexisPlotR
comes with a random dataset of entry and exit dates for 300 Individuals from 1895 to 1905. Some of the deaths (or exits) are not observed or unknown. Take a look at the lifelines_sample
dataset:
data("lifelines_sample")
str(lifelines_sample)
## 'data.frame': 300 obs. of 2 variables:
## $ entry: Date, format: "1898-04-25" "1899-12-28" ...
## $ exit : Date, format: "1898-07-30" NA ...
head(lifelines_sample, 10)
## entry exit
## 1 1898-04-25 1898-07-30
## 2 1899-12-28 <NA>
## 3 1903-01-15 <NA>
## 4 1901-04-13 <NA>
## 5 1895-05-30 1900-03-29
## 6 1897-09-22 <NA>
## 7 1896-02-16 1896-04-24
## 8 1896-11-13 1902-10-30
## 9 1904-10-31 <NA>
## 10 1899-04-02 1902-04-11
To add all this data to your Lexis Diagram, use lexis.lifeline()
and provide the respective columns of lifelines_sample
as arguments:
mylexis <- lexis.grid(year.start = 1900, year.end = 1905, age.start = 0, age.end = 5)
lexis.lifeline(lg = mylexis, entry = lifelines_sample$entry, exit = lifelines_sample$exit)
As this is just random data the plot is not really interesting and confusing. But you can change the default plotting behaviour and add marks to the lineends, change the colour and width of the lines as well as the level of transparency of the lines:
lexis.lifeline(lg = mylexis, entry = lifelines_sample$entry, exit = lifelines_sample$exit, lineends = TRUE, colour = "blue", lwd = 1.5, alpha = 0.3)
The Human Mortality Database (HMD) contains original calculations of death rates and life tables for national populations (countries or areas), as well as the input data used in constructing those tables. The input data consist of death counts from vital statistics, plus census counts, birth counts, and population estimates from various sources. http://www.mortality.org/Public/Overview.php
To access data from the HMD you first have to register for free.
The type of data we want to use here are the "Deaths by Lexis triangles" you can download for a couple of countries. These datafiles contain death counts for every year, age and cohort and so every row of these datafiles represents one of the triangles in the Lexis Diagram. The function lexis.hmd()
takes these death counts and the respective triangles according to a gradient scale.
First you have to download a "Deaths by Lexis triangles" file from the HMD. Alternatively you can use the sample data (Deaths_lexis_sample.txt
) that ships with LexisPlotR
. This raw dataset includes random deaths counts but emulates the structure of the HMD datafiles.
To load and prepare the HMD data for further usage, LexisPlotR ships with prepare.hmd()
which reads the raw data from the .txt
file and does some preparation.
# Find the path to the sample data
path <- system.file("extdata", "Deaths_lexis_sample.txt", package = "LexisPlotR")
# read the raw data with prepare.hmd()
mydata <- prepare.hmd(path)
## Warning in prepare.hmd(path): NAs durch Umwandlung erzeugt
# Inspect your data
str(mydata)
## 'data.frame': 4400 obs. of 13 variables:
## $ Year : num 1970 1970 1970 1970 1970 1970 1970 1970 1970 1970 ...
## $ Age : num 0 0 1 1 2 2 3 3 4 4 ...
## $ Cohort: num 1970 1969 1969 1968 1968 ...
## $ Female: int 473 367 151 273 721 564 819 421 674 290 ...
## $ Male : int 137 55 889 184 640 379 5 52 67 784 ...
## $ Total : int 610 422 1040 457 1361 943 824 473 741 1074 ...
## $ upper : logi FALSE TRUE FALSE TRUE FALSE TRUE ...
## $ x1 : Date, format: "1970-01-01" "1970-01-01" ...
## $ x2 : Date, format: "1971-01-01" "1970-01-01" ...
## $ x3 : Date, format: "1971-01-01" "1971-01-01" ...
## $ y1 : num 0 0 1 1 2 2 3 3 4 4 ...
## $ y2 : num 0 1 1 2 2 3 3 4 4 5 ...
## $ y3 : num 1 1 2 2 3 3 4 4 5 5 ...
summary(mydata[,c("Year", "Age", "Cohort")])
## Year Age Cohort
## Min. :1970 Min. : 0.0 Min. :1860
## 1st Qu.:1975 1st Qu.: 27.0 1st Qu.:1897
## Median :1980 Median : 54.5 Median :1924
## Mean :1980 Mean : 54.5 Mean :1924
## 3rd Qu.:1984 3rd Qu.: 82.0 3rd Qu.:1952
## Max. :1989 Max. :109.0 Max. :1989
As you see from summary()
this datafile contains death counts from 1970 to 1989 for the ages 0 to 109. In a Lexis grid with suitable dimensions these death counts can be plotted. You may choose whether to plot total death counts or death counts for females or males.
mylexis <- lexis.grid(year.start = 1980, year.end = 1985, age.start = 0, age.end = 5)
# Plot total death counts
lexis.hmd(lg = mylexis, hmd.data = mydata, column = "Total")
Again, this is just random data. For real insights use data from the HMD.
The HMD datafiles offer the following death counts: Total, Female and Male. If you want to plot the ratio of males on total death counts, you first have to add a respective column:
mydata$ratioMales <- mydata$Male / mydata$Total
lexis.hmd(lg = mylexis, hmd.data = mydata, column = "ratioMales")
LexisPlotR
is simply a specialised wrapper for ggplot2
. Therefore you can edit the appearance of your Lexis Diagram by adding some labs
and themes
just like with any other ggplot2
plot.
mylexis <- lexis.grid(year.start = 1900, year.end = 1905, age.start = 0, age.end = 5)
# Add a title
mylexis <- mylexis + labs(title = "LexisPlotR")
mylexis
# Change axis labels
mylexis <- mylexis + theme(axis.title = element_text(face = "bold", colour = "red"))
mylexis