Summary functions

summarize

sum_up prints detailed summary statistics (corresponds to Stata summarize)

N <- 100
df <- data_frame(
  id = 1:N,
  v1 = sample(5, N, TRUE),
  v2 = sample(1e6, N, TRUE)
)
sum_up(df)
df %>% sum_up(starts_with("v"), d = TRUE)
df %>% group_by(v1) %>%  sum_up()

tab prints distinct rows with their count. Compared to the dplyr function count, this command just adds Frequency and Cumulative frequency.

N <- 1e2 ; K = 10
df <- data_frame(
  id = sample(5, N, TRUE),
  v1 = sample(5, N, TRUE)
)
tab(df, id, v1)
tab(df, id, v1, na.rm = TRUE)
df %>% group_by(id) %>% tab(v1)

Join

join is a wrapper for dplyr merge functionalities, with two added functions

r # merge m:1 v1 join(x, y, kind = "full", check = m~1) - The option gen specifies the name of a new variable that identifies non matched and matched rows (as in Stata).

r # merge m:1 v1, gen(_merge) join(x, y, kind = "full", gen = "_merge")

Visual exploration

stat_binmean() is a layer for ggplot2. It allows to compute the mean of y over the mean of x within bins of x, similarly to Stata command binscatter,

ggplot(iris, aes(x = Sepal.Width , y = Sepal.Length)) + stat_binmean()

ggplot(iris, aes(x = Sepal.Width , y = Sepal.Length, color = Species)) + stat_binmean(n=10) 

Since stat_binmean is just a layer for ggplo2, you can surimpose any model fit

ggplot(iris, aes(x = Sepal.Width , y = Sepal.Length, color = Species)) + stat_binmean(n=10) + stat_smooth(method = "lm", se = FALSE)