statcanR Vignette: Example 1

Thierry Warin

Intro

In this vignette, we leverage a Statistics Canada dataset in order to demonstrate the commands available in the statcanR package and their functionality. More detailed data analyses are provided in a separate vignette.

We are interested in exploring how different organizations are succeeding in adopting sustainable practices. To do this, we begin by using the “statcan_search” command. This command allows us to match keywords related to our topic of interest with a database/ databases available on the Statistics Canada website. Given our example topic of interest, some keywords could be “green” and “business”.


Data identification: Using the statcan_search() command

library(statcanR)
library(dplyr)
library(DT)
library(curl)
library(dplyr)
library(ggplot2)
library(reshape2)

# Search for data table with keywords 
statcan_search(c("green","business"),"eng")

We will choose the first dataset, which is on barriers that businesses or organizations face in adopting more green practices over the next 12 months, for the third quarter of 2022. Note that there is already a unique identifier available in this table. We can directly copy this identifier and paste it into the next command: “statcan_download_data”.


Data extraction: Using the statcan_download_data() command

# Enter tableNumber and preferred language ("eng" for Enlgish or "fra" for French)
biz_barriers <- statcan_download_data(tableNumber = "33-10-0548-01",lang="eng")  %>%
  rename(potential_barriers = `Potential barriers of the business or organization when adopting more green practices over the next 12 months`) %>%
  
  # Remove unneeded variables
  select(-c(REF_DATE,DGUID,UOM_ID,SCALAR_FACTOR,SCALAR_ID,VECTOR,COORDINATE,SYMBOL,TERMINATED,DECIMALS))

datatable(biz_barriers, options = list(pageLength = 5))


Analysis / Exploration

The first part of the analysis will concentrate on which types of organizations are unable to adopt sustainable practices due to COVID. To do this we filter for the COVID barrier and then tabulate across the different types of businesses. Furthermore, we want to only keep data that are considered acceptable to be published by Statistics Canada (i.e., those that have at least a “D” status). Finally we will examine how these adoptability values differ across Canadian provinces.

Let’s first look at how covid is preventing sustainability adoption across different industries. To do this, consider two different industries - for example, construction and finance/ insurance.


# Filter for COVID barrier
covid_barrier <-biz_barriers %>% 
  subset(.,potential_barriers == "COVID-19") %>%
  
  
# Filter for data status
  subset(.,STATUS == "A" | STATUS == "B" | STATUS == "C" | STATUS == "D") %>%
  
# Filter for province
  subset(.,GEO != "Canada" & GEO != "Nunavut" & GEO != "Northwest Territories" & GEO != "Yukon") %>%
  distinct(GEO,STATUS,`Business characteristics`, .keep_all = T)

# Construction vs fin/insr
ind_comparison <- subset(covid_barrier, `Business characteristics` == "Finance and insurance [52]"| `Business characteristics` == "Construction [23]") %>%
  mutate(biz_type = as.factor(`Business characteristics`)) %>%
  select(biz_type,GEO,VALUE)

# Plot data
ggplot2::ggplot(ind_comparison,aes(x = GEO,y = VALUE),) + 
    geom_bar(aes(fill = biz_type),stat = "identity",position = "dodge", width = 0.5) +
    theme(axis.text = element_text(size = 7)) +
    ylab("Percent of businesses") +
    ggtitle("Percent of businesses facing sustainability barriers due to COVID") 


Staying within the business barriers to sustainable development, we now take a more macro-economic perspective and consider the Canadian economy as a whole. At the same time, we now control for the size of businesses and see how different barriers have.

biz_size <- biz_barriers %>%
  subset(.,`Business characteristics` == "1 to 4 employees" | `Business characteristics` == "5 to 19 employees" | `Business characteristics` == "20 to 99 employees" | `Business characteristics` == "100 or more employees") %>%
  subset(.,GEO == "Canada") %>%
  mutate(biz_type = as.factor(`Business characteristics`)) %>%
  mutate(biz_type = ordered(biz_type, levels = c("100 or more employees",  "20 to 99 employees", "5 to 19 employees" ,"1 to 4 employees"))) %>%
  select(biz_type, potential_barriers, VALUE)

biz_size$potential_barriers[biz_size$potential_barriers == "Potential barriers of the business or organization when adopting more green practices over the next 12 months, other"] <- "Other"
biz_size$potential_barriers[biz_size$potential_barriers == "Potential barriers of the business or organization when adopting more green practices over the next 12 months, none"] <- "None"


 ggplot(arrange(biz_size, -biz_type),aes(x = potential_barriers,y = VALUE),) + 
    geom_bar(aes(fill = biz_type),stat = "identity",position = "dodge", width = 0.5) +
    theme(axis.text = element_text(size = 8)) +
    ylab("Percent of businesses") +
    xlab("") +
    coord_flip()+
    ggtitle("Percent of businesses facing sustainability barriers due to COVID") 


Here, there is no obvious type of business in terms of number of employees that was has more or less barriers to adapt their practices with respect to sustainability measures. This is opposed to the breakdown by type of business, where one industry (construction) was clearly more impacted by COVID than another (finance/insurance) in their attempt to make their business practices sustainable.

Conclusion

We see that in all but one province, COVID is causing manufacturing businesses to have a harder time adopting their practices to be sustainable compared to finance/insurance businesses.