Introduction

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction

The Executive Communcations Dataset (ECD) is a dataset comprised of executive communications across 41 differenct countries. The ecdata package is a minimal package to download data from the ecd repositories. It includes caching and data dicitionaries.

`load_ecd`

The default function for loading the ECD is load_ecd. This function will download data from our repositories and load them into memory. You can load the full ECD by setting load_ecd(full_ecd = TRUE) This can take awhile because you are downloading a 1.9GB parquet file.


full_ecd = load_ecd(full_ecd = TRUE)

If you want a specific country or countries you can feed a character vector to the country argument.


load_ecd(country = 'Greece')

The country argument tolerates some typos, common abbreviations, and common country names. If you want to load data based on the language of the statement you can provide a character string or character vector of languages to the language argument.


english = load_ecd(language = 'English')

polyglot = load_ecd(language = c('French', 'Italian', 'Korean'))

For a full list of accepted country names and abbreviations you can call ecd_country_dictionary


ecd_country_dictionary |>
  head()
#>   name_in_dataset  file_name language abbr_three_letter abbr_two_letter
#> 1       Argentina  argentina  Spanish               ARG              AR
#> 2       Australia  australia  English               AUS              AU
#> 3         Austria    austria  English               AUT              AT
#> 4      Azerbaijan azerbaijan  English               AZE              AZ
#> 5      Azerbaijan azerbaijan  English               AZE              AZ
#> 6         Bolivia    bolivia  Spanish               BOL              BO
#>   other_valid_inputs common_abr
#> 1               <NA>       <NA>
#> 2               <NA>       <NA>
#> 3               <NA>       <NA>
#> 4               <NA>       <NA>
#> 5               <NA>       <NA>
#> 6               <NA>       <NA>

Note that the time to download and load a file will vary a lot due to various file sizes.

`lazy_load_ecd`

We also have a “lazy” option which will download the files and then use arrow::open_dataset to open the dataset out of memory.


nigeria = lazy_load_ecd(country = 'Nigeria')

To bring the dataset into memory you simply need to call.


nigeria |>
  dplyr::collect()

This has some speed benefits when data wrangling. One thing to be aware of is that if you lazy load a dataset previously it may bring in additional files. To prevent this behavior run


clear_cache()

Then restart your R session.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.