The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Authors: Steven Liao (steven.liao@ucr.edu), In Song Kim (insong@mit.edu), Sayumi Miyano (smiyano@princeton.edu), Feng Zhu (zucxjo@gmail.com)
This R package provides a set of utilities for matching products in different classification codes used in international trade research. It currently supports concordance between the classifications below:
Support between the above and the below classifications will be offered soon:
Additionally, the package provides functions for:
concordance
is available on CRAN and can be installed
using:
install.packages("concordance")
You can install the most recent development version of
concordance
using the devtools
package. First
you have to install devtools
using the following code. Note
that you only have to do this once:
if(!require(devtools)) install.packages("devtools")
Then, load devtools
and use the function
install_github()
to install concordance
:
library(devtools)
install_github("insongkim/concordance", dependencies=TRUE)
To cite concordance
in publications use:
Steven Liao, In Song Kim, Sayumi Miyano, Feng Zhu (2020). concordance: Product Concordance.
R package version 2.0.0. https://CRAN.R-project.org/package=concordance
A BibTeX entry for LaTeX users is:
@Manual{,
title = {concordance: Product Concordance},
author = {Steven Liao and In Song Kim and Sayumi Miyano and Feng Zhu},
year = {2020},
note = {R package version 2.0.0},
url = {https://CRAN.R-project.org/package=concordance},
}
Users can look up the product description of different classification
codes using the get_desc
function. The example below
focuses on HS codes.
# load package
library(concordance)
# get product description
get_desc(sourcevar = c("120600", "854690"), origin = "HS5")
[1] "Oil seeds; sunflower seeds, whether or not broken" "Electrical insulators; other than of glass and ceramics"
Users can also input codes with different digits. For HS codes, 2, 4, 6-digits are supported. Note that users should always include leading zeroes in the codes (e.g. use HS code 010110 instead of 10110) – results may be buggy otherwise.
# get product description
get_desc(sourcevar = c("1206", "8546"), origin = "HS5")
[1] "Sunflower seeds; whether or not broken" "Electrical insulators of any material"
# get product description
get_desc(sourcevar = c("12", "85"), origin = "HS5")
[1] "Oil seeds and oleaginous fruits; miscellaneous grains, seeds and fruit, industrial or medicinal plants; straw and fodder"
[2] "Electrical machinery and equipment and parts thereof; sound recorders and reproducers; television image and sound recorders and reproducers, parts and accessories of such articles"
Users can concord between different classification codes using the
concord
function. The example below converts HS5 to NAICS
codes.
Users can choose to retain all matches for each input by setting
all = TRUE
. This option will also return the share of
occurrences for each matched output among all matched outputs at the
user-specified digit level.
# HS to NAICS
concord(sourcevar = c("120600", "854690"),
origin = "HS5", destination = "NAICS",
dest.digit = 6, all = TRUE)
$`120600`
$`120600`$match
[1] "111120"
$`120600`$weight
[1] 1
$`854690`
$`854690`$match
[1] "326199" "335932"
$`854690`$weight
[1] 0.5 0.5
Alternatively, users can simply obtain the matched output with the
largest share of occurrences (the mode match) with
all = FALSE
(default). If the mode consists of multiple
matches, the function will return the first matched output.
concord(sourcevar = c("120600", "854690"),
origin = "HS5", destination = "NAICS",
dest.digit = 6, all = FALSE)
[1] "111120" "326199"
Users can double-check the validity of the matches with
get_desc
.
# get product description of NAICS ouput
get_desc(sourcevar = c("111120", "326199"), origin = "NAICS2017")
[1] "Oilseed (except Soybean) Farming" "All Other Plastics Product Manufacturing"
More technically, the function works by matching an input code to the most fine-grained level of destination codes in our package (e.g., the 6-digit NAICS codes above) and then calculates the occurrence share of each matched code at the user-specified digit-level. Mode(s) can occur when users choose destination codes at a more aggregated level and multiple finer-grained matched codes belong to certain groups at that level.
We illustrate the above mechanics using HS5 code “8546” as an example. When users ask for 6-digit NAICS codes (the most fine-grained level available), HS5 code “8546” is matched to five NAICS codes: “327212”, “327113”, “327110”, “326199”, and “335932”, with weights of 0.2 (1/5) each.
concord(sourcevar = "8546",
origin = "HS5", destination = "NAICS",
dest.digit = 6, all = TRUE)
$`8546`
$`8546`$match
[1] "327212" "327113" "327110" "326199" "335932"
$`8546`$weight
[1] 0.2 0.2 0.2 0.2 0.2
Instead, when users ask for 4-digit NAICS codes, HS5 code “8546” is matched to four NAICS codes: “3271”, “3272”, “3261”, “3359”. NAICS code “3271” gets a weight of 0.4 since it consists of two finer-grained matches “327113” and “327110” out of the 5 total matches (2/5).
concord(sourcevar = "8546",
origin = "HS5", destination = "NAICS",
dest.digit = 4, all = TRUE)
$`8546`
$`8546`$match
[1] "3271" "3272" "3261" "3359"
$`8546`$weight
[1] 0.4 0.2 0.2 0.2
Thus, when all = FALSE
, the function will retain the
matched code with the largest weight “3271”.
concord(sourcevar = "8546",
origin = "HS5", destination = "NAICS",
dest.digit = 4, all = FALSE)
[1] "3271"
Rauch (1999) classifies each SITC Rev. 2 industry according to three possible types:
The get_proddiff
function concords users’ input codes to
SITC2 codes and then extracts the corresponding Rauch
classifications.
There are two main options. First, users can set
prop = "n"
, prop = "r"
, or
prop = "w"
, in which case the function will return the
proportion of “w”, “r”, or “n” in the resulting vector of Rauch
indices.
# get the proportion of type "r"
get_proddiff(sourcevar = c("120600", "854690"), origin = "HS5", prop = "r")
120600 854690
1 0
If prop is not set to any of these, then the function returns, for each input code, a dataframe that summarizes all the frequencies and proportions of “w”, “r”, and “n”.
get_proddiff(sourcevar = c("120600", "854690"), origin = "HS5", prop = "")
$`120600`
rauch freq proportion
1 w 0 0
2 r 1 1
3 n 0 0
$`854690`
rauch freq proportion
1 w 0 0
2 r 0 0
3 n 1 1
Second, users can choose Rauch’s conservative classification with
setting = CON
(default). setting = LIB
returns
Rauch’s liberal classification.
get_proddiff(sourcevar = c("120600", "854690"), origin = "HS5", setting = "LIB", prop = "")
$`120600`
rauch freq proportion
1 w 1 1
2 r 0 0
3 n 0 0
$`854690`
rauch freq proportion
1 w 0 0
2 r 0 0
3 n 1 1
Broda and Weinstein (2006) estimate product-level import demand elasticities for 73 countries using HS0 3-digit codes.
The get_sigma
function concords users’ input codes to
3-digit HS0 codes and then extracts the corresponding product-level
elasticities in the country selected by the user.
There are two main options. First, when give_avg = TRUE
(default), each output element will be a simple average of all
elasticities (of matched codes) in the corresponding vector.
get_sigma(sourcevar = c("120600", "854690"), origin = "HS5",
country = "USA", give_avg = TRUE)
[1] 3.733456 1.233216
Users can also set give_avg = FALSE
to obtain the full
vector of elasticities for all matching codes of each element in the
input vector. In this case, there were only one matches per input.
get_sigma(sourcevar = c("120600", "854690"), origin = "HS5",
country = "USA", give_avg = FALSE)
$`120600`
$`120600`$elasticity
[1] 3.733456
$`854690`
$`854690`$elasticity
[1] 1.233216
Second, for the United States (only), Broda and Weinstein (2006) have
also estimated elasticities based on more fine-grained 5-digit SITC3
codes. Users can obtain elasticities in the United States via this
method with use_SITC = TRUE
.
get_sigma(sourcevar = c("120600", "854690"), origin = "HS5",
country = "USA", use_SITC = TRUE, give_avg = TRUE)
[1] 2.562991 1.345522
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.