The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

PCCC: An Example Using the Center for Disease Control’s Multiple Cause of Death Dataset

James Feinstein

Seth Russell

Tell Bennett

2020-06-01

Introduction

This vignette provides an example using publicly available death certificate data to illustrate how the pccc package generates the Complex Chronic Condition (CCC) categories from ICD-9-CM and ICD-10-CM codes. For an overview of the CCC classification system, see pccc-overview.

To evaluate the code chunks in this example you will need to load the following R packages.

library(pccc)
library(dplyr)

Accessing the Data

The Center for Disease Control maintains vital statistics including death certificate data. The publicly available death certificate data, known as the Multiple Cause of Death (MCD) file, contain ICD diagnostic codes specifying the diseases and conditions leading to each decedent’s death. In particular, the 1996 MCD data contain both ICD-9-CM and ICD-10 codes, making it an ideal example to demonstrate how the PCCC software categorizes ICD codes. Please note that because of the way ICD-9-CM codes are mapped to ICD-10-CM codes (https://www.cms.gov/Medicare/Coding/ICD10/2018-ICD-10-CM-and-GEMs.html), the calculated frequencies of CCCs may differ between corresponding ICD-9-CM and ICD-10-CM diagnosis codes for a decedent.

The data documentation and instructions for direct download are available at: ftp://ftp.cdc.gov/pub/health_statistics/nchs/datasets/comparability/icd9_icd10/ICD9_ICD10_comparability_file_documentation.pdf

Preparing the Data

For this illustrative example, we have provided just 2 columns of the data for decedents <=21 years old: the ICD-9-CM underlying cause of death diagnosis code and the ICD-10-CM underlying cause of death diagnosis code. If you wish to recreate the data yourself from the direct download site, you will need to utilize column positions 142-145 (ICD-9-CM) and 444-447 (ICD-10) and restrict the data to records with age <=21 years (column positions 64 - 66).

Here’s a sample of how the file could be read and processed:

# download and unzip file from ftp://ftp.cdc.gov/pub/health_statistics/nchs/datasets/comparability/icd9_icd10/ICD9_ICD10_comparability_public_use_ASCII.ZIP
# columns of interest
# start end     width description
# 64  -  64     1     Age Code
# 65  -  66     2     Age Value
#                     Code Value     Description
#                     0    01-99     Years less than 100
#                     1    00-99     Years 100 or more
#                     2    01-11,99  Months
#                     3    01-03,99  Weeks
#                     4    01-27,99  Days
#                     5    01-23, 99 Hours
#                     6    01-59, 99 Minutes
#                     9    99        Age not stated
# 142 - 145     4     ICD Code 9th Revision (Underlying Cause of Death)
# 444 - 447     4     ICD-10 Underlying Cause Code

library(readr)

mcod <- readr::read_fwf("ICD9_ICD10_comparability_public_use_ASCII.dat",
                        readr::fwf_positions(
                          start = c(64, 65, 142, 444),
                          end = c(64, 66, 145, 447),
                          col_names = c('age_code', 'age', 'icd9', 'icd10')),
                        col_types = 'iicc')
mcod <- mcod[
             (mcod$age_code == 0 & mcod$age <= 21) |
             (mcod$age_code %in% c(2, 3, 4, 5, 6))
            , ]
mcod <- dplyr::mutate(mcod, id = seq_along(age))
mcod <- mcod[c("id", "icd9", "icd10")]

Running the PCCC Software

Within the example data, there are 2 string variables for ICD-9-CM and ICD-10 codes. If you inspect the first 10 rows of the codes, you will notice they conform to the formatting guidelines outlined in the PCCC overview file pccc-overview.

# Show data
head(pccc::comparability, 10)
#>    id icd9 icd10
#> 1   1  912   W80
#> 2   2 7423  Q039
#> 3   3 7980   R95
#> 4   4 9229   W34
#> 5   5 8199  V892
#> 6   6 8120  V877
#> 7   7 7718  D689
#> 8   8 7980   R95
#> 9   9 7980   R95
#> 10 10 7650  P072

To run the PCCC classification on the ICD-9-CM codes:

# Run PCCC on ICD-9-CM codes

ccc_result_icd9 <-
    ccc(pccc::comparability, # get id, dx, and pc columns
        id      = id,
        dx_cols = icd9,
        pc_cols = ,
        icdv    = 09)

# review results
head(ccc_result_icd9)
#>   id neuromusc cvd respiratory renal gi hemato_immu metabolic congeni_genetic
#> 1  1         0   0           0     0  0           0         0               0
#> 2  2         1   0           0     0  0           0         0               0
#> 3  3         0   0           0     0  0           0         0               0
#> 4  4         0   0           0     0  0           0         0               0
#> 5  5         0   0           0     0  0           0         0               0
#> 6  6         0   0           0     0  0           0         0               0
#>   malignancy neonatal tech_dep transplant ccc_flag
#> 1          0        0        0          0        0
#> 2          0        0        0          0        1
#> 3          0        0        0          0        0
#> 4          0        0        0          0        0
#> 5          0        0        0          0        0
#> 6          0        0        0          0        0

# view number of patients with each CCC
sum_results <- dplyr::summarize_at(ccc_result_icd9, vars(-id), sum) %>% print.data.frame
#>   neuromusc  cvd respiratory renal  gi hemato_immu metabolic congeni_genetic
#> 1      2559 3341        1651   366 189         794       294            2146
#>   malignancy neonatal tech_dep transplant ccc_flag
#> 1       2848     1202        6          0    15390

# view percent of total population with each CCC
dplyr::summarize_at(ccc_result_icd9, vars(-id), mean) %>% print.data.frame
#>    neuromusc        cvd respiratory       renal          gi hemato_immu
#> 1 0.03934683 0.05137076  0.02538555 0.005627566 0.002906038  0.01220844
#>     metabolic congeni_genetic malignancy   neonatal     tech_dep transplant
#> 1 0.004520504       0.0329966 0.04379046 0.01848179 9.225518e-05          0
#>    ccc_flag
#> 1 0.2366345

To run the PCCC classification on the ICD-10-CM codes:

# Run PCCC on ICD-10-CM codes

ccc_result_icd10 <-
    ccc(pccc::comparability, # get id, dx, and pc columns
        id      = id,
        dx_cols = icd10,
        pc_cols = ,
        icdv    = 10)

# review results
head(ccc_result_icd10)
#>   id neuromusc cvd respiratory renal gi hemato_immu metabolic congeni_genetic
#> 1  1         0   0           0     0  0           0         0               0
#> 2  2         1   0           0     0  0           0         0               0
#> 3  3         0   0           0     0  0           0         0               0
#> 4  4         0   0           0     0  0           0         0               0
#> 5  5         0   0           0     0  0           0         0               0
#> 6  6         0   0           0     0  0           0         0               0
#>   malignancy neonatal tech_dep transplant ccc_flag
#> 1          0        0        0          0        0
#> 2          0        0        0          0        1
#> 3          0        0        0          0        0
#> 4          0        0        0          0        0
#> 5          0        0        0          0        0
#> 6          0        0        0          0        0

# view number of patients with each CCC
sum_results <- dplyr::summarize_at(ccc_result_icd10, vars(-id), sum) %>% print.data.frame
#>   neuromusc  cvd respiratory renal  gi hemato_immu metabolic congeni_genetic
#> 1      1990 3500        1385   377 185         794       277            1979
#>   malignancy neonatal tech_dep transplant ccc_flag
#> 1       2949     1421        6          0    14863

# view percent of total population with each CCC
dplyr::summarize_at(ccc_result_icd10, vars(-id), mean) %>% print.data.frame
#>    neuromusc        cvd respiratory     renal          gi hemato_immu
#> 1 0.03059797 0.05381552  0.02129557 0.0057967 0.002844535  0.01220844
#>     metabolic congeni_genetic malignancy  neonatal     tech_dep transplant
#> 1 0.004259114      0.03042883 0.04534342 0.0218491 9.225518e-05          0
#>    ccc_flag
#> 1 0.2285315

References

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.