tidy

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

tidy

The package provides functionalities to tidy a summarised result to obtain a dataframe with which is easier to do subsequent calculations.

In this line, the split functions, described in split and unite functions allow to interact with name-level columns.

For the estimates, we have the pivotEstimates function, and for the settings addSettings. Finally the tidy method accommodates the split and pivot functionalities in the same function.

Estimates

First, let’s load relevant libraries and create a mock summarised result table.

library(visOmopResults)
library(dplyr)
result <- mockSummarisedResult()
result |> glimpse()
#> Rows: 126
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ group_name       <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level      <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name      <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level     <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name    <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name    <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value   <chr> "9337847", "4006478", "2868369", "7818476", "9065176"…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

The function pivotEstimates adds columns containing the estimates values for each combination of columns in pivotEstimatesBy. For instance, in the following example we use the columns variable_name, variable_level, and estimate_name to pivot the estimates.

result |> 
  pivotEstimates(pivotEstimatesBy = c("variable_name", "variable_level", "estimate_name")) |>
  glimpse()
#> Rows: 18
#> Columns: 15
#> $ result_id                          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name                           <chr> "mock", "mock", "mock", "mock", "mo…
#> $ group_name                         <chr> "cohort_name", "cohort_name", "coho…
#> $ group_level                        <chr> "cohort1", "cohort1", "cohort1", "c…
#> $ strata_name                        <chr> "overall", "age_group &&& sex", "ag…
#> $ strata_level                       <chr> "overall", "<40 &&& Male", ">=40 &&…
#> $ additional_name                    <chr> "overall", "overall", "overall", "o…
#> $ additional_level                   <chr> "overall", "overall", "overall", "o…
#> $ `number subjects_NA_count`         <int> 9337847, 4006478, 2868369, 7818476,…
#> $ age_NA_mean                        <dbl> 30.49621, 27.51317, 19.64153, 84.40…
#> $ age_NA_sd                          <dbl> 3.3287556, 4.6797953, 3.8420378, 7.…
#> $ Medications_Amoxiciline_count      <int> 21944, 70846, 27309, 44353, 34557, …
#> $ Medications_Amoxiciline_percentage <dbl> 12.759029, 81.434293, 99.356778, 49…
#> $ Medications_Ibuprofen_count        <int> 2795, 1362, 94596, 12537, 66965, 25…
#> $ Medications_Ibuprofen_percentage   <dbl> 30.713166, 8.628628, 59.166925, 83.…

The argument nameStyle is to customise the names of the new columns. It uses the glue package syntax. For instance:

result |> 
  pivotEstimates(pivotEstimatesBy = "estimate_name",
                 nameStyle = "{toupper(estimate_name)}") |>
  glimpse()
#> Rows: 72
#> Columns: 14
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ group_name       <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level      <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name      <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level     <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name    <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ COUNT            <int> 9337847, 4006478, 2868369, 7818476, 9065176, 2211710,…
#> $ MEAN             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ SD               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ PERCENTAGE       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

Settings

The function addSettings adds a new column for each of the settings in the summarised result, if any:

mockSummarisedResult() |>
  addSettings() |>
  glimpse()
#> Rows: 126
#> Columns: 16
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ group_name       <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level      <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name      <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level     <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name    <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name    <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value   <chr> "2703410", "3101646", "4285343", "2451643", "6496595"…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ result_type      <chr> "mock_summarised_result", "mock_summarised_result", "…
#> $ package_name     <chr> "visOmopResults", "visOmopResults", "visOmopResults",…
#> $ package_version  <chr> "0.3.0", "0.3.0", "0.3.0", "0.3.0", "0.3.0", "0.3.0",…

Tidy

Finally, the method tidy incorporates the splitting pf name-level columns and pivotting of estimates and settings. By default, it splits group, strata and additional, pivots estimates by the columns “estimate_name” and also pivots the settings.

result <- mockSummarisedResult()

result |> 
  tidy() |> 
  glimpse()
#> Rows: 72
#> Columns: 14
#> $ result_id       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ cdm_name        <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock"…
#> $ cohort_name     <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1",…
#> $ age_group       <chr> "overall", "<40", ">=40", "<40", ">=40", "overall", "o…
#> $ sex             <chr> "overall", "Male", "Male", "Female", "Female", "Male",…
#> $ variable_name   <chr> "number subjects", "number subjects", "number subjects…
#> $ variable_level  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ count           <int> 3397666, 5378334, 1665180, 7493291, 1764428, 6818035, …
#> $ mean            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ sd              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ percentage      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ result_type     <chr> "mock_summarised_result", "mock_summarised_result", "m…
#> $ package_name    <chr> "visOmopResults", "visOmopResults", "visOmopResults", …
#> $ package_version <chr> "0.3.0", "0.3.0", "0.3.0", "0.3.0", "0.3.0", "0.3.0", …

Which column pairs to split can be customised with the split arguments, while pivotEstimatesBy and nameStyle are for pivotting estimates. If pivotEstimatesBy is NULL or character(), estimates will not be modified. Settings will always be pivotted if present.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.