The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Data imported from excel and csv in business situations can have
messy characteristics and data formats. This package provides functions
to tidy your data frame using the power of tidyselect
.
create sample data
tibble::tibble(
date = c("20190101", "20190305", "20201012"),
numeric_val = c(1, NA, 5),
char_val = c("", " val ", "-")
) -> sample_table
sample_table
## # A tibble: 3 × 3
## date numeric_val char_val
## <chr> <dbl> <chr>
## 1 20190101 1 ""
## 2 20190305 NA " val "
## 3 20201012 5 "-"
Data occasionally has different ways to represent NA values.
set_na
checks as default
c("-", "", " ", "null")
but any values can be supplied to
automatically be set to NA. This is helpful when you want to check the
NA profile of a data frame using validata::diagnose
## # A tibble: 3 × 3
## date numeric_val char_val
## <chr> <dbl> <chr>
## 1 20190101 1 <NA>
## 2 20190305 NA " val "
## 3 20201012 5 <NA>
remove whitespace from the ends of character variables that may be otherwise undetectable by inspection.
## # A tibble: 3 × 3
## date numeric_val char_val
## <chr> <dbl> <chr>
## 1 20190101 1 ""
## 2 20190305 NA "val"
## 3 20201012 5 "-"
automatically convert character columns that should be dates.
## # A tibble: 3 × 3
## date numeric_val char_val
## <date> <dbl> <chr>
## 1 2019-01-01 1 ""
## 2 2019-03-05 NA " val "
## 3 2020-10-12 5 "-"
relocates an unorganized dataframe using heuristics such as putting character and date columns first, and organizing by alphabetical order.
## # A tibble: 3 × 3
## date char_val numeric_val
## <chr> <chr> <dbl>
## 1 20190101 "" 1
## 2 20190305 " val " NA
## 3 20201012 "-" 5
Wrapper function to apply all cleaning operations to a data frame using sensible defaults.
## # A tibble: 3 × 3
## DATE NUMERIC_VAL CHAR_VAL
## <int> <int> <chr>
## 1 20190101 1 <NA>
## 2 20190305 NA " val "
## 3 20201012 5 <NA>
use tidyselect to fill NAs with a single value
## # A tibble: 3 × 3
## date numeric_val char_val
## <chr> <dbl> <chr>
## 1 20190101 1 ""
## 2 20190305 0 " val "
## 3 20201012 5 "-"
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.