library(framecleaner)
library(printr)
library(dplyr)
Data imported from excel and csv in business situations can have messy characteristics and data formats. This package provides functions to tidy your data frame using the power of tidyselect
.
create sample data
::tibble(
tibbledate = c("20190101", "20190305", "20201012"),
numeric_val = c(1, NA, 5),
char_val = c("", " val ", "-")
-> sample_table
)
sample_table
date | numeric_val | char_val |
---|---|---|
20190101 | 1 | |
20190305 | NA | val |
20201012 | 5 | - |
Data occasionally has different ways to represent NA values. set_na
checks as default c("-", "", " ", "null")
but any values can be supplied to automatically be set to NA. This is helpful when you want to check the NA profile of a data frame using validata::diagnose
%>%
sample_table make_na()
date | numeric_val | char_val |
---|---|---|
20190101 | 1 | NA |
20190305 | NA | val |
20201012 | 5 | NA |
remove whitespace from the ends of character variables that may be otherwise undetectable by inspection.
%>%
sample_table remove_whitespace()
date | numeric_val | char_val |
---|---|---|
20190101 | 1 | |
20190305 | NA | val |
20201012 | 5 | - |
automatically convert character columns that should be dates.
%>%
sample_table set_date()
date | numeric_val | char_val |
---|---|---|
2019-01-01 | 1 | |
2019-03-05 | NA | val |
2020-10-12 | 5 | - |
relocates an unorganized dataframe using heuristics such as putting character and date columns first, and organizing by alphabetical order.
%>%
sample_table relocate_all()
date | char_val | numeric_val |
---|---|---|
20190101 | 1 | |
20190305 | val | NA |
20201012 | - | 5 |
Wrapper function to apply all cleaning operations to a data frame using sensible defaults. I encourage you to build your own clean_frame
function to suit your needs.
%>%
sample_table clean_frame()
DATE | NUMERIC_VAL | CHAR_VAL |
---|---|---|
20190101 | 1 | NA |
20190305 | NA | val |
20201012 | 5 | NA |
use tidyselect to fill NAs with a single value
%>%
sample_table fill_na()
date | numeric_val | char_val |
---|---|---|
20190101 | 1 | |
20190305 | 0 | val |
20201012 | 5 | - |