The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Type: Package
Title: Inspect and Clean Subject-Generated ID Codes and Related Data
Version: 1.0.0
Maintainer: Annemarie Pläschke <anneplaeschke@gmail.com>
Description: Makes data wrangling with ID-related aspects more comfortable. Provides functions that make it easy to inspect various subject-generated ID codes (SGIC) for plausibility. Also helps with inspecting other common identifiers, ensuring that your data stays clean and reliable.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.2
Suggests: knitr, rmarkdown, spelling, testthat (≥ 3.0.0)
Config/testthat/edition: 3
Depends: R (≥ 2.10)
Imports: dplyr, tibble, rlang
VignetteBuilder: knitr
URL: https://kuuuwe.github.io/trustmebro/, https://github.com/kuuuwe/trustmebro
Language: en-US
BugReports: https://github.com/kuuuwe/trustmebro/issues
NeedsCompilation: no
Packaged: 2025-05-07 20:50:26 UTC; Anwender
Author: Annemarie Pläschke ORCID iD [aut, cre, cph], Tobias Brändle ORCID iD [aut]
Repository: CRAN
Date/Publication: 2025-05-09 14:10:02 UTC

Identify duplicate cases

Description

Identify duplicate cases in a data frame or tibble based on specific variables. A logical column 'has_dupes' is added, that indicates whether or not a row has duplicate values based on the provided variables.

Usage

find_dupes(data, ...)

Arguments

data

A data frame or tibble

...

Variable names to check for duplicates

Value

The original data frame or tibble with an additional logical column 'has_dupes' which is 'TRUE' for rows that have duplicates based on the specified variables and 'FALSE' otherwise.

Examples

# Example data
print(sailor_students)

# Find duplicate cases based on 'sgic', 'school' and 'class'
sailor_students_dupes <- find_dupes(sailor_students, sgic, school, class)

# Rows where 'has_dupes' is `TRUE` indicate duplicates based on the provided columns
print(sailor_students_dupes)

Inspect birthday-component of a string

Description

Check whether a given string contains exactly one two-digit number that represents a valid day of the month (between 01 and 31). The string is assumed to be a code (e.g., a SGIC), which may include letters and digits.

Usage

inspect_birthday(code)

Arguments

code

A character string containing a SGIC or similar code that may include a numeric birthday-component.

Value

A logical value: 'TRUE' if the string contains only one valid birthday-component (between 01 and 31), otherwise 'FALSE'.

Examples

inspect_birthday("DEF66") # FALSE - 66 is not a valid day
inspect_birthday("GHI02") # TRUE - 02 is a valid day
inspect_birthday("ABC12DEF34") # FALSE - Multiple numeric components
inspect_birthday("XYZ") # FALSE - No numeric component
inspect_birthday("JKL31") # TRUE - 31 is a valid day

Inspect birthday- and birthmonth-component of a string

Description

Checks whether a given string contains exactly one four-digit number representing a valid combination of a day (birthday) and a month (birth month). Numeric components can be interpreted in either "DDMM" (day-month) or "MMDD" (month-day) format, depending on the specified format. The string is assumed to be a code (e.g., a SGIC), which may include letters and digits.

Usage

inspect_birthdaymonth(code, format = "DDMM")

Arguments

code

A character string containing a SGIC or similar code that may include a numeric component representing a birthday and birth month.

format

A string specifying the format of the date of birth components in code. Use "DDMM" for day-month format and "MMDD" for month-day format. Default is "DDMM".

Value

A logical value: 'TRUE' if the string contains exactly one valid numeric component that forms a valid birthday (day and month), otherwise 'FALSE'.

Examples

inspect_birthdaymonth("DEF2802") # TRUE - 28th of February is a valid date
inspect_birthdaymonth("GHI3002") # FALSE - 30th of February is invalid
inspect_birthdaymonth("XYZ3112") # TRUE - 31st of December is valid
inspect_birthdaymonth("18DEF02") # FALSE - Multiple numeric components
inspect_birthdaymonth("XYZ") # FALSE - No numeric components
inspect_birthdaymonth("ABC1231", format = "MMDD") # TRUE - December 31st is valid

Inspect birthmonth-component of a string

Description

Check whether a given string contains exactly one two-digit number that represents a valid month of the year (between 01 and 12). The string is assumed to be a code (e.g., a SGIC), which may include letters and digits.

Usage

inspect_birthmonth(code)

Arguments

code

A character string containing a SGIC or similar code that may include a numeric birth month-component.

Value

A logical value: 'TRUE' if the string contains only one valid birth month-component (between 01 and 12), otherwise 'FALSE'.

Examples

inspect_birthday("DEF66") # FALSE - 66 is not a valid month
inspect_birthday("GHI02") # TRUE - 02 (February) is a valid month
inspect_birthday("ABC12DEF10") # FALSE - Multiple numeric components
inspect_birthday("XYZ") # FALSE - No numeric component
inspect_birthday("JKL11") # TRUE - 11 (November) is a valid day

Inspect if a string matches an expected pattern

Description

Check whether a given string matches a specified pattern using regular expressions (regex). The string is assumed to be a code (e.g., a SGIC), which should follow a predefined format.

Usage

inspect_characterid(code, pattern)

Arguments

code

A character string containing a SGIC or similar code that should follow a predefined format.

pattern

A character string specifying the expected pattern using regular expressions (regex). The pattern defines the format 'code' should match.

Value

A logical value: 'TRUE' if the code matches the expected pattern, otherwise 'FALSE'

Examples

inspect_characterid("ABC1234", "^[A-Za-z]{3}[0-9]{4}$") #TRUE - Matches the pattern
inspect_characterid("12DBG45FG", "^[A-Za-z]{3}[0-9]{4}$") #FALSE - Does not match the pattern

Inspect if a number has the expected length

Description

Check whether a given numeric value has the expected number of digits.

Usage

inspect_numberid(number, expected_length)

Arguments

number

A numeric value.

expected_length

An integer specifying the expected number of digits.

Value

A logical value: 'TRUE' if 'number' has the expected length and consists only of digits, otherwise 'FALSE'.

Examples

inspect_numberid(12345, 5)  # TRUE - 5 digits
inspect_numberid(1234, 5)    # FALSE - 4 digits

Inspect if a value is in a recode map

Description

Check whether a given value is present as a key in a specified recode map. Inputs can be validated against a set of predefined categories or labels.

Usage

inspect_valinvec(value, recode_map)

Arguments

value

A single value to inspect, which is checked against the keys of a recode map.

recode_map

A named vector where the names represent the keys to check against. The values of the vector are ignored.

Value

A logical value: 'TRUE' if the 'value' is a key in the 'recode_map', otherwise 'FALSE'.

Examples

recode_map <- c(male = "M", female = "F")
inspect_valinvec("female", recode_map) # TRUE - "female" is a key in the recode map
inspect_valinvec("other", recode_map) # FALSE - "other" is not a key in the recode map

Purge strings in a data frame

Description

Clean specified character columns in a data frame or tibble by removing non-alphanumeric characters, replacing them with a specified character (default is "#"). Also replaces NA values and allows for additional characters to keep in the cleaned strings. The resulting strings are converted to uppercase.

Usage

purge_string(data, ..., replacement = "#", keep = "")

Arguments

data

A data frame or tibble containing columns to be cleaned.

...

Variables to clean. If none are provided, all character columns will be processed.

replacement

A character string used to replace unwanted characters and empty strings. Default is "#".

keep

A character string containing any additional characters that should be retained in the cleaned strings.

Value

A data frame or tibble with the specified character columns cleaned and modified as per the given parameters.

Examples

# Example data
print(sailor_students)

# Clean all character columns, replacing unwanted characters with "#", retaining "-" 
sailor_students_cleaned <- 
purge_string(sailor_students, sgic, school, class, gender, keep = "-")

# Tibble with cleaned 'sgic', 'school', 'class' and 'gender' columns
print(sailor_students_cleaned)

Recode a variable

Description

Recode a specified variable in a data frame or tibble based on a provided recode map. If the recode map is empty, the original variable is retained under a new name.

Usage

recode_valinvec(data, var, recode_map, new_var)

Arguments

data

A data frame or tibble.

var

A variable to be recoded.

recode_map

A named vector specifying the recode map.

new_var

Name of the new variable holding the recoded values.

Value

A data frame or tibble with the new variable added.

Examples

# Example data
print(sailor_students)

# Define a recode map for gender
recode_map_gender <- c("Female" = "F", "Male" = "M", "Other" = "X")

# Recode gender
sailor_students_recoded <- 
recode_valinvec(sailor_students, gender, recode_map_gender, recode_gender)

# A tibble with a recoded gender variable
print(sailor_students_recoded)

key data on students from the sailor moon universe

Description

A fictional key data set.

Usage

sailor_keys

Format

'sailor_keys' A tibble with 12 rows and 6 columns:

schoolyear

schoolyear

guid

hexadecimal ID number

name, birthday, sex

student information

school, schoolnumber, class, grade_level

school information

sgic1, sgic2, sgic3

subject generated ID


assessment data on students from the sailor moon universe

Description

A fictional assessment data set.

Usage

sailor_students

Format

'sailor_students' A tibble with 12 rows and 6 columns:

sgic

Subject generated ID

school

schoolnumber

class

class designation

gender

gender

testscore_language, testscore_calculus

testscores

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.