The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Type: Package
Title: Survey Data Cleaning, Weighting and Analysis
Version: 1.0.3
Description: Provides utilities for cleaning survey data, computing weights, and performing descriptive statistical analysis. Methods follow Lohr (2019, ISBN:978-0367272454) "Sampling: Design and Analysis" and Lumley (2010) <doi:10.1002/9780470580066>.
License: GPL-3
Encoding: UTF-8
LazyData: true
Depends: R (≥ 4.0.0)
Imports: dplyr, ggplot2, rlang, stats
Suggests: knitr, rmarkdown, markdown, testthat
VignetteBuilder: knitr
RoxygenNote: 7.3.3
NeedsCompilation: no
Packaged: 2026-01-19 05:21:43 UTC; HP Computers
Author: Muhammad Ali [aut, cre]
Maintainer: Muhammad Ali <aliawan1170@gmail.com>
Repository: CRAN
Date/Publication: 2026-01-22 21:20:02 UTC

Apply survey weights to data

Description

This function applies survey weights by creating a weighted version of the dataset. The weights are normalized to sum to the sample size for computational stability.

Usage

apply_weights(data, weight_col)

Arguments

data

A data.frame containing survey data

weight_col

Character string specifying column name containing weights

Value

A data.frame with normalized weights

Examples

data <- data.frame(age = c(25, 30, 35), weight = c(1.2, 0.8, 1.0))
weighted_data <- apply_weights(data, "weight")

Clean missing values in specified column

Description

This function handles missing values using specified imputation method. Supports mean, median, and mode imputation for numeric variables.

Usage

clean_missing(data, col, method = c("mean", "median", "mode"))

Arguments

data

A data.frame containing survey data

col

Character string specifying column name to clean

method

Character string specifying imputation method ("mean", "median", or "mode")

Value

A data.frame with missing values imputed

Examples

data <- data.frame(age = c(25, NA, 30, NA, 35))
clean_data <- clean_missing(data, "age", method = "mean")

Generate cross-tabulation table with chi-square test

Description

This function creates a cross-tabulation between two categorical variables and performs a chi-square test of independence. Can incorporate survey weights.

Usage

cross_tabulation(data, col1, col2, weight_col = NULL)

Arguments

data

A data.frame containing survey data

col1

Character string specifying first categorical variable

col2

Character string specifying second categorical variable

weight_col

Character string specifying column name containing weights (optional)

Value

A list containing cross-tabulation and chi-square test results

Examples

data <- data.frame(gender = c("M", "F", "M", "F"), 
                   education = c("HS", "College", "HS", "College"))
cross_tab <- cross_tabulation(data, "gender", "education")

Generate comprehensive survey description

Description

This function provides a comprehensive description of survey data including sample size, variable types, missing value patterns, and basic statistics. Can incorporate survey weights if provided.

Usage

describe_survey(data, weight_col = NULL)

Arguments

data

A data.frame containing survey data

weight_col

Character string specifying column name containing weights (optional)

Value

A list containing descriptive statistics

Examples

data <- data.frame(
  age = c(25, 30, 35),
  gender = c("M", "F", "M"),
  weight = c(1.2, 0.8, 1.0)
)
desc <- describe_survey(data)
desc_weighted <- describe_survey(data, "weight")

Example Survey Dataset

Description

A small example dataset used to demonstrate SurveyStat functions.

Usage

example_survey

Format

A data frame with 10 rows and 5 variables:

Age

Numeric age of respondent

Gender

Gender of respondent (Male/Female)

Education

Education level (High School/Bachelor/Graduate)

Income

Numeric income value

Weight

Survey weight

Source

Simulated data for demonstration purposes


Generate frequency table for categorical variable

Description

This function creates a frequency table for a categorical variable, optionally incorporating survey weights.

Usage

frequency_table(data, col, weight_col = NULL)

Arguments

data

A data.frame containing survey data

col

Character string specifying column name for categorical variable

weight_col

Character string specifying column name containing weights (optional)

Value

A data.frame with frequency statistics

Examples

data <- data.frame(gender = c("M", "F", "M", "F"), weight = c(1, 1.2, 0.8, 1.1))
freq_table <- frequency_table(data, "gender")
weighted_freq <- frequency_table(data, "gender", "weight")

Declare global variables to suppress NOTES in CRAN checks

Description

This file declares variables that are used in non-standard evaluation contexts (dplyr pipelines, ggplot aesthetics) to avoid "no visible binding" notes during CRAN checks.


Create publication-quality box plot

Description

This function creates a clean, publication-quality box plot for numeric variables, optionally grouped by a categorical variable.

Usage

plot_boxplot(data, col, group_col = NULL, add_points = TRUE)

Arguments

data

A data.frame containing survey data

col

Character string specifying column name for numeric variable

group_col

Character string specifying column name for grouping variable (optional)

add_points

Logical whether to add individual data points (default: TRUE)

Value

A ggplot object

Examples

data <- data.frame(age = c(25, 30, 35, 40, 45), gender = c("M", "F", "M", "F", "M"))
box_plot <- plot_boxplot(data, "age")
grouped_box <- plot_boxplot(data, "age", "gender")

Create publication-quality histogram

Description

This function creates a clean, publication-quality histogram for numeric variables using ggplot2 with minimal theme and appropriate statistical overlays.

Usage

plot_histogram(data, col, bins = 30, add_density = TRUE)

Arguments

data

A data.frame containing survey data

col

Character string specifying column name for numeric variable

bins

Number of bins for histogram (default: 30)

add_density

Logical whether to add density curve (default: TRUE)

Value

A ggplot object

Examples

data <- data.frame(age = rnorm(100, 35, 10))
hist_plot <- plot_histogram(data, "age")
print(hist_plot)

Create weighted bar plot for categorical variables

Description

This function creates a bar plot for categorical variables, optionally using survey weights to show weighted frequencies.

Usage

plot_weighted_bar(data, col, weight_col = NULL, show_percentages = TRUE)

Arguments

data

A data.frame containing survey data

col

Character string specifying column name for categorical variable

weight_col

Character string specifying column name containing weights (optional)

show_percentages

Logical whether to show percentage labels (default: TRUE)

Value

A ggplot object

Examples

data <- data.frame(gender = c("M", "F", "M", "F"), weight = c(1, 1.2, 0.8, 1.1))
bar_plot <- plot_weighted_bar(data, "gender")
weighted_bar <- plot_weighted_bar(data, "gender", "weight")

Rake survey weights to match population targets

Description

This function implements simple raking (iterative proportional fitting) to adjust survey weights to match known population marginal totals. Assumes two-dimensional raking for simplicity.

Usage

rake_weights(data, population_targets, weight_col = "weight")

Arguments

data

A data.frame containing survey data

population_targets

Named list with population totals for each variable

weight_col

Character string specifying initial weight column name

Value

A data.frame with raked weights

Examples

# Assuming we have gender and education population totals
targets <- list(
  gender = c(Male = 1000000, Female = 1050000),
  education = c(HighSchool = 800000, Bachelor = 900000, Graduate = 350000)
)
data <- data.frame(
  gender = c("Male", "Female", "Male", "Female", "Male"), 
  education = c("HighSchool", "Bachelor", "Bachelor", "HighSchool", "Graduate"),
  weight = c(1, 1, 1, 1, 1)
)
raked_data <- rake_weights(data, targets, "weight")

Remove duplicate rows from survey data

Description

This function identifies and removes duplicate rows based on all columns. Preserves the first occurrence of each duplicate.

Usage

remove_duplicates(data)

Arguments

data

A data.frame containing survey data

Value

A data.frame with duplicates removed

Examples

data <- data.frame(id = c(1, 2, 2, 3), age = c(25, 30, 30, 35))
clean_data <- remove_duplicates(data)

Standardize categorical values

Description

This function standardizes categorical variables by mapping values to standardized categories. Useful for consolidating different representations of the same category.

Usage

standardize_categories(data, col, mapping)

Arguments

data

A data.frame containing survey data

col

Character string specifying column name to standardize

mapping

Named list or vector mapping old values to new values

Value

A data.frame with standardized categories

Examples

data <- data.frame(gender = c("M", "Male", "F", "Female", "m"))
mapping <- list("M" = "Male", "Male" = "Male", "F" = "Female", "Female" = "Female", "m" = "Male")
clean_data <- standardize_categories(data, "gender", mapping)

Calculate weighted mean

Description

This function calculates the weighted mean of a numeric variable. Uses standard weighted mean formula: sum(x * w) / sum(w)

Usage

weighted_mean(data, target_col, weight_col)

Arguments

data

A data.frame containing survey data

target_col

Character string specifying column name for target variable

weight_col

Character string specifying column name containing weights

Value

Numeric weighted mean

Examples

data <- data.frame(income = c(50000, 75000, 100000), weight = c(1.2, 0.8, 1.0))
weighted_income <- weighted_mean(data, "income", "weight")

Calculate weighted total

Description

This function calculates the weighted total of a numeric variable. Useful for estimating population totals from survey data.

Usage

weighted_total(data, target_col, weight_col)

Arguments

data

A data.frame containing survey data

target_col

Character string specifying column name for target variable

weight_col

Character string specifying column name containing weights

Value

Numeric weighted total

Examples

data <- data.frame(income = c(50000, 75000, 100000), weight = c(1000, 800, 1200))
total_income <- weighted_total(data, "income", "weight")

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.