The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Title: Efficient Data Filtering and Aggregation Using Grep
Version: 0.1.0
Description: Provides an interface to the system-level 'grep' utility for efficiently reading, filtering, and aggregating data from multiple flat files. By pre-filtering data at the command line before it enters the R environment, the package reduces memory overhead and improves ingestion speed. Includes functions for counting records across large file systems and supports recursive directory searching.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.3
Suggests: ggplot2, knitr, rmarkdown
VignetteBuilder: knitr
Imports: data.table, methods
NeedsCompilation: no
Packaged: 2026-01-20 12:59:50 UTC; akshat
Author: David Shilane [aut], Atharv Raskar [aut], Akshat Maurya [aut, cre]
Maintainer: Akshat Maurya <codingmaster902@gmail.com>
Repository: CRAN
Date/Publication: 2026-01-23 21:10:02 UTC

Build grep command string

Description

Constructs a safe and properly formatted grep command string for system execution. This function handles input sanitization by utilizing R's internal shell quoting mechanism, ensuring compatibility across different operating systems.

Usage

build_grep_cmd(pattern, files, options = "", fixed = FALSE)

Arguments

pattern

Character vector of patterns to search for.

files

Character vector of file paths to search in.

options

Character string containing grep flags (e.g., "-i", "-v").

fixed

Logical; if TRUE, grep is told to treat patterns as fixed strings.

Value

A properly formatted command string ready for system execution.


grep_count: Efficiently count the number of relevant records from one or more files using grep

Description

grep_count: Efficiently count the number of relevant records from one or more files using grep

Usage

grep_count(
  files = NULL,
  path = NULL,
  file_pattern = NULL,
  pattern = "",
  invert = FALSE,
  ignore_case = FALSE,
  fixed = FALSE,
  recursive = FALSE,
  word_match = FALSE,
  only_matching = FALSE,
  skip = 0,
  header = TRUE,
  include_filename = FALSE,
  show_cmd = FALSE,
  show_progress = FALSE,
  ...
)

Arguments

files

Character vector of file paths to read.

path

Optional. Directory path to search for files.

file_pattern

Optional. A pattern to filter filenames when using the path argument. Passed to list.files.

pattern

Pattern to search for within files (passed to grep).

invert

Logical; if TRUE, return non-matching lines.

ignore_case

Logical; if TRUE, perform case-insensitive matching (default: TRUE).

fixed

Logical; if TRUE, pattern is a fixed string, not a regular expression.

recursive

Logical; if TRUE, search recursively through directories.

word_match

Logical; if TRUE, match only whole words.

only_matching

Logical; if TRUE, return only the matching part of the lines.

skip

Integer; number of rows to skip.

header

Logical; if TRUE, treat first row as header.

include_filename

Logical; if TRUE, include source filename as a column.

show_cmd

Logical; if TRUE, return the grep command string instead of executing it.

show_progress

Logical; if TRUE, show progress indicators.

...

Additional arguments passed to fread.

Value

A data.table containing file names and counts.


grep_read: Efficiently read and filter lines from one or more files using grep, returning a data.table.

Description

grep_read: Efficiently read and filter lines from one or more files using grep, returning a data.table.

Usage

grep_read(
  files = NULL,
  path = NULL,
  file_pattern = NULL,
  pattern = "",
  invert = FALSE,
  ignore_case = FALSE,
  fixed = FALSE,
  show_cmd = FALSE,
  recursive = FALSE,
  word_match = FALSE,
  show_line_numbers = FALSE,
  only_matching = FALSE,
  nrows = Inf,
  skip = 0,
  header = TRUE,
  col.names = NULL,
  include_filename = FALSE,
  show_progress = FALSE,
  ...
)

Arguments

files

Character vector of file paths to read.

path

Optional. Directory path to search for files.

file_pattern

Optional. A pattern to filter filenames when using the path argument. Passed to list.files.

pattern

Pattern to search for within files (passed to grep).

invert

Logical; if TRUE, return non-matching lines.

ignore_case

Logical; if TRUE, perform case-insensitive matching (default: TRUE).

fixed

Logical; if TRUE, pattern is a fixed string, not a regular expression.

show_cmd

Logical; if TRUE, return the grep command string instead of executing it.

recursive

Logical; if TRUE, search recursively through directories.

word_match

Logical; if TRUE, match only whole words.

show_line_numbers

Logical; if TRUE, include line numbers from source files. Headers are automatically removed and lines renumbered.

only_matching

Logical; if TRUE, return only the matching part of the lines.

nrows

Integer; maximum number of rows to read.

skip

Integer; number of rows to skip.

header

Logical; if TRUE, treat first row as header. Note that using FALSE means that the first row will be included as a row of data in the reading process.

col.names

Character vector of column names.

include_filename

Logical; if TRUE, include source filename as a column.

show_progress

Logical; if TRUE, show progress indicators.

...

Additional arguments passed to fread.

Value

A data.table with different structures based on the options:

Note

When searching for literal strings (not regex patterns), set fixed = TRUE to avoid regex interpretation. For example, searching for "3.94" with fixed = FALSE will match "3894" because "." is a regex metacharacter.

Header rows are automatically handled:


Detect Windows reliably

Description

Detect Windows reliably

Usage

is_windows()

Split columns based on a delimiter

Description

Efficiently splits character vectors into multiple columns based on a specified delimiter. This function is optimized for performance and handles common use cases like parsing grep output or other delimited text data.

Usage

split_columns(
  x,
  column.names = NA,
  split = ":",
  resulting.columns = 3,
  fixed = TRUE
)

Arguments

x

Character vector to split

column.names

Names for the resulting columns (optional)

split

Delimiter to split on (default: ":")

resulting.columns

Number of columns to create (default: 3)

fixed

Whether to use fixed string matching (default: TRUE)

Value

A data.table with split columns. Column names are automatically assigned as V1, V2, V3, etc. unless custom names are provided via column.names.

Examples

# Split grep-like output with colon delimiter
data <- c("file.txt:15:error message", "file.txt:23:warning message")
result <- split_columns(data, resulting.columns = 3)
print(result)

# With custom column names
result_named <- split_columns(data, 
                             column.names = c("filename", "line", "message"),
                             resulting.columns = 3)
print(result_named)

# Split into 2 columns (combining remaining elements)
result_2col <- split_columns(data, resulting.columns = 2)
print(result_2col)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.