The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Christopher Gandrud
Please report any bugs or suggestions at: https://github.com/christophergandrud/DataCombine/issues.
DataCombine is a set of miscellaneous tools intended to make combining data sets–especially time-series cross-section data–easier. The package is continually being developed as I turn lines of code that I frequently use into single functions. It currently includes the following functions:
CasesTable
function added to report cases after
listwise deletion of missing values for time-series cross-sectional
data.
change
: calculates the absolute, percentage, and
proportion change from a specified lag, including within
groups.
CountSpell
: function that returns a variable
counting the spell number for an observation. Works with grouped
data.
dMerge
: merges 2 data frames and report/drop/keeps
only duplicates.
DropNA
: drops rows from a data frame when they have
missing (NA
) values on a given variable(s).
FillDown
: fills in missing (NA
) values
with the previous non-missing value
FillIn
: fills in missing values of a variable from
one data frame with the values from another variable.
FindDups
: find duplicated values in a data frame and
subset it to either include or not include them.
FindReplace
: replaces multiple patterns found in a
character string column of a data frame.
grepl.sub
: subsets a data frame if a specified
pattern is found in a character string.
InsertRow
: allows user to insert a row into a data
frame. Largely implements: Ari B. Friedman’s
function.
MoveFront
: moves variables to the front of a data
frame. This can be useful if you have a data frame with many variables
and want to move a variable or variables to the front.
NaVar
: create new variable(s) indicating if there
are missing values in other variable(s).
shift
: creates lag and lead variables, including for
time-series cross-sectional data. The shifted variable is returned to a
new vector. This function is largely based on TszKin
Julian’s shift function.
slide
: creates lag and lead variables, including for
time-series cross-sectional data. The slid variable are added to the
original data frame. This expands the capabilities of
shift
.
slideMA
: creates a moving average for a period
before or after each time point for a given variable.
SpreadDummy
: spread a dummy variable (1’s and 0’)
over a specified time period and for specified groups.
StartEnd
: finds the starting and ending time points
of a spell, including for time-series cross-sectional data.
rmExcept
: removes all objects from a workspace
except those specified by the user.
TimeExpand
: expands a data set so that it includes
an observation for each time point in a sequence. Works with grouped
data.
TimeFill
: creates a continuous
Unit
-Time
-Dummy
data frame from a
data frame with Unit
-Start
-End
times.
VarDrop
: drops one or more variables from a data
frame.
I will continue to add to the package as I build data sets and run across other pesky tasks I do repeatedly that would be simpler if they were completed by a single function.
DataCombine is on CRAN.
You can also install the most recent stable version with
install_github
from the devtools:
devtools::install_github('christophergandrud/DataCombine')
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.