The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

diffpriv

packageversion CRAN_Status_Badge Travis Build Status Coverage Status license minimal R version

Overview

The diffpriv package makes privacy-aware data science in R easy. diffpriv implements the formal framework of differential privacy: differentially-private mechanisms can safely release to untrusted third parties: statistics computed, models fit, or arbitrary structures derived on privacy-sensitive data. Due to the worst-case nature of the framework, mechanism development typically requires involved theoretical analysis. diffpriv offers a turn-key approach to differential privacy by automating this process with sensitivity sampling in place of theoretical sensitivity analysis.

Installation

Obtaining diffpriv is easy. From within R:

##  Install the development version of diffpriv from GitHub:
install.packages("devtools")
devtools::install_github("brubinstein/diffpriv")

Example

A typical example in differential privacy is privately releasing a simple target function of privacy-sensitive input data X. Say the mean of numeric data:

## a target function we'd like to run on private data X, releasing the result
target <- function(X) mean(X)

First load the diffpriv package (installed as above) and construct a chosen differentially-private mechanism for privatizing target.

## target seeks to release a numeric, so we'll use the Laplace mechanism---a
## standard generic mechanism for privatizing numeric responses
library(diffpriv)
mech <- DPMechLaplace(target = target)

To run mech on a dataset X we must first determine the sensitivity of target to small changes to input dataset. One avenue is to analytically bound sensitivity (on paper; see the vignette) and supply it via the sensitivity argument of mechanism construction: in this case not hard if we assume bounded data, but in general sensitivity can be very non-trivial to calculate manually. The other approach, which we follow in this example, is sensitivity sampling: repeated probing of target to estimate sensitivity automatically. We need only specify a distribution for generating random probe datasets; sensitivitySampler() takes care of the rest. The price we pay for this convenience is the weaker form of random differential privacy.

## set a dataset sampling distribution, then estimate target sensitivity with
## sufficient samples for subsequent mechanism responses to achieve random
## differential privacy with confidence 1-gamma
distr <- function(n) rnorm(n)
mech <- sensitivitySampler(mech, oracle = distr, n = 5, gamma = 0.1)
#> Sampling sensitivity with m=285 gamma=0.1 k=285
mech@sensitivity    ## DPMech and subclasses are S4: slots accessed via @
#> [1] 0.8089517

With a sensitivity-calibrated mechanism in hand, we can release private responses on a dataset X, displayed alongside the non-private response for comparison:

X <- c(0.328,-1.444,-0.511,0.154,-2.062) # length is sensitivitySampler() n
r <- releaseResponse(mech, privacyParams = DPParamsEps(epsilon = 1), X = X)
cat("Private response r$response:   ", r$response,
  "\nNon-private response target(X):", target(X))
#> Private response r$response:    -1.119506 
#> Non-private response target(X): -0.707

Getting Started

The above example demonstrates the main components of diffpriv:

Read the package vignette for more, or news for the latest release notes.

Citing the Package

diffpriv is an open-source package offered with a permissive MIT License. Please acknowledge use of diffpriv by citing the paper on the sensitivity sampler:

Benjamin I. P. Rubinstein and Francesco Aldà. “Pain-Free Random Differential Privacy with Sensitivity Sampling”, to appear in the 34th International Conference on Machine Learning (ICML’2017), 2017.

Other relevant references to cite depending on usage:

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.