In this example we consider an application to a CUSUM charts based on a logistic regression model.
Assume we have \(n\) past in-control data \((Y_{-n},X_{-n}),\ldots,(Y_{-1},X_{-1})\), where \(Y_i\) is a binary response variable and \(X_i\) is a corresponding vector of covariates.
Suppose that in control \(\mbox{logit}(\mbox{P}(Y_i=1|X_i))=X_i\beta\). A maximum likelihood estimate \(\hat\beta\) of the parameters is obtained by the glm function.
For detecting a change to \(\mbox{logit}(\mbox{P}(Y_i=1|X_i))=\Delta+X_i\beta\), a CUSUM chart based on the cumulative sum of likelihood ratios of the in-control versus out-of-control model can be defined by (\citep{Steiner2000Msp}) \[S_t=\max(0, S_{t-1}+R_t), \quad S_0=0, \] where \[ \exp(R_t)=\frac{\exp(\Delta+X_t\beta)^{Y_t}/(1+\exp(\Delta+X_t\beta))}{\exp(X_t\beta)^{Y_t}/(1+\exp(X_t\beta))} =\exp(Y_t\Delta)\frac{1+\exp(X_t\beta)}{1+\exp(\Delta+X_t\beta)}. \]
The following generates a data set of past observations (replace this with your observed past data) from the model \(\mbox{logit}(\mbox{P}(Y_i=1|X_i))=-1+x_1+x_2+x_3\) and distribution of the covariate values as specified below.
n <- 1000
Xlogreg <- data.frame(x1=rbinom(n,1,0.4), x2=runif(n,0,1), x3=rnorm(n))
xbeta <- -1+Xlogreg$x1+Xlogreg$x2+Xlogreg$x3
Xlogreg$y <- rbinom(n,1,exp(xbeta)/(1+exp(xbeta)))
Next, we initialise and compute the resulting estimate for running the chart - in this case \(\hat \beta\).
library(spcadjust)
chartlogreg <- new("SPCCUSUM",model=SPCModellogregLikRatio(Delta= 1, formula="y~x1+x2+x3"))
xihat <- xiofdata(chartlogreg,Xlogreg)
xihat
##
## Call: glm(formula = formula, family = binomial("logit"), data = P)
##
## Coefficients:
## (Intercept) x1 x2 x3
## -1.0768 1.0341 1.0357 0.9918
##
## Degrees of Freedom: 999 Total (i.e. Null); 996 Residual
## Null Deviance: 1385
## Residual Deviance: 1142 AIC: 1150
Next find the threshold that with roughly 90\% probability results in an average run length of at least 1000 in control. You should increase the number of bootstrap replications (the argument nrep) for real applications.
cal <- SPCproperty(data=Xlogreg,
nrep=100,chart=chartlogreg,
property="calARL",params=list(target=1000),quiet=TRUE)
cal
## 90 % CI: A threshold of 4.906 gives an in-control ARL of at least
## 1000.
## Unadjusted result: 4.206
## Based on 100 bootstrap repetitions.
Next, we run the chart with new observations (that are in-control).
n <- 100
newXlogreg <- data.frame(x1=rbinom(n,1,0.4), x2=runif(n,0,1), x3=rnorm(n))
newxbeta <- -1+newXlogreg$x1+newXlogreg$x2+newXlogreg$x3
newXlogreg$y <- rbinom(n,1,exp(newxbeta)/(1+exp(newxbeta)))
S <- runchart(chartlogreg, newdata=newXlogreg,xi=xihat)
In the next example, the chart is run with data that is out-of-control from time 51 and onwards.
n <- 100
newXlogreg <- data.frame(x1=rbinom(n,1,0.4), x2=runif(n,0,1), x3=rnorm(n))
outind <- c(rep(0,50),rep(1,50))
newxbeta <- -1+newXlogreg$x1+newXlogreg$x2+newXlogreg$x3+outind
newXlogreg$y <- rbinom(n,1,exp(newxbeta)/(1+exp(newxbeta)))
S <- runchart(chartlogreg, newdata=newXlogreg,xi=xihat)