next up previous
Next: Sliced average variance estimation Up: Methods available Previous: Methods available


Sliced inverse regression

Sliced inverse regression, or sir, was proposed by Li (1991); see Cook (1998, Chapter 11). In sir, we make use of the fact that given certain assumptions on the marginal distribution of $X$1, the inverse regression problem $F(z\vert y) \subseteq
{\mathcal S}(B)$. The general computational outline for sir is as follows:
  1. Examine $F(z\vert y)$ by dividing the range of $Y$ into $h$ slices, each with approximately the same number of observations. With a multivariate response ($Y$ has $k$ columns), divide the range of $Y_1 \times Y_2 \ldots \times Y_k$ into $h$ cells. For example, when $k = 3$, and we slice $Y_1$ into 3 slices, $Y_2$ into 2 slices, and $Y_3$ into 4 slices, we will have $h=2\times3\times4=24$ cells. The number of slices or cells $h$ is a tuning parameter of the procedure.
  2. Assume that within each slice or cell $F(z\vert y)$ is approximately constant. Then the expected value of the within-slice vector of sample means will be a vector in ${\mathcal S}(B)$.
  3. Form the $h \times p$ matrix whose $i$-th row is the vector of weighted sample means in the $i$-th slice. The matrix $\hat{M}$ is the $p\times p$ sample covariance matrix of these sample mean vectors.
sir thus concentrates on the mean function $\mbox{E}(Z\vert Y)$, and ignores any other dependence. The output given in the last section is an example of typical output for sir. First is given the eigenvectors and eigenvalues of $\hat{M}$; the eigenvectors have been back-transformed to the original $X$-scale. Assuming that the dimension is $d$, the estimate of ${\mathcal S}(B)$ is given by the first $d$ eigenvectors. Also given along with the eigenvectors is the square of the correlation between the ols fitted values and the first $d$ principal directions. The first direction selected by sir is almost always about the same as the first direction selected by ols, as is the case in the example above. For sir, Li (1991) provided asymptotic tests of dimension based on partial sums of eigenvalues, and these tests are given in the summary. The tests have asymptotic Chi-square distributions, with the number of degrees of freedom shown in the output. Examining the tests shown in the final output, we see that the test of $d=0$ versus $d>0$ has a very small $p$-value, so we would reject $d=0$. The test for $d=1$ versus $d>1$ has $p$-value near $0.001$, suggesting that $d$ is at least 2. The test for $d=2$ versus $d>2$ has $p$-value of about $0.31$, so we suspect that $d=2$ for this problem. This suggests that further analysis of this regression problem can be done based on the 3D graph of the response versus the linear combinations of the predictors determined by the first two eigenvectors, and the dimension of the problem can be reduced from 4 to 2 without loss of information. See Cook (1998), and Cook and Weisberg (1994, 1999), for further examples and interpretation. When the response is multivariate, the format of the call is:
m1 <- dr(cbind(LBM,RCC)~Ht+Wt+WCC))
The summary for a multivariate response is similar:
> summary(m1)

Call:
dr(formula = cbind(LBM, RCC) ~ Ht + Wt + WCC)

Terms:
cbind(LBM, RCC) ~ Ht + Wt + WCC

Method:
sir with 9 slices, n = 202, using weights.

Slice Sizes:
24 23 23 23 22 21 22 22 22

Eigenvectors:
      Dir1    Dir2    Dir3
Ht  0.4857  0.3879  0.1946
Wt  0.8171 -0.2238 -0.1449
WCC 0.3105 -0.8941  0.9701

              Dir1    Dir2    Dir3
Eigenvalues 0.7076 0.05105 0.02168
R^2(LBM|dr) 0.9911 0.99124 1.00000
R^2(RCC|dr) 0.9670 0.97957 1.00000

Asymp. Chi-square tests for dimension:
              Stat df p-value
0D vs >= 1D 157.63 24  0.0000
1D vs >= 2D  14.69 14  0.3995
2D vs >= 3D   4.38  6  0.6254
The test statistics are the same as in the univariate response case, as is the interpretation of the eigenvalues and vectors. The output gives the squared correlation of each of the responses with the eigenvectors.
next up previous
Next: Sliced average variance estimation Up: Methods available Previous: Methods available
Sandy Weisberg 2002-01-10