next up previous
Next: Usage Up: Dimension Reduction Regression in Previous: Dimension Reduction Regression in

Introduction

In the general regression problem, we have a response $y$ of dimension $k$ (usually, $k=1$) and $p$-dimensional predictor $x$, and the goal is to learn about how the conditional distributions $F(y\vert x)$ as $x$ varies through its sample space. In parametric regression, we specify a functional form for the conditional distributions that is known up to a few parameters. In nonparametric regression, no assumptions are made about $F$, but progress is really only possible if the dimensions $p$ and $k$ are small. Dimension reduction regression is one intermediate possibility between the parametric and nonparametric extremes. In this setup, we assume without loss of information that the conditional distributions can be indexed by $d$ linear combinations, or for some probably unknown $p\times d$ matrix $B$

\begin{displaymath}
F(y\vert x) = F(y\vert B'x)
\end{displaymath}

This representation always holds trivially, by setting $B=I$, the $p\times p$ identity matrix, and so the usual goal is to find the $B$ of lowest possible dimension for which this representation holds. If (1) holds for a particular $B$, then it also holds for $B^*=BA$, where $A$ is any full rank matrix, and hence the unique part of the regression summary is the subspace that is spanned by $B$, which we denote ${\mathcal S}(B)$. Cook (1998) provides a more complete introduction to these ideas, including discussion of when this subspace, which we call the central subspace, exists, and when it is unique. In this paper, we discuss software for estimating the subspace ${\mathcal S}(B)$ spanned by $B$, and tests concerning the dimension $d$ based on dimension reduction methods. This software was written using , but can also be used with Splus. Most, but not all, of the methods available here are also included in the Xlisp-Stat program Arc, Cook and Weisberg (1999). The platform allows use of dimension reduction methods with existing statistical methods that are not readily available in Xlisp-Stat, and hence in Arc. For example, the code is more suitable for Monte Carlo experimentation than Arc. In addition, includes a much wider array of options for smoothing, including multidimensional smoothers. On the other hand, Arc takes full advantage of the dynamic graphical capabilities of Xlisp-Stat, and at least for now the graphical summaries of dimension reduction regression are clearly superior in Arc. Thus, there appears to be good reason to have these methods available using both platforms. Cook (1998) provides the most complete introduction to this area. See also Cook and Weisberg (1994) for a more gentle introduction to dimension reduction. In this paper we give only the barest outline of dimension reduction methodology, concentrating on the software. Suppose we have data $(x_i,y_i)$, for $i=1,\ldots,n$ that are independent and collected into a matrix $X$ and a vector $Y$ if $k=1$ and a matrix $Y$ if $k>1$. In addition, suppose we have nonnegative weights $w_1,\ldots,w_n$ whose sum is $n$; if unspecified, we take all the $w_i=1$. Generally following Yin (2000), a procedure for estimating ${\mathcal S}(B)$ and for obtaining tests concerning $d$ is:
  1. Scale and center $X$ as

    \begin{displaymath}
Z =
W^{1/2}(X - 1\bar{x}') \hat{\Sigma}^{-1/2}
\end{displaymath}

    where $\bar{x} = \sum w_ix_i/\sum w_i$ is the vector of weighted column means, $W = \mathrm{diag}(w_i)$, and

    \begin{displaymath}
\hat{\Sigma} = \frac{1}{n -1} (X - 1\bar{x}')'W(X - 1\bar{x}')
\end{displaymath}

    $\hat{\Sigma}^{-1/2}$ is any square root of the inverse of the sample covariance matrix for $X$ (for example, using a singular value decomposition) and $\bar{x}$ is a vector of weighted sample means. In this scaling, the rows of $Z$ have zero mean and identity sample covariance matrix.
  2. Use the scaled and centered data $Z$ to find a dimension $p\times p$ symmetric matrix $\hat{M}$ that is a consistent estimate of a population matrix $M$ with the property that ${\mathcal S}(M)
\subseteq {\mathcal S}(B)$. For most procedures, all we can guarantee is that $M$ tells us about a part, but not necessarily all, of ${\mathcal S}(B)$. Each of the methods (for example, sir, save, and phd) have a different method for selecting $\hat{M}$.
  3. Let $\vert\hat{\lambda}_1\vert\geq \ldots \geq \vert\hat{\lambda}_p\vert$ be the ordered absolute eigenvalues of $\hat{M}$, and $\hat{u}_1,\ldots,\hat{u}_p$ the corresponding eigenvectors of $\hat{M}$. In some applications (like phd) the eigenvalues may be negative.
  4. A Test that the dimension $d=d_0$ against the alternative that $d >
d_0$ is based on a partial sum of eigenvalues of the form:

    \begin{displaymath}
\Lambda_{d_0} = n \hat{c} \sum_{j=d_0+1}^p \vert\hat{\lambda}_j\vert^{\nu}
\end{displaymath}

    where $\hat{c}$ is a method-specific term, and $\nu$ is generally equal to 1, but it is equal to 2 for phd. The distribution of these partial sums depends on assumptions and on the method of obtaining $\hat{M}$.
  5. Given $d$, the estimate of ${\mathcal S}(B)$ is the span of the first $d$ eigenvectors. When viewed as a subspace of $\Re^n$, the basis for this estimated subspace is $Z\hat{u}_1,\ldots,Z\hat{u}_d$. These directions can then be back-transformed to the $X$-scale. Given the estimate of ${\mathcal S}(B)$, graphical methods can be used to recover information about $F$, or about particular aspects of the conditional distributions, such as the conditional mean function.

next up previous
Next: Usage Up: Dimension Reduction Regression in Previous: Dimension Reduction Regression in
Sandy Weisberg 2002-01-10