The library convey aims at estimating measures of poverty and income concentration. There are already at least two libraries covering this subject: vardpoor and Laeken. The main difference between the library convey and these two is that the convey strongly hinges on the survey library.
Some measures of poverty and income concentration are defined by non-differentiable functions so that it is not possible to use Taylor linearization to estimate their variances. An alternative is to use Influence functions as described in Deville (1999) and Osier (2009). The library convey implements this methodology to work with survey.design
objects and also with svyrep.design
objects.
Some examples of these measures are:
At-risk-of-poverty threshold: \[ arpt=.60q_{.50} \] where \(q_{.50}\) is the income median;
At-risk-of-poverty rate \[ arpr=\frac{\sum_U 1(y_i \leq arpt)}{N}.100 \]
Quintile share ratio
\[ qsr=\frac{\sum_U 1(y_i>q_{.80})}{\sum_U 1(y_i\leq q_{.20})} \]
Note that it is not possible to use Taylor linearization for these measures because they depend on quantiles and the Gini is defined as a function of ranks. This could be done using the approach proposed by Deville (1999) based upon influence functions.
Let \(U\) be a population of size \(N\) and \(M\) be a measure that allocates mass one to the set composed by one unit, that is \(M(i)=M_i= 1\) if \(i\in U\) and \(M(i)=0\) if \(i\notin U\)
Now, a population parameter \(\theta\) can be expressed as a functional of \(M\) \[ \theta=T(M) \]
Examples of such parameters are:
Total: \[Y=\sum_Uy_i=\sum_U y_iM_i=\int ydM=T(M)\]
Ratio of two totals: \[R=\frac{Y}{X}=\frac{\int y dM}{\int x dM}=T(M)\]
Cumulative distribution function: \[F(x)=\frac{\sum_U 1(y_i\leq x)}{N}=\frac{\int 1(y\leq x)dM}{\int{dM}}=T(M)\]
To estimate these parameters from the sample, we replace the measure \(M\) by the estimated measure \(\hat{M}\) defined by: \(\hat{M}(i)=\hat{M}_i= w_i\) if \(i\in s\) and \(\hat{M}(i)=0\) if \(i\notin s\).
The estimators of the population parameters can then be expressed as functional of the measure \(\hat{M}\).
Total: \[\hat{Y}=T(\hat{M})=\int yd\hat{M}=\sum_s w_iy_i\]
Ratio of totals: \[\hat{R}=T(\hat{M})=\frac{\int y d\hat{M}}{\int x d\hat{M}}=\frac{\sum_s w_iy_i}{\sum_s w_ix_i}\]
Cumulative distribution function: \[\hat{F}(x)=T(\hat{M})=\frac{\int 1(y\leq x)d\hat{M}}{\int{d\hat{M}}}=\frac{\sum_s w_i 1(y_i\leq x)}{\sum_s w_i}\]
The variance of the estimator \(T(\hat{M})\) can approximated by:
\[ Var\left[T(\hat{M})\right]\cong var\left[\sum_s w_i z_i\right] \]
The linearized
variable \(z\) is given by the derivative of the functional:
\[ z_k=lim_{t\rightarrow0}\frac{T(M+t\delta_k)-T(M)}{t}=IT_k(M) \] where, \(\delta_k\) is the Dirac measure in \(k\): \(\delta_k(i)=1\) if and only if \(i=k\).
This derivative is called Influence Function and was introduced in the area of Robust Statistics.
Total: \[ \begin{align} IT_k(M)&=lim_{t\rightarrow 0}\frac{T(M+t\delta_k)-T(M)}{t}\\ &=lim_{t\rightarrow 0}\frac{\int y.d(M+t\delta_k)-\int y.dM}{t}\\ &=lim_{t\rightarrow 0}\frac{\int yd(t\delta_k)}{t}=y_k \end{align} \]
Ratio of two totals: \[ \begin{align} IR_k(M)&=I\left(\frac{U}{V}\right)_k(M)=\frac{V(M)\times IU_k(M)-U(M)\times IV_k(M)}{V(M)^2}\\ &=\frac{X y_k-Y x_k}{X^2}=\frac{1}{X}(y_k-Rx_k) \end{align} \]
\[ z_k= -\frac{0.6}{f(m)}\times\frac{1}{N}\times\left[I(y_k\leq m-0.5) \right] \]
\[ arpr=\frac{\sum_U I(y_i \leq t)}{\sum_U w_i}.100 \] \[ z_k=\frac{1}{N}\left[I(y_k\leq t)-t\right]-\frac{0.6}{N}\times\frac{f(t)}{f(m)}\left[I(y_k\leq m)-0.5\right] \]
where:
\(N\) - population size;
\(t\) - at-risk-of-poverty threshold;
\(y_k\) - income of person \(k\);
\(m\) - median income;
\(f\) - income density function;
In the library convey, there are some basic functions that produces the linearized variables of some estimates that often enter in the definition of measures of concentration and poverty. For example the quantile
which is linearized by the function svyiqalpha
. Other example is the function svyisq
that linearizes the total below a quantile of the variable.
From the linearized variables of these basic estimates it is possible by using rules of composition, valid for influence functions, to derive the influence function of more complex estimates. By definition the influence function is a Gateaux derivative and the rules rules of composition valid for Gateaux derivatives also hold for Influence Functions.
The following property of Gateaux derivatives was often used in the library convey. Let \(g\) be a differentible function of \(m\) variables. Suppose we want to compute the influence function of the estimator \(g(T_1, T_2,\ldots, T_m)\), knowing the Influence function of the estimators \(T_i, i=1,\ldots, m\). Then the following holds:
\[ I(g(T_1, T_2,\ldots, T_m)) = \sum_{i=1}^m \frac{\partial g}{\partial T_i}I(T_i) \]
In the library convey this rule is implemented by the function contrastinf
which uses the R function deriv
to compute the formal partial derivatives \(\frac{\partial g}{\partial T_i}\).
For example, suppose we want to linearize the Relative median poverty gap
(rmpg), defined as the difference between the at-risk-of-poverty threshold (arpt
) and the median of incomes less than the arpt
relative to the arprt
:
\[ rmpg= \frac{arpt-medpoor} {arpt} \]
where medpoor
is the median of incomes less than arpt
.
Suppose we know how to linearize arpt
and medpoor
, then by applying the function contrastinf
with \[
g(T_1,T_2)= \frac{(T_1 - T_2)}{T_1}
\] we linearize the rmpg
.
In the following examples we will use the data set eusilc
contained in the libraries vardpoor
and Laeken
.
library(vardpoor)
data(eusilc)
Next, we create an object of class survey.design
using the function svydesign
of the library survey:
library(survey)
des_eusilc <- svydesign(ids = ~rb030, strata =~db040, weights = ~rb050, data = eusilc)
Right after the creation of the design object des_eusilc
, we should use the function convey_prep
that adds an attribute to the survey design which saves information on the design object based upon the whole sample, needed to work with subset designs.
library(convey)
des_eusilc <- convey_prep( des_eusilc )
## preparing your full survey design to work with R convey package functions
##
note that this function must be run on the full survey design object immediately after the svydesign() or svrepdesign() call.
##
To estimate the at-risk-of-poverty rate
we use the function svyarpt
:
svyarpr(~eqIncome, design=des_eusilc)
arpr SE
eqIncome 0.14444 0.0028
To estimate the at-risk-of-poverty rate
for domains defined by the variable db040
we use
svyby(~eqIncome, by = ~db040, design = des_eusilc, FUN = svyarpr, deff = FALSE)
db040 eqIncome se.eqIncome
Burgenland Burgenland 0.1953984 0.017202243
Carinthia Carinthia 0.1308627 0.010610622
Lower Austria Lower Austria 0.1384362 0.006517660
Salzburg Salzburg 0.1378734 0.011579280
Styria Styria 0.1437464 0.007452360
Tyrol Tyrol 0.1530819 0.009880430
Upper Austria Upper Austria 0.1088977 0.005928336
Vienna Vienna 0.1723468 0.007682826
Vorarlberg Vorarlberg 0.1653731 0.013754670
Using the same data set, we estimate the quintile share ratio
:
# for the whole population
svyqsr(~eqIncome, design=des_eusilc, alpha= .20)
qsr SE
eqIncome 3.97 0.0426
# for domains
svyby(~eqIncome, by = ~db040, design = des_eusilc,
FUN = svyqsr, alpha= .20, deff = FALSE)
db040 eqIncome se.eqIncome
Burgenland Burgenland 5.008486 0.32755685
Carinthia Carinthia 3.562404 0.10909726
Lower Austria Lower Austria 3.824539 0.08783599
Salzburg Salzburg 3.768393 0.17015086
Styria Styria 3.464305 0.09364800
Tyrol Tyrol 3.586046 0.13629739
Upper Austria Upper Austria 3.668289 0.09310624
Vienna Vienna 4.654743 0.13135731
Vorarlberg Vorarlberg 4.366511 0.20532075
These functions can be used as S3 methods for the classes survey.design
and svyrep.design
.
Let’s create a design object of class svyrep.design
and run the function convey_prep
on it:
des_eusilc_rep <- as.svrepdesign(des_eusilc, type = "bootstrap")
des_eusilc_rep <- convey_prep(des_eusilc_rep)
## preparing your full survey design to work with R convey package functions
##
note that this function must be run on the full survey design object immediately after the svydesign() or svrepdesign() call.
##
and then use the function svyarpr
:
svyarpr(~eqIncome, design=des_eusilc_rep)
arpr SE
eqIncome 0.14444 0.0026
svyby(~eqIncome, by = ~db040, design = des_eusilc_rep, FUN = svyarpr, deff = FALSE)
db040 eqIncome se.eqIncome
Burgenland Burgenland 0.1953984 0.015948955
Carinthia Carinthia 0.1308627 0.009369766
Lower Austria Lower Austria 0.1384362 0.006378286
Salzburg Salzburg 0.1378734 0.012678287
Styria Styria 0.1437464 0.007245318
Tyrol Tyrol 0.1530819 0.010223210
Upper Austria Upper Austria 0.1088977 0.005749901
Vienna Vienna 0.1723468 0.008765321
Vorarlberg Vorarlberg 0.1653731 0.014346126
The functions of the library convey are called in a similar way to the functions in library survey.
It is also possible to deal with missing values by using the argument na.rm
.
# survey.design using a variable with missings
svygini( ~ py010n , design = des_eusilc )
gini SE
py010n NA NA
svygini( ~ py010n , design = des_eusilc , na.rm = TRUE )
gini SE
py010n 0.64606 0.0036
# svyrep.design using a variable with missings
# svygini( ~ py010n , design = des_eusilc_rep ) get error
svygini( ~ py010n , design = des_eusilc_rep , na.rm = TRUE )
gini SE
py010n 0.64606 0.0041
Foster and all(1984) proposed a family of indicators to measure poverty.
The class of \(FGT\) measures, can be defined as
\[ p=\frac{1}{N}\sum_{k\in U}h(y_{k},\theta ), \]
where
\[ h(y_{k},\theta )=\left[ \frac{(\theta -y_{k})}{\theta }\right] ^{\gamma }\delta \left\{ y_{k}\leq \theta \right\} , \]
where: \(\theta\) is the poverty threshold; \(\delta\) the indicator function that assigns value 1 if the condition \(\{y_{k}\leq \theta \}\) is satisfied and 0 otherwise, and \(\gamma\) is a non-negative constant.
When \(\gamma =0\), \(p\) can be interpreted as the ratio of poor people, and for \(\gamma \geq 1\), the weight of poor people increases with the value \(\gamma\), (Foster and all, 1984).
The poverty measure FGT is implemented in the library convey by the function svyfgt
. The argument thresh_type
of this function defines the type of poverty threshold adopted. There are three possible choices:
abs
– fixed and given by the argument thresh_valuerelq
– a proportion of a quantile fixed by the argument proportion
and the quantile is defined by the argument order
.relm
– a proportion of the mean fixed the argument proportion
The quantile and the mean involved in the definition of the threshold are estimated for the whole population. When \(\gamma=0\) and \(\theta= .6*MED\) the measure is equal to the indicator arpr
computed by the function svyarpr
.
Next, we give some examples of the function svyfgt
to estimate the values of the FGT poverty index.
Consider first the poverty threshold fixed (\(\gamma=0\)) in the value \(10000\). The headcount ratio (FGT0) is
svyfgt(~eqIncome, des_eusilc, g=0, abs_thresh=10000)
fgt0 SE
eqIncome 0.11444 0.0027
The poverty gap (FGT1) (\(\gamma=1\)) index for the poverty threshold fixed at the same value is
svyfgt(~eqIncome, des_eusilc, g=1, abs_thresh=10000)
fgt1 SE
eqIncome 0.032085 0.0011
To estimate the FGT0 with the poverty threshold fixed at \(0.6* MED\) we fix the argument type_thresh=“relq” and use the default values for percent
and order
:
svyfgt(~eqIncome, des_eusilc, g=0, type_thresh= "relq")
fgt0 SE
eqIncome 0.14444 0.0028
that matches the estimate obtained by
svyarpr(~eqIncome, design=des_eusilc, .5, .6)
arpr SE
eqIncome 0.14444 0.0028
To estimate the poverty gap(FGT1) with the poverty threshold equal to \(0.6*MEAN\) we use:
svyfgt(~eqIncome, des_eusilc, g=1, type_thresh= "relm")
fgt1 SE
eqIncome 0.051187 0.0012
Berger, Y.G. e C.J. Skinner (to be published) - Variance Estimation for a Low-Income Proportion.
Foster, K., J. Greer e E. Thornbecke (1984) - A Class of Decomposable Poverty Measure. Econometrica, 52, 761-766.
Guillaume Osier (2009). Variance estimation for complex indicators of poverty and inequality. , Vol.3, No.3, pp. 167-195, ISSN 1864-3361, URL .
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL .