Basic backgroundUnit-level small area estimation models are popular models in poverty mapping which is one of the most common applications of small area estimation. In the following, basic ideas are described. For a deeper methodological introduction to the model, the reader is referred to standard literature. For this model type, unit-level data is assumed, i.e., the variable of interest, e.g, household income, is contained in the survey data and auxiliary variables with predictive power are available in the unit-level survey data and domain means of the same variables are available in the additional data source.
The basic unit-level model, also known as nested error linear regression model (Battese et al. 1988), has a nested structure as follows: MathJax Block Equation |
---|
y_{ij} = x_{ij}^{\top} \beta + u_{i} + e_{ij}, i = 1, ...,D, j = 1,...,n_{i}, |
Where (mathjax-inline(y_{ij})mathjax-inline) is the variable of interest for the (mathjax-inline(j^{th})mathjax-inline) unit (individual/household) in the (mathjax-inline(i^{th})mathjax-inline) domain/area. The model contains independent identically distributed domain-specific random effects and unit-level error terms , that are normal with MathJax Inline Equation |
---|
equation | u_{i} \sim N(0, \sigma_{u}^{2}) |
---|
| and MathJax Inline Equation |
---|
equation | e_{ij} \sim N(0, \sigma_{e}^{2}) |
---|
| , respectively, auxiliary information and fixed-effects parameters . Common estimation approaches are empirical best linear unbiased prediction (EBLUP), empirical and hierarchical Bayesian methods.The empirical best linear unbiased predictor can also be expressed as a weighted average of the survey regression estimator and the regression-synthetic part MathJax Block Equation |
---|
\hat{\theta}_i^{EBLUP} = \hat{\gamma}_{i} \left[ \bar{y}_{i} + (\bar{X}_{i}^{\top} \hat{\beta} - \bar{x}_{i}^{\top} \hat{\beta})\right] + (1 - \hat{\gamma}_{i}) \bar{X}_{i}^{\top} \hat{\beta} |
where is the sample mean of the variable of interest for domain (mathjax-inline(i)mathjax-inline), MathJax Inline Equation |
---|
equation | \bar{X}_{i}^{\top} |
---|
| and MathJax Inline Equation |
---|
equation | \bar{x}_{i}^{\top} |
---|
| are the means of the auxiliary information from the additional data source and the survey, respectively, and , MathJax Inline Equation |
---|
equation | \hat{\sigma}_{u}^{2} |
---|
| , MathJax Inline Equation |
---|
equation | \hat{\sigma}_{e}^{2} |
---|
| are the estimated parameters. The weight MathJax Inline Equation |
---|
equation | \hat{\gamma}_{i} = \frac{\hat{\sigma}_{u}}{\hat{\sigma}_{u} + \frac{\hat{\sigma}_{e}}{n_{i}}} |
---|
| measures the amount of unexplained between-area variability to the total variability. With increasing sample size , the weight on the survey regression estimator increases.In order to account for non-negligible sampling fractions, MathJax Inline Equation |
---|
equation | f_{i} = \frac{n_{i}}{N_{i}} |
---|
| , the EBLUP can be expressed as follows: MathJax Block Equation |
---|
\hat{\theta}_{i}^{EBLUP} = f_{i} \bar{y}_{i} + \left(\bar{X}_{i} - f_{i}\bar{x}_{i}\right)^{\top} \hat{\beta} + (1 - f_{i}) \left[\hat{\gamma}_{i} \left(\bar{y}_{i} - \bar{x}_{i}^{\top} \hat{\beta}\right) \right]. |
Extension for non-linear indicators While the model above only supports the estimation of means and totals, extensions based on the nested error linear regression model allow the estimation of non-linear indicators like proportions. The two most popular approaches are the World Bank or ELL method (Elbers et al. 2003) and the empirical best predictor (Molina and Rao 2010). Both are common approaches in poverty mapping. Roughly speaking, the survey data is used to fit a model relating the variable of interest and auxiliary information at the unit level which results into estimates , MathJax Inline Equation |
---|
equation | \hat{\sigma}_{u}^{2} |
---|
| , MathJax Inline Equation |
---|
equation | \hat{\sigma}_{e}^{2} |
---|
| of the model parameters. The model relation, the estimated parameters and the auxiliary information from the additional data source at unit level are used to produce predictions of the variable of interest for every unit in every domain. The predicted values are used to estimate the indicator of interest for the required domain level.While the basic unit-level model only requires means of the auxiliary information, the ELL and the EBP require auxiliary information for all units in all domains. Consequently, the higher flexibility in possible indicators comes along with stronger data requirements.
SAE METHOD | PROS | CONS |
---|
ELL | - Any indicator that is a function of the dependent variable can be estimated, also simultaneously
- May be more accurate when the number of domains is large and there are non-sampled domains.
| - It is a model-based approach, i.e., model diagnostics need to be conducted.
- It assumes homogeneity in the small domains and can have a high MSE under the model, in case that the unexplained heterogeneity between domains is significant.
- Results can be affected by individual outliers.
| EBP | - Any indicator that is a function of the dependent variable can be estimated, also simultaneously.
- Better performance compared to ELL regarding the MSE under the model, in case the unexplained heterogeneity between domains is significant.
| - It is a model-based approach, i.e., model diagnostics need to be conducted.
- The sampling design is not considered.
- Results can be affected by individual outliers or lack of normality.
- It assumes homogeneity in clusters.
|
Das and Haslett (2019) provide a detailed comparison of the ELL and the EBP approaches for poverty estimation in developing countries. |