The section provides an overview of the basic unit-level model and some of the extensions. 

Basic background

Unit-level small area estimation models are popular models in poverty mapping which is one of the most common applications of small area estimation. In the following, basic ideas are described. For a deeper methodological introduction to the model, the reader is referred to standard literature.

For this model type, unit-level data is assumed, i.e., the variable of interest, e.g, household income, is contained in the survey data and auxiliary variables with predictive power are available in the unit-level survey data and domain means of the same variables are available in the additional data source.

The basic unit-level model, also known as nested error linear regression model (Battese et al. 1988), has a nested structure as follows:

(mathjax-block(y_{ij} = x_{ij}^{\top} \beta + u_{i} + e_{ij}, i = 1, ...,D, j = 1,...,n_{i}, )mathjax-block)

Where (mathjax-inline(y_{ij})mathjax-inline) is the variable of interest for the  (mathjax-inline(j^{th})mathjax-inline) unit (individual/household) in the (mathjax-inline(i^{th})mathjax-inline) domain/area. The model contains independent identically distributed domain-specific random effects  (mathjax-inline(u_{i})mathjax-inline) and unit-level error terms (mathjax-inline(e_{ij})mathjax-inline) , that are normal with (mathjax-inline(u_{i} \sim N(0, \sigma_{u}^{2}) )mathjax-inline) and (mathjax-inline(e_{ij} \sim N(0, \sigma_{e}^{2}))mathjax-inline) , respectively, auxiliary information (mathjax-inline(X)mathjax-inline) and fixed-effects parameters (mathjax-inline(\beta)mathjax-inline) . Common estimation approaches are empirical best linear unbiased prediction (EBLUP), empirical and hierarchical Bayesian methods.

The empirical best linear unbiased predictor can also be expressed as a weighted average of the survey regression estimator and the regression-synthetic part

(mathjax-block(\hat{\theta}_i^{EBLUP} = \hat{\gamma}_{i} \left[ \bar{y}_{i} + (\bar{X}_{i}^{\top} \hat{\beta} - \bar{x}_{i}^{\top} \hat{\beta})\right] + (1 - \hat{\gamma}_{i}) \bar{X}_{i}^{\top} \hat{\beta} )mathjax-block)

where (mathjax-inline(\bar{y}_{i})mathjax-inline) is the sample mean of the variable of interest for domain (mathjax-inline(i)mathjax-inline), (mathjax-inline(\bar{X}_{i}^{\top})mathjax-inline) and  (mathjax-inline(\bar{x}_{i}^{\top})mathjax-inline) are the means of the auxiliary information from the additional data source and the survey, respectively, and  (mathjax-inline(\hat{\beta})mathjax-inline) ,  (mathjax-inline(\hat{\sigma}_{u}^{2})mathjax-inline) ,  (mathjax-inline(\hat{\sigma}_{e}^{2})mathjax-inline) are the estimated parameters. The weight (mathjax-inline(\hat{\gamma}_{i} = \frac{\hat{\sigma}_{u}}{\hat{\sigma}_{u} + \frac{\hat{\sigma}_{e}}{n_{i}}})mathjax-inline) measures the amount of unexplained between-area variability to the total variability. With increasing sample size (mathjax-inline(n_{i})mathjax-inline) , the weight on the survey regression estimator increases.

In order to account for non-negligible sampling fractions, (mathjax-inline(f_{i} = \frac{n_{i}}{N_{i}})mathjax-inline) , the EBLUP can be expressed as follows:

(mathjax-block(\hat{\theta}_{i}^{EBLUP} = f_{i} \bar{y}_{i} + \left(\bar{X}_{i} - f_{i}\bar{x}_{i}\right)^{\top} \hat{\beta} + (1 - f_{i}) \left[\hat{\gamma}_{i} \left(\bar{y}_{i} - \bar{x}_{i}^{\top} \hat{\beta}\right) \right].)mathjax-block)

Extension for non-linear indicators

While the model above only supports the estimation of means and totals, extensions based on the nested error linear regression model allow the estimation of non-linear indicators like proportions.  The two most popular approaches are the World Bank or ELL method (Elbers et al. 2003) and the empirical best predictor (Molina and Rao 2010). Both are common approaches in poverty mapping. Roughly speaking, the survey data is used to fit a model relating the variable of interest and auxiliary information at the unit level which results into estimates  (mathjax-inline(\hat{\beta})mathjax-inline) , (mathjax-inline(\hat{\sigma}_{u}^{2})mathjax-inline) , (mathjax-inline(\hat{\sigma}_{e}^{2})mathjax-inline) of the model parameters. The model relation, the estimated parameters and the auxiliary information from the additional data source at unit level are used to produce predictions of the variable of interest for every unit in every domain. The predicted values are used to estimate the indicator of interest for the required domain level.

While the basic unit-level model only requires means of the auxiliary information, the ELL and the EBP require auxiliary information for all units in all domains. Consequently, the higher flexibility in possible indicators comes along with stronger data requirements.


SAE METHODPROSCONS
ELL
  • Any indicator that is a function of the dependent variable can be estimated, also simultaneously
  • May be more accurate when the number of domains is large and there are non-sampled domains.
  • It is a model-based approach, i.e., model diagnostics need to be conducted.
  • It assumes homogeneity in the small domains and can have a high MSE under the model, in case that the unexplained heterogeneity between domains is significant.
  • Results can be affected by individual outliers.

EBP

  • Any indicator that is a function of the dependent variable can be estimated, also simultaneously.
  • Better performance compared to ELL regarding the MSE under the model, in case the unexplained heterogeneity between domains is significant.
  • It is a model-based approach, i.e., model diagnostics need to be conducted.
  • The sampling design is not considered.
  • Results can be affected by individual outliers or lack of normality.
  • It assumes homogeneity in clusters.

Das and Haslett (2019) provide a detailed comparison of the ELL and the EBP approaches for poverty estimation in developing countries.

ELL explained

For an explanation of the World Bank method, also known as ELL approach, consult the first 10 minutes of the video “Leaving no one behind” - small area estimation with new data and new methods.

Poverty mapping explained

For an introduction to poverty mapping that also includes explanations to the EBP and ELL approach, consult the lecture of What is poverty mapping? by Nikos Tzavidis.


How to apply

Besides a basic understanding of the methodology, the application of approaches with statistical software is of interest for practitioners. While the inputs partly depend on the chosen software packages, some common data inputs and inputs for some implemented extensions are described.

Inputs

The starting point of the application is unit-level data, i.e. a survey is available at unit-level (microdata) containing the domain identifying variables. For the auxiliary information, the requirements differ depending on the indicator of interest and thus the chosen method: 

  • Mean → Basic unit-level model →  Area-level means of the auxiliary information are sufficient.
  • Non-linear indicators, e.g., ratios or quantiles → ELL or EBP →  Auxiliary information is required at the unit level.

Additional to the variable of interest and the predictor variables, the variable indicating domains or clusters needs to be given. For the other necessary arguments as e.g., the estimation method of the model parameters, defaults (predefined values) are often set to simplify the application.

Outputs

The outputs of all software packages are domain-specific estimates, Furthermore, all software packages provide estimates of the mean squared error.

Implementation

The table below gives a first overview of modules in standard software that provide the application of the basic unit-level model, the ELL and the EBP. Other extensions to the basic unit-level model are described below.

Overview of availability of the unit-level model in statistical software (not comprehensive)


RStataSASPythonOther
Basic modelsae, hbsae, JoSAE, rsae

Prototype by Statistics Canada,

Mukhopadhyay and McDowell

samplics
EBPsae, emdi


PovMap
ELL
sae

PovMap

Estimation of model parameters 

The model parameters to be estimated in the basic unit-level model are the domain random error effects variance (mathjax-inline(\sigma_{u}^{2})mathjax-inline) , the unit-level error variance (mathjax-inline(\sigma_{e}^{2})mathjax-inline)  and the fixed-effects parameters (mathjax-inline(\beta)mathjax-inline) . Common estimation approaches are empirical best linear unbiased prediction (EBLUP), empirical (EB) and hierarchical Bayesian (HB) methods. In practice, EBLUP and HB are usually used. The following table briefly shows the distinctions between the approaches but for detailed information, please see the literature recommendations.

EBLUP vs. Empirical Bayes vs. Hierarchical Bayes (based on Ghosh and Rao 1994)

EBLUPEmpirical Bayes                                     Hierarchical Bayes
  • Classical frequentist framework
  • Small area parameters can be expressed as linear combinations of fixed and random effects
  • Estimators minimize the mean squared error among the class of linear unbiased estimators of fixed parameters → Best linear unbiased predictor (BLUP)
  • Variance of random effects is assumed to be known but needs to be estimated in practical applications → Empirical best linear unbiased predictor (EBLUP)
  • Posterior distribution of the parameters of interests given the data is obtained
  • Model parameters are assumed to be known
  • Parameters are estimated from the marginal distribution
  • Inferences are based on the estimated posterior distribution
  • Prior distribution on model parameters is specified
  • Posterior distribution of the parameters of interest is obtained
  • Inferences are based on the posterior distribution → parameter of interest is estimated by posterior mean and the uncertainty is estimated by the posterior variance

For the basic unit-level model, EBLUP and HB approaches are implemented in standard software. The standard EBLUP approaches are maximum likelihood (ML), residual maximum likelihood (REML) and fitting-of-constants method (Battesse et al. 1988). For ML and REML, normality is assumed for the two error terms. The HB approach requires the specification of a subjective prior on the model parameters. the random effect variance (mathjax-inline(\sigma_{u}^{2})mathjax-inline) , the unit-level error variance  (mathjax-inline(\sigma_{e}^{2})mathjax-inline) and the fixed-effects parameters (mathjax-inline(\beta)mathjax-inline) . The prior can be informative or non-informative. Prior information could be derived from expert knowledge or relevant previous studies. While the HB approach is straightforward, the prior selection should be careful since improper priors could lead to improper posteriors, and thus SAE estimates.

Availability of the different estimation approaches in standard statistical software

ModelComment                                          RStataSAS                    Python
EBLUPE.g., ML, REML, fitting-of-constantssae, rsae, JoSAE

Prototype by Statistics Canada, Mukhopadhyay and McDowell

samplics
HBPriors for model parameters need to be selected.hbsae


For the ELL, implemented approaches to obtain the variance components are the Henderson III method (Henderson 1953, Huang and Hidiroglou 2003) and the ELL approach (2003). The EBP makes use of the ML and REML estimates for the variance components. 


Uncertainty measurement

All software packages report estimates for the uncertainty along with the SAE estimators.

For the basic unit-level model, estimators of the mean squared error (MSE) have either analytical formulations, are derived by replication methods or obtained by the posterior variance when HB estimation is used. For the ELL and EBP, replication methods in form of bootstraps are available.

Extensions

Additional to the basic unit-level model, some extensions are implemented in standard software. In the following, the ideas behind the extensions are briefly described and the table gives an overview which package provides the extension. For detailed information about the methodology, please see the original reference or the package descriptions/vignettes.

  • Pseudo-EBLUP: The basic unit-level model does not have the option to include survey design weights. Therefore, You and Rao (2002) proposed a method that allows the inclusion of survey weights to obtain design-consistent model based estimators for the domain means.
  • Under heteroscedasticity: One common model assumption in linear mixed models is the constant variance of the residual errors. When this assumption is violated, model-based methods may result in unreliable estimates. Thus, Breidenbach et al. (2018) propose a method that addresses heteroscedasticity in models.
  • Robust estimation: In the presence of influential outliers, robust estimation methods may help to reduce the influence of outlying observations on the estimation (Schoch 2012). For the estimation, a tuning constant k that regulates the degree of robustness needs to be specified. With k towards infinity, the predictions equal to the standard EBLUP.

CommentsRStataSASPython
Pseudo-EBLUPAllows inclusion of survey weights.

Prototype by Statistics Canada
Under heteroscedicityAddresses the potential heteroscedasticity in residual errors.JoSAE


Robust estimationDownweighting of influencing outliersrsae


References

Battese, G. E., Harter, R. M. and Fuller, W. A. (1988). An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data, Journal of the American Statistical Association, 83(401), 28-36.

Breidenbach, J., Magnussen, S., Rahlfa, J. and Astrupa, R. (2018). Unit-level and area-level small area estimation under heteroscedasticity using digital aerial photogrammetry data, Remote Sensing of Environment, 212, 199–211.

Das, S. and Haslett, S. (2019). A Comparison of Methods for Poverty Estimation in Developing Countries, International Statistical Review, 87(2), 368-392.

Elbers, C., Lanjouw, J. and Lanjouw, P. (2003) Micro-level estimation of poverty and inequality. Econometrica, 71, 355–364.

Ghosh, M. and Rao, J. N. K. (1994). Small Area Estimation: An Appraisal, Statistical Science, 9(1), 55-76.

Henderson, C. R. (1953). Estimation of variance and covariance components. Biometrics 9 (2), 226–252.

Huang, R. and M. Hidiroglou (2003). Design consistent estimators for a mixed linear model on survey data. Proceedings of the Survey Research Methods Section, American Statistical Association (2003), 1897–1904.

Molina, I. and Rao, J. N. K. (2010). Small area estimation of poverty indicators. The Canadian Journal of Statistics, 38, 369–385.

Rao, J.N.K. and Molina, I. (2015). Small Area Estimation. New York: Wiley.

Schoch, T. (2012). Robust Unit-Level Small Area Estimation: A Fast Algorithm for Large Dara Sets, Austrian Journal of Statistics, 41(4), 243-265.

You, Y., and Rao, J.N.K. (2002). A pseudo empirical best linear unbiased prediction approach to small area estimation using survey weights. The Canadian Journal of Statistics, 30, 431-439.

  • No labels