The chapter covers the enabling environment that is required for National Statistical Offices to move from SAE experimentation to production. Whenever available, discussions under each key area are accompanied by national examples.
Establishing a clear and focused objective that links SAE to data use for policymaking
Before embarking any new statistical exercise, it is always a good practice to ensure that there is a real demand and understand how the produced data will be used, whether they are going to feed into a specific government programme, whether to evaluate a social assistance programme or other specific objectives. In the US Census Bureau example, estimates from the SAIPE program are used to allocate federal funds to school districts.
In Chile, the law of the Fondo Común Municipal (FCM) was amended in 2009 and required that the Ministerio de Desarrollo Social y Familianeed to provide poverty rate estimates every 2 years for all comunas in the country. Funding to all comunas will be allocated based on such data. Similarly in Colombia it was mandatory that the Department of Social Prosperity (DPS) redesigned monetary transfer programs Jóvenes en Acción and Familias en Acción, the most critical conditional monetary transfers in Colombia. Data on poverty for smaller geographical areas are needed for the monetary transfer.
In addition, SAE should only be used when design-based estimates are not available.
Building the legal foundation for using SAE for official data production
The example of the SAIPE program from the United States demonstrates how the legal foundation for the US Small Area Income and Poverty Estimates (SAIPE) Program emerged. Excerpts from Article PL 103-382 - Improving America's Schools Act 1994 covered 3 aspects: (a) requirement of using the Census Bureau data to allocate Federal funds to school districts; (b) an implicit requirement on the quality of data; and (c) creation of an external panel to evaluate the estimates.
Building a sustainable SAE data production system is a long-term process that requires substantive financial resources to support (a) the initial methodological development; (b) regular and periodical adjustment of methods; and (c) sustainable data production and validation. SAE is usually a small proportion of all the activities within a National Statistical Office. Given the continuous efforts required, a dedicated team on SAE would usually be preferred.
Fostering an environment for research and development
The NSOs’ advances in small area estimation generally have its roots in research projects responding to government’s, and other stakeholders’, calls for granular social, business and environmental data. The projects are usually complex, demanding time and a qualified team. Fostering research within the NSO, and liaising with academic experts, may be the key for successful innovative outcomes. In addition, joining forces with other NSOs or institutions can enable a constructive environment, not only for sharing knowledge but for combining efforts on the required methodological work.
Although time and personnel resources are always scarce in most NSOs, these are the essentials for enabling research. In addition, since the small area estimation model is a tailor-made tool for a specific target indicator, it is not uncommon that attempts to produce model-based estimates do not deliver publishable outcomes. This may happen, for example, due to the choice of target indicator, the lack of auxiliary data or the need for additional methodological development. Every experiment is, in fact, a capacity building movement towards innovation.
While successful experiences are those being reported, NSOs have been involved in projects and feasibility studies that did not result in small area experimental or official statistics. Acknowledging this challenge is also crucial to a productive research and development environment.
This Toolkit presents examples of collaborative research projects jointly carried out by NSOs and academia such as EURAREA and ESSnet Project for SAE.
In addition, the SAE Practices section highlights NSOs’ partnerships with academia and institutions (such as the World Bank, UN-ECLAC, UNDP, etc.) that supported their progress and achievements in the subject.
Design-based versus model-based estimates: a changing culture in the national statistical offices
In the context of sample surveys, statistical inference can be design-based, model-assisted and model-based. Typical inferences used following a sample survey are design-based and model-assisted that are both based on the stochastic structure induced by the sampling design. Parameter and variance estimators are derived under the concept of repeatedly drawing samples from a finite population according to the same sampling design, while statistical modelling plays a minor role. On the other hand, in a model-based inference, the probability structure of the sampling design plays a less pronounced role since the inference is based on the probabilistic structure of an assumed statistical model. (Van den Brakel and Bethlehem, 2008)
The above paper further noted that at that time (2008) the use of model-based estimation procedures was rather limited at Statistics Netherlands and most European National Statistical Institutes. A number of factors were quoted as the underlying reasons for the rather reserved attitudes towards model-based estimation, including: (a) playing safe in the production of official statistics and not wanting to rely on model assumptions particularly if they are not verifiable; and (b) technically and practically challenging model-based methods that were identified as particularly difficult.
Another reason that limits model-based estimation is that it is important, but time-consuming, to choose models and to check model assumptions. This is not necessary in a design-based framework which is hence much more efficient. In addition, a model-based estimation procedure is usually a tailor-made solution to produce statistics of a specific indicator, whereas a single set of design-based weights (and/or one design-based calibration procedure) can be employed to yield estimates for many survey variables and indicators, all at once.
Input data is the most crucial element for producing good SAE estimates. Countries interested in SAE for official production often face two challenges in terms of input data:
At the beginning of a SAE project, one needs to identify proper auxiliary variables for the modelling procedure. In the example of Chile, a table was put together with all relevant data sources:
The selection of the auxiliary variables should be made jointly with subject-matter experts. For Chile, when producing poverty rate SAE estimates, three criteria were used to select auxiliary variables: (1) associated with social and economic conditions that help understand the phenomenon of interest in the targe small areas; (2) collected and administered by reliable sources, which helps to control measurement errors, and (3) elaborated and published periodically, to depict more accurately and timely the evolution of phenomenon under study. In the UK ONS example, quality of the administrative data was assessed to understand its coverage, whether it has been used or assessed before by another study, how frequent data are updated and if there is a time lag, how much impact the lag is on the overall quality. Note that the quality assurance for SAE input administrative data source is built upon ONS' Toolkit for the quality assurance of administrative data, but additional quality requirement for SAE is also assessed, which is mostly related to the prediction power of auxiliary variables, the alignment of administrative boundaries and consistency of concepts and definitions across different sources.
Maintaining a high and fit-for-purpose quality standard
It is extremely important to appraise the outcomes of small area estimation before publishing the estimates. The guide on small area estimation for NSOs (ADB, 2020, Section 2.5) highlights two levels of assessment to secure the quality of estimates: an internal assessment to evaluate the models, the estimation procedure and corresponding results; and an external validation with “prospective users of the statistics and other experts”.
The internal assessment is encompassed into the production process of SAE estimates. Model diagnostics should be implemented and produced estimates should also be compared against other data sources (proxies). Usually, these alternative sources are not available at the small area, or domain, level and the model-based estimates have to be aggregated to the appropriate level in order to allow comparisons.
In addition, external validation is required to guarantee that estimates are plausible and fit for purpose (in the sense of meeting users’ needs and being produced according to valid methods). This can comprise public consultations, quality review by stakeholders and external experts, as well as presentation of methods and corresponding results in specialized forums and conferences.
An independent evaluation of the SAE methods and results is highly recommended. The main benefits of having independent review of methods and data sources by external experts are (1) protection against unseen weakness or failures in the small area estimation process, and (2) independent verification of the quality of the approach promotes public confidence in the estimates.
With regard to developing new methods, good quality input data sources are generally more relevant than using the latest technical development, that may have not been thoroughly tested. Fitness of models and methods for the task at hand are worthier than novelty.
Academia is an important partner for developing SAE methods and reviewing the SAE work. In the US example, review of SAE work within the Census Bureau is carried out by an advisory committee, by chartering review panels from the National Academy of Sciences, and by requesting review from individual experts. In the example of Australian Bureau of Statistics, independent review is carried out by the Methodology Advisory Committee composed of academics in Australia. Rather than giving feedback on each SAE project, they are involved at key times, for example, when ABS was establishing random effects models and when investigating temporal SAE models. When the UK Office for National Statistics (ONS) initiated its work on SAE the University of Southampton was commissioned to examine the potential for small area estimation. Joint work has continued since then as can been seen in reports and papers.
Several NSOs liaise with academia for capacity building activities as well as for the development and evaluation of small area projects. On the other hand, academics welcome real small area estimation problems to prompt advances in the subject. This toolkit displays several examples of this partnership as cases studies, SAE country practices and in the listed references.
Other government institutions and private sector
Availability and access to auxiliary information are essential for small area estimation procedures. Administrative data held by different government institutions (or administrative authorities), as well as alternative data (such as mobile phone and sensor data, or data generated by tracking devices) owned by private companies, constitute valuable input to small area models. Institutional arrangements or agreements between NSOs and other entities may facilitate partnerships. Although this pathway is usually laborious, the benefits of data sharing and record linkage to improve the statistical production has already been highlighted in many official statistics’ discussion forums.
The successful small area estimation projects presented in this toolkit exemplify the incorporation of government administrative data and privately owned alternative data in small area estimation models.
Administrative data from Ministerio de Desarollo Social (that means Ministry of Social Development) was incorporated as auxiliary information in the small area model to produce poverty rate estimates for Chilean comunas.
The small area income estimates, that are being published by ONS-UK for many years, illustrates how ONS cooperates with a range of UK government institutions to obtain not only auxiliary administrative information but also the income data (the target variable). ONS established a data sharing agreement with the Department for Work and Pensions (DWP) for accessing the Family Resources Survey (FRS) data. The DWP, in turn, reports as one of the Uses of FRS Data:
“The Office for National Statistics produces small area model-based income estimates as the official estimates of annual household income at the middle layer super output area (MSOA) level in England and Wales. The estimates are produced using a combination of survey data from the Family Resources Survey and previously published data from the 2011 Census and a number of administrative data sources.”
Up to now, the use of proprietary data, as target or auxiliary variables, for small area estimation is still limited. The main challenges are related to data access and data quality. The SAE2021 conference theme was Big Data for Small Area Estimation and the SAE2022 edition will present conferences on small area estimation and Big Data, encompassing small area methods in the cutting-edge and comprehensive subject of data integration.
One example of proprietary data employed as auxiliary variable for small area estimation is reported in Schmid et al. (2017), who developed a small area estimation model for deriving small area literacy rate estimates for the 431 communes in Senegal by gender, using mobile phone auxiliary data. The estimates are based on 2011 Demographic Health Survey (DHS) carried out by the National Agency of Statistics and Demography of Senegal (ANSD, Agence Nationale de Statistique et de la Démographie) and mobile phone data covering the year 2013 (provided by the Senegalese telecommunication company Sonatel).
Within national statistical system
For the development of a small area estimation project, the assessment to secure quality of estimates and the publication of results, a diverse team of specialists is required. SAE methodologists and subject-matter experts usually work together to define the target indicators, to support decisions related to the selection of auxiliary variables and to validate the estimates (internal assessment). It may also be the case that specialists such as data journalists, data visualization experts or media experts would contribute to prepare material for communicating the results.
Depending on how the country’s national statistical system is organized, the team of specialists may work in different government institutions or in different units/divisions of the NSO. Sharing of technical resources across NSO units/divisions and across institutions may be the key to a successful outcome. Working groups are also helpful for sharing expertise, tasks and facilitate data sharing.
In BPS-Statistics Indonesia, for example, the research on small are estimation is carried out by BPS employees from several office units, forming an SAE ad-hoc team comprised of SAE technical coordinators, SAE methodologists, data experts on the substantive topic and interns. The SAE unit within Istat consists of 4 methodologists who liaise with other office units through working groups and this has helped respond to user needs in terms of the required level of disaggregation for small area statistics and the target indicators, obtain the available auxiliary information, and validate the model results.
Geospatial information plays an essential role in the scope of official statistics. Integration of geoinformation and GIS techniques in the development of small area models can enable improvements and production of estimates where no other administrative or census data is available. Expertise in this area is vital and welcome.
For example, night-time lights density measured through satellite images can be used to delineate urban and rural areas at the lower administrative level. This does not only assist in providing estimates at an urban and rural level but can also be used as a proxy for local economic activity or urbanization and correlate well with other welfare proxies (Henderson et al., 2012; Ghosh et al., 2013).
Geospatial covariates, such as road density, elevation, precipitation, can also be incorporated in to mapping analysis. For example, Jalan and Ravallion (1997 and 2002) show significant correlation of poor resource endowments (climate variables, land use / land cover) on poverty outcomes while controlling for household education and other human capital endowments.
A recent report from the World Bank (Masaki at al., 2020) presents results from small area estimation of non-monetary poverty for Sri-Lanka and Tanzania incorporating geospatial data (such as night-time lights and normalized difference vegetation index).
Mapping the small area estimation outcomes using GIS software is also valuable for policymakers to make informed decisions.
Building capacity on using SAE for official statistics can take different ways to be effective, depending on the NOS's existing competences in this topic. The following highlights a number of lessons learnt from national experiences:
One broad challenge that is not specific to capacity building on small area estimation but is certainly relevant is the high level of staff turnover in national statistical offices. As seen in the discussion throughout this Wiki platform that it usually requires a large effort in training national staff to get to the level of comfortably using SAE techniques. Once the well-trained staff leaves the office, the program the SAE work will be impacted on. Some of the ongoing discussion around this challenges is whether a close collaboration with the research institution would help alleviate this pressure to a certain degree. The institution could potentially lead the work on further research in testing new methods or new data and the tested method can be provided to the national statistical offices for running in production. (Some reflections from colleagues supporting countries on SAE under the Data For Now project).
Transparency in releasing methodology and communicating quality
The relevance and usefulness of small area estimates to improve countries’ capacity to produce and disseminate granular data is acknowledged. Still, trust in official statistics is a pillar of the national statistical system, and Principle 1 of the Fundamental Principles of Official Statistics is related to Relevance, Impartiality and Equal Access.
“To make information widely known and available on an impartial basis requires dissemination and marketing activities (including dealing with the media ) to provide information in the form required by users, and release policies which provide equal opportunity of access. Sound statistical principles needs to be followed with the presentation of statistics so that they are easy to understand and impartially reported.”
The Section Communicating SAE Results highlights important issues on this topic. NSOs with long tradition of producing model-based estimates offer good examples of reports and initiatives for disseminating SAE methods and the resulting quality of statistics.
The US SAIPE project provides plenty of material on its website, as well as ONS-UK that, besides publishing a technical report, also presents information related to the quality of estimates and guidance on the use of estimates.
In various countries, small area estimates are published as experimental statistics, facilitating users’ consultation as well as familiarization with innovative methods and corresponding statistics. See, for example:
Practical way forward: from experimental statistics to official statistics
It is common practice that small area estimates are initially published as experimental statistics, since they are new outputs, resulting from statistical models, and will be evaluated by users and other stakeholders. Therefore, careful planning is necessary to progress from SAE research to the production of experimental statistics, and then to reach the official statistics status. The latter is achieved when statistical methods are recognized as mature and well-known, estimates are considered reliable, quality standards are met for both the outcomes and the production process, and the statistics are judged as useful and trustworthy by users. There are no shortcuts in this route but planning and organization can facilitate the course. Various sections of this toolkit address main points and stages of this enterprise.
Careful planning is necessary to move from SAE experiment to the production of official statistics. The following chart shows the road map that Statistics Indonesia (BPS) has put together when planning its research pathway towards SAE data production.
Source: Statistics Indonesia (BPS, as of May 2021)
On a further stage, Statistics Canada reports the development of a small area estimation production system that “has been used to produce small area estimates on an experimental basis for several surveys”(Hidiroglou, Beaumont and Yung, 2019).
In addition, Small Area Income Estimates for England and Wales (ONS-UK) have already been published as official statistics since 2012, when the designation as National Statistics was awarded following an assessment by the UK Statistics Authority (see Assessment Report 160 from 2011 and Assessment Report 324 from 2016) .
“The more we invest in meeting the demands for small area statistics and the more successful we are in obtaining small area estimates, the more complex the estimation system becomes as care has to be taken to ensure comparability over time (when repeat estimation takes place) as well as consistency over area/domains and coherence over variables. Queries on how to compare or analyse change over time and on how to aggregate small area estimates to broader areas or domains have to be addressed due to soaring users’ calls”. (Silva and Clarke, 2008)
An interesting development in SAE is the effort on reducing reliance on customized approach for different outcome indicators, which has been a major hurdle for taking SAE on board for more indicators. The case is highlighted by ISTAT that it is developing "methods that, taking into account sampling design and benchmarking needs, would deliver a set of weights for each domain of interest that can be used to produce the estimates of all target indicators based on a survey (reducing the disadvantage of current SAE tailor-made methods)". Basel (2020) also quotes scalability challenges of SAE methods since the need of modelling large sets of indicators/responses from the same survey is a reality.
I addition, the incorporation of alternative data sources as auxiliary information and the need for data integration has to be considered within the development of small area estimation methods and procedures. For example, the Italian Statistical Society has recently launched a call for papers to be published in a special issue of the Statistical Methods and Applications journal on Big Data and alternative data sources for Small Area Estimation. Other research problems and developments are displayed by US Census Bureau (Small Area Estimation: research problems) and Statistics Canada (Statistical Methodology Research and Development Program Achievements, 2020/2021).
Silva, D.B.N. and Clarke, P. (2008) Some Initiatives on Combining Data to Support Small Area Statistics. Paper presented at theIAOS 2008 Conference on Reshaping Official Statistics. [online]Available from: http://www.stats.gov.cn/english/specialtopics/iaos/Papers/CS_4_3_Silva_Clarke.pdf. ,Accessed 2 February 2022.
US Census Bureau, SAIPE program, courtesy of Mr. William R Bell
Australia Bureau of Statistics, courtesy of Mr. Sean Buttsworth
Henderson, J. V., A. Storeygard and D. Weil. 2012. Measuring Growth from Outer Space. American Economic Review, 102(2), pp.994-1028.
Ghosh, T., S.J. Anderson, C.D. Elvidge, and P.C. Sutton. 2013. Using Nighttime Satellite Imagery as a Proxy Measure of Human Well-Being. Sustainability. 5: 4988-5019; doi:10.3390/su5124988
Jalan, J. and M. Ravallion. 1997. Spatial Poverty Traps? World Bank Policy Research Working Paper # 1798. Washington, DC: World Bank.
Jalan, J. and M. Ravallion. 2002. Geographic Poverty Traps? A Micro Model of Consumption Growth in Rural China. Journal of Applied Econometrics 17(4): 329-346.
Basel, W. (2020). Scalability issues for SAE methods. Paper presented at: Year of Data Science Workshop on Small Area Data Analytics, March 30 - April 3, 2020, University of Maryland. Book of Abstracts, Abstract 31, p. 31. Available at https://mti.umd.edu/sites/mti.umd.edu/files/Tentative%20Program%20March%20%202.pdf