Descriptive analysis of data
Modified on 2013/05/20 17:01 by Haoyi Chen — Categorized as: Chapter 4 - Analysis and presentation of gender statistics
The degree of data processing and analysis varies by type of statistical products prepared by the national statistical offices. (See Box 4.1 for types of statistical products that may include gender statistics.) Typically, tables constructed to disseminate data collected in censuses or surveys involve minimum data processing and analysis. A large amount of data is provided, often as absolute frequencies or counts of observations, making difficult to discern the main differences between women and men. Additional processing and analysis are developed when more analytical reports or articles focused on specific topics are prepared. In this case, the differences between women and men have a chance of becoming more visible.
Gender statistics require at least two statistical variables cross-tabulated: sex and the main characteristic that is studied, such as educational attainment or labour force participation. Ideally, additional variables are used in further crosstabulation of data (for example, by age group or geographic areas) in three- or multiple-way tables. Although statistics on individuals have been traditionally disseminated as totals with no further information on women and men, data are increasingly disaggregated by sex in dissemination materials. Still, one limitation in producing gender statistics persists. Sex is often used as only one of the breakdown variables for the data presented. As explained in chapter 1 and shown in chapter 2, gender statistics and a meaningful gender analysis commonly require disaggregation by sex and other characteristics at the same time. For example, gender segregation in the labour market is partially determined by the gender gap in education, therefore data on occupations should be further disaggregated by level of educational attainment.
Basic descriptive analysis of data involves calculation of simple measures of composition and distribution of variables by sex and for each sex that facilitate straightforward gender-focused comparisons between different groups of population. Depending on the type of data, these measures may be proportions, rates, ratios or averages, for example. Furthermore, when necessary, such as in the case of sample surveys, measures of association between variables can be used to decide whether the differences observed for women and men are statistically significant or not.
Percentages, ratios, rates or averages are the basis for calculation of gender indicators. Indicators, in general, are used to “indicate” how differently one group performs by comparison to a norm or a reference group. Gender indicators should show how women perform by comparison to men, what is their status relative to men’s status, in areas such as education, formal work, access to resources, health or decision-making. In this regard, gender indicators are important tools for planners and policy makers in monitoring progress toward gender equality.
The sections following present the type of data involved in gender statistics, measures of composition and distribution used in gender statistics, and the types of gender indicators that can be constructed using those measures.
Box 4.1 Types of statistical products disseminating gender statistics
Gender statistics are made available by national statistical offices through various types of dissemination products. Some of the dissemination products are part of the regular production of a statistical office and aimed at making available data collected in censuses, sample surveys or compiled from administrative sources. They usually concern one type of data source or one statistical field and are intended to specialists who wish to further analyse the results of censuses or surveys or carry out research on specific topics. The data disseminated in this type of products can be detailed, organized in large tables, and often presented as absolute values or raw data that would give specialists more flexibility in doing their own analysis. A gender perspective can be integrated in these products by systematic sex-disaggregation of data and systematic coverage of data needed to address gender issues.
Other dissemination products that may include gender statistics are analytical reports or articles focused on specific topics. Data and other information may be compiled from more than one source and different statistical fields may be covered. Policy concerns are usually taken into account. These publications are intended for larger audience, not only statisticians, but also research and policy specialists in the topic or topics covered. Data disseminated in this type of product is presented in small summary tables and charts and discussed in the accompanying text. Large tables with more detailed data may be provided in annexes. A gender perspective can be integrated in these products through three elements: data-based analysis of gender issues specific to the selected topic; illustrations with gender-sensitive tables and charts; and systematic sex-disaggregation of data presented in annexes of the publication.
Statistical publications focused on gender issues are one type of analytical reports. The typical example is the “Women and men” publications produced by many national statistical offices. These publications contain data from different statistical fields and from different sources; cover multiple policy areas and gender issues; and are addressed to a large audience, including persons with limited or no experience in statistics. They are an important tool for non-statisticians, gender specialists, gender advocates and policy makers. Instead of presenting data and let the reader analyze them and draw their own conclusions, these publications are focused on presenting the main results of data analysis and their interpretation, including implications for policymaking. They are usually designed to be user friendly, based on easily comprehended language, with simple tables and charts, and attractive presentation.
Finally, gender statistics are disseminated through dedicated databases or through more comprehensive databases such as those focused on social indicators, development indicators or human development indicators. Data disseminated in this format usually cover several areas of concern and several points in time or time periods. Data are usually presented ready processed into indicators that facilitate comparisons over time or between various groups of population. Information on calculation of indicators included in the database, underlying definitions or concepts used, and sources of data used, are sometimes made available along with the database. This type of dissemination product is usually targeted to specialists interested to analyze themselves statistical information, including for monitoring purposes.
Hedman, Birgitta, Francesca Perucci and Pehr Sundström, 1996.
Engendering Statistics. A Tool for Change.
United Nations, 1997.
Handbook for Producing National Statistical Reports on Women and Men.
DESA, United Nations Statistics Division, New York.
United Nations Economic Commission for Europe and World Bank Institute, 2010.
Developing Gender Statistics: A Practical Tool.
Type of data involved in gender statistics: qualitative and quantitative variables
Statistical variables are classified into two broad classes based on their measurement level: qualitative variables, also called categorical variables (for example, sex, marital status, ethnicity, educational attainment); and quantitative variables (for example, age, income, and time spent on paid or unpaid activities). Categorical variables are of two major types: nominal variables (such as sex and marital status) and ordinal variables (such as educational attainment). Nominal variables do not imply any continuum or sequence of their categories. Typical examples include sex or ethnicity. The categories can be arranged in any order without inconvenience in the analysis. However, for the convenience in presentation, they can be arranged alphabetically, in order of their relative size in the population, or in order of relative focus of the publication (for example, first women, followed by men). Ordinal variables imply an underlying continuum. When dealing with ordinal variables, the categories must be arranged in the order implied by the continuum to facilitate analysis of the data. A typical example is “level of educational attainment”. The categories can be order in ascending or descending order of level of education. For example: no education, primary education, secondary education, post-secondary non-tertiary education, and tertiary education. Some continuous variables tend to be coded into a few categories and treated as ordinal variables. For example, age in single years can be recoded in 5-year age groups and displayed from the youngest to the oldest ages.The distinction between types of variables is important because specific statistical measures can be applied to each category, as shown in the paragraphs following.
Measures of composition or distribution for qualitative variables
Computation of proportions, percentages, ratios and rates are basic statistical procedures in describing the categorical composition or distribution of qualitative variables, and useful tools for standardization of the statistics compared. It is important to keep in mind that the measures of composition or distribution should not be calculated for small number of observations. In that case, actual numbers (absolute frequencies) should be preferred.
Proportions and percentages
is defined as the relative number of observations in a given category of a variable relative to the total number of observations for that variable. It is calculated as the number of observations in the given category divided by the total number of observations. The sum of proportions of observations in each category of a variable should equal to unity, unless the categories of the variable are not mutually exclusive. Most often, proportions are expressed in percentages. Percentages are obtained from proportions multiplied by 100. Percentages will add up to 100 unless the categories are not mutually exclusive.
In gender statistics, proportions can be calculated as relative measures of (a) distributions of each sex by selected characteristics; and (b) sex distributions within the categories of a characteristic. These two types of proportions are presented in the Table 4.1. In the first case of distribution, the proportions are calculated as relative frequencies of the categories of a characteristic for each sex, with women’s and men’s respective totals used as the denominators. For example, in the third column of data in Table 4.1 it can be observed that employed represents 39 per cent of all women. This is calculated as the number of women employed divided by women’s total population in the corresponding age group and multiplied by 100. In comparison, employed represents 73 per cent of all men, as shown in the fourth column of data. This is calculated as the number of men employed divided by men’s total population in the corresponding age group and multiplied by 100.
In gender-related analysis, proportions calculated as percentage distributions can be used to compare women and men with regard to various social or economic characteristics. A simple measure of the gender gap is the differential prevalence, where percents in the distribution of a characteristic within the female population are subtracted from corresponding percents in the distribution of the characteristic within the male population. The resulting percentage-point difference indicates the gender gap in the characteristic considered. In our case, the proportion of women employed is lower than the proportion of men employed by 34 percentage points.
The percentage distribution of the categories of a characteristic for each sex is the basis of most of the gender indicators. A few examples are the labour force participation rate, literacy rate, school attendance rate, or contraceptive use. Based on the proportions calculated in the data columns 3 and 4 in Table 4.1, two indicators of the status of women and men on the labour market can be directly figured out. For example, the proportion of women who are employed (39 per cent in our case) is actually the indicator employment-to-population ratio, one of the indicators for the first Millennium Development Goal, on eradication of poverty and hunger. Furthermore the proportion of women who are employed or unemployed give the labour force participation rate (in our case, the labour force participation for women is 39 +2 =41 per cent). Based on the data presented in the table two other indicators can be calculated: unemployment rate (which is the proportion of unemployed in the total of employed and unemployed); and employment rate (which is the proportion of employed in the total of employed and unemployed).
Table 4. 1 Economic activity status for population 15-64 years old, Peru, 2007
Sex distribution (per cent)
Not economically active population
Source: United Nations Statistics Division, DYB, Census data sets (accessed January 2012).
Sex distribution within the categories of a characteristic are shown in the data columns 5 and 6 in Table 4.1. In this case the proportions are calculated by raw, as opposed to the previous type of proportions, calculated by columns. For example, 36 per cent of the employed are women and the rest 64 per cent are men. The share of women in employed is calculated as the number of women employed divided by the total number of women and men employed and multiplied by 100.
Among the gender indicators constructed based on sex distribution within a category of population are the proportion of seats in parliament held by women, share of girls among the children out-school, share of women among agricultural workers, and share of women among older population living alone.
This type of indicator is often used for population groups known to have an overrepresentation of women or men. The selected groups are often linked to a policy concern. For example, in many countries women represent a minority of parliament members, ministries, chief executives of corporations, mayors, or researchers. Policies based on gender quota are used by some of the countries to increase the participation of women in those groups.
The percentage of women and the percentage of men in a group, always add up to one hundred per cent. Because of that, often only one of the indicators (share of women usually) is presented in tables or graphs.
Particular compositional aspects of a population can be made explicit by use of ratios. A ratio is a single number that expresses the relative size of two numbers. The ratio of one number A to another number B is defined as A divided by B. Ratios can take values greater than unity. Because of the way they are calculated, proportions can be considered a special type of ratio in which the denominator includes the numerator. However, ordinarily, the term ratio is used to refer to instances in which the numerator (A) and the denominator (B) represent separate and distinct categories. Ratios can be expressed in any base that happened to be convenient, however, often used is the base of 100.
A well-known example of ratio based on qualitative variables is the sex ratio – the number of males per 100 females, used to state the degree to which members of one sex outnumber those of the other sex in a population or subgroup of a population. A variation of this indicator is the sex ratio of birth, defined as the number of male live births per 100 female live births.
Other gender indicators based on sex ratios may involve the standardization of the variables used. For example, gender parity index calculated for participation at various levels of education is intended to reflect the surplus of girls or boys enrolled in school. The indicator can be calculated simply by dividing the number of girls enrolled to the number of boys enrolled. This gives a good estimation of the distribution by sex in enrolment. However, it gives a poor measure of gender differences in access to education, because the differences in the number of girls and number of boys that should be in school (the school-age population) are not taken into account. An alternative calculation of the indicator that controls for the sex composition of the school-age population uses the ratio of net enrolment rates (or gross enrolment ratios) for girls to net enrolment rates (or gross enrolment ratios) for boys.
In general, proportions and ratios are useful for analysis of the composition of a population or of a set of events. Rates, in contrast, are used to study the dynamics of change. Most often used in gender statistics are rates of incidence. A rate of incidence is usually defined as the number of events that occur within a given time interval (usually a year) divided by the number of members of the population who were exposed to the risk of the event during the same time interval. Rates can be considered a special type of ratio, in the sense that they are obtained by dividing a number (of events) to another number (of population exposed to the event). In calculating rates, it is usually assumed that the events are evenly distributed throughout the year, while the population at risk is approximated as the midyear population. Demographic rates such as fertility rates and mortality rates are typical examples of rates calculated in gender statistics. By convention, some ordinary percentage figures showing the composition of a population group are called rates. For example, what is called literacy rate is actually a simple percentage of the population that is literate.
When data on population exposed to risk are not easily available, a close approximation of that population is used as denominator to summarize the incidence of the events considered. The indicator obtained is not considered a rate anymore, but a ratio. For example, in the case of maternal mortality, when the originating population – that is the number of pregnant women – is not available, the indicator is calculated on the number of live births, and is more accurately called maternal mortality ratio.
Data used for the numerator and data used for the denominator in calculating rates come sometimes from different sources. For example, in the case of mortality rates, data on deaths used for the numerator may come from the civil registration system, while data on population used for the denominator may come from population censuses. When data from different sources are to be combined, it is essential to ascertain whether they are comparable in terms of the coverage of all groups of population and geographic areas, and time period(see Box 4.2).
is similar to a rate, with one important difference: the denominator is composed of all those persons in a given population at the beginning of the period of observation. Typical examples are infant mortality rate and under-five mortality rate. The numerators are infant and child deaths respectively. The denominator used is the number of births, which represents the population at risk of dying at the beginning of the period of observation.
Measures of composition or distribution for quantitative variables
In gender statistics, the measures of central tendency and dispersion commonly used to analyse continuous variables are the median and quantiles, the arithmetic mean and the standard deviation.
Medians and quantiles
is the value that divides a set of ranked observations into two groups of observations of equal size. Examples of indicators based on the median are the median age of the population and the median income in population. The concept of median can be generalized, obtaining
, which divide a ranked distribution into groups of equal number of observations. Examples of quantiles are
quartiles, quintiles, deciles
. Quartiles divide the ranked distribution into four equal groups, quintiles in five groups, deciles in ten groups, while percentiles in one hundred groups. These measures are often used in presenting the distribution of income or wealth scores.
Means and standard deviation
) is defined as the sum of values recorded for a quantitative variable divided by the total number of observations. Examples of indicators based on arithmetic mean are the average time use for unpaid work by sex, the average size of land owned by sex of the owner, mean age at first marriage by sex and mean age of mother at first child. Some gender indicators are calculated as ratios between the averages calculated for women and for men. For example, one of the indicators commonly used to show the gender pay gap is the ratio of female to male earnings in manufacturing. It is calculated by dividing the average earnings gained by women employed in manufacturing by average earnings gained by men employed in manufacturing.
Deviations from the mean
are differences between the values of each observation for a particular variable and the mean of all values observed for that variable. Values of some observations are greater than the mean, therefore their deviations from the mean are positive; while values of other observations are smaller than the mean, therefore their deviations from the mean are negative. When the deviations from the mean are squared, all the negative deviations become positive. The sum of all squared deviations divided by the number of observations (or by the number of observations minus 1 in the case of data from sample-based surveys) is called variance. Variance is a measure of variability in the distribution of a variable. It represents the degree to which individuals differ from a mean value of a variable. The greater the spread of observations, the greater the variance. Because the variance is measured in squared units of the variable, it is difficult to interpret its values. Taking the square root of the deviance returns the measure to the original unit of the variable. This measure is called standard deviation. The size of the standard deviation relative to that of the mean is called
coefficient of variation.
Although measures of dispersions such as standard deviation and coefficient of variation are not often presented in gender statistics, they have an important role in measuring the degree of association between variables and in making inferences about a population based on data collected from a sample of that population.
Box 4.2 Using data from different sources
When data from different sources are to be combined, it is essential to ascertain whether they are comparable in terms of the coverage, time period, definitions and concepts. Statistics from different government sources may differ in arrangement, detail and choice of derived figures. Moreover, what appear to be comparable figures may not be, due to errors or variations in classification or data-processing procedures. Lack of comparability can also be a problem with time-series data, if concepts or methods have changed from one period to another.
Checks for consistency and comparability between different sources should be made any time different sources are to be combined. Obtaining comparable data for the period covered by a study or for completing time series should be a paramount concern. It is most problematic when different sources are used for the same indicator (say, if missing years require supplementary data). Any variations in concepts from different sources and even different years within the same source should be thoroughly checked.
In most cases these checks can be made by reviewing the source’s documentation. It is also a good idea to consult specialists in different fields who may themselves supply or use the data. These specialists often have additional information on availability of data (which may not be well publicized). They often understand special considerations of specific types of data, and know of existing evaluations.
: Excerpt from United Nations 1997,
Handbook for Producing National Statistical Reports on Women and Men.