UNSD Document

Statistics New Zealand Graphics Guidelines

The Graphics Guidelines have been prepared as a supplement to the Protocols for Official Statistics. They will assist with the implementation of Principle 8 of the protocol.

Principle 8: in analysing and reporting the results of a collection, objectivity and professionalism must be maintained and the data impartially presented in ways which are easy to understand.

1. Introduction

Graphs have two primary uses. Firstly, they can be used to explore and analyse data in order to uncover patterns and relationships. The second use of graphs forms the scope of these guidelines: the communication and display of results.

Graphs are widely used to communicate information. Unfortunately a focus on eye-catching graphic design and a lack of attention to principles for accurate presentation of information can result in graphs which are not clear and are misunderstood. The objective of these guidelines is to provide assistance in the production of graphs which accurately reflect the major story in the data and are presented in the clearest and most consistent possible way.

The design principles that follow are based primarily on recommendations from the following sources.

1. A. Wallgren, B. Wallgren, R. Persson, U. Jorner, and J. Haaland (1996). Graphing Statistics and Data, (Statistics Sweden). 93pp, Sage Publications Inc, Newbury Park.
2. E. Tufte, (1983). The Visual Display of Quantitative Information. Graphics Press, Cheshire, Connecticut.
3. N. Fisher, Informative graphics. CSIRO.
4. The Australian Bureau of Statistics graphics standards (1990).

These references (especially the first one) should be consulted where information is required that is beyond the scope of this document.

Producing graphs is an art as well as a skill. While adherence to the points in these guidelines will go a long way towards ensuring that a graph presents data in the best way possible, the process is not complete until at least one other person has reviewed the graph. This final step is vital.

A special point of note is that one should be careful when using computer packages to produce graphs. Many default settings are not ideal for good graphics, such as vertically written text, large spacing between bars on a bar graph etc.

1.1 When and why use graphics?

Graphs can be more revealing than statistical tables. The objective of a graph should be to convey the major story being revealed by the data in an unambiguous and illuminating form. Graphs should not only emphasise important statistical messages and indicate relative sizes or trends, but also create reader interest in the statistics.

The first step in deciding what to graph is to analyse the statistical output and understand the major elements to be represented. It should then be decided whether a graph is the best way of representing these elements. A table may be better.

How does one choose between use of a graph or a table? There are some fairly simple indicators of situations in which a table will be preferable. These are where the data sets:

1. are very small (perhaps just 3 or 4 values),
2. have several cross-classifications,
3. have comments attached to some of the data points,
4. contain numerical values which are of direct interest, or
5. contain numerical values that are likely to be required for future reference.

However in general a graph is preferable to a table since patterns can more easily be revealed.

Graphs should generally be located as closely as possible to the relevant tabular or descriptive presentation. In some cases, however (for example in small publications, or those where users may wish to compare one graph with another), it may be more appropriate to show all the graphs together.

1.2 Types of variables

Different types of variables require different sorts of graph. A variable will be one of the following:

1. Qualitative ('Words') e.g. Sex, Region
2. Quantitative ('Numbers')
a) Discrete ('Certain Values') e.g. number of rooms, family size.
b) Continuous ('All Values') e.g. an economic index, age, weight, temperature.

Continuous variables are often grouped into classes such as age ranges.

1.3 What should be in a graph?

The following are principles of constructing a graph. The components of a graph are defined in the Appendix.

- The graph should induce the reader to think about the data it contains, not the graphic design.

- Graphs should not give a false impression of the data by exaggerating differences.

- There should be as much white space as possible on a graph. In practice this means that a large proportion of the ink on a graph should be used to present the data itself. Grids, tick marks, labels etc. should be kept to a minimum.

- Graph 'junk’ should be minimised, e.g. hatching, stipples, unnecessary labelling and third dimensions.

- Graphs that contain only a few data points should be small in size.

- The interior of a graph, the plot area, is for data. This region should not be cluttered. Labels should be kept to a minimum; tick marks and scale labels should be outside the data region and when several series of data sets are included in the data region they must be visually distinguishable.

- The amount of text in the graph should be kept to a minimum. Explanatory text should be restricted to the title of the graph (and, where absolutely necessary, footnotes).

- A graph should still be intelligible after black-and-white photocopying or printing, so lines or bars should be distinguished by more than just colour.

Section 4, ‘Graphic standards’ gives more detail on what should be in a graph.

2. The process of graphing

The process of graphing falls into two stages, the first comprising data analysis and selection of the graph, and the second comprising construction of the graph and critical review of it.

2.1 First stage: Data analysis and graph selection

1. Perform a statistical analysis of the data set to find out what patterns and relationships (if any) it contains.
2. Decide if this information in the data is to be presented in a graph rather than in a table.
3. Decide on the basic, or primary, variables involved.
4. Identify the types of variables: quantitative or qualitative (categorical).
5. Decide on the specific variables and comparisons of interest.
6. Select an appropriate graph (time series, bar graph, dot graph etc) for the type of data. The interpretation should not be prejudiced by the technique of presentation.

2.2 Second stage: Construction of the graph

1. Construct an initial graph.
2. Consider re-ordering the variables.
3. Consider using extra plots.
4. Consider adding or removing zero.
5. Consider allowing for a break in an axis.
6. Consider changing the size of the graph.
7. Re-check against principles of graph construction.
- Is the graph easy to read?
- Can the graph be misinterpreted?
- Does the graph have a good size and shape?
- Is the graph in the right place?
- Does the graph benefit from being in colour?
8. Try the graph out on somebody.

These steps should be repeated until all points are satisfied.

3. When to use which graphs

3.1 Bar graphs

- A bar graph is best for comparison of quantities.

- Use when graphing a continuous variable by a categorical variable or when graphing classes (e.g. age ranges).

- Keep category labels as short as possible.

- It is often best to align the bars horizontally. This means that there is room for longer category labels (although these should be kept as short as possible). Vertical bar graphs with labels which do not fit neatly along the axis and require very large legends are difficult to interpret. The principle exception is when time is involved: the time axis should always be horizontal.

Example 3.1.1: Simple bar graph

Horizontal bars allow space for long category labels thus facilitating reading of the graph. Note that the bars are not touching and are evenly spaced. This indicates the categorical nature of the data.

4. Graphics standards

The following requirements should be adhered to for any graph.

4.1 Shape and size

- If the nature of the data suggests the shape of the graphic, follow that suggestion.
- Otherwise the frame should be 1.5 times as wide as it is high.
- Small graphs should be used for simple messages, larger graphs for more complex messages.
- Comparison of related graphs should be facilitated by using identical scales of measurement and placing graphs side by side.

4.2 Graph title

- A graph title must be left aligned.
- The title should be informative but as short as possible. Supplement it with a separate caption under the graph if necessary.
- The title should be in mixed upper and lower (title) case, e.g. 'Sex Ratios of Elderly, Urban and Rural Areas'.

4.3 Plot Border

- A graph should have a border around the plot area if the plot area is the same colour as the rest of the page. This helps visually to connect the elements of the graph.

4.4 Scale

- A scale should be chosen which results in a balanced presentation and assists interpolation between labelled tick points. Use 1, 2, 5 (or 10, 20, 50 etc) as scale intervals. This will result in having easily recognisable values (even and multiples of 5) in the scale. For example avoid a scale as 30, 60, 90, 120, missing 100.

- Intervals should be evenly spaced. Non-linear scales (e.g. logarithmic) should only be used where absolutely necessary and where readers will not be misled.

- Use the same scale and format for graphs that are likely to be compared.

4.5 Axes

- Place the axes at the left and bottom of the graph.
- A second right hand axis should be used where the graph spans a whole page.
- Where the vertical axis has positive and negative values, the zero line should be clearly indicated.
- Two different types of vertical axis for different overlaid graphs should in general not be used as this is confusing to the reader. However occasionally this is a useful tool to compare patterns of trends (see Section 4.18).

4.6 Axis labels

- There should be name labels for both axes. These should begin with an initial capital followed by lower case e.g. 'Number never married'. An exception is where the category labels in a bar graph clearly identify what is being plotted on the axis (e.g. years, region names). In this case an axis name label may only add clutter to the graph.
- The unit and scale of measurement should be placed in the axis title and not in the graph title.
- The interval between the two highest y-axis labels should contain data.

4.7 Tick marks

- Tick marks must be outside of the axes.
- The width of the axis and the number of plot points will determine the number of tick points that are labelled. The number of labelled (major) tick marks must be less than 10 for the horizontal axis and less than 8 for the vertical axis.
- Minor ticks should be kept to the minimum necessary for clarity.
- The data should span the tick marks, i.e. the data should begin at the first tick mark and end at the last tick mark.
- The first and last major tick mark along the horizontal axis of a time series graph must be labelled.
- Do not put ticks between bars on graphs. They have no value and are confusing to readers.

4.8 Tick mark labels

- Numeric tick mark labels should have fewer than 4 digits and must have fewer than 6 digits (i.e. preferably 3 or less with a maximum of 5 digits). A comma as a thousands separator should be used for large numbers in graphs as it makes large numbers easier to read.

- The scale factor is the scaling to apply to the values labelling the tick marks e.g. a maximum scale value of 55,000 with a scale factor of 1,000 will display 55 as the maximum figure on the axis. The correct numeric axis label depends on the scale factor, the scale of the data, and the units of measurement (e.g. a label of '$M' where the scale factor is 1,000,000 and the units are dollars).

- The maximum and minimum vales of the numeric scale and the interval between tick marks must be selected appropriately so that suitable values appear for the axis tick labels. The value (maximum minus minimum) must be divisible by the specified interval value with no remainder.

- The tick mark labels should always be written under the plot area, not under the zero line.

4.9 Category labels

- Labels for categories of variables should be as short as possible consistent with interpretability.

4.10 Label alignment

- All labels should run horizontally.

4.11 Data point labels

It may occasionally be necessary to identify specific data points with labels. These should be

- inside the plot border, and
- with a line joining a label to its corresponding point.

4.12 Line styles

- Use different lines styles where lines cross or touch each other.

4.13 Legend

- Each line in a graph should be individually labelled if space permits. These labels must be clearly associated with the correct line only.
- Otherwise a legend (see definition of legend in the appendix) should be shown outside the graph area, preferably next to the lines, and to the right.
- Each column or group of columns in a column graph should be individually labeled when clearly possible, in preference to a legend.

4.14 Colours

- When colour is available use it sparingly, and normally only one colour, in soft tones and in a limited range of shades.

- Ensure that lines or bars can still be distinguished when reproduced in black and white, e.g. by using different line styles.

4.15 Fonts

- Use a san serif font, preferably Arial, for any text. On axes Arial Narrow can be used. Sources should be written in a font such as Goudy Old Style Italic, 7 points. In general the size depends on the publication. As a guideline the Publication Section of Statistics New Zealand uses for Analytical reports 12 points Arial for Headings.

4.16 Bars

- Bars must be filled with a solid shade, not hatching.
- Bars in a bar graph should not normally have borders (or the border must be the same shade as the bar itself).
- In grouped bar graphs in monochrome, the bars should be shaded in tints with the first bar shaded 80%, the second 60%, the third (where it exists) 40%, and the fourth (where it exists) 20%. If the background is grey, then the 20% bar may not show up very well. In this case a border is permitted around the bar.
- Where colour is used, it should be muted and preferably only one colour should be used. A colour should not be especially prominent as this could give false emphasis on the category (e.g. black, red and green: the red category is likely to be overemphasised). Recommended colours are blue, (bluish) green and purple.

4.17 Grids

- Subtle (grey-on-white or white-on-grey) grids should be used where possible to facilitate accurate value judgements.

4.18 Multiple scales

- Care should be taken with graphs with two different vertical scales as they can be difficult to interpret. Usually it is better to present the data on separate graphs (see Section 4.5).

- They are easiest to interpret where changes in pattern, not in levels, are of interest.

Example 4.18.1: Poor example: graph with two different vertical scales.

There is no relationship between the two scales so they only confuse the reader. In addition there is a broken axis which is not clearly indicated. This graph would be better as two separate graphs.