The following table provides examples of standard quality checks to be performed on the raw data with descriptions, priorities, and pass criteria.


Table 1.  Quality assurance checklist for raw MPD

#

Priority

Indicator

Dataset

Description

Pass criteria

1

critical 

Missing values

Cells, domestic

Out of total records, number of empty values in the dataset needed for calculations

Any field has less than 5% missing values. 

2

critical 

Number of records per day

Domestic

Records per day

There should be no illogical lows or peaks on the timeline.

3

critical 

Number of unique subscribers per day

Domestic

Unique subscribers per day

There should be no illogical lows or peaks on the timeline.

4

critical 

Geographical distribution of cells

Cells

What to see:

●       How many cells have incorrect coordinates (e.g., out of the country)?

●       How are cells distributed in the country – are there any missing regions without cells?

●       Are there any illogical cell locations?

With visual inspection, there should be less than 5% of cells that are out of the country or that have definitely incorrect coordinates

5

critical 

Cell occupancy

Cells, domestic

How many of the cells have records in the domestic dataset?

Less than 5% of the cells should have 0 records.

6

critical 

Cell occupancy

Cells, domestic

How many of the cells are missing from the cells table? Look at domestic data and see how many cells in domestic have a cell reference that is missing from cells data.

There are less than 5% missing cells.

7

critical 

Subscriber presence in data

Domestic

Number of days domestic subscribers are present out of all days in the period 

For domestic data, the subscriber should be present on most days.

8

critical 

Diurnal distribution of records

Domestic

Average number of records per hour (0-23)

There should be peaks in the morning and afternoon, and no sudden peaks.

9

important

Weekly distribution of records

Domestic

Average number of records per hour (0-23)

Should represent weekly chart (weekends lower)

10

critical 

Average number of records per day per subscriber

Domestic

Average number of records per day per subscriber

CDR: 3–4

IPDR 10–50

Signalling: > 50

11

low 

Time between subsequent events

Domestic

Time gap between subsequent events

It should follow folded normal distribution.

12

low 

Identify time zone

Domestic

Based on diurnal distribution of records; identify what time zone is used

Should conclude that there is single time zone and it is identifiable.

Source: Positium.

  • No labels