A review of the quality of data on agency websites
by Richard Garfield and Philip Skinner October 2008

As the scrutiny of information for humanitarian programming grows, the quality of that information becomes ever more important. There is growing interest in both the presentation of information for key variables, and transparency regarding its origins and accuracy.

This article reviews eight websites which we believe to be the most widely used sources of basic information on health and wellbeing by those interested in humanitarian conditions. Data from these websites frequently finds its way into analytical reports and orients humanitarian workers in the field. Typically, press and publicity reports are unclear as to the sources of their data, often simply indicating that the presenting organisation is highly regarded and thus deserving of trust. Because of this high impact, and the poor visibility of the sources, we considered it important to examine the information that they present.

We reviewed data for a possible 12 variables present on the websites of eight major humanitarian and development agencies, accessed in August 2007. We sought to characterise the existence or absence of, sources for and reliability of data presented across these agencies.

Methods

We reviewed and collected into a database data on 12 social indicators for six countries of high priority for humanitarian actors in recent years (Afghanistan, Chad, the Democratic Republic of Congo (DRC), Sudan, Uganda and Zimbabwe). These variables were drawn from eight websites, comprising five humanitarian or development agencies of the UN system, the statistical agency of the UN, the World Bank and a humanitarian web press portal (see Box 1). The data was reviewed to establish if a numerical variable was provided, and if a source and dates for the variable were given. Data was then examined to determine the level of agreement and the frequency of common source origins. Numerical indicators were compared across websites for consistency by identifying repeated values, and for variance by calculating the coefficient of variation (CV). The CV is a measure of dispersion of a probability distribution, defined as the ratio of the standard deviation to the mean.

Screen Shot 2012-08-30 at 3.12.55 PM

Results

There was a high level of heterogeneity among data presented across these websites. Table 1 shows the average number of data points provided per variable for the six countries examined. If all the websites had a variable for each of the countries, a perfect score of 8 would have been generated. The average, instead, was 3.9, meaning that slightly less than half of all possible data entries were provided among the eight sites examined.

Some variables derived from modelling were also presented. These include percent of population urban, percent of population under 15 years, and life expectancy at birth. Variables derived from modelling or household surveys, or a combination of the two, were provided less frequently (about half the time). They include mortality rates among under-1s and under-5s. Data derived exclusively from surveys was provided less frequently than other variables. This data includes water and sanitation source, literacy and HIV prevalence. Data derived from financial accounts was also infrequently cited; while five of the eight websites provided data on income per capita, only two unique entries were provided as four of the five websites used a common source and identical estimates. It is notable, however, that the variable was labelled using a variety of terms, including GDP, GNI and GNI atlas method. On one site different measures were used for different countries, making comparisons meaningless. Data on health spending per capita was provided by only one site.

Screen Shot 2012-08-30 at 3.13.11 PM

The average coefficients of variation were very small for demographic variables. This is probably due to common source estimates for most of this data. Data for adult literacy, however, was calculated differently on each website. Not surprisingly, the average CV for this variable was wider.

Discussion

Without more information than is available on the examined websites, it is not possible to examine the accuracy, definitions or original sources for data presented.

It is little wonder that those preparing reports on countries are often left confused. Data is missing for many countries from major relevant websites that provide this information. Often, data is not provided for the same variables, making comparisons difficult or impossible. When common variables are used, the sources of data are often not provided. With or without specification of sources, the presentation of identical numerical variables is common, providing many users with a false sense of comfort that sources are in agreement (rather than repetitive), and thus likely to be accurate.

Repetition of a common source can provide a sense of reliability, without providing insight into the validity of the source data used. This mindset is rapidly shattered when variables from more than one source are provided, and a high degree of unexplained variability is observed. Data is often widely and uncritically used by academics, journalists and policymakers. This emphasises the need for the editors of these websites to provide documentation on the sources used, and to raise concerns regarding the reliability and accuracy of the data. Confidence in data should be based on a combination of validity (meaning that the data reports what it intends to report) and repeatability (meaning that multiple observers would come up with similar counts). As Warren Buffett put it: ‘It is better to be approximately right than precisely wrong’.

To interpret and monitor progress on a variable over time, sources of data should always be provided. Most websites did provide at least some information about their data sources. Alertnet did not, leaving the user without guidance. UNSTAT provides ‘official statistics’ (for example, government sources), and states that some are of ‘uncertain reliability’. UNDP and UNSTATS provided no operative definitions, and WHO and the World Bank provided no sources. DESA was clear both on sources and data definitions used.

It is worth noting that, although population data is widely provided and shows little variance, this does not necessarily mean that population figures are accurate. The frequent absence of values and the heterogeneity among the data provided was surprising in light of the increased coordination among development and humanitarian agencies in recent years. It is possible that evaluation staff are not working closely with their publicists and website editors. Whatever the reason, the poor presentation of data on humanitarian variables by a wide range of agencies appears to be a vestige of the era prior to Sphere and the Good Humanitarian Donorship initiative. These limitations can probably be overcome if several principles are followed:

  • Data sources should be presented as reference in the interests of transparency, comparability and interpretation.
  • Website data summaries should be reviewed and updated at least once a year.
  • Those managing data on websites should receive training to better present and interpret their sources.

In recent years there has been an upsurge in efforts to improve the collection and analysis of data on humanitarian conditions. SMART and WFP have created manuals to help standardise approaches to field surveys for nutrition and mortality. CRED has contributed to the standardisation of reporting of these surveys to a common database. The Health and Nutrition Tracking Service (HNTS) has begun to improve the triangulation of these and other data on health and nutrition. To date, a similar effort to improve and standardise the reporting of information on humanitarian conditions has not yet been made. Such an effort is needed if we are to evaluate and improve humanitarian response.

Richard Garfield is Manager of the Health and Nutrition Tracking Service and Henrik H. Bendixen Professor of Clinical International Nursing, School of Nursing, Columbia University, New York. Philip Skinner is an undergraduate student at the University of Bristol, UK. Any correspondence relating to this article should be addressed to: garfieldr@who.int.