How on earth to identify reliable data?
These days, a huge array of answers can be found on the web to satisfy just about any data need. While the amount of data available has actually exploded, intentional dissemination of incorrect information has also grown immensely. So how can one know which data can be trusted?
Critical reading skills of data and the use of source criticism are offered to help in identifying reliable data. But what do they mean in practice and what should be taken into account when assessing the reliability of data? And how to assess the responsibility and openness of the producer of the data source?
Answers to this can be found in the UN's Fundamental Principles of Official Statistics that have their 30th anniversary this year. There are a total of ten principles that guide statistical authorities. It is instructed in the principles that ”to facilitate a correct interpretation of the data, the statistical agencies are to present information according to scientific standards on the sources, methods and procedures of the statistics.”
In practice, this means that detailed information should be given on statistics about how they have been compiled. In accordance with this principle, statistical authorities have for decades produced reliable data describing the methods and data source used. The principle can also be used to assess the reliability of other data generated in society.
Good description makes the use of data easier
A responsible producer of data can be identified from the fact that the producer attaches a description of the method to the disseminated data. By means of a clear description, users can assess the quality of data.
The quality and reliability of data consist of many parts. Firstly, it must be known how the data have been collected.
From where have the source data been obtained and how were they selected? Have the data been produced by random sampling or is the data source, for example, exhaustive register data? It is also good to know who maintains the register, how and when the data were obtained from it or how the sample was designed.
When we know where the data come from, we need to know how the data were compiled. How were the data processed and what methods were used to obtain the results? It is important to know the grounds for the selected methods and procedures.
A responsible data producer also gives information about limitations related to the use of data and possible errors contained in the data. For example, the data produced with a sample survey always include sampling error and a responsible data producer gives users an estimate of this, for example, in the form of a margin of error.
Another important part of data quality is how the data meet users’ needs. Users should know how and when the data are released and where the descriptions of the data compilation are available.
Check with the help of a list whether the data are reliable
The requirements described above can be summarised as a checklist. When selecting/looking for data on the web for a survey, thesis or report, please note the following:
- Who is the producer of the data?
- Are the data accompanied by a description of the data sources used, such as data collection or registers?
- Do the data describe the phenomenon as a whole or have data been collected from random respondents as a sampling?
- Is the processing of data described in more detail? For example, have missing data been replaced or have duplicate data been removed?
- Have mathematical models or other scientific methods been used in the processing and analysis of data?
- How are the data distributed to users? Are the prepared tables, reports and indicators clear and easy to use?
If you get clear answers to the questions above, the use of data is reliable.
The author works in Statistics Finland's Partnership and Ecosystem Relations service area.