What data quality information is important to you?


Publish Data Quality information is a W3C Data on the Web Best Practice.

As a user of open data what data quality information would help you most?

  • If the data was validated against it’s schema would that suffice? :white_check_mark:
  • would knowing the delay between data creation and publishing be more helpful? :hourglass_flowing_sand:
  • what about data being publishing on time as promised by the metadata? :alarm_clock:

What’s data quality measures are important to you?


The data quality issues which I observe are:

  1. interoperability of data, especially geospatially. We recently tried to use the TMR qldtraffic.qld.gov.au web service and found that we didn’t have the tools to consume the GeoJSON into our mapping environment.
  2. comparability of data, statistical data collections (from multiple publishers) are not published to an agreed statistical geography making it very difficult to combine demography (potential need) and actual service demand
  3. timeliness and frequency - recently we were combining data from education and health sectors. Both provided high quality statistical outputs, but the latest health dataset was from 2015. Education was from School Day 8, 2017.
    4, completeness - one example is the ability to find a single dataset of all school enrolments in Queensland schools, across state, state special, catholic and independent schools. It doesn’t exist, yet all of these school sectors are regulated.


The W3C Data on the Web Best Practices don’t address the interoperability and comparability points raised above. :scream: Approaches to making your data interoperable and comparable would be an interesting topic to discuss.

The Intended Outcome of the “Provide data quality information” best practice is,

Humans and software agents will be able to assess the quality and therefore suitability of a dataset for their application.

So I guess the data quality information for the points above could be:

Knowing that information would help you determine, without downloading and inspecting the data, that the data is not suitable for your purpose .

There are many data quality dimensions that can be described, some more practical than others.

edit: As an aside, I found the list of all schools via https://schoolsdirectory.eq.edu.au/# and I’m told an API is coming soon.