Just wondering how others are handling changes to datasets and ensuring existing user base can continue to work. Thought it might be a good topic for a future lunchtime lecture.
That’s an interesting topic. Off the top of my head I can think of a couple of techniques in the W3C Data on the Web Best Practices:
How do others tackle this challenge?
Yeah, it’s tricky and depends a bit on your platform. IMHO there is something a bit untenable about this set of assumptions made by both Socrata and CKAN:
- Each dataset provides an API, and developers should build apps directly off that API
- The structure of the API is formed directly from the names and types of the columns in the dataset.
- Anyone can access this API
Essentially, this means that from the moment you upload a dataset, there could exist an app that is totally dependent on the structure of your dataset, and if you change anything about the dataset, it will break. Worse, you have no idea if such an app exists, or how to contact the developers in advance.
This isn’t a theoretical concern: we have some datasets with typos in field names, but I’m reluctant to change them for fear of breaking something out there.
Possible solutions (that are a bit outside our scope for implementing):
- all dataset APIs should be versioned by default, and work as long as possible. (That is, you should still be able to access a renamed field name through the old API. Obviously deleted fields are a different story…)
- all API access requires an API key (which just requires an email address). Email anyone who has accessed the API for a specific dataset within the last 3 months prior to making a breaking change to it.
thanks Steve, will do some more reading. I am at Brisbane City Council and using CKAN but I am relatively new to the domain and still learning. I have a programming background so deprecation and keeping previous versions available aren’t new topics to me but I haven’t seen many great examples of it working well or if it is applicable to data domain.
I think it gets complicated when you have a mix of api, csv and other dataset formats. Versioning of API’s is probably ideal and keeping the older versions available for a defined period useful. Not sure the best approach for csv files is as yet, maybe users of data expect this to change more. The lack of knowing who you are impacting is something I really struggle with.
I suspect many of publishers might be doing something to assist with backward compatibility, would love to hear successes and failures that others are using or have tried.