Privacy, distribution, licences and standards - more notes on the London Linked Data meet-up
Last week I went to the 2nd London Linked Data meet-up, and I've already blogged a few things about it for journalism.co.uk ("How media sites can make use of linked data" and "Silver Oliver on A history of linked data at the BBC"), The Guardian's Open Platform blog ("London Linked Data meet-up"), and here on currybetdotnet about human readable URIs. Twice. [1]
Here are the few remaining bits of my notes that didn't make it into those articles.
Government data and privacy
One of the best questions from the floor on the day asked John Sheridan & Jeni Tennison, who were talking about data.gov.uk, about the implications for privacy of governments releasing all of this data. The asker suggested that in recent weeks we've seen that bright people at Facebook and Google are not able to stand back far enough from their products to realise the privacy problems they may be posing, They suggested individual datasets may not pose a risk, but that combining them could have implications.
John Sheridan insisted that at data.gov.uk they understood the difference between 'public data' and 'personal data', and that civil servants will be trained in the DPA and in publishing data.
Personally, I think there is a bigger issue to be explored here, as it seems to me incongruent that in one country people can be arrested for taking photographs because this might pose a terrorist threat, whilst at the same time governments are releasing data sets that would allow you to track and hijack school buses without having to bother to do any of the surveillance yourself.
Software licences
A question from the floor at one point was suggesting that it was a concern that datasets were being released with licences that made the data free, but the tools that were being built to process it were using more restrictive licences - and the Uberblic platform was specifically mentioned. Was there a danger, they wondered, that because people were not using truly open software licences, free data could get trapped in proprietary software?
The gentle answer from the panel was that provided the data was licensed properly and was in portable formats, then it shouldn't be a problem. I'd have been more tempted to suggest that if you don't like the licenses the tools are being written under, instead of moaning about it, you are completely free to write your own...
Distribution is good
One of the organisers of the day, Georgi Kobilarov, made a telling point about why the distributed published nature of open linked data was so vital to the health of the datasets. He pointed out that it meant that each dataset was curated by a community that cared specifically about that data, whether it was music fans on MusicBrainz, or a specific scientific community. It certainly seemed like a challenge to me for news and media organisations to get actively involved in curating the datasets that are important to them.
Standards that work
At the weekend I was in conversation with Jonathan Stray of the Nieman Journalism Lab about linked data, and I finally managed to articulate one of my nagging doubts about the current approach. It seems to me quite right that there is a healthy debate around setting an appropriate standard for people to follow, but I am concerned that, at present, the debate is being framed in a totally academic way, by organisations that are chiefly free from time and commercial pressures.
The world view around linked data seems to be that people will produce validating code containing spam-free 100% accurate data. I'm yet to see a single area of the web where that has happened. The reason that the various flavours of HTML/XHTML work, and were adopted as a standard, is not because they were rigidly right, but because they and the browsers that displayed them were forgiving.
Standards get adopted because they are usable, not because they are perfect. I think we need to be aiming for usable open linked data, not a perfect open linked data schema.
Next...
I feel like I've said more than enough about linked data over the last few days. Tomorrow I'll be starting a new series of blog posts on currybetdotnet, looking at how the recent Winter Olympics were covered online.
[1] Personally I was quite excited that the event was being held in the room at ULU where I have seen loads of great bands early on in their careers - Blur, Muse, Goldfrapp and Happy Mondays amongst others - although it looked astonishingly small in daylight! [Return to article]
> It seems to me quite right that there is a healthy debate around setting an appropriate standard for people to follow, but I am concerned that, at present, the debate is being framed in a totally academic way, by organisations that are chiefly free from time and commercial pressures.
I would agree with this. It was even worse a few years ago when I tried to use RDF for the basis of the MusicBrainz web service. Most people who were concerned with creating end-user applications hated RDF and the available RDF tools.
The adoption of the XML web service skyrocketed because good tools and best practices were readily established. The end user doesn't care that RDF is a much better suited tool for representing metadata than XML.
See this graph of our 2006 traffic
We introduced the XML web service in march and by the end of the year it already had as much traffic as the RDF service.