Introducing the rNews metadata standard at Hacks/Hackers London
Last night I was at the Hacks/Hackers meet-up to hear Andreas Gebhart, Stuart Myles and Evan Sandhaus talk about the proposed new IPTC semantic metadata standard rNews. Stuart and Evan are also coming in to our Developer Drop-In today at The Guardian.
rNews is an attempt to standardise machine-readable metadata around news stories on the web, and as someone who thinks “the more metadata the merrier” it looks like a promising development. One of the important things to understand is that most news CMS software already stores all of this information at the database level, but as an industry we’ve been historically poor at exposing it at the presentation level of the web page.
rNews is built around a model of a news item, and can be expressed in a current RDFa implementation, with the promise of a HTML5 microdata implementation to come.
If you were more of a “hack” than a “hacker”, then last night had lots of scary slides full of gnarly code, but it did have a key message to take back to our businesses - the idea that the news industry “has to be the chicken, not the egg”.
In other words, in order for a metadata standard around news publishing to gain credibility on the web, it requires for us all to rally round one. If we start publishing semantically rich metadata based around our shared understanding of the news domain model, then it will make sense for others to adopt it. If we don’t, we’ll always be trying to bend formats like Facebook’s Open Graph or the search engine driven schema.org to fit the news model.
There were 4 things that I particularly liked about the rNews model:
Everything was optional. That means you are not hindered in implementing rNews because your particular CMS doesn’t support one metadata field, and you can choose not to mark-up information that you don’t want to be machine readable.
Publishing principles. Explicit ethics and a code of practice are one of the things that distinguishes professional newsgathering and publishing from other “content” on the web. rNews includes the ability to link to an organisation’s principles.
It included print metadata. As Evan put it: “It is not like articles are randomly dispersed throughout a newspaper”. The relative size and positioning of stories throughout the printed product gives you implicit information about the editorial importance of those stories, that often gets thrown away at the “shovelware” stage of web publishing.
Comments and community interaction were part of the spec. It is testament to how much audience participation in news sites has become the norm that the rNews model specifically includes a class around user interaction. This includes properties like comment count, and a URL where comments can be left on a story.
There was some criticism of the emerging standard on the night. It doesn’t yet, for example, map to any external vocabularies, and there was a strong feeling that there should be some concept of “the news events” in the standard.
For me, I think we should focus on raising the temperature of a pan of water a little before we attempt to boil the ocean - very few news organisations publish machine-readable structured data at present. Once we’ve got over that hurdle, then perhaps we can start arguing the finer points of “one perfect news ontology to rule them all”.
In the meantime, rNews seems to have a sensible and flexible approach to preserving and publishing all that structured data that we so often throw away.
“Hacks/Hackers London: Notes from the talks” brings together notes from 16 talks, including those from Martin Rosenbaum, Stephen Grey, Alastair Dant, Scott Byrne-Fraser and Wendy Grossman. It looks at topics of interest to journalists and programers alike, including freedom of information, processing big data sets to tell stories, social activism hack camps, the future of interactive technologies, and using social media to cover your tracks - or uncover those of somebody else.
“Hacks/Hackers London: Notes from the talks” for Kindle is £1.14.
Another thing that I thought was interesting is that these moves to improve news on the browser definitely shuts up all those Valley wags that claim (at SXSW at least) that the browser is dead and that content is moving towards apps. There is definitely room for improvement there, and apps can't fulfill all the requirements that the IPTC guys were listing. That's my 2 cents at least...
rNews's strength should be its lightweight requirements and a focus on doing one job really well. Currently the model mixes metadata about the article and metadata about the domain discussed in the article. A 'Person' is both the topic and a creator of articles and comments. They are actually different creatures and our interest in them and their associated metadata is different. We are unlikely do any interesting inference over people commenting on an article but we would most likely to do so on people written about in the news (e.g.give me all the board members of private health companies with political party affiliations). Adding events would be going further in that wrong direction. Keep it lightweight. Let rNews do the work of representing articles, and avoid opening the door onto hell which is domain modelling.
Thanks for coming out last night.
Thanks for your insight. This is more than useful for everybody using NewsMLG2.
You mentioned that it doesn't map to any external vocabularies. Will this feature be added?
Many thanks for this write-up, Martin. Great to hear a variety of perspectives on rNews.
I was disappointed that it didn't reuse external vocabularies and think that this might limit it's usefulness. Why not use geonames, foaf, dc etc? It would make my news items much more linked and enable others to build cooler applications.
I'm no expert (far far far from it) but one app I might want to build is a travel app that shows me news related to the place (international or national) that I am going to travel to. If I were building such an app I would use geonames or ordinance survey for data but because rnews doesnt relate to these my build becomes much more difficult.
I really hope rnews does go for the LOD route because otherwise it seems like it could just become a dead end rather than enriching and benefiting from being linked to other data sets.
Yes, next version of rNews should include a set of mappings of some classes / properties to other well-used schemas and ontologies in the web of data. Candidates are FOAF, Dublin Core, SIOC, LODE and of course schema.org