Inline article links to tag pages on guardian.co.uk

by Martin Belam, 10 August 2010

In yesterday's post, "5 ways that The Guardian puts external links onto web pages", I mentioned how Patrick Smith had sparked off a lot debate with a blog post entitled "Link to the past: why do some news sites STILL not link out in 2010?". In it, he suggested The Guardian website as being one of the best examples of linking out - an assertion he partially retracted, saying:

"On the subject of Guardian.co.uk and linking - it does do a good job directing readers to interesting and relevant things on its blogs and in the technology/media section, but I am swayed by some commenters below criticising my assertion that the site is 'good at linking', as the majority of links do appear to be internal-facing subject page links."

In our web CMS, we have a check-box that offers the option to 'Automate linking of keywords'. We have literally thousands of topic keywords, and the CMS will automatically insert a hyperlink to a tag page if the text in the body of an article matches the keyword. The automatic link occurs once on the first mention of the keyword, and we maintain a 'blacklist' of terms that don't get linked. The tool skips over any words that are already forming part of a hyperlink.

Whilst it certainly serves to increase the number of internal links pointing to those keyword resource pages, I think the benefits for the end user here are obvious in a lot of cases. If a journalist uses the phrase 'credit crunch' in an article, and it is automatically turned into a hyperlink to our credit crunch tag page, and that page opens with explainers on 'Credit crisis - how it began' and 'How the bubble burst', then that is a valuable and useful service to readers.

Credit Crunch tag page on guardian.co.uk

It is, however, a much less convincing user experience when the keyword in question is a company or organisation name.

If you click on the hyperlinked word 'Microsoft' in the middle of a Guardian news article, as a user are you expecting to see more news stories about the company? To get a stock-quote for MSFT? Or to stop reading news entirely and instead go to www.microsoft.com? Or directly to office.microsoft.com if the piece in question was about that particular bit of their software portfolio?

The automatic linking is an admittedly blunt tool for putting topic based hyperlinks into articles that would otherwise be without any inline links at all - and as a result some are more useful than others for the end user. What could be better, I think, is if there was some finer grain control over what got automatically hyperlinked. Personally I'd prefer to see the tags for people, companies and organisations exempt from it, as I think those are the types of links where there is the most disconnect between expecting an external link and receiving an internal one.

The question is, of course, how much effort do you put into devising an algorithm to perfect the automatic linking of keywords, versus optimising workflow so that you don't need to automate links on an ongoing basis.

Next...

As I said, Patrick's blog post rekindled a debate about when and how news organisations should include external links, a theme I hope to return to in a couple of further blog posts.

9 Comments

Sometimes, especially with tech and media stories, it's incredibly annoying -- things you'd definitely expect to be hyperlinks out to an external site end up at tag pages (I've a feeling I've seen this happen, though I wouldn't swear on it, where the source article was the only piece with that tag in the first place). While you could argue that linking to a company's homepage isn't necessarily the most useful thing to do, it is what most [handwaving alert] people expect the target of a link to be when they see "This week, ((Microsoft)) announced...".

The key here is that automatic links are no substitute for properly-written hypertext, which is -- after all -- what it's supposed to be. By all means augment the text with internal tag links (although I'd hope a document containing a mixture would style the tag links differently to external ones). I don't think there's anything bad about linking to tagged collections, either, but the conclusion I've reached after a few years of using guardian.co.uk is that this means of doing it generally errs towards violating the principle of least surprise (and makes for articles which read quite oddly in some cases, given the link styling acts as a highlight of sorts).

Personally -- and this is all my opinion and so effectively worth squat, despite it being a fairly well-considered one -- I think inline links should generally link to the most authoritative/canonical location of the thing you're linking: companies should link to the company website, names of websites should link to the website you're talking about ("The video-sharing website Vimeo today announced..." should never link the word Vimeo to an internal tag collection page for heaven's sake!).

As you suggest, generic terms are fairer game, but there needs to be clear differentiation between internal and external links and the purpose of the link should be comparatively obvious from the outset (only ever linking to internal tagged collections is cheating, as it creates a mismatch between expected and actual behaviour IMO).

er, rant over, sorry.

Mo | 10 August 2010

I guess I can see how in some circumstances inline links add value for the reader. If they don't add value for reader though, is there any sense including them? I mean, does it add any value for the site?

Daniel Rose | 10 August 2010

Automatic linking to keywords might not be a very good idea to do. I do not think any algorithm is so intelligent as human brain. Try to do what exactly you want to do. This might be taking some of your time. But still do not leave your activity in any "intelligent software".

Joe | 10 August 2010

Auto linking is an interesting concept. Do you have something in your algorithm that only links the first listed keyword, or does it link every matching keyword in the article?

Dustin | 10 August 2010

I agree that inline links can add a great deal of value to the reader. I guess the issue these days is that, as a post or article reader, I never know whether I should trust the links I see or not. Are they going to assist me to find further, relevant information, or are they really part of an affiliate marketing campaign? I can see why some sites continue to stay away from them.

Sarah | 11 August 2010

Do you have something in your algorithm that only links the first listed keyword, or does it link every matching keyword in the article?

Hi Dustin, as far as I'm aware, the tool autolinks the first instance of every keyword, unless it occurs within an already manually inserted hyperlink, or the keyword is on a 'blacklist' of things not to autolink.

Martin Belam replied to comment from Dustin | 11 August 2010

If they don't add value for reader though, is there any sense including them? I mean, does it add any value for the site?

Hi Daniel, as I mentioned, it does in some cases seem useful to readers. We don't generally tag articles with every single topic mentioned in the body copy, and so this extends the lateral navigation possibilities from a story - particularly when it may have been repurposed from print copy and be completely devoid of hyperlinks. The issue for me I guess is whether we should be investing time and effort perfecting the world's greatest autolinking algorithm, or whether we should be looking at workflow improvements that would render such a tool unnecessary.

Martin Belam replied to comment from Daniel Rose | 11 August 2010

Hi Mo, no worries, rant away. If it was up to me, all news articles everywhere would include relevant hand-picked hyperlinks to both external and internal stories, sources, topic pages and websites, and we'd go back into the archive adding them in where they were lacking. But I think we have to accept that with the time and technology constraints that exist in newsrooms, that this isn't going to happen anytime soon. Auto-linking is a crude tool to at least get some of our tag pages exposed to the reader. It may well be that we don't make enough use of our 'stop' list of keywords that shouldn't be auto-linked.

As an aside, I note you suggest that external and internal links should be styled differently, but also point out that the linking of keywords as we do it currently makes for an odd reading experience because the styling emphasises odd words. I've written a little bit about this in today's blog post on the user experience of links on news sites.

Martin Belam replied to comment from Mo | 11 August 2010

Bit late to this as I was on holiday. But I often find the Guardian's autolinking really jarring - I noticed it this morning when reading this article about the Liverpool sale: http://www.guardian.co.uk/football/2010/aug/12/liverpool-bids-deadline-day.

The word business is autolinked to the business section in this paragraph:

"Yesterday the family announced the death at 95 of Sheikha Maryam al-Shamsi, the mother of Sharjah's ruler, Sheikh Sultan bin Mohammed al-Qassimi. All members of the family will now observe seven days mourning during which no business will be conducted."

That aside, the biggest problem, I think, is the inconsistency - some links are auto-added by the CMS, some are manually added to external sites, and some are manually added by the author. It's only by mousing over a given link that you can tell where it's going to go. As a result, I have all but given up clicking on in-line links (I tend to head for the tags list in the right hand column). Don't suppose you'd care to share any stats on CTRs on the in-line links?!? I just find it hard to believe that people, in the middle of reading an article about a subject, click on the auto-link to a topic page ... I can see they might want to do it (a) once they've finished or (b) as a way to escape something they've lost interest in. But hunting around for a link in the copy doesn't seem the best way to accomplish those goals.

Returning to how it works ...

On this page, say: http://www.guardian.co.uk/football/2010/aug/12/phil-jagielka-everton-arsenal-transfer

The two instances of (I imagine) autolinked words are "Everton" and "Transfer window". Which are the terms listed in the tag/keyword list in your skinny middle column. Does that mean the autolinking is only done to words that appear there (which would appear to make it even more redundant from a reader's point of view)? I couldn't work out why Arsenal wasn't linked in the copy or a tag?

All that aside, I do think the Guardian is one of the best newspapers at linking out - but it's clearly much more common in the more bloggy bits than in the hard news bits.

malcolm coles | 13 August 2010

Inline article links to tag pages on guardian.co.uk

Next...

9 Comments

Keep up to date on my new blog

Search