Newspapers and the search game

At a glance
This entry was written on January 30, 2006.
The entry prior to this is entitled Why I stick with MT.
The entry following this is entitled Something for the portfolio.
There are 0 comments on this post.
This entry has been tagged as Blogs, Newspapers, Recommended, Work, XHTML, google.
Archives are also available.

Jason Kottke's ad-hoc research on the search-engine ranking of New York Times articles doesn't really surprise me. Newspaper websites, as a whole, are not meant for search-engine optimization as yet, and Kottke very succinctly explains why:

"This isn't exactly surprising given that most NY Times articles disappear behind a paywall after a week and some of their content (TimesSelect) isn't even publicly accessible at all. Also, I didn't look too closely at the HTML markup of the NY Times, but it could also be that it's not as optimized for Google as well as that of some weblogs and other media outlets."

The fact of the matter is that the Times' table-based markup makes it tough for search engines to find meaningful information and then as soon as they do find meaningful information, newspapers such as the Times shuffle stories off behind a paywall that also keeps search bots at bay.

I've already mentioned that I think the hiring of Khoi Vinh by the Times is a good hire, and I fully expect their markup to become markedly better under his watch. I suspect that just by cleaning up the code, they're going to see an immediate improvement in search-engine ranking of individual stories.

As for the disappearing articles problem, it's going to take someone in the management at one of these newspapers to suddenly come to the realization that there are not enough people out there willing to pay for their content to offset the amount of money lost in pageviews and advertising.

Even old articles have a value, and storage is cheap so it still makes no sense to me why any content provider (and newspapers are just the biggest and worst example of this) would cut off their content from readers. It's not like they're making money off their archives anyway (nobody's paying them for surfing through microfiche at the local library).

Get some advertising scratch, build reader goodwill and add some Google juice.

I still see no downside to opening up archives.

A glimpse of the future

I've been eyeballing the Django Project, a Python-based web application framework that was born in the Lawrence (Kan.) Journal-World newsroom with some interest for a while now. It looks extremely easy to use from a newspaper-website perspective and it's free, which is a huge step up from the bad proprietary solutions that most newspapers currently use (most Gannett papers, for instance, use Publicus, a system from Saxotech that produces—at best—bad URLs).

It's also lead me to the Journal-World's website, which is, by far, my new favorite newspaper website when it comes to Getting It Right. The J-W includes open archives back to at least 1989, clean, easy-to-understand URLs, generally valid (10 errors on the homepage, most unencoded ampersands) semantically-rich markup, iPod-friendly versions of their stories (brilliant!) and host of other things that make me long for a future where every newspaper website is put together with as much competence and foresight.

You'll notice on the 1989 story that they have a pair of ads on there. Now, not many people are likely search for 17-year-old women's college basketball gamers, but if they exist, the J-W provides the content and reaps the advertising benefit of it. Other newspapers, who move stories into a for-pay archive after a week of display, lose that ad money (however small it might be) and have to rely on someone wanting to see that gamer so bad that they'll pay $4.99 or whatever it is newspapers are charging these days.

If you ask me, it's better to get a little money than no money at all.

The other nice benefit of the Journal-World's semantic-richness is the Google ranking. Lawrence is a college town, a boy's name and a fairly common search term (just a guess). The Journal-World is the third result in Google for a search for "lawrence."

By comparison, the newspaper I currently work for is a regional newspaper in suburban New York City that concentrates on a three-county area (again, not mentioning, but not really hiding, either). A search for one of the counties gets us on the second page of Google results. The other two counties don't even bring the newspaper's home page up in the first ten pages of results. A search for our general geographic area doesn't include the newspaper in the first page of results (aside from a few specific articles that mention the phrase). Only a specific, made-up-by-us slang version of that area brings up our website—as the first result in a search that no one does.

It goes without saying that the site has issues in its markup ... issues enough that the validator chokes on them before the first error.

UPDATE: And just when I start wondering if I needed anything to back up my little theories about clean, semantic code and Google juice, Mike Davidson of sIFR and Newsvine fame goes and does it for me.

If newspaper companies want their sites and stories to rank highly in search engines, the first and most important step is to bring their code up to modern standards.

And that means that the people who design and build newspaper websites need to take a time out and relearn their craft (and I hesitate to call the process that brought us the current state of most sites anything approaching a craft).

Post a comment