Proud member of the sinister Smart Goat keiretsu.

 Permanent link to: Fair and Balanced Fair and Balanced

August 19, 2004 02:11 PM

[From topix.net: Headline reads: Prostitution Ring. Article begins: Employment Opportunity. I really like reading services like Google News and Topix, which generate pages by pulling news from thousands of news sites. The variety of sources is nice, and the format makes it easy to scan the headlines. Topix even offers RSS feeds of their pages.

But, probably the real reason I like these sites is the programs behind them seem to have my sense of humor. What can you say about this headline and excerpt? Hey, the software was just doing what it was told: Pull the headline, and use the first few words on the page as an excerpt. Completely innocent, and highly amusing.

This does, however, highlight a real problem for web designers. With all the search engine bots running around on the web, pulling data and presenting it however they see fit, how can you make sure your pages are presented in the proper context?

Brad Choate restricts what Google sees on each page using PHP. This keeps the search engine from indexing content on each page that doesn’t really apply to what the page is about: navigation, ads, etc. It’s a good idea and, much like the Force, would be easy to misuse. Scammers could show one page to Google and a completely different page to browsers in order to get higher page rank and hijack visitors.

As XML and XHTML become more prevalent, I would like to see an XHTML module for search indexing. This would allow you to put a standard set of clues in your markup to let search bots know what was important. Something as simple as a search-index attribute added to all tags would let you pick and choose what gets included in search engine results. You still have to worry about misuse, but I’m sure Google would adapt, and other search engines would soon follow. News sites could mark all their advertisements as search-index="no", and not have to worry about Topix making them look silly.

But then, what fun would that be?

Smart Goat
Crafty Goat
Central Oklahoma Mensa
Jon's Site
Slashdot
Techbargains
Camworld
WWDN
Arts & Letters Daily
A List Apart
Zeldman
Doc Searls
Dan Gillmor
Engadget
Lawrence Lessig
Technorati Profile
 


January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
 

 

Valid XHTML 1.0!

Valid CSS!

Movable Type

Browse Happy logo