Sunday, 20 December 2009


One of the things I want to experiment more with next year is semantic data or as Tim Berners-Lee increasingly calls it linked data. Once data is tagged and coded for semantic meaning, or once applications emerge that 'extract' semantics we are on the verge of something very exciting.

Google has been experimenting with semantic search for a while, i.e search for the query "~what is the capital of France?" and you get a semantic search. Or try "~who is the president of nigeria?"

I recently came across, a new application from a subsidiary of Thomson Reuters called Open Calais. It is free, has an api, and is a semantic application that actually works. It describes itself as follows:

The OpenCalais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing (NLP), machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.

The tags are delivered to you; you can then incorporate them into other applications - for search, news aggregation, blogs, catalogs, you name it.

It looks at unstructured data and pulls out content that is separated into the three classes of content it recognises: named entities, facts, and events, as shown in the diagram below.

Watch the video below (apologies for the cheesiness), and play with their simulator. More information at Open Calais


Dhiren Shingadia said...

I was using an Open Calais plugin on my blog for quite sometime last year to generate tags for all my blog posts and content.

It was interesting to see that the plugin was one of many versions, in the Wordpress plugin directory, that utilised the Calais API.

As the importance of semantics increases, I think we'll see developer communities produce more and more interesting applications that improve user experience through semantic tagging.

The search engines are already paying attention to Microformats and RDF, so hopefully we'll see greater use of these types of markup across other applications and services.

Tony Effik said...

Completely agree with you Dhiren. Interesting to see you beat me to discovering and even using Calais.

Semantics are one of the really hot areas in digital, and communications agencies are at the periphery even though semantics is an integral part of communications.