The Semantic Web – The Next Big Thing?

I have just started delving into the principles and practices involved in implementing semantic web technologies. The so-called semantic web is periodically touted as the next big thing, the future of the web, or more pithily “web 3.0”. The idea is being vigorously championed by the “inventor” of the web – Tim Berners-Lee, as well as by a major working group of the W3C – the group responsible for laying down the standards for the web.

To put it in a nutshell, the idea of the semantic web is to provide information in a way that allows intelligent applications to use and combine information in order to infer or derive other useful information. Just like web 2.0, the semantic web names both a trend and a set of technologies. Where web 2.0 involved the notion of social networking and a set of technologies – AJAX – for providing rich integrated user experience, the semantic web names both the trend towards providing richly structured meta data on the web, and the languages, such as OWL, that will be used for implementation.

As a trend the semantic web marks a shift of focus away from offering visually rich content and towards offering information in a format that will allow it to be used not just by web browsers as visual content, but by a whole range of applications as data. In particular, it is hoped that such data will be used intelligently to infer trends and patterns of behaviour through deep analysis, often called “data mining”.

In order to make such content reusable, web semanticists create what they call “ontologies” – structured data representations of things in dedicated modelling languages. To be effective, the modelling languages for creating ontologies need to be both well structured enough to allow for information to be processed and combined, whilst also being flexible enough to capture an appropriately granular level of significant and useful detail.

The move away towards the sharing of knowledge has, of course, been going on for a long time. Basic html, the markup language of the earlier web, which was used to control layout as much as anything else, has been largely superceded by xhtml, which is used with more of an eye to the structure of the information being presented. Many organisations  explicitly share content through publishing RSS news feeds or web-enabled data services. Despite this, the current level of reuse of data is still quite modest.

One of the doubts hanging over the semantic web is its reliance on a very simplified model of language, called propositional language, presented as subject-verb-object triples, such as “Jon is tall” or “Jon speaks English”. Only in such a simplified form is it possible to combine assertions in anything like a simple fashion, using syllogisms.

An example of a syllogism is:

A) Socrates is Athenian
B) Athenians speak Greek
C) Socrates speaks Greek

The problem is that only a very tiny portion of useful language can be expressed like this. Most of what we understand is provided by an understanding of context. Take for example the following three statements:

A) The child is safe
B) The beach is safe
C) The car is safe

In these three cases it is clear that “is safe” means something significantly different. We understand the differences because we know practically what one does on a beach, or with a car, and what it means to keep a child safe. Without such contextual cultural knowledge, untangling these distinctions would be impossible.

Provided we can all agree about the context of use and how we are to interpret a given set of propositional triples, it is entirely possible to reduce a given area of useful language to this form. But a good deal of work is required in order to get the language into this form – and for a business this means expensive consultancy on agreeing frameworks for sharing meaning and information. It may be possible for limited uses frameworks to be developed for specific common scenarios, so that individuals can join in, but again, there will be a certain amount of work involved that will not be worth it for most sites.

So, we come to a basic case of cost benefit analysis – what will you gain from making your information available on the semantic web, and what will it cost you to choose or create the appropriate structures? How long will it take to sanitize your existing information, or prepare knowledge workers to label new information correctly? For anyone who has ever been involved in a major data  migration, these issues are very far from trivial – so I can’t help thinking that as usual, cost will be the driver.

In areas where information has intrinsic value – data rights management, or the payment of royalties for broadcast material, for example – it will make sense to use semantic web approaches – and here we are likely to see comparatively rapid updated. In most cases, the cost of implementing the semantic web is likely to be prohibitive – so the next big thing, I would suggest, will only be big in a small way.


About jonallenby
I'm the co-founder and Technical Director of a new media agency - Lime Media. I would describe myself as having a healthy scepticism about technology - new ways of doing things are always new, but they are not necessarily better. Best to cut through the hype and think about how technology will physically change people's lives, for better or worse. I am also struggling to finish a part-time PhD in language, metaphor and philosophy at Goldsmith's College, University of London. Apart from thinking and reading, I like playing with my children, cross-country running and White Crane Kung-Fu - though usually not all at once.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: