How Do I Create A Semantic Web Site?

A member of the Web4lib mail list asked:

How do I create a semantic web site?

I know I have to use either RDF or OWL but do I use either of these to create a mark up language which I then use to create the web site or, with the semantic web do we move away from mark up languages altogether?

Am I right in thinking that OWL and RDF do not contain any information on how the document is to be displayed or presented? They do not seem to allow for style sheets.

Is the creation of a semantic web site completely different from anything that has gone before and I am stuck in an old way of looking at the problem? Are mark up languages a thing of the past as far as the Web is concerned?

Any clarification would be much appreciated.

RDF is certainly among the acronyms most identified with Semantic Web, but it’s not necessarily as complex as all that, and there are things we can do today to answer the question. Among the best of them (and one that will always deliver value), is to make sure our sites are marked up meaningfully. I know this sounds simple, but it’s surprising how few data-rich library sites take advantage of it.

Example: if you want all the titles of works on a page to be bold, don’t use the <b> tag, instead, use a semantic class name like <class = “title”> and use CSS to make it look like you want. Otherwise, our pages are just a jumble of bold and non-bolded stuff (think how much easier printed citations would be to parse if they were marked up that way).

The costs and benefits of semantic markup are frequently argued on a number of lists, but it’s worth noting that we no longer substitute ‘i’ for ‘1’ or ‘O’ for ‘0’ on our keyboards. Binary just doesn’t work as well with i and o.

It’s also worth looking into Microformats, a way of encoding semantic details into the data we use every day, using the tools we already have. Tantek explains them in a recent presentation.

One huge difference between the Microformats crowd and semantic webbers is the issue of human usability. That is, Microformats are built for humans first, machines second, in part because we just don’t have good and well distributed tools to use data that’s not formatted for human use, but also because it helps clear up errors and prevent gaming.

Tantek speaks of Microformats as a cornerstone of the “lower case semantic web” in this presentation from 2004, and ReadWriteWeb directly compares the two.

I’ve been working on some of these challenges myself, and have worked hard to make content presented in Scriblio semantically clear. Take a look at some of the markup in this example. All the bibliographic data is represented inside an unordered list and is parsable as XML. Here’s an excerpt of the ISBNs:

  • ISBN

    • 1586421158
    • <li>
  • ```

    That’s not to say the Semantic Web folks don’t see a difference. This article at Semantic Focus says they miss the point, but I side with Clay Shirky’s Praise for Evolvable Systems. Speaking on how the HTTP and HTML finally delivered on the promise of hyperlinks envisioned decades earlier, he notes:

    Centrally designed protocols start out strong and improve logarithmically. Evolvable protocols start out weak and improve exponentially. It’s dinosaurs vs. mammals, and the mammals win every time. The Web is not the perfect hypertext protocol, just the best one that’s also currently practical. Infrastructure built on evolvable protocols will always be partially incomplete, partially wrong and ultimately better designed than its competition.