Rdfa-use-cases

From RDFaWiki

Jump to: navigation, search

Contents

Introduction

This page lists all of the use cases that are supported by RDFa. Some of the use cases were created at the beginning of the RDF in XHTML standards initiative and others were created after XHTML+RDFa reached W3C REC.

Text that isn't actually a use case is marked like this (in italics).

Use Case Template

If you are going to add a use case, please use this template:

== Problem Description ==

== Why do we need to support this? ==

== RDFa Solution == 
{{not-implemented}} or {{implemented}}

=== Example Markup ===

=== Pseudo-code Example Using Markup ===

Resource List Management Tool for Undergraduate Students

The system uses RDFa to speed up user interaction when editing structured data. Instead of communicating with a remote server, the WSWYIG editor uses a direct manipulation based on RDFa and updates the server with the generated RDF graph only when the interaction finishes.

Yahoo! SearchMonkey

Using SearchMonkey, developers and site owners can use structured data to make Yahoo! Search results more useful and visually appealing, and drive more relevant traffic to their sites. See also the SW Use Case published at W3C.

Creative Commons Rights Expression Language

The Creative Commons Rights Expression Language (ccREL) is a specification describing how license information may be described using RDF and how license information may be attached to works.

Someone is writing a Web page that embeds a video that embeds photos that are CC licensed, and they want their software to automatically answer and address these questions:

  • Under what license has a copyright holder released her work, and what are the associated permissions and restrictions?
  • Can I redistribute this work for commercial purposes?
  • Can I distribute a modified version of this work?
  • How should I assign credit to the original author?

Bitmunk - An Open, Digital Media Commerce Standard

Digital Bazaar is using Semantic Web Technology to establish a set of open mark-up and communication standards for Web-based, peer-to-peer marketplaces. The system that Digital Bazaar has created, called Bitmunk, is used to transact digital media such as music, movies, television and books between independent agents on the Web. The decentralized nature of the peer-to-peer marketplace requires flexible, open standards for communication and knowledge representation across multiple websites. They have chosen RDFa as a mechanism for marking up music, movie, TV, and book information on blogs and websites.

Problem Description

Bitmunk is a peer-to-peer marketplace that allows anybody to list and sell digital content such as music, movies, and books directly from their blogs. Buyers purchase data directly from peers on the network, rather than directly from a website such as Apple's iTunes Music Store or Amazon's video store. There is a central location for searching all content for sale on the network, but listing items for sale directly on blogs is another mechanism that Bitmunk supports.

In order to make the music acquisition process seamless, Bitmunk needed a mechanism to mark up music, video and other digital content in a blog or website. The Bitmunk Firefox plug-in would then detect the purchase information required from the embedded meta-data in the same web page that the browser is viewing. For example, while browsing the Scissorkick website, it would be nice to be able to purchase the music directly from one's favorite online music store without leaving the page. Marking up the music information in a way that works across websites would hopefully help drive a universal set of tools to enable this use case.

Why do we need to support this?

While the iTunes music store and Amazon make it very easy to purchase music, it's difficult to price-check music and purchase it without having to go through a special website or application to acquire music. There should be a universal digital content acquisition mechanism that is built into browsers (via a simple extension) in order to acquire digital content quickly, inexpensively and most important, legally. No such cross-browser, cross-website mechanism exists.

RDFa Solution

Bitmunk utilizes the Media, Audio, Video and Commerce RDF vocabularies expressed using RDFa on websites and blogs to gather information about items that are described on the page as well as purchase information for those items. The Bitmunk plug-in detects audio and video objects on a page that may be purchased through Bitmunk and unobtrusively displays an option to the person browsing to purchase the items. The RDFa information contains, at a minimum, the item's URL, the item's name, type and information describing a purchase URL. More information can be gleaned via "follow your nose" to the item's URL.

Example Markup

<div xmlns:dcterms="http://purl.org/dc/terms/" 
     xmlns:commerce="http://purl.org/commerce#"
     xmlns:audio="http://purl.org/media/audio#"
     xmlns:bitmunk="http://bitmunk.com/vocabs/purchase#">
     about="http://bitmunk.com/media/6995811#song" typeof="audio:Recording">
   <span property="dcterms:title">The Way I Am</span> by 
   <span property="dcterms:creator">Ingrid Michaelson</span>
   Buy from <a rel="commerce:payment bitmunk:payment" href="http://192.0.2.20/bitmunk/purchase/6995811">me</a>.
</div>

Pseudo-code Example Using Markup

1. Check current page for any music, video information. (Assume music object is detected with purchase information).

2. Since music purchase information was detected, enable Firefox 3 AwesomeBar to list a music icon.

3. Person clicks Firefox 3 AwesomeBar icon, which builds a UI containing more information for the music item. Bitmunk plugin fetches document listed in the @about for the music item and parses it for RDFa triples.

4. Triple information from current page and target music page are combined and displayed to user in one UI along with purchase button.

5. Purchase button is clicked, Bitmunk serializes RDFa triples into a native format and starts the purchase transaction with the blogger's Bitmunk sales server.

Using a Data Model to Generate User Interfaces

Problem Description

As a browser interface developer, I find it really annoying that I have to keep creating new screen scrapers for websites in order to build UIs that work with page data differently than the page developer intended. The data is all there on the page, but it takes a great amount of effort to extract it into a usable form. Even worse, the screen scraper breaks whenever a major update is made to the page, requiring me to solve the scraping problem yet again.

Microformats were a step in the right direction, but I keep having to create a new parser and special rules for every new Microformat that is created. Every time I develop a new parser it takes precious time away from making the browser actually useful. Can we create a world where we don't have to worry about the data model anymore and instead focus on the UI?

Why do we need to support this?

Browser UIs for working with web page data suck. We can add an RSS feed and bookmark a page, but many other more complex tasks force us to tediously cut and paste text instead of working with the information on the page directly. It would increase productivity and reduce frustration for many people if we could use the data in a web page to generate a custom browser UI for calling a phone number using our cellphone, adding an event to our calendaring software, including a person in our address book, or buying a song from our favorite online music store.

RDFa Solution

implemented

Fuzzbot is designed to detect semantic data and display it using a custom UI specific to the data to the person browsing. For example, Fuzzbot can show you information about movies that it has found on a web page and then perform actions on a movie item that was found, such as search for it on IMDB.

Example Markup

<div xmlns:dcterms="http://purl.org/dc/terms/"
     xmlns:media="http://purl.org/media#"
     xmlns:video="http://purl.org/media/video#"
     about="#why-so-serious" typeof="video:Movie">
   <p>
   <a rel="media:depiction" href="http://images.rottentomatoes.com/images/movie/custom/51/1184851.jpg">
      <span about="#why-so-serious" property="dcterms:title">The Dark Knight</span>
   </a> directed by 
   <span property="dcterms:creator">Christopher Nolan</span>
   </p>
</div>

Video Markup Example Page

Pseudo-code Example Using Markup

After installing the Fuzzbot plugin, the browser will feed every page visited to the librdfa library for processing. There is an option for disabling Fuzzbot in the Firefox status bar. Here is the process for extracting and displaying UIs based on RDFa triples that are detected.

1. Fuzzbot registers a function that is called on page load in each tab to see if there is any RDFa data.

2. Firefox loads a page.

3. The Fuzzbot XUL code detects a page load and serializes the DOM to text, feeding it to the C++ Fuzzbot back-end.

function tripleHandler(subject, predicate, object) {}
...
var gPlugin = Components
     .classes["@rdfa.digitalbazaar.com/fuzzbot/xpcom;1"]
     .getService()
     .QueryInterface(
        Components.interfaces.nsIFuzzbotExtension);
var url = gBrowser.selectedBrowser.contentDocument.URL;
var xml = serializer.serializeToString(gBrowser.selectedBrowser.contentDocument);
gPlugin.processRdfaTriples(url, xml, tripleHandler);

4. The C++ Fuzzbot back-end calls librdfa to extract the triples.

5. When a new triple is detected, a Javascript callback in the Fuzzbot XUL code is called with the detected triple data.

// Fuzzbot Firefox C++ plugin code: process_triple is called from librdfa 
// whenever a triple is detected in the input stream
void process_triple(rdftriple* triple, void* callback_data)
{
   buffer_status* status = (buffer_status*)callback_data;
  
   // Perform a Javascript callback using XPCOM
   nsCOMPtr<fuzzbotJSTripleHandlerCallback> javascript_callback =
      status->javascript_callback;
   PRBool ret = PR_FALSE;
   javascript_callback->Call(
      triple->subject, triple->predicate, triple->object, &ret);
}

6. The Fuzzbot XUL code stores all triples in a per-Firefox-tab triple store which exists for as long as the browser is viewing the page.

function tripleHandler(subject, predicate, object)
{
   triple = new Object();
   triple.subject = subject;
   triple.predicate = predicate;
   triple.object = object;
   // If the subject doesn't exist in the triple store, create it.
   if(!gTripleStore[subject])
   {
      gTripleStore[subject] = new Array();
   }
   gTripleStore[subject].push(triple);
}

7. If a class is detected that Fuzzbot XUL code recognizes, such as video:Movie, an icon is placed in the Firefox 3 AwesomeBar.

// Setting the objectsDetected attribute triggers the visibility flag for the icon
fuzzbotVideoIcon.setAttribute("objectsDetected", "true");

8. If the person browsing clicks on the video icon in the Firefox 3 AwesomeBar, they are presented with a list of all of the video objects on the page.

9. If a particular video object is selected, then information for that video object is used to construct a Firefox 3 UI.

function videoSelected(event)
{
  var subj = event.currentTarget.getAttribute("subject");
  var title = "Video Information";
  var params = {inn:{subject:subj, triples:gTripleStore}, out:null};
  // create and display the new window.
  var newWindow = 
     window.openDialog("chrome://fuzzbot/content/displays/video.xul", "",
        "chrome, dialog, resizable=yes", params);
  newWindow.focus();
  newWindow.title = title;
  event.stopPropagation();
}

10. Each piece of information, such as director or title of the video can be used to launch an automatic search or lookup on a related site that is knowledge-able of that information (such as Rotten Tomatoes or IMDB, in the case of video).

serviceArguments = {"searchby":"celebs"};
buildActionMenu("fuzzbot-video-details-creator-menupopup",
   "Search Rotten Tomatoes", "rotten-tomatoes", serviceArguments, 
   gTripleStore[currentSubject]["dcterms:creator"]["value"]);

Basic Structured Blogging

Paul maintains a blog and wishes to "mark up" his existing page with structure so that tools can pick up his blog post tags, authors, titles, and his blogroll. In particular, his HTML blog should be usable as its own structured feed.

Publishing an Event

Overriding Some of the Rendered Data: Paul sometimes gives talks on various topics, and announces them on his blog. He would like to mark up these announcements with proper scheduling information, so that his readers' software can automatically obtain the scheduling information and add it to their calendar. Importantly, some of the rendered data might be more informal than the machine-readable data required to produce a calendar event. Also of importance: Paul may want to annotate his event with a combination of existing vocabularies and a new vocabulary of his own design.

Content Management Metadata

Tod sells an HTML-based content management system, where all documents are processed and edited as HTML, sent from one editor to another, and eventually published and indexed. He would like to build up the editorial metadata within the HTML document itself, so that it is easier to manage and less likely to be lost.

Self-Contained HTML Fragments

Tara runs a video sharing web site. When Paul wants to blog about a video, he can paste a fragment of HTML provided by Tara directly into his blog. The video is then available inline, in his blog, along with any licensing information (Creative Commons?) about the video.

Web Clipboard

Problem Description

Lucy is looking for a new apartment and some items with which to furnish it. She browses various RDFa-enabled web pages, including apartment listings, furniture stores, kitchen appliances, etc. Every time she finds an item she likes, she can point to it, extract the locally-relevant structured data expressed using RDFa, and transfer it to her apartment-hunting page, where it can be organized, sorted, and categorized.

Why do we need to support this?

Extracting relevant information from web pages is a still a very manual process. Unless a particular site allows you to add items to a shopping cart, or a "favorites list", it is very difficult to store relevant details for later use. The use of a web browser to remember items from multiple sites is even more daunting, usually resulting in dropping web tools in favor of desktop tools such as a text editor.

There is no reason why copying concepts to a web-based clipboard should be so difficult - the idea has failed to gain traction until now because there has not been an easy-to-implement data model and mark-up mechanism allowing people to right-click and store items into a semantic clipboard.

RDFa Solution

not-implemented

One potential solution uses semantic web technologies to express structured data about apartments and products on different sites that publish RDFa. This content is "semantically clippable", meaning that it can be extracted from a page, stored and published elsewhere on the web.

To extract the data, Lucy right-clicks on the image and the web browser would present a new option to her titled "Save object to Web Clipboard...". This would copy all semantic objects associated with the image (any triples that contain it in a subject-predicate-object statement, as well as all parent objects) to a storage area in the browser. Once these objects are in the clipboard, it would then allow the transfer of these objects to a different website or application that consumes RDFa.

Example Markup

There are two sites that Lucy searches for items on - ApartmentFinder and Craigslist.

After searching for a while, this apartment looks interesting to Lucy:


<div xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:dcterms="http://purl.org/dc/terms/" 
     xmlns:commerce="http://purl.org/commerce#"
     xmlns:shelter="http://example.org/shelter#">
     about="http://losangeles.apartmentfinder.com/Beverly-Hills-Apartments/Lido-Equities-Management-Apartments#1BR" typeof="shelter:Apartment">

   <span rel="shelter:depiction">
      <img src="http://images.apartmentfinder.com/phototmp/Thumbnails/ExtraLarge/88340/E56DED46-5FDF-418F-93DA-4E18AABB9ACC.jpg" />
   </span>

   <span tel="shelter:phone" href="tel:18004706219">800-470-6219</span>
   <span property="shelter:address-street">218 N. Cannon Drive</span> 
   <span property="shelter:address-room">Suite C</span>
   <span property="shelter:address-city">BEVERLY HILLS</span> 
   <span property="shelter:address-locality">CA</span>
   <span property="shelter:address-zipcode">90210</span>

   <span property="dcterms:description">
      Prime Westside Living. We have bachelors, singles, one and two bedroom floorplans. Close to UCLA, easy access to freeways. 
   </span>

   Rent: <div rel="commerce:costs">
             <div about="http://losangeles.apartmentfinder.com/Beverly-Hills-Apartments/Lido-Equities-Management-Apartments#1BR-price" 
                  typeof="commerce:Price">
                <span property="commerce:currency" content="USD">$</span> <span property="commerce:amount" datatype="xsd:decimal">595</span>
                <span property="commerce:billingCycle" content="RP1M">monthly</span>
             </div>
          </div>
</div>

She right-clicks on the image of the apartment and selects "Save Object to Web Clipboard...".

She then goes to Craiglist and finds a couch in Hollywood, saving the semantic object as well:


<div xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:dcterms="http://purl.org/dc/terms/" 
     xmlns:commerce="http://purl.org/commerce#"
     xmlns:vcard="http://www.w3.org/2001/vcard-rdf/3.0#"
     xmlns:product="http://example.org/product#">
     about="http://losangeles.craigslist.org/sfv/fuo/1133629621.html" typeof="product:Furniture">

   <div rel="product:depiction">
      <img src="http://images.craigslist.org/3k13mb3pbZZZZZZZZZ94lbdf5b6de5f561d9d.jpg" />
   </div>

   <span property="dcterms:description">
      Couch set for sale 3 years old no rips or tears suede material dark brown color good brand
   </span>

   <div rel="commerce:contact">
      You can call Tal at <span typeof="vcard:AGENT" rel="vcard:TEL" href="tel:18189687953">818-968-7953</span> 
   </div>

   Price: <div rel="commerce:costs">
             <div about="http://losangeles.craigslist.org/sfv/fuo/1133629621.html#price" 
                  typeof="commerce:Price">
                <span property="commerce:currency" content="USD">$</span> <span property="commerce:amount" datatype="xsd:decimal">175</span>
             </div>
          </div>
</div>

Pseudo-code Example Using Markup

The storage mechanism isn't important, an on-disk database, web service, JSON-encoded text file or any other persistent storage engine would work. The internal representation format could be RDF/XML, XHTML1.1+RDFa, TURTLE, NTriples, or any other semantic storage format. The data could be accessed from the browser-based storage via a web page.

Lucy could then use an website called TheBigMove.com to organize all aspects of her move, including items that she is tracking for the move. She would go to her "To Do" list and add the semantic objects she had cut from other places. To ensure that sites don't try and steal any of her web clipboard objects, she would be required to click a browser-activated button labeled "Upload Web Objects", which would ask her which web objects she would like to share with the web page. Lucy would select the apartment listing and the couch listing, which would prompt the code to do the following:


// get all of the recent web clipboard objects (as XHTML1.1+RDFa) 
// that were authorized for transfer by Lucy
var authorizedObjects = webClipboard.getAuthorizedObjects("XHTML1.1+RDFa");

// parse all of the triples into Javascript objects
var rdfaParser = new RdfaParser();
var semanticObjects = rdfaParser.parse(authorizedObjects);

// add each object to the page
for(obj in semanticObjects)
{
   pageDisplay.addObject(obj);
}

If Lucy is happy with the items that were added to the page, she could then click "Save Changes..." on TheBigMove.com website and log out.

Semantic Wiki

Tim runs an RDFa-aware Semantic Wiki, where users contribute content in Wiki markup, using a WYSIWYG tool, or using HTML+RDFa. In all cases, the semantic wiki produces HTML+RDFa, so that users like Lucy can transfer the structured content from one semantic wiki (or any other RDFa source) to another semantic wiki (or any other RDFa destination). In particular, Lucy may be pasting her apartment-and-furnishing finds into her own Semantic Wiki.

Augmented Browsing for Scientific Minds

Problem Description

Patrick writes a science blog where he discusses proteins, genes, and chemicals. He has just gotten his DNA analyzed by a genomics firm and would like to publish one finding to his blog. He has very little control over the layout. He's using a fairly constrained hosting provider that disallows layout changes but conveniently allows him to embed metadata. An example of this would be an RDFa-enabled Drupal installation that doesn't strip RDFa attributes from markup. Patrick adds RDFa to indicate the genes that he is talking about. Visitors to his website can then browse Patrick's site with an RDFa-aware browser and automatically cross-reference the proteins and genes that Patrick is talking about.

Why do we need to support this?

There are a variety of data-driven websites that are increasingly used by scientists and scientifically savvy readers to publish, collect and search distributed research data. For example 23andMe, SNPedia and KEGG could be cross-referenceable if only there were a universal mechanism that all of them could use to express genomic and organic chemical compound information. The base standards exist, but there is no higher-level semantic expression language that allows them to express the data in an open way. By expressing this data semantically, in-browser systems could ease the burden of cross-referencing information spread across numerous websites.

RDFa Solution

not-implemented

Our example demonstrates expressing an SNP on a blog and having that information cross-reference-able to 23andMe as well as SNPedia.

From Wikipedia: A Single Nucleotide Polymorphism is also known as a SNP or snp (pronounced 'snip').

The importance of SNPs comes from their ability to influence disease risk, drug efficacy and side-effects, tell you about your ancestry, and predict aspects of how you look and even act. SNPs are probably the most important category of genetic changes influencing common diseases. And in terms of common diseases, 9 of the top 10 leading causes of death have a genetic component and thus most likely one or more SNPs influencing your risk.

Example Markup

A blog post could look like the following:

<div xmlns:dna="http://example.com/vocabs/dna#"
     about="http://example.com/me" typeof="dna:Human">
   Just got my results back from 23andMe and they're showing that I have an
   SNP marker for <span property="dna:SNP">rs333</span>, which is basically 
   a resistance to HIV, which is good. The bad news is that I have an
   increased risk for aneurysms.
   </div>
</div>

Pseudo-code Example Using Markup

To extract and cross-reference the data, a web browser would:

  1. Use an RDFa browser to extract all of the triples from the page.
  2. Highlight all dna:SNP triples that are visible on the page.
  3. The user would then click on the highlighted triple, which would then give them the option to look up the SNP on SNPedia or 23andMe.
  4. If the user clicks on SNPedia, they would be redirected to a base URL + the SNP text. In the case above, this would be: http://www.snpedia.com/index.php/rs333

Advanced Data Structures

Patrick keeps a list of his scientific publications on his web site. Using the BibTex vocabulary, he would like to provide structure within this publications page so that Ulrich, who browses the web with an RDFa-aware client, can automatically extract this information and use it to cite Patrick's papers.

Publishing an RDF Vocabulary and Validating Usage

Problem Description

Paul wants to publish a large RDF vocabulary in HTML in order to provide a clear, human readable description of the same vocabulary, that mixes the terms with descriptive text in HTML. He wants to do this using a combination of RDFS and/or OWL because he would like to ensure that vocabulary validators can use his vocabulary page to read more detailed information about the vocabulary. This extra information could include information about vocabulary term datatypes, similar or equivalent terms in other vocabularies, parent class information, vocabulary term stability, and datatype range information.

Using RDFa, the terms themselves can be mixed with a descriptive text in HTML. The RDFa engine can then extract the vocabulary in RDF/XML, TURTLE, N-Triples, or N3 formats, to be included used directly by RDF aware applications (eg, reasoners).

Why do we need to support this?

In order to make general vocabulary editing/checking tools for RDFa that are capable of helping authors detect mistakes and warn them about experimental features, a mechanism to express semantics about vocabulary documents is necessary. Without the ability to annotate vocabularies with meta-data, developers will be forced to create vocabulary-specific solutions. Creating hand-coded, vocabulary-specific validators is a very time consuming process, time which could be spent solving less monotonous problems.

RDFa Solution

not-implemented

There are a number of RDF vocabularies whose primary purpose is describing other vocabularies. Some of these vocabularies include RDF Schema (RDFS), the Web Ontology Language (OWL), Vocabulary Status, and Simple Knowledge Organizations Systems (SKOS).

Example Markup

The markup below not only provides a human readable description for the RDF vocabulary, but it also makes certain parts of the vocabulary machine-readable as well. The markup expresses type information for the vocabulary term, term stability, identifying the human-readable description of the term, and specifying the range of allow-able values for the term.


<dl xmlns:xsd="http://www.w3.org/2001/XMLSchema#" 
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 
    xmlns:owl="http://www.w3.org/2002/07/owl#" 
    xmlns:vs="http://www.w3.org/2003/06/sw-vocab-status/ns#"
    about="http://purl.org/media#position" typeof="owl:DatatypeProperty">
  <dt><a href="http://purl.org/media#position">media:position</a></dt>

  <dd class="details">
    <table>
      <tr>
        <td class="detailtype">Status</td>

        <td property="vs:term_status">stable</td>
      </tr>

      <tr>
        <td class="detailtype">Description</td>

        <td class="description" property="rdfs:comment">The position of
        the audio recording in an album, LP, playlist, top 10 list,
        podcast history or other ordered list of audio recordings.</td>
      </tr>

      <tr>
        <td class="detailtype">Datatype:</td>

        <td class="datatype">
          <a rel="rdfs:range" 
             href="http://www.w3.org/2001/XMLSchema#integer">xsd:integer</a>
        </td>
      </tr>
    </table>

  </dd>
</dl>

Pseudo-code Example Using Markup

Using the XHTML+RDFa document above, a developer could write a general vocabulary validator to ensure that authors are using the vocabulary as its designer intended it to be used.

In order to provide general range checking information, a validator could:

  1. Parse an RDFa page that uses an annotated RDF vocabulary. Save the triples as the [active page triples].
  2. For each vocabulary used on the page, follow the vocabulary URL to it's source and parse the vocabulary using an RDFa parser. Save each vocabulary's triples as [vocabulary description triples].
  3. For each [triple] in [active page triples] look up the predicate in [vocabulary description triples]. We'll refer to these set of triples as [vocabulary predicate triples].
  4. Look up the "rdfs:range" in the [vocabulary predicate triples] and then check it against the value for the [triple]. If the type information of the [triple]'s object does not match the "rdfs:range", then warn the author that they are mis-using the vocabulary.

Given the XHTML+RDFa markup above, the validation tool would check to make sure that all "media:position" predicate's have data that is of type "xsd:integer". Any author using anything that is not of type "xsd:integer" would be warned that they are using improper values for their "media:position" item.

In order to provide warnings about experimental features, a validator could:

  1. Parse an RDFa page that uses an annotated RDF vocabulary. Save the triples as the [active page triples].
  2. For each vocabulary used on the page, follow the vocabulary URL to it's source and parse the vocabulary using an RDFa parser. Save each vocabulary's triples as [vocabulary description triples].
  3. For each [triple] in [active page triples] look up the predicate in [vocabulary description triples]. We'll refer to these set of triples as [vocabulary predicate triples].
  4. If the "vs:term_status" is not equal to "stable", warn the author that they are using a non-stable vocabulary term.

To provide more information to an interactive RDFa editor, an authoring tool could:

  1. Allow the author to hover over a [vocabulary term].
  2. The authoring tool would fetch the annotated vocabulary via the RDF vocabulary URL and save each vocabulary's triples as [vocabulary description triples].
  3. The authoring tool would then find the [vocabulary term] triples in the [vocabulary description triples] by searching all subjects for the [vocabulary term].
  4. Upon finding the [vocabulary term] triples, the UI could then display the "rdfs:description", "vs:term_status", "rdfs:range" and a number of other helpful hints for page authors.

Extending a tag-based language with flexible metadata

Michael has developed a domain-specific XML language. He wants to be able to add metadata to his documents, not limited in granularity or vocabulary. Therefore, he adds [a subset of] RDFa to his language and implements an extraction of RDF from his language.

General Mechanism for Assigning ISO Codes to Object Data

XML: The Microformats community has been struggling with the abbr design pattern when attempting to specify certain machine-readable object attributes that differ from the human-readable content. For example, when specifying times, dates, weights, countries and other data in French, Japanese, or Urdu, it is helpful to use the ISO format to express the data to a machine and associate it with an object property, but to specify the human-readable value in the speaker's natural language.

Enhancing a User-Agent's copy/paste operations with meta data

Enabling relevant meta data included in copy/paste operations, so that the UA copied from can annotate the data with a common meta information interchange format and the target application can interpret it. See E-mail and discussion.

Enabling Physical/Aural Rendering Semantics in SVG

SVG: Doug Scheppers raised an interesting use case at Web Directions North 2009. People that have visual disabilities usually use their fingers, notably texture, to understand visual data such as maps and charts. If semantics could be attached to graphics or parts of web pages that are of interest, textural rendering machines (such as thermal printers that print maps with different textures for water, roads, sidewalks, crosswalks, etc.) could extract the semantic visual information from a page or image and generate the appropriate textured print-out. Doug marked sections of an image with semantic attributes during his demo, which would allow you to render the image using texture data to express semantics, or provide aural cues when one touches part of an image.

Replacing Screen Scraping

Problem Description

Service and product providers can't include the meaning or data behind the things they publish in a structured way using HTML. For example, how do you find out where the price of a book is located in a page from Amazon? Anybody that would want to use this data is forced to perform "screen scraping", that is, there is a need for publisher-push rather than consumer-pull semantics.

Why do we need to support this?

There are a variety of websites that publish data for the world to use - governments, non-profits, medical agencies, reporting agencies and a variety of other data-driven organizations produce intellectual property products that they would like to share with the world. Until now, they would have to provide data files along side HTML files describing the contents of those data files. There is a rather large search industry built around web scraping, but also a vibrant 3rd party software community built around the need to extract data from web pages. Some web scraping software retails for as much as $2,500 per copy, underscoring the complexity of the task.

It has been demonstrated that a single data model, RDF, could express a vast amount of information while re-using the standard information markup mechanism on the web, HTML. Supporting this approach of structured data representation on the web would lower the barriers for re-using data across the web and spur innovation related to data re-use and cross-site data mash-ups.

RDFa Solution

not-implemented

Focusing specifically on marking up book prices on web pages, one could define a variety of RDF vocabularies for book and pricing markup on the web, which would allow authors, publishers and book sellers to submit their current book prices to search engines and web browsers. This mark-up would allow indexing of not only what is for sale, but how much, availability, shipping source and a variety of other items that people shopping for books would be interested in knowing.

Example Markup

This item could be shown, for instance, in a set of search listings for Amazons bestsellers list.


<div xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:dcterms="http://purl.org/dc/terms/" 
     xmlns:commerce="http://purl.org/commerce#"
     xmlns:biblio="http://purl.org/net/biblio#">
     about="http://www.amazon.com/Watchmen-Alan-Moore/dp/0930289234" typeof="biblio:Book">
   <span property="dcterms:title">Watchmen</span> by 
   <span property="dcterms:creator">Alan Moore</span> and <span property="dcterms:creator">Dave Gibbons</span>.

   4.6 out of 5 stars (830 customer reviews) | 28 customer discussions
   In Stock

   List Price: $19.99
   Price: <div rel="commerce:costs">
             <div about="http://www.amazon.com/Watchmen-Alan-Moore/dp/0930289234#price" typeof="commerce:Price">
                <span property="commerce:currency" content="USD">$</span> <span property="commerce:amount" datatype="xsd:decimal">11.99</span>
             </div>
          </div>
   You Save: $8.00 (40%)

   <a rel="commerce:payment" href="http://www.amazon.com/gp/offer-listing/0930289234/">125 used & new</a> from $8.00.
</div>

Pseudo-code Example Using Markup

A screen scraper could easily extract the markup above by running a regular RDFa processor on the page to extract the structured data. The same screen scraper would work for any book site that used a common set of vocabularies. This means that one can write one screen scraper for books, instead of having to write a screen scraper for Amazon.com and another for Barnes and Noble and yet another for their favorite independent online book store. The book scraper program would not have to worry about the physical layout of the page anymore, and instead could do something like this:

// create a book parser and get all of the triples from a particular URL
var bookParser = new RDFaBookParser();
bookParser.parseTriples("http://www.amazon.com/gp/bestsellers/books/");

// get all of the book triples
var allBooks = bookParser.getByType("biblio:Book");

// get all of the books by Alan Moore
var alanMooreBooks = allBooks.get("dcterms:creator", "Alan Moore");

// print out the title and prices for all of the books by Alan Moore
for(book in alanMooreBooks)
{
   // if the book has an associated price, store it in our local database
   if(book["commerce:costs"])
   {
      myDatabase.store(book.url, book["dcterms:title"], book["commerce:costs"]["commerce:amount"]);
   }
}

Mash-ups Sans APIs

People doing data mash-ups need to learn a plethora of APIs or formats while all they would likely want is one format and a plethora of vocabularies covering the domain.

Expressing and Associating Concepts on a Web Page

Problem Description

Use of HTML to make annotations that aid the production and dissemination knowledge on a global basis. I shouldn't have to switch to another language in order to express or identify concepts associated with the text/blurb in a Web page I am reading, writing, or publishing.

When writing HTML (by hand or indirectly via a program) I want to isolate at time or writing what the content is about with regards to: people, places, and other real-world things. For instance, if writing about the subject "Napoleon", I want to be able to isolate "Napoleon" from a paragraph or heading, and state that the aforementioned subject: is an Entity of type "Person", and the fact that he is associated with another entity "France".

From a readers perspective, the use-case above is like taking a highlighter and making notes while reading about "Napoleon".

The case above is akin to what we all do when studying. The trouble with this scenario is that sharing the annotations is somewhat unnatural due to conditioning that goes right back to childhood. For example, when we were kids, we performed these tasks, but actually shared that part of our endeavors since it was typically the route to competitive advantage i.e., being top student in the class.

The childhood scenario outlined above is antithetical to the essence of the World Wide Web, as vital infrastructure harnessing collective intelligence.

Why do we need to support this?

When authoring content on the web, we do not give web authors the option of expressing deeper meaning in the content that they write. While we do provide the concept of the hyperlink to associate relevant documents with one another, we do not provide a mechanisms to hyperlink relevant concepts to one another. Being able to not only link concepts across the web, but in the same document, provides deeper meaning to both humans and computers. This deeper meaning can then be mined from the Web at large. Deeper relationships that the original author had identified can be discovered by crawling not only the documents, but the concepts on the Web. Knowledge expression can happen with higher fidelity when semantic technologies are used.

RDFa Solution

not-implemented

A website could use a vocabulary focused on expressing historical concepts to express concepts about Napoleon. For example, the following Wikipedia entry text could be enhanced using RDFa to establish inter-page concepts and relate them to one another:

Napoleon Bonaparte (French: Napoléon Bonaparte French pronunciation: [napoleɔ̃ bɔnɑpaʁt]; 15 August 1769 – 5 May 1821) later known as Emperor Napoleon I, was a military and political leader of France whose actions shaped European politics in the early 19th century.

Example Markup


<div xmlns:history="http://example.org/history#"
     about="http://en.wikipedia.org/wiki/Napoleon_I_of_France" typeof="history:Person">
<span property="history:name" xml:lang="en">Napoleon Bonaparte</span> 
(French: <span property="history:name" xml:lang="fr">Napoléon Bonaparte</span> 
French pronunciation: [napoleɔ̃ bɔnɑpaʁt]; 15 August 1769 – 5 May 1821) later known 
as Emperor Napoleon I, was a military and political leader of 
<div rel="history:leader">
   <div typeof="history:Country">
      <span property="history:name">France</span> whose actions shaped European politics in the early 19th century.
   </div>
</div>

The markup above establishes three semantic concepts, a person (Napoleon), a country (France), and their relationship (Napoleon was a leader of France).

Pseudo-code Example Using Markup

A search engine or web browser could then parse and save this information for later by running the following code:


var rdfaParser = new RdfaParser();

myDatabase.addTriples(rdfaParser.parse(currentPage));

A query agent could then run a query against the database of history information (either in-browser, or via a search page). For example, somebody could type the following question into a search box:

"Who is Napoleon?" [SEARCH]

The following code could be executed to answer the question:



// "Who" could be interpreted as a search hint on a person's name
people = myDatabase.search("history:Person").search("history:name", "Napoleon");

// The following command would print all of the people who are named Napoleon and
// their relationships to others. In this particular case, the following information
// would be printed: "Napoleon Bonaparte" leader "France"
for(p in people)
{
   printPersonRelationshipsToCountries(p);
}


Automatic Disambiguation of Search Terms on the Web

When Google answers a search like "Who is Napoleon" you get an answer, but where is the disambiguation? How does it determine the context for the search? There are many dimensions to "Napoleon" and Google statistically guessed one based on link density of its subjectively assembled index and page rank algorithm. How do you as writer or reader efficiently navigate the many aspects/facets associated with the pattern: "Napoleon"? What if the answer you are looking for is in the statistically insignificant links and not the major links?

Building Universal Knowledge Systems Where Natural Language Processing Fails

How do you merge statements made in multiple languages about a single subject or topic into a particular knowledge base? If there are a number of thoughts about George W. Bush made by people that speak Spanish and there are a number of statements made by people that speak English, how do you coalesce those statements into one knowledge base? How do you differentiate those statements from statements made about George H.W. Bush? One approach would be to use a similar underlying vocabulary to describe each person and specify the person using a universally unique string. This would allow the underlying language to change, but ensure that the semantics of what is being expressed stays the same.

Externally Tagging Parts of Video with Information

Problem Description

Sam has posted a video tutorial on how to grow tomatoes on his video blog. Jane uses the tutorial and would like to leave feedback to others that view the video regarding certain parts of the video she found most helpful. Since Sam has comments disabled on his blog, Jane cannot comment on the particular sections of the video other than linking to it from her blog and entering the information there. This is not useful to most people viewing the video as they would have to go to every blogger's site to read each comment. Luckily, Jane has a video player that is capable of finding comments distributed around blogs on the net. The video player shows the comments as a video is being watched (shown as sub-titles). How can Jane specify her comments on parts of a video in a distributed manner?

Why do we need to support this?

The web is inherently a collaborative medium. While many people do not have the knowledge and skill to make a site collaborative, enabling static content to receive input from visitors is not only helpful to those that utilize the site, but it adds value to the Web. The more information that is known and expressed about content on the web, the greater the network effects.

RDFa Solution

not-implemented

Annotation of remote media files can be performed via a simple RDF vocabulary and RDFa. While this example mashes HTML5's video element and RDFa together, it could work with any embedded media object.

Example Markup


<div xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:video="http://purl.org/media/video#"
     xmlns:caption="http://example.org/caption#"
     about="#tomato-tutorial" typeof="video:Episode">
   <div rel="media:download">
      <video controls src="http://example.org/growing-tomatoes.ogg"></video>
   </div>
   <p>I really liked this growing tomatoes video by Sam. His use
   of upside-down growing techniques and terra cotta pots looks great.</p>

   <p about="http://example.org/growing-tomatoes.ogg" rel="caption:comment">
      <div typeof="caption:Text">
         Around
         <span property="caption:offset" content="PT2M3S" datatype="xsd:duration">two minutes</span>
         in, he talks about using expensive pots.
         <span property="caption:content">I ended up using 5 gallon buckets instead of terra cotta pots.</span>
      </div>
   </p>
</div>

Pseudo-code Example Using Markup

A video player embedded in the same page could extract the triples from the page like so:


// grab all of the captions on the current page
var rdfaParser = new RdfaParser();
var semanticObjects = rdfaParser.parse(currentPage);
var captions = semanticObjects.getObjectsBySubjectAndType("http://example.org/growing-tomatoes.ogg", "caption:Text");

var videoPlayer = getVideoPlayer("http://example.org/growing-tomatoes.ogg");
videoPlayer.loadCaptions(captions);
videoPlayer.play();

If someone were to embed the video on their website and use a third party indexing engine to retrieve captions associated with the video, those captions could be expressed in RDFa and loaded into the video player in the same way:


// grab all of the captions on the current page
var rdfaParser = new RdfaParser();
var captions = rdfaParser.parse("http://example.org/video-captions?video=http://example.org/growing-tomatoes.ogg&filter-spam=yes");

var videoPlayer = getVideoPlayer("http://example.org/growing-tomatoes.ogg");
videoPlayer.loadCaptions(captions);
videoPlayer.play();

Note that a one line change can change the caption data feed from the current page to a different page without needing to change the web API or objects used by the video player.