Inaugural Meeting of the Concept Web Alliance

May 18, 2009 Ian Hamilton Technology

On May 7 – 8th, I attended the inaugural meeting of the Concept Web Alliance. CWA wants to enable interoperability between large triple stores like the Large Knowledge Collider (LarKC) and provide an Open Access mechanism for accessing the triple stores. This is great for the projects in life sciences as the semantic triple stores are becoming the de facto way to store data for gene expression and sequencing, biobanks, etc.

The CWA mission statement is:

”To enable an open collaborative environment to jointly address the challenges associated with high volume scholarly and professional data production, storage, interoperability and analyses for knowledge discovery.”

You can read the entire CWA declaration.

There were a number of representatives from the STM publishing world (Abel Packer from Bireme is a founding CWA member, Nature, Springer, SEED, The Scientist, Thomson Reuters) and I had some good conversations about the vision of the CWA in relation to STM publishers. All agree that the CWA is a much needed initiative but there are questions on how it can feed back into a revenue model. Most STM publishers don’t have triple stores that can be offered to the CWA endeavor. They publish the final result – research articles based on the triple stores.

But I see a few ways that publishers can benefit from working with the CWA:

1. The CWA can provide tools that link the data stores directly to the content of the research article. Search, data mining, and visualization tools can be created for publishers which would give their users new ways of interacting with the research article. Users can find the research articles that they really want and can dig into the underlying data even if the data is a massive data store. For the publisher, this can increase revenue by bringing more traffic, focused advertising campaigns, etc. CWA can provide these tools for a fee which would be used to sustain and further the CWA mission.

2. The CWA can provide tools to automate semantic encoding of the research article. As an example, David Shotten has shown how this can be used by publishers by encoding a PLoS NTD article – see Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article and has another paper titled ““Semantic Publishing: the coming revolution in scientific journal publishing” – preprint available here. Knewco also has some great technology in this space – check out their Concept Web.

3. Publishers can provide Open Data back to the CWA. Publishers could provide access to content tagged with RDFa for easier auto-machine discovery, allow access to the data from the supplemental information in the research article or provide triples generated from the content of the research article itself. PLoS is a bit ahead of the publishing curve as all of the PLoS journals run on the Ambra/Topaz platform which stores the content of the research articles as triples. We’re looking at ways to provide access to a subset of these triples (we would need to remove user information) through a SPARQL endpoint or other means of access. This would allow for direct access to the triples that could then be given back to the CWA.

What other ways can publishers interact with the CWA? The CWA wants to know how your organization can participate.