Archive for December, 2009

ResearchForge - An open access repository of scientific papers, shared data and open source software

A recent position paper from ACM Communications titled “Assessing Open Source Software as a Scholarly Contribution“, calls for publishing open source software as part of scientific contributions to become standard practice and advocates a software review process to ensure the software meets adequate level of quality.  Furthermore, the authors would like to see scientists to continue maintaining and supporting their software.  Some bibliometrics are even suggested as a means to measure the impact of the implemented software.

Digging deeper, I expected PLoS (Public Library of Science) as an innovator in this space to already have this solved.  Although not there yet, PLoS makes Open Access, Data Sharing and Open Source Software publication a policy and their IT Director is a believer in open source, even making it a part of job descriptions on their site.  Software must (or should) be submitted to one of the listed software repositories (SourceForge, etc.).  I think its more than just making source code available, its also integrating it with a well-established open source platform.  For instance, anyone publishing data mining algorithms these days should integrate it into Weka, otherwise applied researchers will likely either choose an alternate algorithm that IS implemented in Weka (especially if upon a deadline) or custom develop the algorithm as described in pseudocode.  The applied researcher might not even be aware that your algorithm exists since the software repositories and scientific citation databases are disjoint.  Custom developing a pseudocode algorithm increases the likelihood of software implementation errors, the obvious consequence of which is erroneous scientific findings derived from its use.

Weka’s acceptance criteria for new classifiers is that they be published in a reputable journal or conference proceedings and that they outperform standard algorithms.  I guess that makes the challenge a bit of “chicken and egg” problem.  Further complicating the adoption of publishing open source software as an accepted scholarly practice, the selection of programming language for the software implementation will always be a contentious issue.

Seems like there’s an opportunity for a “ResearchForge” site that combines open source software repository (like SourceForge) with a citation and publishing platform (like CiteULike or others) with a dataset repository, all nicely packaged together.  I suppose the same could be accomplished if existing repositories published semantically annotated content so that a machine could automatically connect the dots between the paper citation, the accompanying datasets, and the software implementation, all residing in disparate repositories.

Tags: , , ,

No Comments


Semantic Annotation and the Tipping Point for Semantic Web

Semantic annotation capabilities embedded within publishing platforms is necessary (but not sufficient) for semantic web adoption to reach its tipping point.  We already have Semantic Mediawiki extension for semantic wikis, Wordpress RDFa plugin for annotated blogging, and a recent paper in ACM Communications from Microsoft Research titled “A ‘Smart’ Cyberinfrastructure for research” discusses add-in support within Microsoft Word for semantic annotations first announced in March 2009 and available from CodePlexDrupal bundles RDFa annotations within its Content Management System (CMS) as a core, out-of-the-box feature.  Bundling semantic annotation out-of-the-box is key to widespread adoption; community plugins/extensions like Semantic Mediawiki and Wordpress RDFa plugin will not lead to the same level of user adoption as publishers of semantic content, thereby not fully realizing the potential on the consumer side of semantic technologies.  I anticipate other publishing platforms will follow their lead, first with a community extension available for semantic annotation and then working the capability into the roadmap to become a core feature.  I wonder how long until a semantic annotation capability exists within PDF?  This 2007 paper “An Annotation Tool for Semantic Documents” demonstrates a Protege ontology editor plugin that allows users to semantically annotate PDF documents.

Some companies publish RDF already (see Good Relations ontology for e-commerce, BestBuy publishes its store information and inventory as RDF) but this practice won’t become more widespread until e-commerce sites are at a competitive disadvantage to NOT publish RDF of their inventory.

In terms of the research community eating its own semantic web dogfood, I think Semantically Annotated LaTeX holds a lot of promise, as I would expect Computer Science and related scientific disciplines to be the earliest adopters, but I guess the bioinformatics field already has them beat.

As a case in point, I plan to add Wordpress RDFa extension to this blog as soon as I get around to it.

Tags: ,

1 Comment



SetPageWidth