Posts Tagged ‘research’
ResearchForge - An open access repository of scientific papers, shared data and open source software
Posted by Dustin Burke in uncategorized on December 4th, 2009
A recent position paper from ACM Communications titled “Assessing Open Source Software as a Scholarly Contribution“, calls for publishing open source software as part of scientific contributions to become standard practice and advocates a software review process to ensure the software meets adequate level of quality. Furthermore, the authors would like to see scientists to continue maintaining and supporting their software. Some bibliometrics are even suggested as a means to measure the impact of the implemented software.
Digging deeper, I expected PLoS (Public Library of Science) as an innovator in this space to already have this solved. Although not there yet, PLoS makes Open Access, Data Sharing and Open Source Software publication a policy and their IT Director is a believer in open source, even making it a part of job descriptions on their site. Software must (or should) be submitted to one of the listed software repositories (SourceForge, etc.). I think its more than just making source code available, its also integrating it with a well-established open source platform. For instance, anyone publishing data mining algorithms these days should integrate it into Weka, otherwise applied researchers will likely either choose an alternate algorithm that IS implemented in Weka (especially if upon a deadline) or custom develop the algorithm as described in pseudocode. The applied researcher might not even be aware that your algorithm exists since the software repositories and scientific citation databases are disjoint. Custom developing a pseudocode algorithm increases the likelihood of software implementation errors, the obvious consequence of which is erroneous scientific findings derived from its use.
Weka’s acceptance criteria for new classifiers is that they be published in a reputable journal or conference proceedings and that they outperform standard algorithms. I guess that makes the challenge a bit of “chicken and egg” problem. Further complicating the adoption of publishing open source software as an accepted scholarly practice, the selection of programming language for the software implementation will always be a contentious issue.
Seems like there’s an opportunity for a “ResearchForge” site that combines open source software repository (like SourceForge) with a citation and publishing platform (like CiteULike or others) with a dataset repository, all nicely packaged together. I suppose the same could be accomplished if existing repositories published semantically annotated content so that a machine could automatically connect the dots between the paper citation, the accompanying datasets, and the software implementation, all residing in disparate repositories.







