Workflows from Sage Bionetworks.

2011 July 8
by Monica Duke

The aims of the SageCite project included extending Taverna to incorporate a citation service, and collecting evidence of the types of network models, data and process used in the modelling of disease networks. Review of the literature shows that current conventions for reporting research, and methods of citing and linking data in publications, do not always provide sufficient detail and information to reproduce the research reported. This has led to calls to change methods of publication to improve the links and citations to the research underlying the journal article. Eric Schadt described the launch of a new journal Open Network Biology to address this problem, and Phil Bourne, winner of the 2010 Jim Gray eScience award, makes a call for the data, and the knowledge derived from that data, to become less distinct, and more easily navigated.

SageCite set about working closely with Sage Bionetworks to take some steps in investigating the development of disease network models, documenting the process, and adding support for citation.

Following a visit in November 2010 to the Sage Bioneworks base in Seattle, specific data sets and tools were provided by Brig Mecham at Sage Bionetworks to Peter Li. Using these data and tools, workflows were implemented in Taverna which captured and documented a particular stage of the disease model building process as part of the metaGEO project that is co-ordinated by Brig Mecham. These workflows, together with others being developed in the future, provided the basis for understanding the issues of data citation in network biology. These workflows are also an integral part of the demonstrator application that has been developed. The demonstrator shows how the DataCite service can be used for registering data that are generated from the building of disease network models. The workflows themselves are now shared through the MyExperiment environment.

The development and use of the demonstrator is described in these slides. A recording of the process will also be available shortly.

The registration of workflow data was implemented as a plugin for the Taverna workflow system. The plugin provided an activity which allows a data item to be associated with a DOI that is registered using the DataCite service. The decision on whether to register and cite workflow data is then made by the workflow builder during the development of the workflow. In addition, the DataCite can be used to associate the DOI with a web page which can be opened in a web browser to view the data item. For the purposes of software testing in the SageCite project, these web pages were created on a Google Sites web site.

Through the work on SageCite, a better understanding of how Sage Bionetwork’s predictive models of diseases are developed has been obtained. The complexity and work involved in building such models required close working between the two researchers at Sage Bionetworks and the SageCite project. As a result of the SageCite project, specific stages of the modelling process have been documented as Taverna workflows which are shared with the life sciences community using myExperiment. The SageCite project led to a collaboration on the metaGEO project with Brig Mecham from Sage Bionetworks who has developed tools for integrating gene expression data sets for meta-analyses of diseases, as described by Brig in his presentation at the Sage Congress. MetaGEO tools are now being used by Peter Li for studying blood cancers and diseases of inflammation at the University of Birmingham in their Systems Science for Health project.

