OPM in 20 bullet points

2010 December 23

by Monica Duke

Yesterday I mentioned that I had started to get stuck into the review of ontologies by tackling OPM – the Open Provenance Model. Here I have tried to summarise OPM in 20 bullet points:

OPM emerged as a consensus from community participants with activity starting back in 2006.
Provenance challenge activities led to substantial ageement on a core representation of provenance.
The OPM specification v.1 was released in 2007; an open-source model was adopted for governance of OPM; version 1.1. of OPM was presented in 2009.
OPM is designed to meet the following requirements:
- To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model.
- To allow developers to build and share tools that operate on such provenance model.
- To define provenenace in a precise, technology-agnostic model.
- To support a digital representation of provenance for any “thing”, whether produced by computer systems or not.
- To allow multiple levels of description to co-exist.
- To define a core set of rules that identify the valid inferences that can be made on provenance.

OPM consists of a directed graph expressing what caused things to be i.e. how things depended on others and resulted in specific states. Provenance graphs are aimed at representing causality graphs explaining how processes and artifacts came out to be. A graphical notation for provenance graphs is suggested.
OPM is based on three kinds of nodes in the graph, defined as: Artifact: Immutable piece of state, which may have a physical embodiment in a physical object, or a digital representation in a computer system. Process: Action or series of actions performed on or caused by Artifacts, and resulting in new artifacts. Agent: Contextual entity acting as a catalyst of a process, enabling, facilitating, controlling or affecting its execution.
Causal dependencies between artifacts, process and agents are captured in the graph. The edges denote one of the following categories of dependency (the nodes are as defined in the previous bullet):
- used(R)
- wasGeneratedBy(R)
- wasControlledBy(R)
- wasTriggeredBy
- wasDerivedFrom
R can be used to denote a role which is meaningful in the context of the application, and aims to distinguish the nature of the dependency when multiple such edges are connected to the same process. e.g. a process may use several files, reading parameters from one (R=parameters) and reading data from another (R=data). Communities need to define their own roles in OPM profiles, and roles should always be specified
OPM adopted a weak notion of causal dependence and defines the dependencies, but recognizes that subclasses that capture stronger notions of causality may be needed in specific systems (e.g. a strong interpretation of the used edge requires the artifact to be available for the process to start).
OPM recognises a need for detail at different levels of abstraction or of a different viewpoint of processes – giving rise to different accounts of the same execution, and describes how this can be achieved.
An account represents a description at some level of detail as provided by one or more observers. The concept of account allows multiple descriptions to co-exist.
OPM allows the addition of time information to processes. Time is optional.The model specifies constraints that time information must satisfy with respect to causal dependencies.
OPM expects that reasoning algorithms may be used over provenance models and describes completion rules and multistep inferences to show how causal dependencies can be summarised by transitive closure.
OPM provides a formal definition of what constitutes a legal graph and defines rules for OPM e.g. the requirement for identifiers for accounts, artifacts, processes and agents, what is optional and what is mandatory.
An annotation framework allows extra information to be added to OPM entites to allow meaningful exchange in specific communities. Some properties that are expected to be commonly used (such as Label) are defined.
OPM profiles are intended to define a specialisation of OPM, and capture best practice and usage guidelines developed by communities. Profiles must remain compliant with the semantics of OPM.
Attribution can be attached as an annotation but work is currently in progress to deal with these concepts.
The Open Provenance Model Vocabulary was released in December 2010 and is designed as a lightweight provenance vocabulary by implementing the OPM.
There are a number of alternative provenance vocabularies; these have been mapped to OPM in an analysis by the W3C Provenance Incubator Group task force.
http://openprovenance.org/ is the main page for OPM activity and contains links to the specifications, a tutorial, tools that implement OPM, the OPM wiki and other useful pointers.

Despite being aware of OPM over the years this has been my first real close look at its description, so I hope that I have done it justice in the above 20 bullet points. I invite others with more knowledge to make comments and additions or corrections, and I look forward to working collaboratively with colleagues on JISC MRD projects to evaluate the various standards that we are interested in.

2 Responses

Amin permalink

June 5, 2011

Hi Monica, nie post !
I’ m interested in provenance management and provenance framworks. However, I don’t see any real use case validating OPM for provenance between distributed applications (all of the works focused on scientfic workflows )
Also, I wanna know if the OPM annotation framwork allow to query provenance using annotations (or they are just textual descriptions)?
Anyway, I hope that we can exchange about different issues in OPM.
Thanks.

Trackbacks and Pingbacks

Open Provenance Model | SageCite KnowledgeBlog

Comments are closed.

Citing network models of disease and associated data.

OPM in 20 bullet points

Trackbacks and Pingbacks

Recent Articles

Blogroll

Admin