Recommendations « W3C incubator group on LLD

1 0

@@ To read the most-up-to-date of version of this section, in the context of the entire report, please see our wiki page

2 0

This section comprises the following sub-sections:

3 0

4 2

The general recommendation of the report is for libraries to embrace of the web of information, both in terms of making their data available for use and in terms of making use of the web of data in library services. Ideally, library data should integrate fully with web resources, creating greater visibility for libraries and bringing library services to information seekers. In engaging with the Web of linked data, libraries can take on a leadership role around traditional library values of managing resources for permanence, application of rules-based data creation, and attention to the needs of information seekers.

5 0

Assess

6 0

Identify sets of data as possible candidates for early exposure as LD

7 1

A very early step should be the identification of high priority/low effort linked data projects. The very nature of linked data facilitates taking an incremental approach to making a set of data available for use on the Web. Libraries are in possession of a complex data environment and attempting to expose all of that complexity in the first steps to linked data would probably not be successful. At the same time, there are library resources that are highly conducive to being published as linked data without disrupting current library systems and services. Among these are authority files (which function as identifiers and have discrete values) and controlled lists. Identification of these low hanging fruits will allow libraries to enter the linked data cloud soon and without having to make changes elsewhere in their workflows.

8 1

For each set of data, determine ROI of current practices, and costs and ROI of exposing as LD

9 0

There must be some measurement of the relative costs of current library data practices and the potential of Linked Data to aid in making decisions about the future of library data. There are various areas of library metadata practices that could be studied, either separately or together. Among these are:

10 1

The relative costs of the Record v. statement approach: for editing by humans, as full record replacement in systems, and the capabilities for sharing
The use of text versus identifiers approach has costs: actual records must change when displays change (Cookery to Cooking); international cooperation requires extensive language mapping processes; some needed data elements must be extracted from textual field using algorithms, which also hinders sharing; and some library data formats require catalogers to duplicate information in the record, providing both textual fields and coded data for same information.
Study ways to eliminate duplication of effort in metadata creation and in service development.

11 0

Consider migration strategies

12 0

A full migration to Linked Data for library and cultural heritage metadata will likely be a lengthy and highly distributed effort. The existence of large stores of already standardized data, however, makes possible economies of scale if the community can coordinate its activities.

13 1

Migration plans will need to recognize that there is a difference between publish and migrate. Publishing existing data as library linked data will make limited use of linked data capabilities because the existing underlying data formats are built on prior data concepts. In particular, existing formats lack the ability to create many of the links that one would like. Migration is likely to be a multi-step process, perhaps publishing non-LD formats as RDF while encouraging libraries to include LD-friendly data elements in current data formats (e.g. MARC21 $0 field for identifiers), then adding identifiers and relationships to that RDF. In addition, the data held in todays databases was designed to be coherent only within that database environment and does not interact with other data that might be found in the LD environment. The magnitude of this change will mean that it cannot be done as a single, one-time conversion; there will be many seemingly incomplete stages before the community arrives at a destination close to an idealized LD environment.

14 0

The length of time to perform the migration will be large because of the number of activities: emergence of best practices for LLD, creation and adoption of new software, consensus on global identifiers and deduplication strategies, and so forth. A plan must be drawn up that stages activities in ways that allow innovators to participate sooner while paving the path for the majority adopters to make use of the work later. Adoption of the plan would also reduce duplication of effort as the community moves from a self-contained, record-based approach to a worldwide graph approach for bibliographic data.

15 0

These tasks will require the cooperation of libraries and institutions in a broad coalition. The coalition will need to address some difficult questions. For example, not all institutions will be equally able to make this transition in a timely manner, but it will be important that progress not depend on the actions of a few institutions. The community must be allowed to move forward with new standards as a whole even where past practices have assigned development of standards to particular institutions.

16 0

Each of these possible paths have costs and benefits that should be studied and understood as part of the transition to linked data, taking into account the investment that libraries have in their current systems and economic factors. Concurrent with a plan to migrate data is the need for a plan to change data production processes to take advantage of linked data technologies.

17 0

Foster a discussion about open data and rights

18 3

Rights owners who are better informed of the issues associated with open data publishing will be able to make safer decisions. It makes sense for consortia with common views on the potential advantages and disadvantages of linked data to discuss rights and licensing issues and identify areas of agreement. A mixture of rights within linked data space will complicate re-use of metadata, so there is an incentive to have rights agreements on a national or international scale. For the perspective of UK higher education libraries, see the Rights and licensing section of the Open bibliographic data guide.

19 0

Facilitate

20 0

Cultivate an ethos of innovation

21 0

Small-scale, independent research and development by innovators at individual library organization is particularly important, because small organizations have resources others dont, such as the freedom to make independent changes iteratively and close contact with internal and end-users. Sharing and reuse of these innovations is important, and it is particularly critical for innovators at small organizations, who may otherwise lack outlets for contact with their counterparts elsewhere. Communication of ideas and loose-knit collaboration across the community can save time and achieve common goals. Existing ad hoc communities such as Code4Lib, dev8D, and the mashedUp series provide support, networking, and information sharing for innovators. Developers and other innovators in these communities need to be further engaged and supported to grow libraries capacity for problem-solving and innovation.

22 1

Research and development is also advanced at library and information-focused graduate schools, and through research-oriented organizations like ASIST and the Dublin Core Metadata Initiative, and in independent research groups like OCLC Research. Connections between such research organizations and individual libraries (especially small libraries, public libraries, and special libraries) could also be fruitful, both in translating research advances more quickly into production-level implementations and in directing research attention to new problems.

23 0

CREATION OF TOOLS [PEM]

24 0

Identify Linked Data literacy needed for different staff roles in the library

25 0

The linked data environment offers a very different perspective on metadata and its applications than traditional approaches. Obtaining best value from this environment requires orientation and education for professional staff interacting with metadata applications and vendors supplying metadata support infrastructures. This should be seen as an extension to existing knowledge and expertise, rather than a replacement of it. It is particularly important that decision-makers in libraries understand the technology environment well enough to make informed decisions.

26 0

Include metadata design in library and information science education

27 0

The principles and practice of Linked Data offer a fundamental shift in the way metadata is designed. To prepare future professionals in the creation of new services, metadata design should be included in professional degree programs. Topics could include evaluation of datasets and element sets with regard to quality, provenance, and trust, and Semantic Web modeling, with special attention to simple inference patterns and the semantics of data alignment.

28 1

Increase library participation in Semantic Web standardization

29 0

If Semantic Web standards do not support the translation of library data with sufficient expressivity, the standard can be extended. For example, if Simple Knowledge Organization System (SKOS), a standard used for publishing knowledge organization systems as Linked Data, does not include mechanisms for expressing concept coordination, LLD implementers should should consider devising solutions within the idiom of Linked Data i.e., on the basis of RDF and OWL. In order to ensure that their structures will be understood by consumers of Linked Data, implementers should work in collaboration with the Semantic Web community both to ensure that the proposed solutions are compatible with Semantic Web best practice and to maximize the applicability of their work outside the library environment. Members of the library world should contribute in standardization efforts of relevance to libraries, such as the W3C efforts to extend RDF to encompass notions of named graphs and provenance, by joining working groups and participating in public review processes.

30 0

Design

31 0

Translate library data, and data standards, into forms appropriate for Linked Data

32 1

In the library environment, conformance to conceptual models and content rules has traditionally been tested at the level of metadata records, the syntactic conformance of which can be validated. As with natural language, there is more than one way to translate such models and constraints into the language of Linked Data. In an OWL ontology, for example, content rules may be expressed as semantic constraints on properties and classes, while an application profile (in the Dublin Core style) uses properties and classes, their semantics untouched, with syntactic constraints for validating metadata records. RDF data can also differentiate natural-language labels for things and identifiers (URIs) for the underlying things themselves a distinction relevant when translating authority data for subjects or personas, traditionally represented by text-string labels. In order make informed choices between the design options, translators of library standards should involve Semantic Web experts who can verify whether the translations correctly convey the translators intentions, and they can make the results of that process available for public comment and testing before widespread implementation is undertaken.

33 0

Develop and disseminate best-practices design patterns tailored to LLD

34 0

Design patterns allow implementers to build on the experience of predecessors. Traditional cataloging practices are documented with a rich array of patterns and examples, and best practices are starting to be documented for the Linked Data space as a whole (e.g., http://linkeddatabook.com/editions/1.0/#htoc61). What is needed are design patterns specifically tailored to LLD requirements. These patterns will meet the needs of people and developers who rely on patterns to understand new techniques and will increase the coherence of Library Linked Data overall.

35 0

Design user stories and exemplar user interfaces

36 0

Obviously the point of library linked data is to provide new and better services to users, as well as to allow anyone to create applications and services based on library data. Because the semantic web is new it isnt going to be possible to predict all of the types of services that can be developed for information discovery and use, but the design of some use cases and experimental user services are necessary to test library data in this environment and to determine fruitful directions for development activities.

37 0

Identify and link

38 0

Assign unique identifiers (URIs) for all significant things in library data

39 0

There are shared things subject headings, data elements, controlled vocabs, that all need identifiers. The actual records in library catalogs that would be share d also need to be given identifiers, although these may be local, not global, in their range.

40 0

Create URIs for the items in library datasets

41 1

Library data cannot be used in a linked data environment if URIs for specific resources and the concepts of library standards are not available. The official owners of resource data and standards should assign URIs as soon as possible, since application developers and other users of such data will not delay their activities, but are more likely to assign URIs themselves, outside of the owning institution. To avoid proliferation of URIs for the same thing and encourage re-use of URIs already assigned, when owners are not able to assign URIs in good time, they should seek partners for this work or delegate the assignment and maintenance of URI to others.

42 0

Some libraries or library organizations should play a leading role in curating the RDF representations of library metadata elements, including URIs, in a similar way to existing patterns of standards maintenance, where a specific organization acts on behalf of the community. Such roles should operate in a more cross-domain environment, to reflect the networking infrastructure of linked data. Agencies responsible for the creation of catalogue records and other metadata, such as national bibliographies, on behalf of national and international library communities should take a leading role in creating URIs for the resources described, as a priority over publishing entire records as linked data, to help local libraries avoid creating duplicate URIs for the same resource.

43 0

Namespace policies should be documented and published in a way that allows users of the URIs and namespace to make safe choices based on quality, stability, and persistence.

44 0

Create explicit links from library datasets to other well-used datasets

45 1

Libraries should also assign URIs for relationships between their things, and between their things and other things in LD space. Without these relationships library data remains isolated, much as it is today.

46 0

Directly use, or map to, commonly understood Linked Data vocabularies

47 0

In order to ensure linkability with other datasets in the cloud, library datasets must be described using commonly understood vocabularies. Library data is too important to be relegated to an RDF silo an island of information described formally with RDF, but using vocabularies not familiar to less specialized Linked Data consumers. If library data is described using library-specific vocabularies, then those vocabularies should, to the extent possible, be mapped to (aligned with) well-known RDF vocabularies such as Dublin Core, FOAF, BIO, and GeoNames. Alternatively, the maintainers of library-specific vocabularies should promote the widespread use of their vocabularies in mainstream Linked Data applications.

48 0

Existing value vocabularies for entities such as places, concepts, events, and persons should be considered for use in LLD. Where library-specific value vocabularies are created, either by translation from legacy standards or through new initiative, their widespread use and alignment with other vocabularies should be promoted.

49 0

Prepare

50 0

Develop best practices and design patterns for LLD

51 1

Reusable design patterns for common data needs and data structures will be necessary to achieve some heterogeneity of metadata across the community. Design patterns will facilitate sharing, but will also create efficiences for data creation by providing solutions to common problems and narrowing the range of options in development activities. These design patterns will also facilitate large-scale searching of these resources and dealing with duplication of data in the linked data space. Best practice documentation should emerge from this development to communicate LLD patterns within the community and among the broader users of LLD.

52 0

Commit to best-practice policies for managing and preserving RDF vocabularies

53 0

Organizations and individuals who create and maintain URIs for resources and standards will benefit if they develop policies for the namespaces used to derive those URIs. Policies encourage a consistent, coherent, and stable approach which improves effectiveness and efficiency. quality assurance for users of URIs and their namespaces. Policies might cover

54 1

Use of patterns to design URIs, based on good practice guidelines.
Persistence of URIs.
Good practice guidelines and recipes for constructing ontologies and structured vocabularies.
Version control for individual URIs and the namespace itself.
Use of HTTP URIs (URLs) for the identification of library elements, vocabularies and bibliographic data, in keeping with the principles of the semantic web.
Extensibility of use of the namespace by smaller organizations.
Translations of labels and other annotations into other languages.

55 0

Identify tools that support the creation and use of LLD

56 0

Easy-to-use, task-appropriate tools are needed to facilitate library use of linked data. Both generic linked data tools (e.g. a URI generator which facilitates the creation of URIs) and custom domain-oriented LLD tools are required (e.g. a MARC-to-RDF converter; tools incorporating LLD-related vocabularies in easy-to-use ways). Domain-oriented tools should be based on mainstream technologies, so should be adapted from generic tools and regularly maintained so that library approaches dont diverge too far from the mainstream. Developing appropriate tools will require identification of the necessary tasks (including programming, metadata development, metadata instance creation, and end-user browse and search), the needs and technical abilities of the tool users, as well as the resources available. Tools for non-programmers are especially important, and should adopt an appropriate technical level (i.e. that of a general computer user) and terminology (i.e. terms familiar to the user), including providing help and direction where decisions must be made.

57 0

Curate

58 0

Apply library experience in curation and long-term preservation to Linked Data datasets

59 0

Much the content in todays Linked Data cloud is of questionable quality the result of ad-hoc, one-off conversions of publicly available datasets into RDF and not subject to regular accuracy checks or maintenance updates. With their ethos of quality control and long-term maintenance commitments, memory institutions have a significant opportunity to take a key role in the important (and hitherto ignored) function of linked-data curation duties by which libraries and archives have proven their worth, extended to a new domain. By curating and maintaining Linked Data sets, memory institutions can reap the benefits of integrating value-added contributions from other communities. Data from biographers or genealogists, for example, would immensely enrich resource descriptions in areas to which librarians traditionally do not themselves attend, greatly improving the possibilities for discovering and navigating the collections of libraries and archives

60 0

Preserve Linked Data vocabularies

61 0

Linked Data will remain usable twenty years from now only if its URIs persist and remain resolvable to documentation of their meaning. As keys to the correct interpretation of data, both now and in the future, element and value vocabularies are particularly important as objects of preservation. The vocabularies used in Linked Data today are developed and curated by maintainers ranging from private individuals to stable institutions. Many innovative developments occur in projects of limited duration. Such a diverse ecosystem of vocabularies is in principle healthier than a semantic monoculture, but most vocabularies reside on a single Web server, representing a single point of failure, with maintainers responsible individually for ensuring that domain-name fees are paid and that URIs are not re-purposed or simply forgotten.

62 0

This situation presents libraries with an important opportunity to assume a key role in supporting the Linked Data ecosystem. By mentoring active maintainers with agreements for caching vocabularies in the present and assuming ownership when projects end, maintainers retire, or institutions close, libraries could ensure their long-term preservation, much as they have preserved successive print copies of Dewey Decimal Classification since 1873. With help from automated mirrored-cache technology (such as the LOCKSS system) libraries could, in the short term, ensure uninterrupted access to vocabularies by maintaining versioned snapshots of a vocabularys documentation at multiple sites, automatically coming online whenever a primary server should fail (see Baker and Halpin). In this role, libraries could provide protection against service outages and thus improve the robustness of Linked Data generally.

J. McRee Elrod

6/28/2011

GO TO TEXT

If we are concerned with standardization, would it not be good to decide whether it is “web” or “Web” in our writing?

karencoyle

6/29/2011

This is a wiki draft, and the editorial work to fix things like case will happen when it is turned into a “real” document. It would be good to know, for things like “web”, what people find clearest in terms of case.

Would it not be helpful to spell out acronyms at first use in each section, e.g, “ROI”, as is done for “URIS” at 38?

Jody DeRidder

7/18/2011

I honestly think this is the best way to get the library culture to buy in to linked data implementations and the semantic web. If librarians become part owners of the process, claiming their role as information professionals, staking their interests in assisting controlled vocabulary mapping and development and improving access to patrons… then we’ll make progress. You may need to frame this in terms of the Ranganathan 5 laws, modernized:
1) Information is for use.
2) Every user his/her information.
3) Every bit of information, its user.
4) Save the time of the user.
5) Information access is a growing organism.

Assisting in getting the right information in the right hands at the point of need: isn’t that what librarianship is all about?

Returning to the basic focus of our profession may help to sell the change.

Patrick Danowski

7/22/2011

Not sure where to put general comments because it deals most with the outcome of the paper i leave it in this section:

When I first read the paper i was disappointed. I think I expected a little bit more concrte recommentations. Later I learned that the task for the group is to evaluate is a library linked data group is needed under the roof of W3C. or how it is written in the success criteria “, leading to a clear and agreed view regarding what further standards and guidelines should be developed, and what organization should be set up in order to develop them.” in the deliverables this point is described as ” propose a way forward for these communities to participate productively in further W3C standardization actions.” I try to focus on this point during my comments.

A main question is to less covered in this recommendations, this is in my person view the question of complexity. If the W3C should be active in the field of library data which complexity they should take about, must it really the level of internal library formats like RDA and MARC or should be more general that it understandable and useable by a broader audience. If we decide for the second point (and this is the only option on a W3C level else it could be better done by Library of Congress or IFLA) earns it automatically that not the full complexity of library data can be handled by a W3C standard. A second question is also for which types of library data we need to develop standards. Three very easy upcoming topics are bibliographic data, vocabularies and classifications, but this list don’t have to be complete.

Already in the benefits the conflict of complexity is not covered very well instead the goal reads more to find a systems which is suitable for all needs even for internal library use. In my person view it looks like the discussion focused to much on the point how the work of this group ca replace existing systems in libraries but this can be much better discussed in a library only community and don’t need the W3C. Certainly the discussion the library community should be connected to the W3C discussion, so that in future the transition costs to an easy to reuse by a broader community format will be easily realized. If this focus would be changed a lot of the barriers would also fall away because the goal won’t be to change the library system in the basics what means that also not all libraries have to adapt such a new system this will always take a large amount of time like the transition process from MAB (the german internal library format) to MARC21 is showing. Also the aspects of the costs won’t be so high because there will be no force to change.

The future work of a library data group as part of the W3C should focus on the last two problems first which rights protection should data have, so that is can be easily reused outside of the library world and how this data can be more easily shared outside of the library world. Under this conditions a graph paradigm is still useful but not a must. The vision of global unique identifiers sounds nice but in a bottom up approach like the semantic web not realizable, specially because there are already different existing identifiers for library objects. The build of effective mapping and resolver systems like VIAF sounds more promising.

The resulting recommendations are nice for library management but they are not answering the main question:

A)Which standards must be developed

B) How should an organization look like which care about the standards?

With the outcome of the group i would answer this question in the following way:

A) Existing library standards are certifying they need only translated in a best practice way in a linked data format.

B) There is no need for a further work of a W3C group, this development can be done by existing library standarisation comities like the Library of Congress Future of MARC group or different groups as part of IFLA

But in my person view this are the wrong answers because this will lead again to standards which are suitable and understandable by libraries but not enough by external players like citation management software or other existing tools.

Indeed the answers should be

A) We need the development of a lightly W3C standards e.g. for the representation of bibliographic data which are able to deal which the complexity that is mainly useful for non-library systems.

B) A group should created where the voting weights from library community is max 50% of the total votes else it will happen like in the incubator group that to many librarians will be presented and create an library only standard.

The work of a W3C group is only needed if the result will be broadly excepted in the web community not in the library world.

Alan Danskin

Agree value of authority files as LOD.

Release of national bibliographies as LOD also has potential to generate a lot of data without excessive duplication and could provide the hooks for holdings of individual libraries.

Statistical analyses of redundancy in current metadata processes may identify a lot of waste, but comparing well established standards with emerging standards is complex. Measuring linked open data against current processes will be difficult. A further complexity is that there is not agreement within the LOD community on significant issues: such as types of persistent identifier to be preferred; application of RDF model: there are differences of opinion concerning class/property; use of literals; use of blank nodes. This makes engagement with LOD complex, confusing and costly

BL will publish experience of converting BNB from MARC 21 to LOD. In principle, BL is also open to publishing information on the tools employed and where possible, the tools themselves.

The tools are only part of the equation; the expertise necessary for their effective deployment should not be underestimated. BL’s experience certainly confirms the expectation that this is an iterative process.

http://www.bl.uk/bibliographic/datafree.html

BL endorses the finding that any restrictions on reuse of metadata inhibit value as linked data

BL experience illustrates that issues such as identification of the real object distinct from the concept of an object are still very much alive. It seems prudent while such fundamental debates remain unresolved to err on the side of caution and identify both separately. Real work is needed on use cases to illustrate that identification of the real object is sufficient for all needs, not just library requirements.

Agree

BL model is previewed at:

The British Library welcomes the work of the work of the W3C incubator group on library linked data. The British Library has been experimenting with the practicalities of expression the British National Bibliographic as linked data and our comments draw on this experience.

The report should substantiate its assertions regarding the value of linked data more explicitly. It would be instructive include examples of the benefits derived by other communities.

Ed Chamberlain

7/29/2011

There are also issues of concern around attribution licenses and linked data.
Attribution only works and thus has practical value at a dataset level. Given the composite nature of RDF, any single triple could be referenced or reused by another record or service. Attribution does not work practically in this context.

Adrian

8/3/2011

This paragraph contains a more differentiated view on open data than the previous paragraphs which touch the topic. As said in other comments, a general explanation of ‘Open Data’ (especially compared to ‘Linked Data’) is missing in the report.

Is it right to call OCLC Research an “independent research group”?

“Use of HTTP URIs (URLs) for the identification of library elements, vocabularies and bibliographic data…”

Perhaps the term “data” should be exchanged as it is especially important to use HTTP URIs for bibliographic _resources_ and not only for the data about them.

W3C incubator group on LLD – Draft Report for comment