W3C incubator group on LLD - Draft Report for comment http://blogs.ukoln.ac.uk/w3clld A blog to allow the community to comment on the draft report of the W3C linked library data incubator group. Tue, 04 Jun 2013 10:29:00 +0000 en-US hourly 1 http://wordpress.org/?v=3.5.2 Separate deliverable — LLD Vocabularies and Datasets http://blogs.ukoln.ac.uk/w3clld/2011/06/26/lld-vocabularies-and-datasets/?utm_source=rss&utm_medium=rss&utm_campaign=lld-vocabularies-and-datasets http://blogs.ukoln.ac.uk/w3clld/2011/06/26/lld-vocabularies-and-datasets/#comments Sun, 26 Jun 2011 16:31:33 +0000 Antoine Isaac http://blogs.ukoln.ac.uk/w3clld/2011/06/26/lld-vocabularies-and-datasets/ The LLD Vocabularies and Datasets report is only available through the LLD wiki.

]]>
http://blogs.ukoln.ac.uk/w3clld/2011/06/26/lld-vocabularies-and-datasets/feed/ 0
Separate deliverable — Use Cases http://blogs.ukoln.ac.uk/w3clld/2011/06/26/use-cases/?utm_source=rss&utm_medium=rss&utm_campaign=use-cases http://blogs.ukoln.ac.uk/w3clld/2011/06/26/use-cases/#comments Sun, 26 Jun 2011 16:30:11 +0000 Antoine Isaac http://blogs.ukoln.ac.uk/w3clld/?p=84 The Use Case report is only available through the LLD wiki.

]]>
http://blogs.ukoln.ac.uk/w3clld/2011/06/26/use-cases/feed/ 0
Recommendations http://blogs.ukoln.ac.uk/w3clld/2011/06/26/recommendations/?utm_source=rss&utm_medium=rss&utm_campaign=recommendations http://blogs.ukoln.ac.uk/w3clld/2011/06/26/recommendations/#comments Sun, 26 Jun 2011 16:05:08 +0000 Antoine Isaac http://blogs.ukoln.ac.uk/w3clld/?p=67 @@ To read the most-up-to-date of version of this section, in the context of the entire report, please see our wiki page

This section comprises the following sub-sections:

The general recommendation of the report is for libraries to embrace of the web of information, both in terms of making their data available for use and in terms of making use of the web of data in library services. Ideally, library data should integrate fully with web resources, creating greater visibility for libraries and bringing library services to information seekers. In engaging with the Web of linked data, libraries can take on a leadership role around traditional library values of managing resources for permanence, application of rules-based data creation, and attention to the needs of information seekers.

Assess

Identify sets of data as possible candidates for early exposure as LD

A very early step should be the identification of high priority/low effort linked data projects. The very nature of linked data facilitates taking an incremental approach to making a set of data available for use on the Web. Libraries are in possession of a complex data environment and attempting to expose all of that complexity in the first steps to linked data would probably not be successful. At the same time, there are library resources that are highly conducive to being published as linked data without disrupting current library systems and services. Among these are authority files (which function as identifiers and have discrete values) and controlled lists. Identification of these “low hanging fruits” will allow libraries to enter the linked data cloud soon and without having to make changes elsewhere in their workflows.

For each set of data, determine ROI of current practices, and costs and ROI of exposing as LD

There must be some measurement of the relative costs of current library data practices and the potential of Linked Data to aid in making decisions about the future of library data. There are various areas of library metadata practices that could be studied, either separately or together. Among these are:

  • The relative costs of the Record v. statement approach: for editing by humans, as full record replacement in systems, and the capabilities for sharing
  • The use of text versus identifiers approach has costs: actual records must change when displays change (Cookery to Cooking); international cooperation requires extensive language mapping processes; some needed data elements must be extracted from textual field using algorithms, which also hinders sharing; and some library data formats require catalogers to duplicate information in the record, providing both textual fields and coded data for same information.
  • Study ways to eliminate duplication of effort in metadata creation and in service development.

Consider migration strategies

A full migration to Linked Data for library and cultural heritage metadata will likely be a lengthy and highly distributed effort. The existence of large stores of already standardized data, however, makes possible economies of scale if the community can coordinate its activities.

Migration plans will need to recognize that there is a difference between “publish” and “migrate”. Publishing existing data as library linked data will make limited use of linked data capabilities because the existing underlying data formats are built on prior data concepts. In particular, existing formats lack the ability to create many of the links that one would like. Migration is likely to be a multi-step process, perhaps publishing non-LD formats as RDF while encouraging libraries to include LD-friendly data elements in current data formats (e.g. MARC21 $0 field for identifiers), then adding identifiers and relationships to that RDF. In addition, the data held in today’s databases was designed to be coherent only within that database environment and does not interact with other data that might be found in the LD environment. The magnitude of this change will mean that it cannot be done as a single, one-time conversion; there will be many seemingly incomplete stages before the community arrives at a destination close to an idealized LD environment.

The length of time to perform the migration will be large because of the number of activities: emergence of best practices for LLD, creation and adoption of new software, consensus on global identifiers and deduplication strategies, and so forth. A plan must be drawn up that stages activities in ways that allow innovators to participate sooner while paving the path for the majority adopters to make use of the work later. Adoption of the plan would also reduce duplication of effort as the community moves from a self-contained, record-based approach to a worldwide graph approach for bibliographic data.

These tasks will require the cooperation of libraries and institutions in a broad coalition. The coalition will need to address some difficult questions. For example, not all institutions will be equally able to make this transition in a timely manner, but it will be important that progress not depend on the actions of a few institutions. The community must be allowed to move forward with new standards as a whole even where past practices have assigned development of standards to particular institutions.

Each of these possible paths have costs and benefits that should be studied and understood as part of the transition to linked data, taking into account the investment that libraries have in their current systems and economic factors. Concurrent with a plan to migrate data is the need for a plan to change data production processes to take advantage of linked data technologies.

Foster a discussion about open data and rights

Rights owners who are better informed of the issues associated with open data publishing will be able to make safer decisions. It makes sense for consortia with common views on the potential advantages and disadvantages of linked data to discuss rights and licensing issues and identify areas of agreement. A mixture of rights within linked data space will complicate re-use of metadata, so there is an incentive to have rights agreements on a national or international scale. For the perspective of UK higher education libraries, see the Rights and licensing section of the Open bibliographic data guide.

Facilitate

Cultivate an ethos of innovation

Small-scale, independent research and development by innovators at individual library organization is particularly important, because small organizations have resources others don’t, such as the freedom to make independent changes iteratively and close contact with internal and end-users. Sharing and reuse of these innovations is important, and it is particularly critical for innovators at small organizations, who may otherwise lack outlets for contact with their counterparts elsewhere. Communication of ideas and loose-knit collaboration across the community can save time and achieve common goals. Existing ad hoc communities such as Code4Lib, dev8D, and the mashedUp series provide support, networking, and information sharing for innovators. Developers and other innovators in these communities need to be further engaged and supported to grow libraries’ capacity for problem-solving and innovation.

Research and development is also advanced at library and information-focused graduate schools, and through research-oriented organizations like ASIS&T and the Dublin Core Metadata Initiative, and in independent research groups like OCLC Research. Connections between such research organizations and individual libraries (especially small libraries, public libraries, and special libraries) could also be fruitful, both in translating research advances more quickly into production-level implementations and in directing research attention to new problems.

CREATION OF TOOLS [PEM]

Identify Linked Data literacy needed for different staff roles in the library

The linked data environment offers a very different perspective on metadata and its applications than traditional approaches. Obtaining best value from this environment requires orientation and education for professional staff interacting with metadata applications and vendors supplying metadata support infrastructures. This should be seen as an extension to existing knowledge and expertise, rather than a replacement of it. It is particularly important that decision-makers in libraries understand the technology environment well enough to make informed decisions.

Include metadata design in library and information science education

The principles and practice of Linked Data offer a fundamental shift in the way metadata is designed. To prepare future professionals in the creation of new services, metadata design should be included in professional degree programs. Topics could include evaluation of datasets and element sets with regard to quality, provenance, and trust, and Semantic Web modeling, with special attention to simple inference patterns and the semantics of data alignment.

Increase library participation in Semantic Web standardization

If Semantic Web standards do not support the translation of library data with sufficient expressivity, the standard can be extended. For example, if Simple Knowledge Organization System (SKOS), a standard used for publishing knowledge organization systems as Linked Data, does not include mechanisms for expressing concept coordination, LLD implementers should should consider devising solutions within the idiom of Linked Data — i.e., on the basis of RDF and OWL. In order to ensure that their structures will be understood by consumers of Linked Data, implementers should work in collaboration with the Semantic Web community both to ensure that the proposed solutions are compatible with Semantic Web best practice and to maximize the applicability of their work outside the library environment. Members of the library world should contribute in standardization efforts of relevance to libraries, such as the W3C efforts to extend RDF to encompass notions of named graphs and provenance, by joining working groups and participating in public review processes.

Design

Translate library data, and data standards, into forms appropriate for Linked Data

In the library environment, conformance to conceptual models and content rules has traditionally been tested at the level of metadata records, the syntactic conformance of which can be validated. As with natural language, there is more than one way to “translate” such models and constraints into the language of Linked Data. In an OWL ontology, for example, content rules may be expressed as “semantic” constraints on properties and classes, while an application profile (in the Dublin Core style) “uses” properties and classes, their semantics untouched, with “syntactic” constraints for validating metadata records. RDF data can also differentiate natural-language labels for things and identifiers (URIs) for the underlying things themselves — a distinction relevant when translating authority data for subjects or personas, traditionally represented by text-string labels. In order make informed choices between the design options, translators of library standards should involve Semantic Web experts who can verify whether the translations correctly convey the translators intentions, and they can make the results of that process available for public comment and testing before widespread implementation is undertaken.

Develop and disseminate best-practices design patterns tailored to LLD

Design patterns allow implementers to build on the experience of predecessors. Traditional cataloging practices are documented with a rich array of patterns and examples, and best practices are starting to be documented for the Linked Data space as a whole (e.g., http://linkeddatabook.com/editions/1.0/#htoc61). What is needed are design patterns specifically tailored to LLD requirements. These patterns will meet the needs of people and developers who rely on patterns to understand new techniques and will increase the coherence of Library Linked Data overall.

Design user stories and exemplar user interfaces

Obviously the point of library linked data is to provide new and better services to users, as well as to allow anyone to create applications and services based on library data. Because the semantic web is new it isn’t going to be possible to predict all of the types of services that can be developed for information discovery and use, but the design of some use cases and experimental user services are necessary to test library data in this environment and to determine fruitful directions for development activities.

Identify and link

Assign unique identifiers (URIs) for all significant things in library data

There are shared things… subject headings, data elements, controlled vocabs, that all need identifiers. The actual records in library catalogs that would be share d also need to be given identifiers, although these may be local, not global, in their “range”.

Create URIs for the items in library datasets

Library data cannot be used in a linked data environment if URIs for specific resources and the concepts of library standards are not available. The official owners of resource data and standards should assign URIs as soon as possible, since application developers and other users of such data will not delay their activities, but are more likely to assign URIs themselves, outside of the owning institution. To avoid proliferation of URIs for the same thing and encourage re-use of URIs already assigned, when owners are not able to assign URIs in good time, they should seek partners for this work or delegate the assignment and maintenance of URI to others.

Some libraries or library organizations should play a leading role in curating the RDF representations of library metadata elements, including URIs, in a similar way to existing patterns of standards maintenance, where a specific organization acts on behalf of the community. Such roles should operate in a more cross-domain environment, to reflect the networking infrastructure of linked data. Agencies responsible for the creation of catalogue records and other metadata, such as national bibliographies, on behalf of national and international library communities should take a leading role in creating URIs for the resources described, as a priority over publishing entire records as linked data, to help local libraries avoid creating duplicate URIs for the same resource.

Namespace policies should be documented and published in a way that allows users of the URIs and namespace to make safe choices based on quality, stability, and persistence.

Create explicit links from library datasets to other well-used datasets

Libraries should also assign URIs for relationships between their things, and between their things and other things in LD space. Without these relationships library data remains isolated, much as it is today.

Directly use, or map to, commonly understood Linked Data vocabularies

In order to ensure linkability with other datasets in the cloud, library datasets must be described using commonly understood vocabularies. Library data is too important to be relegated to an “RDF silo” — an island of information described formally with RDF, but using vocabularies not familiar to less specialized Linked Data consumers. If library data is described using library-specific vocabularies, then those vocabularies should, to the extent possible, be mapped to (aligned with) well-known RDF vocabularies such as Dublin Core, FOAF, BIO, and GeoNames. Alternatively, the maintainers of library-specific vocabularies should promote the widespread use of their vocabularies in mainstream Linked Data applications.

Existing value vocabularies for entities such as places, concepts, events, and persons should be considered for use in LLD. Where library-specific value vocabularies are created, either by translation from legacy standards or through new initiative, their widespread use and alignment with other vocabularies should be promoted.

Prepare

Develop best practices and design patterns for LLD

Reusable design patterns for common data needs and data structures will be necessary to achieve some heterogeneity of metadata across the community. Design patterns will facilitate sharing, but will also create efficiences for data creation by providing solutions to common problems and narrowing the range of options in development activities. These design patterns will also facilitate large-scale searching of these resources and dealing with duplication of data in the linked data space. Best practice documentation should emerge from this development to communicate LLD patterns within the community and among the broader users of LLD.

Commit to best-practice policies for managing and preserving RDF vocabularies

Organizations and individuals who create and maintain URIs for resources and standards will benefit if they develop policies for the namespaces used to derive those URIs. Policies encourage a consistent, coherent, and stable approach which improves effectiveness and efficiency. quality assurance for users of URIs and their namespaces. Policies might cover

  • Use of patterns to design URIs, based on good practice guidelines.
  • Persistence of URIs.
  • Good practice guidelines and recipes for constructing ontologies and structured vocabularies.
  • Version control for individual URIs and the namespace itself.
  • Use of HTTP URIs (URLs) for the identification of library elements, vocabularies and bibliographic data, in keeping with the principles of the semantic web.
  • Extensibility of use of the namespace by smaller organizations.
  • Translations of labels and other annotations into other languages.’

Identify tools that support the creation and use of LLD

Easy-to-use, task-appropriate tools are needed to facilitate library use of linked data. Both generic linked data tools (e.g. a URI generator which facilitates the creation of URIs) and custom domain-oriented LLD tools are required (e.g. a MARC-to-RDF converter; tools incorporating LLD-related vocabularies in easy-to-use ways). Domain-oriented tools should be based on mainstream technologies, so should be adapted from generic tools and regularly maintained so that library approaches don’t diverge too far from the mainstream. Developing appropriate tools will require identification of the necessary tasks (including programming, metadata development, metadata instance creation, and end-user browse and search), the needs and technical abilities of the tool users, as well as the resources available. Tools for non-programmers are especially important, and should adopt an appropriate technical level (i.e. that of a general computer user) and terminology (i.e. terms familiar to the user), including providing help and direction where decisions must be made.

Curate

Apply library experience in curation and long-term preservation to Linked Data datasets

Much the content in today’s Linked Data cloud is of questionable quality — the result of ad-hoc, one-off conversions of publicly available datasets into RDF and not subject to regular accuracy checks or maintenance updates. With their ethos of quality control and long-term maintenance commitments, memory institutions have a significant opportunity to take a key role in the important (and hitherto ignored) function of linked-data curation — duties by which libraries and archives have proven their worth, extended to a new domain. By curating and maintaining Linked Data sets, memory institutions can reap the benefits of integrating value-added contributions from other communities. Data from biographers or genealogists, for example, would immensely enrich resource descriptions in areas to which librarians traditionally do not themselves attend, greatly improving the possibilities for discovering and navigating the collections of libraries and archives

Preserve Linked Data vocabularies

Linked Data will remain usable twenty years from now only if its URIs persist and remain resolvable to documentation of their meaning. As keys to the correct interpretation of data, both now and in the future, element and value vocabularies are particularly important as objects of preservation. The vocabularies used in Linked Data today are developed and curated by maintainers ranging from private individuals to stable institutions. Many innovative developments occur in projects of limited duration. Such a diverse ecosystem of vocabularies is in principle healthier than a semantic monoculture, but most vocabularies reside on a single Web server, representing a single point of failure, with maintainers responsible individually for ensuring that domain-name fees are paid and that URIs are not re-purposed or simply forgotten.

This situation presents libraries with an important opportunity to assume a key role in supporting the Linked Data ecosystem. By mentoring active maintainers with agreements for caching vocabularies in the present and assuming ownership when projects end, maintainers retire, or institutions close, libraries could ensure their long-term preservation, much as they have preserved successive print copies of Dewey Decimal Classification since 1873. With help from automated mirrored-cache technology (such as the LOCKSS system) libraries could, in the short term, ensure uninterrupted access to vocabularies by maintaining versioned snapshots of a vocabulary’s documentation at multiple sites, automatically coming online whenever a primary server should fail (see Baker and Halpin). In this role, libraries could provide protection against service outages and thus improve the robustness of Linked Data generally.

]]>
http://blogs.ukoln.ac.uk/w3clld/2011/06/26/recommendations/feed/ 20
Implementation challenges and barriers to adoption http://blogs.ukoln.ac.uk/w3clld/2011/06/26/implementation-challenges-and-barriers-to-adoption-2/?utm_source=rss&utm_medium=rss&utm_campaign=implementation-challenges-and-barriers-to-adoption-2 http://blogs.ukoln.ac.uk/w3clld/2011/06/26/implementation-challenges-and-barriers-to-adoption-2/#comments Sun, 26 Jun 2011 15:56:32 +0000 Antoine Isaac http://blogs.ukoln.ac.uk/w3clld/?p=97 @@ To read the most-up-to-date of version of this section, in the context of the entire report, please see our wiki page

This section comprises the following sub-sections:

Designed for stability, the library ecosystem resists change

As stable and reliable archives with long-term goals, cultural heritage organizations — particularly libraries — are predisposed to traditionalism and conservation. This emphasis on larger goals has led libraries to fall out of step with the faster-moving technology culture of the past few decades. When most information was in print format, libraries were at the forefront of information organization and retrieval. With the introduction of machine-readable catalogs in the 1960s, libraries were early adopters of the computer, though primarily for automating the production of printed catalogs of print materials. As the volume of information in digital format has overtaken print, libraries have struggled both to maintain their function as long-term archives as well as to extend their missions to include digital information. Decreased budgets for libraries and their parent institutions have greatly hindered libraries’ ability to create competitive information services.

Cooperative metadata creation is economical but creates barriers to change

Libraries take advantage of cooperative agreements allowing them to share resources, as well as metadata describing those resources. These cooperative efforts are both a strength and a weakness: while shared data creation has economic benefit, changes to share data require coordination among the sharing parties.

Consequently, major changes require a strong agent to coordinate the effort. In most countries, the national library provides this type of leadership. Changes that transcend the borders of any single country — such as adopting data standards like FRBR or moving to linked library data — require a broad leadership that can take into account the many local needs of the international library community.

Library Data is shareable among libraries, but not yet with the wider world

Linked Data reaches a diverse community far broader than the library community; moving to library Linked Data requires libraries to understand and interact with the entire information community. Much of this information community has been engendered by the capabilities provided by new technologies. The library community has not fully engaged with these new information communities, yet the success of Linked Data will require libraries to interact with them as fully as they interact with other libraries today. This will be a huge cultural change that must be addressed.

Libraries are understaffed in the technology area

As libraries have not kept pace with technological change, they also have not provided sufficient educational opportunities for staff. Training within libraries is limited in some countries, and workers are not encouraged to seek training on their own. Technological changes have taken place so quickly that many in library positions today began their careers long before the World Wide Web was a reality, and these workers may not fully understand the import of these changes. Libraries struggle to maintain their technological presence and are often under-staffed in key areas of technology.

  • In-house software developers. An informal survey of Code4Lib participants suggests that there are few software developers in libraries. Although the developers are embedded in library operations, coding is often a small part of their duties. Staff developers tend to be closely bound to working with systems from off-the-shelf software providers. These developers are for the most part maintaining existing systems and do not have much time to explore new technology paradigms and new software systems. They are dependent on a shrinking number of off-the-shelf providers as market players have consolidated over the past two decades (see Marshall Breeding’s History of Library Automation).
  • Library workers. Software development skills, including metadata modeling, have often not been a strong part of a library worker’s education. Libraries have in essence out-sourced their technology development to a few organizations in the community and to the library systems vendors. These vendors understand library functionality and data, but they need an expectation of development-costs recovery before beginning work on new products.
  • Library leaders. There are many individual Linked Data projects coming out of libraries and related institutions, but no obvious emerging leaders. IFLA has been a thought-leader in this area, but there is still a need to use their work to provide functional systems and software. Many national libraries have an interest in exploring LLD and some have ongoing projects. LLD will be international in scope, and this increases the amount of coordination that will be needed. Because of its strong community ties, however, leadership from within can be expected to have a dramatic effect on the community’s ability to move in the direction of Linked Data.

Library technology has largely been implemented by a small set of vendors

Much of the technical expertise in the library community is concentrated in the small number of vendors who provide the systems and software that run library management functions as well as the user discovery service. These vendor systems hold the bibliographic data integrated into library management functions like acquisitions, receipt of materials, user data, and circulation. Other technical expertise exists primarily in large academic libraries where development of independent discovery systems for local materials is not uncommon. These latter systems are more likely to use mainstream technologies for data creation and management, but they do not represent the primary holdings of the library.

Libraries do not adapt well to technological change

Technology has continually evolved since computers were first used for library operations in the 1960s. However, the library community tends to engage only with established technologies that have brought proven benefits to their operations and services. The Linked Data approach is relatively new, with enabling technologies and best practices being developed outside of mainstream library applications. Experimentation with Linked Data in the library community has been limited in part due to lack of developer tools for LD in general but also because there are no tools that specifically address library data. It can be difficult to demonstrate the value of LD to librarians because the few examples of implementations that do exist use unfamiliar data and interfaces.

The long-term view by libraries applies also to standards

While both library and Web communities value preservation and endurance (or permanence) of information, the timescales differ: library preservation is measured in generations and centuries (if not millenia) while Web-native information might be considered old at two decades. Ensuring this long-term life of information promotes a conservative outlook for library organizations, which is in contrast to the mainstream perspective of the Web community which values novelty and experimentation over preservation of the past.

Therefore it is not surprising that the library standardization process is slower than comparable Web standards development. Current developments towards a new metadata environment can be traced back more than ten years: The basic groundwork for a shift to a new data format was laid in 1998 with the development of the Functional Requirements for Bibliographic Records (FRBR) which provides an entity-relation view of library catalog data. That model is the basis for a new set of cataloguing rules, Resource Description and Access (RDA), which although they became final in 2010, are still under review before implementation. RDA is a standard of four Anglo-American library communities, and has not had international acceptance, although it is being studied widely. LLD standards associated with RDA are still in the process of development. Through a joint working group with DCMI, the Joint Steering Committee for RDA approved an RDF implementation of the properties and value vocabularies of RDA. These have not yet been moved to production status and are not integrated with the primary documentation and cataloguer tools in the RDAToolkit.

Library standardization process is cumbersome

A further difference is that Web-related organizations focus on implementations, often hammering out differences with practical examples, and leaving edge cases for later work. This is in contrast to the library standardization approach: Standards such as FRBR and RDA have been created as documents, without the use of test cases, prototype implementations, and iterative development methodologies that characterize modern IT approaches. Library standards have a strong “top-down” direction, and major standards efforts are undertaken by national or international bodies. S Development of an international standard takes years and that development cannot keep up with the increasingly fast pace of technological change. Development cycles are often locked into face-to-face business meetings of the parent organization or group to comply with formal approval procedures. As a result, standards may be technologically out-of-date as soon as they are published.

Bottom-up standards can be successful but garner little recognition

While on the Web, bottom-up development is common for all but the largest and most-used standards (e.g. HTML5), bottom-up development often does not get proper recognition from the library community. Even so, some bottom-up initiatives have led to successful standards adopted by the library community, including OpenURL, METS, OAI, and Dublin Core. LLD will require funding and will need institutional support (though it isn’t clear where funding and support will come from) but it will also require an environment where the bottom-up developers can flourish.

Library standards are limited to the library data

While the Web values global interchange between all parties, library cataloguing standards in the past have aimed to address only the exchange of data within the library community where the need to think of broader bibliographic data exchange (e.g. with publishers) is new and not universally accepted. There is fear that library data will need to be “dumbed down” in order to interact with other communities; few see the possibility of “smarting up” bibliographic data using library-produced information.

ROI is difficult to calculate

Some cost issues are known but are unmeasured

It is admittedly difficult to calculate or estimate costs and benefits in a publicly funded service environment. This makes it particularly difficult to create concrete justifications for large-scale changes of the magnitude required for adopting Linked Data in libraries. While there is a general recognition of distinct disadvantages to the silo’d library data practices, no measurement exists that would compare the resources required to create and manage current library data compared to linked library data. (Note: there are some studies on the cost of cataloging, but they do not separately study costs related to data technology: Library of Congress Study of the North American MARC Records Marketplace, R2 Consulting LLC, Ruth Fischer, Rick Lugg, October 2009 ) and Implications of MARC Tag Usage on Library Metadata Practices, OCLC, March 2010.)

“MARC data cannot continue to exist in its own discrete environment, separate from the rest of the information universe. It will need to be leveraged and used in other domains to reach users in their own networked environments. The 200 or so MARC 21 fields in use must be mapped to simpler schema.” Smith-Yoshimura, et al., Implications of MARC Tag Usage on Library Metadata Practices. www.oclc.org/research/publications/library/2010/2010-06.pdf

Library-specific data formats require niche systems solutions

It is possible, however, to observe the consequences of library data practices. Libraries use data technology specific to libraries and library systems. They are therefore dependent on niche software systems tailored to formats that nobody uses outside of the library world. Because the formats used in libraries (notably MARC) are unique to libraries, vendors of library systems cannot use mainstream data modeling systems, programmer tools, and database software to build library systems. Development of library systems also requires personnel specifically trained in library data. This makes it expensive to provide systems for the library community. The common practice of commissioning a new, customized system in every library — every 5 to 10 years — is very expensive; the aggregate cost to the library community has not been reliably estimated.

Vocabulary changes in library data are costly

Controlled vocabularies will play an important role in linked data in general, and although controlled vocabularies are used in library data (in particular for names of persons and organizations, and for subjects) they are not managed in a manner to facilitate linked data: changes to vocabularies require that all related records be retrieved and changed; this is a disruptive process, made even more expensive because the library metadata record, being designed primarily as a communication format, requires a full record replace for updates to any of its fields.

Data may have rights issues that prevent open publication

For a perspective from Europe, see Free library data? by Raymond Bérard.

Some data cannot be published openly

Data related to user identity and use of the library is protected by privacy policies and legislation. Other data, such as that related to purchasing and contracts, is not included in our analysis.

Rights ownership can be unmanageably complex

Some library bibliographic data has unclear and untested rights issues that can hinder the release of open data. Ownership of legacy catalogue records has been complicated by data sharing among libraries over the past 50 years. The records most-shared are those created by national cataloguing agencies such as the Library of Congress in the USA and the British Library in the UK. Records are frequently copied and the copies are modified or enhanced for local cataloguer users. These records may be subsequently re-aggregated into the catalogues of regional, national, and international consortia. Assigning legally-sound intellectual property rights between relevant agents and agencies is difficult, and the lack of certainty is a hindrance to data sharing in a community which is necessarily extremely cautious on legal matters such as censorship data privacy/protection.

Rights have perceived value

On the other hand, some bibliographic data may never have been shared with another party, so rights may be exclusively held by creating agencies, who put a value on past, present and future investment in creating, maintaining, and collecting metadata. Larger agencies are likely to treat records as assets in their business plans, and may be reluctant to publish them as open LD, or may be willing to release them only in a stripped- or dumbed-down form with loss of semantic detail. For example, data about specific types of title such as preferred title and parallel title might be output as a general title, losing the detail required for a formal citation of the resource.

Library data is expressed in library-specific formats that cannot be easily shared outside the library community

Library data is expressed primarily as text strings, not “linkable” URIs

Most information in library data is encoded as display-oriented text strings. There are a few shared identifiers for resources, such as ISBNs for books, but most identification is done with text strings. Some coded data fields are used in MARC records, but there is not a clear incentive to include these in all records, since most coded data fields are not used in library system functions. Some data fields, such as authority controlled names and subjects, do have their own associated records in separate files, which have identifiers that could be used to represent those entities in library metadata. However, the data formats currently used do not support the inclusion of these identifiers in existing library records and consequently neither do current library systems support their use.

Some library data is being expressed in RDF on an experimental basis, but without standardization or best practices

Work has begun to express library data in RDF. Some libraries have experimented with publishing LD from their catalogue records although no standard or best practice has yet emerged. There has been progress in defining value vocabularies currently used in libraries. Transformation of legacy data will require more than the mapping of attributes to RDF properties; where possible, library data should be transformed from text to data with identified values. New approaches for library data, such as the FRBR model which informs RDA, offer an opportunity for incorporating linked data principles into future library data practices, particularly when these new standards are implemented.

The library community and the Semantic Web community have no shared terminology for metadata concepts

Work on LLD can be hampered by the disparity in concepts and terminology between libraries and the Semantic Web community. Few in libraries would use a term like “statement” for metadata, and the Web community does not have concepts equivalent to libraries’ “headings” or “authority control.” Each community has its own vocabulary and these reflect the differences in their points of view. Mutual understanding must be fostered as both groups bring important expertise to the potential web of data.

Library data must be conceptualized according to the Graph Paradigm

Translators of legacy library standards into Linked Data must recognize that Semantic Web technologies are not merely variants of practices but represent a fundamentally different way to conceptualize and interpret data. Since the introduction of MARC formats in the 1960s, digital data in libraries has been managed predominantly in the form of “records” — bounded sets of information described in documents with a precisely specified structure — in accordance with what may be called a Record Paradigm. The Semantic Web and Linked Data, in contrast, are based on a Graph Paradigm. In graphs, information is conceptualized as a boundless “web” of links between resources — in visual terms as sets of nodes connected by arcs (or “edges”), and in semantic terms as sets of “statements” consisting of subjects and objects connected by predicates. The three-part statements of Linked Data, or “triples”, are expressed in the language of the Resource Description Framework (RDF). In the Graph Paradigm, the “statement” is an atomic unit of meaning that stands on its own and can be combined with statements from many different sources to create new graphs — a notion ideally suited for the task of integrating information from multiple sources into recombinant graphs.

Under the Record Paradigm, a data architect can specify with precision the form and expected content of a data record, which can be “validated” for completeness and accuracy. Data sharing among libraries has been based largely on the standardization of fixed record formats, and the consistency of that data has been ensured by adherence to well-defined content rules. Under the Graph Paradigm, in contrast, data is conceptualized according to significantly different assumptions. According to the so-called “open-world assumption”, any data at hand may, in principle, be incomplete. It is assumed that data may be supplemented by incorporating information from other, possibly unanticipated, sources, and that information can be added without invalidating information already present.

The notion of “constraints” takes on significantly different meanings under these two paradigms. Under the Record Paradigm, if the format schema for a metadata record says that the description of a book can have only one subject heading and a description with two subject headings is encountered, a validator will report an error in the record. Under the Graph Paradigm, if an OWL ontology says that a book has only one subject heading, and a description with two subject headings (URIs) is encountered, an OWL reasoner will infer that the two subject-heading URIs identify the same subject.

As will be discussed below, the two paradigms may be seen as complementary. The traditional “closed-world” approach is good for flagging data that is inconsistent with the structure of a metadata record as a document, while OWL ontologies are good for flagging logical inconsistencies with respect to a conceptualization of things in the world. The differences between these two approaches mean that the process of translating library standards and datasets into Linked Data cannot be undertaken mechanically, but requires intellectual effort and modeling skill. Translators, in other words, must acquire some fluency in the language of RDF.

]]>
http://blogs.ukoln.ac.uk/w3clld/2011/06/26/implementation-challenges-and-barriers-to-adoption-2/feed/ 49
Relevant Technologies http://blogs.ukoln.ac.uk/w3clld/2011/06/26/relevant-technologies/?utm_source=rss&utm_medium=rss&utm_campaign=relevant-technologies http://blogs.ukoln.ac.uk/w3clld/2011/06/26/relevant-technologies/#comments Sun, 26 Jun 2011 15:06:43 +0000 Antoine Isaac http://blogs.ukoln.ac.uk/w3clld/?p=31 @@ To read the most-up-to-date of version of this section, in the context of the entire report, please see our wiki page

Linked Data is an emerging technology, so most tools are still developmental. Fortunately, the principles of Linked Data are not tied to any particular tool, rather they are tied to Web standards themselves. In many situations, production and consumption of Linked Data can be layered or interwoven with existing applications without the need for massive redevelopment efforts. The following examples are not exhaustive, but are intended to illustrate a few broad categories. From a non-technical perspective, these technologies are relevant because they support the creation and use of HTTP URIs that identify and describe discrete and recognizable individuals.

Discrete and bulk access to information

The Semantic Web has been around many years, but Linked Data gives it a major boost in the form of “Cool URIs“. Linked Data http URIs are “Cool” because raw RDF can be easily and automatically negotiated and rendered into an HTML format for human (browser) consumption. The DBpedia resource for http://dbpedia.org/resource/Jane_Austen is a good example. This is great for diagnosing data and serendipitous discovery, but the atomic nature of Linked Data http URIs makes it impractical for high volume network access. Fortunately, more and more Linked Datasets are being published in bulk and consistently described using the VoID Vocabulary.

Linked Data front-ends to existing data stores

Unlike information represented hierarchically in typical XML documents, resources published as Linked Data allow information to be freed from use-case-specific hierarchies and thus available for unexpected reuse. This not only makes the information easier to mash up, it also makes tools and services easier to mash up. This is true for both producers and consumers of Linked Data. For example, an existing relational database can be mounted as Linked Data and SPARQL by using D2R Server. Similarly, Linked Data can be produced from existing SRU databases with a few rewrite rules. If the information is already available from a SPARQL endpoint, then a Linked Data front-end like Pubby can be used to automate the URIs.

Tools for data designers

Another boost for Linked Data is the growing use of OWL for purposes of data design. Prior to OWL, domain experts could use RDFS to create domain-specific vocabularies, but there was no way to map equivalencies across vocabularies. Among other features, OWL includes an upgrade to RDFS to support ontology mapping. This allows experts to describe their domain using community idioms, while still being interoperable with related or more common idioms. A variety of tools related to OWL can be found on the W3C’s RDF wiki and OWL wiki.

SKOS and related tools

Yet another key technology boost is being provided by SKOS, which is an OWL ontology for dealing with a broad base of conceptual schemes including the management of preferred and alternate labels. Many SKOS-related tools are listed on the W3C’s SKOS community wiki.

Microformats, Microdata and RDFa

Microformats, Microdata and RDFa all provide ways to embed structured data into web pages. As historically the emphasis on publishing information on the web has had to do with publishing web pages, these technologies provide ways to enhance what is already there rather than necessarily deploying separate infrastructure. RDFa supports expression of RDF data in this way and is therefore the most directly interoperable with other linked data infrastructure.

Microdata, which is defined with the new HTML5, provides another way of doing this. It has noteably gained prominence for Search Engine Optimisation purposes with the announcement of http://schema.org/ by Google, Microsoft and Yahoo. This particular type of microdata does not appear to be intended to represent arbitrarily complex data and the vocabulary that they have published places special emphasis on commerce and tourism. Though it is in principle extensible it would require a lot of extension to express library information in this way as most of the required vocabulary is lacking. There is some level of interoperability with linked data thanks to the efforts at http://schema.rdfs.org/ but at this time it seems like it would be difficult to cultivate the high level of interconnectedness between library and other datasets that is possible with linked data using this approach.

It should be noted that the http://schema.org/ protagonists do support harvesting of RDFa data and have pledged to continue doing so, therefore it does not appear to be the case that by publishing HTML pages marked up with RDFa one might somehow “miss out” on the opportunities afforded by microdata. Modulo bugs in the search engines’ parsers it is even possible to do both in the same web page. If for some reason it is not possible to make use of the full expressive power of RDF with RDFa, some structured data is better than none.

Web Application Frameworks

As the Web has grown in popularity, the software development community has created a variety of software libraries that make it easier to create, maintain and reuse web applications. These libraries are often referred to as web application frameworks, and typically implement the Model-View-Controller (MVC) pattern in some fashion. In addition web application frameworks have typically encoded and encouraged best practices with respect to the REST Architectural Style and Resource Oriented Architecture which have informed much of the standardization around web technologies.

A common component to web application frameworks is a URI routing mechanism, which allows software developers to define http URI patterns, and map them to controllers, which in turn generate an HTTP response using the appropriate views and models. This activity encourages best practices with respect to Cool URIs, and also forces the developer to think about the resources that she is making available on the Web. Linked Data’s focus on naming resources with http URIs, and delivering representations of them (HTML for humans, and RDF for machines) makes it a natural fit for web application frameworks which already provide some of the scaffolding for these activities. The wide availability of web application frameworks in many different programming languages and operating system environments has led to them being heavily used in the cultural heritage sector.

However web developers are sometimes turned off Semantic Web (Linked Data) technologies because they feel like they would need to throw away their current application, to swap their database for a triplestore, and their database query language for SPARQL. This is simply not the case, since RDF serializations can be generated on the fly just as web application frameworks do fo HTML, XML and JSON representations. The use of http URIs to identify and link together resources in RDF’s data model make it a natural choice for serializing and sharing entity state in a database neutral way–which has traditionally been of great interest to cultural heritage organizations and the digital preservation community.

Content Management Systems

Just as web application frameworks have evolved as the Web has spread, so has the class of web applications known as content management systems (CMS). CMS are often built using a web application framework, but provide out-of-the-box functionality for easily creating/editing/presenting content (text, images, video) on the Web, and for managing workflows associated with the content. Since CMS are typically built using web frameworks, the same best practices for naming resources with http URIs are naturally followed. The wide availability of content management systems has led to heavy use in the cultural heritage sector. Some content management systems such as Drupal are starting to expose structured database information to machine clients by seamlessly layering it into their HTML using RDFa. As a result, data consumers such as Google Scholar, Google Maps, Facebook, etc. are starting to leverage this structured metadata in their own service offerings. Conversely, Drupal is also starting to make plugins available to consume RDF, such as VARQL and SPARQL Views.

Web Services for Library Linked Data

Theoretically, most domain-specific Web Service API capabilities could be refactored as Linked Data URIs, OWL, SPARQL, and SPARQL/Update. But even though it should be possible to layer a Linked Data URI front-end on an existing back-end datastore, it may not be so easy for the back-end to support SPARQL and SPARQL/Update access. Security, robustness and performance considerations could also preclude supporting SPARQL in production situations. Furthermore, SPARQL endpoints and bulk RDF downloads can facilitate discovery and reuse of the published Linked Data greatly. Most web developers however face a steep learning curve before being able to exploit it, and for many application requirements this is too much of a burden.

Web Services for the most common uses should be be offered as an alternative. Most Web Service APIs tend to be domain-specific, though, and require custom-coded agents. This means they should be well-documented. More general approaches to web service interfaces include OpenSearch (which can be documented using a Description Document), the Linked Data API and ongoing work of the W3C RDF Web Applications Working Group on RDF and RDFa APIs. Some Linked Datasets could also benefit from syndicated access using Atom Syndication Format and/or RSS.

A few Linked Data implementations have endeavored to implement Web Services to enhance discovery and use of resources, often by providing some form of an application programming interface (API). Agrovoc and STW provide an API to discover resources based on relationships in the data, among many more web services. VIAF, Library of Congress, and STW offer autosuggest services for resources, delivering JSON responses ready for consumption in AJAX browser applications (In principle, though, JSON could be content-negotiable via the Linked Data URI, just like HTML and RDF.) Agrovoc and STITCH/CATCH include support for RDF responses Some services provide full-fledged SOAP APIs, while others support a RESTful approach.

By focusing on request parameters and response formats to provide enhanced discovery, Linked Data Web Services diminish, if not eliminate, the requirement that data be stored in a triplestore or be made searchable via SPARQL. And, because web service APIs are common, web services can lower the barrier to entry.

]]>
http://blogs.ukoln.ac.uk/w3clld/2011/06/26/relevant-technologies/feed/ 3
Available Vocabularies and Datasets http://blogs.ukoln.ac.uk/w3clld/2011/06/26/available-vocabularies-and-datasets/?utm_source=rss&utm_medium=rss&utm_campaign=available-vocabularies-and-datasets http://blogs.ukoln.ac.uk/w3clld/2011/06/26/available-vocabularies-and-datasets/#comments Sun, 26 Jun 2011 15:05:48 +0000 Antoine Isaac http://blogs.ukoln.ac.uk/w3clld/?p=6 @@ To read the most-up-to-date of version of this section, in the context of the entire report, please see our wiki page

The success of linked library data relies on the ability of its practitioners to identify, re-use or connect to existing datasets and data models. Linked datasets and vocabularies that are essential in the library and related domains, however, have previously been unknown or unfamiliar to many.

An inventory of existing library linked data resources

The complexity and variety of available vocabularies, overlapping coverage, derivative relationships and alignments, all result in layers of uncertainty for re-use or connection efforts. Therefore, a current and reliable bird’s eye view is essential for both novices seeking an overview of the library linked data domain and experts needing a quick look-up or refresher for a library linked data project.

The LLD XG thus prepared a side deliverable that identifies a set of useful resources for creating or consuming linked data in the library domain. These are classified into three main groups, which are non mutually exclusive as shown in our side deliverable: metadata element sets, value vocabularies, and datasets.

  • Metadata element sets: A metadata element set is a namespace that contains terms used to describe entities. In the linked data paradigm, such element sets are materialized through (RDF) schemas or (OWL) ontologies, with “RDF vocabulary” occasionally being used as an umbrella term. It may help to think of metadata elements sets as defining the model as distinct from the instance data (which fall into the value vocabulary or dataset categories below). Some examples:
    • Dublin Core defines elements such as Creator and Date (but DC does not define bibliographic records that use those elements).
    • FRBR defines entities such as Work and Manifestation and elements that link and describe them.
    • MARC21 defines elements (fields) to describe bibliographic records and authorities.
    • FOAF and ORG define elements to describe people and organisations as might be used for describing authors and publishers
  • Value vocabularies : A value vocabulary could be thought of as a specialized dataset that focuses on the management of discrete value/label literals for use in metadata records and/or user displays. Value vocabularies commonly focus on specific areas such as topic labels, art styles, author names, etc. They are not typically used to manage complex bibliographic resources such as books, but they are appropriate for related components, such as personal names, languages, countries, codes, etc. These act “building blocks” with which more complex metadata record structures can be built. Many libraries require specific value vocabularies for use in particular metadata elements. A value vocabulary thus represents a “controlled list” of allowed values for an element. Broad categories of value vocabularies include: thesaurus, code list, term list, classification scheme, subject heading list, taxonomy, authority file, digital gazetteer, concept scheme, and other types of knowledge organisation systems. Note however, that value vocabularies often have http URIs assigned to the label/value, which could be used in a metadata record instead of or in addition to the literal value. Some examples:
    • LCSH defines topics of books
    • Art and Architecture Thesaurus defines a.o. art styles
    • VIAF defines authorities
    • GeoNames defines geographical locations (e.g. cities).
  • Datasets : A dataset is a collection of structured metadata (aka instance data) — descriptions of things, such as books in a library. Library records consist of statements about things, where each statement consists of an element (“attribute” or “relationship”) of the entity, and a “value” for that element. The elements that are used are often selected from a set of standard elements, such as Dublin Core. The values for the elements are either taken from value vocabularies such as LCSH, or are free text values. Similar notions to “dataset” include “collection” or “metadata record set”. Note that in the Linked Data context, Datasets do not necessarily consist of clearly identifiable “records”. Some examples:
    • a record from a dataset for a given book could have a Subject element drawn from Dublin Core, and a value for Subject drawn from LCSH.
    • the same dataset may contain records for authors as first-class entities that are linked from their book, described with elements like “name” from FOAF
    • a dataset may be self describing in that it contains information about itself as a distinct entity for example with a modified date and maintainer/curator elements drawn from Dublin Core

Instances of these categories are listed in the side-deliverable along with a brief introduction, basic description and links to their locations. For metadata element sets and value vocabularies, use cases collected by the LLD XG are listed under each entry, which provides a clear context of the usage. For the available metadata element sets, namespaces and descriptions of their domain coverage are briefly presented. Two visuzaliations are also presented to help reveal the inter-relations of metadata element sets and the relationships between datasets and value vocabularies registered in CKAN.

Our side deliverable aims at a broad coverage for each of these categories. However, we are well aware that our report cannot capture the entire diversity of what is out there, especially given the dynamic nature of linked data: new resources are continuously made available, and existing ones are regularly updated. To get a representative overview, we intentionally grounded our work on the use cases that our group has gathered from the community. Additional coverage has been added by the experts who participated in LLD XG to ensure that the most visible resources available at the time of writing have not been forgotten. Finally, to help make our report useful in a longer run, we have included a number of links to tools or web spaces, which we believe can help a reader get a more continuously updated snapshot after this incubator group has ended its work. Notably, we have set up a “Library Linked Data” group in the CKAN repository to gather information on relevant library linked datasets. We hope to actively maintain this CKAN group, but for the sake of long-term success the entire community is invited to contribute.

Some observations

Coverage

The coverage of available metadata element sets and value vocabularies is encouraging. Many such resources have been released over the past couple of years, including some flagship value vocabularies already used by many libraries, such as the Library of Congress Subject Headings, or the Dewey Decimal Classification. Referece metadata frameworks are also provided in a linked data-compatible form, including Dublin Core or various FRBR implementations.

The main concern regarding coverage is the relatively low availability of bibliographic datasets. Descriptions of individual books and other library-held items are slightly less important than metadata element sets and value vocabularies, when re-use come into play. And indeed, tools like union catalogues already realize a significant level of exchange of book-level data. Yet it remains crucial — and it is truly one of the expected benefits of linked data applied in our domain — that library-related datasets get published and interconnected, rather than continue to exist in their own silos.

Quality and support for available sources

The level of maturity or stability of available resources vary greatly. Many resources we found are the result of (ongoing) project work, or the result of individual initiatives, and advertise themselves as mere prototypes. The abundance of such efforts is an sign of healthy activity going on in the library linked data domain. In fact it should come as no surprise, when the whole linked data endeavor encourages a much more agile view on data than in any previous paradigm. Yet this somehow jeopardizes the long-term availability and support for library linked data resources.

From this perspective, we find it encouraging that more and more established institutions are committing resources to linked data projects, from the national libraries of Sweden and Hungary, to the Food and Agriculture Organization of the United Nations, not to mention the Library of Congress or OCLC.

Linking

Establishing connections across various datasets is a core aspect of linked data technology, and a key condition to its success. Many semantic links across value vocabularies are already available, some of them obtained through high-quality manual work, like in the MACS or CRISSCROSS projects. And many value vocabulary publishers clearly strive to establish and maintain links to resources that are close to theirs. VIAF, for example, merges authority records from over a dozen national and regional agencies. And although quantitative evaluation was outside the scope of our effort, we hypothesize that many more such links are possible. Consumers of library linked data should be aware of the open world assumption that characterizes it, i.e., data cannot generally be assumed to be complete, and more data could always be released for any given entity.

A similar concern can be voiced regarding metadata element sets. As testified in the LOV inventory, practitioners generally follow the good practice of re-using existing element sets or building “application profiles” of them, but the lack of long-term support for them threatens their enduring meaning and common understanding. Further, some reference frameworks, notably FRBR, have been implemented in different RDF vocabularies, which are not always connected together. Such situation lowers the semantic interoperability of the datasets expressed using these vocabularies. Here, we hope that better communication between the creators and maintainers of these resources, as encouraged by our own incubator group or the LOD-LAM initiative, will help to consolidate the conceptual connections between them.

At the level of datasets, one may observe the same phenomenon as for the previous categories. For example, Open Library has started attaching OCLC numbers to its manifestations. We note however that efforts are being undertaken, and that the community is already well aware of challenges such as the “de-duplication” one.

We also observe that links are being built between library-originated resources and resources originating in other organizations or domains, DBpedia being an obvious case. Again, VIAF provides an example by taking the merged authority records and linking them to DBpedia whenever possible. This illustrates one of the expected benefits of linked data, where data can be easily networked, irrespective of its origins. The library domain can thus benefit from re-using data from other fields, while library data can itself contributes to initiatives that do not strictly fall into the library scope. In the same vein, LLD efforts could benefit from the availability of generic tools for linking data such as Silk – Link Discovery Framework, Google Refine, or Google Refine Reconciliation Service API. However, the community needs to gain experience using them, sharing linking results, and possibly building more tools that are better suited to the LLD environment.

]]>
http://blogs.ukoln.ac.uk/w3clld/2011/06/26/available-vocabularies-and-datasets/feed/ 16
Benefits http://blogs.ukoln.ac.uk/w3clld/2011/06/26/benefits/?utm_source=rss&utm_medium=rss&utm_campaign=benefits http://blogs.ukoln.ac.uk/w3clld/2011/06/26/benefits/#comments Sun, 26 Jun 2011 15:04:27 +0000 Antoine Isaac http://blogs.ukoln.ac.uk/w3clld/?p=27 @@ To read the most-up-to-date of version of this section, in the context of the entire report, please see our wiki page

“Library Linked Data”: Scope of this report

The scope of this report — “library linked data” — can be understood as follows:

Library. The word “library” (analogously to “archive” and “museum”) refers to three distinct but related concepts: a collection, a place where the collection is located, and an agent which curates the collection and administers the location. Collections may be public or private, large or small, and are not limited to any particular types of resources.

Library data. “Library data” refers to any type of digital information produced or curated by libraries that describes resources or aids their discovery. Data used primarily for library-management purposes is generally out of scope. As discussed in more detail below, this report pragmatically distinguishes three types of library data based on their typical use: datasets, element sets, and value vocabularies.

Linked Data. “Linked Data” (LD) refers to data published in accordance with principles designed to facilitate linkages among datasets, element sets, and value vocabularies. Linked Data uses Web addresses (URIs) as globally unique identifiers for dataset items, elements, and value concepts, analogously to the library world’s identifiers for authority control. Linked Data defines relationships between things; these relationships can be used for navigating between, or integrating, complementary sources of information.

Library Linked Data. “Library Linked Data” (LLD) is any type of library data that is either natively maintained, or merely exposed, in the form of RDF triples, thus facilitating linking.

Benefits of the Linked Data approach

The Linked Data approach offers significant advantages over current practices for creating and delivering library data while providing a natural extension to the collaborative sharing models historically employed by libraries, archives, and museums (“memory institutions”). Linked data is sharable, extensible, and easily re-usable. It supports internationalization of data and user services. These characteristics are inherent in the linked data standards and are supported by the use of web-friendly identifiers for data and concepts. Resources can be described in collaboration with other libraries, and linked to data contributed by other communities or even individuals. Like the linking that takes place today between web documents, linked data allows anyone to contribute their unique expertise so that it can be reused and recombined with the expertise of others. The use of identifiers ensures that the diverse descriptions are all talking about the same thing. Through rich linkages with complementary data from trusted sources, libraries can increase the value of their own data beyond the sum of its sources taken individually, as in the story of the stone soup, where the hungry travellers’ boiling a pot of stones attracted from the locals enough curiosity, and small contributions of herbs and carrots, to create a nourishing meal.

By using globally unique identifiers to designate works, places, people, events, subjects, and other objects or concepts of interest, memory institutions can make trusted metadata descriptions available for common use, allowing resources to be cited across a broad range of data sources. An important aspect of the identifier system is its use of the Domain Name System of the Web. This assures stability and trust in a regulated and well-understood ownership and maintenance context. This is fully compatible with the long-term mandate of memory institutions. Libraries, and memory institutions generally, are thus in a unique position to provide the metadata for resources of long-term cultural importance as data on the Web.

Library authority data for names and subjects will help reduce redundancy of bibliographic descriptions on the Web by clearly identifying key entities that are shared across linked data. This will also aid in the reduction of redundancy of metadata representing library holdings.

Benefits to Researchers, Students and Patrons

Users of library and cultural institution services may not be immediately aware that linked data is being employed. Although the changes that occur will be “under the hood,” the underlying structured data will become more richly linked and the user experience will provide greater discovery and use capabilities. The resulting data webs will result in more sophisticated discovery and navigation across library and non-library information resources. Links can be used to expand indexes much easier than required for today’s federated searching, and can offer users a nearly unlimited number of pathways for browsing.

Library users should be comfortable with the basic concepts of linked data since it uses HTTP, the Web’s standard retrieval protocol. Applications may allow users to “follow their nose” (i.e., resolve trails of URI links) to the data itself. Once retrieved, the recombinational nature of RDF will allow information seekers to extract the parts of the data they need and understand, re-mix as required, or even to add their own annotations as contributions to the global graph. These capabilities meet expectations for an interactive user experience, such as is found in social networking applications.

Relationships to and from non-library services such as Wikipedia, Geonames, and Musicbrainz will help connect local collections into the larger universe of information available on the Web. The rise of semantics in HTML, which plays a role in the crawling and relevancy algorithms of Google, Google Scholar, and Facebook, will provide a way for libraries to enhance their visibility through search engine optimization (SEO), allowing resources to be discovered from Websites they use routinely. Citation management can be made as simple as cutting and pasting URLs. Embedding structured data in Web pages using, for example, RDFa markup in HTML pages, will also facilitate re-use of library data in services to information seekers. Automating the retrieval of citations from linked data or creating links from Web resources to library resources will mean that library data is fully integrated into research documents and bibliographies.

Benefits to Organizations

By promoting a bottom-up approach to publishing data, Linked Data creates an opportunity for cultural organizations (including libraries) to improve the value proposition of describing their assets.

The technology itself can help organizations improve their internal data curation processes and maintain better links between, for instance, digitized objects and their descriptions, and improve data publishing process within the organization, even in a context where all the data isn’t necessarily open. Cultural organizations will be able to make use of mainstream technologies to manage their data. (Today’s library technology is specific to library data formats, leading to the existence of a special Integrated Library Systems industry specific to libraries). Library system vendors will benefit from the adoption of mainstream technology as it will give them an opportunity to broaden their user base.

Linked Data may be a first step toward an “in the cloud” approach to managing cultural information — one which will be more cost-effective than individual systems in institutions. This approach will make it possible for small institutions or individual projects to be visible and connected, with reduced infrastructure costs.

Moreover, in an open data context, these institutions will gain greater visibility on the Web, which is where most information seekers may be found. The focus on identifiers allows descriptions to be tailored to specific communities such as museums, archives, galleries, and audiovisual archives. The openness of data is more an opportunity than a threat. One benefit may be a clarification of the licensing of descriptive metadata towards openness, thus facilitating the reusing and sharing of data and improving institutional visibility. Data thus exposed will be put to unexpected uses, as in the adage: “The best thing to do to your data will be thought of by somebody else.”

Benefits to Librarians, Archivists and Curators

The benefits to Patrons and Organizations will also have a direct impact on library and memory-institution professionals. By using Linked Data, memory institutions will create an open, global pool of shared data that can be used and re-used to describe resources, with a limited amount of redundant effort compared with current cataloguing processes.

The use of the Web and Web-based identifiers will make resources immediately available, up-to-date, for cataloguers to re-use. They will be able to pull together descriptions for resources outside their domain environment, across all cultural heritage datasets, and even the web at large. They will be able to concentrate their effort on their domain of local expertise, rather than having to re-create existing descriptions that have been already elaborated by others.

Benefits to Developers

Linked Data methods support the retrieval and re-mixing of data in a way that is consistent across all metadata providers. Instead of requiring data to be accessed using library-centric protocols such as Z39.50, linked data uses well-known standard Web protocols like HTTP. Developers will also no longer have to work with library-specific data formats such as MARC and EAD, which require custom software tools and applications. Linked Data methods involve pushing data onto the Web in a form that is understandable to Web applications. By leveraging RDF and HTTP, library developers are freed from the need to use domain-specific software, opening a growing range of generic tools, much of which are open-source. Thus they will find it much easier to build new services on top of their data. This also opens up a much larger developer community to provide support to IT professional in libraries. In a sea of RDF triples, no developer is an island.

]]>
http://blogs.ukoln.ac.uk/w3clld/2011/06/26/benefits/feed/ 21
Welcome – W3C LLD Draft report for comment http://blogs.ukoln.ac.uk/w3clld/2011/06/16/hello-world/?utm_source=rss&utm_medium=rss&utm_campaign=hello-world http://blogs.ukoln.ac.uk/w3clld/2011/06/16/hello-world/#comments Thu, 16 Jun 2011 14:46:43 +0000 Monica Duke http://blogs.ukoln.ac.uk/w3clld/2011/06/16/hello-world/ This blog is intended as a means to invite comments on the draft report of the W3C  Library Linked Data Incubator Group. The report contains the following sections:

Feel free to use these sections’ posts to post specific comments on them. Comments can be posted to our public mailing list (public-lld@w3.org) using descriptive subject lines such as ‘Subject: [COMMENTS] “Benefits” – section on “Benefits to Developers”‘.

To read the most-up-to-date of version of this section, in the context of the entire report, please see our wiki page.

In addition, the W3C LLD group will release two separate deliverables:

Comments are much welcome on these documents as well!

]]>
http://blogs.ukoln.ac.uk/w3clld/2011/06/16/hello-world/feed/ 1