Implementation challenges and barriers to adoption « W3C incubator group on LLD

This discussion is now closed.

show all (49)

There are no comments. Click the text to your left to make a new comment.

6/28/2011

While I appreciate the fact that this section is in a larger one about Barriers to adoption, I do feel that the heading is overly critical. I think it would be fairer to say that Libraries are no longer early adopters of new technology for the parts of their service which they consider to be business critical – partially because of the issues of retro-conversion of the collections that they already hold and partially because they are service providers who need to ensure that the service continues to run.

Catherine Jones

6/28/2011

GO TO TEXT

While I agree that cataloguing standards were designed to exchange data between libraries; I’m not sure that I would agree that bib. exchange with publishers is new and not accepted. While libraries may not use individual publishers – it is common practice to get bib records from your book supplier. This also doesn’t address journals – our institutional repository uses Cross-Ref to look up the DOIs of journal articles to enahance the information recorded about the publication in the IR.

laura k

7/18/2011

GO TO TEXT

Perhaps if this was broadened to talk about the fact that we share metadata within our supply chain, for lack of a better phrase (publishers, indexing and abstracting services, etc), but not frequently with organizations outside of the traditional information world.

Catherine Jones

6/28/2011

GO TO TEXT

This is true – but one could say this of most disciplines, for example scientific instruments producing data in a certain format needs specialist, niche systems solutions. What is the special issue about Library systems in particular?

Karen

6/28/2011

GO TO TEXT

I actually think that the picture is brighter than this. Although libraries haven’t been technology leaders, they have embraced new technologies to the benefit of their communities, providing free Internet access, lending ebooks (even before they were popular). This is separate from the struggle to manage the flood of digital content.

There is another issue, which is that managing digital content might be better done on a scale that is larger than any one library, while managing physical items is suitable to local institutions. There are tens or hundreds of thousands of libraries, many very small. Digital materials need to be managed globally, not locally, and there is no global library organization to do this.

Laura Smart

6/28/2011

GO TO TEXT

Catherine is correct in that it’s true in other disciplines. I question, however, to what extent this is true in library systems. Database work is database work no matter how the element sets are structured. I think most commercial systems probably use ER modeling/diagramming when creating their systems, those systems are often built on commercial DB (ex. Innovative ILS implementation can be Oracle-based) or Open Source DB (MySQL), and who really knows what type of programming tools and paradigms are being used behind the proprietary wall. I think vendors like VTLS and Ex Libris are probably using agile development techniques.

Laura Smart

6/28/2011

GO TO TEXT

“the library metadata record, being designed primarily as a communication format, requires a full record replace for updates to any of its fields.” Not true. It’s possible to overlay specific fields while doing global updates in an ILS. It is true that it is costly, however.
The process of propagating the need to do updates is what’s expensive. LC changes a subject term or the NAF changes an authority heading, they have to spread the news that the change is made, and then local databases have to do the global updates. It suffers a time delay in addition to the monetary cost. This process can be automated and/or out-sourced but it still has its price.

laura k

7/18/2011

GO TO TEXT

This is costly when metadata needs to be changed in many local records across thousands of libraries; if metadata were in a centralized database and linked to by library records, however, the vocabulary changes would only need to occur in one place, thus saving costs in the long run.

Laura Smart

6/28/2011

GO TO TEXT

I think the web community does have a concept which is equivalent (or at least quasi-equivalent) to libraries headings or authority control. It’s the concept of “unique identifiers.” It is true that the communities don’t share a common language or vocabulary but I don’t think it’s true that they don’t have concepts in common.

karencoyle

6/29/2011

GO TO TEXT

This has been criticized as being not only negative but not really true. Is there another way to say this? I think it is mainly about libraries having trouble being on the leading edge and generally having a hard time changing.

karencoyle

6/29/2011

GO TO TEXT

The paragraph doesn’t really match the heading. The paragraph talks about the lack of library-related LD tools. Maybe that is an issue on its own?

karencoyle

6/29/2011

GO TO TEXT

I think it would be worthwhile to talk separately about the issue of having iterative standards with proof-of-concept development, and the issue of the time lag imposed by the meeting cycles.

I also think somewhere we should mention the variety of standards fora — IFLA, national fora, NISO, and the recent awareness of non-library fora, like W3C.

karencoyle

6/29/2011

GO TO TEXT

I’m not sure this is true as stated. I think that the issue is that local development efforts mainly spread through vendor adoption, and that means that a local development must have wide-spread utility to be adopted. The issue isn’t so much bottom-up but developments that only are of interest to a niche market that vendors cannot economically support.

karencoyle

6/29/2011

GO TO TEXT

It may be useful, however, to say that there is some valuable information to be gleaned from these private areas, like overall circulation statistics for individual titles. Scrubbing the data of any personally identifiable information adds cost to these projects. Privacy is essential and should not be compromised, but it has an impact on projects.

karencoyle

6/29/2011

GO TO TEXT

I agree with Laura that, if we investigate fully, we may find that we have more in common than it appears on the surface. An advantage to that investigation would be that it would require us to clarify our data goals in new terms; we might learn something from the exercise.

Roy Tennant

7/17/2011

GO TO TEXT

That libraries “do not adapt well to technological change” is debatable, and largely orthogonal in any event to whether libraries are using linked data. The crux of the problem to me is the old “chicken and egg” problem. Libraries won’t use linked data until/unless it solves a need. Right now it doesn’t, or at least we lack the tools to make linked data effective in a library environment. Frankly, I don’t see any killer apps out there in any industry, which inhibits adoption in any industry, and even more so in libraries which are organizations of limited resources.

laura k

7/18/2011

GO TO TEXT

I think linked data solves a fairly large need which tends to be overlooked: Making library metadata interoperable with the rest of the web, and with other networked information. I don’t necessarily think this is a situation where an application is going to pop up that makes people see its usefulness, but one where these ideas need to be taken into consideration when we think about how to reconstruct bibliographic metadata. How do we implement these ideas as we re-work (or get rid of) MARC?

Jody DeRidder

7/18/2011

GO TO TEXT

There appears to be a cultural prejudice against software developers in at least some segments of the library culture. Such work is seen as tasks for underlings, and hence the pay scale for programmers in libraries cannot compete with the commercial market. Those in higher ranks in the library are often expected to set aside programming work and not “get their hands dirty.” Programmers may not get respect from librarians, particularly if the programmers do not also have librarian degrees. Continued cutbacks to library funding also reduces the ability to hire decent programmers. These issues combine to keep many libraries in the backwaters of technological development.

Jennifer Bowen

7/20/2011

GO TO TEXT

An important point. We have discovered this first-hand working on the eXtensible Catalog – programmer salaries in libraries cannot compete within the marketplace. This deserves mention in the report.

Jody DeRidder

7/18/2011

GO TO TEXT

Difficult, but perhaps not impossible. A test implementation could be used as a basis for feedback and user testing. Measurement should be made of experienced researchers as well as undergraduate students, in comparison with the same site unmodified. If indeed we can measure improved research capabilities, speed, and discovery, we will have built a case for expanding this effort.

laura k

7/18/2011

GO TO TEXT

I don’t think this heading is entirely clear. Meaning that standards should also be considered as something that should last a long time? That standards take a long time to be developed? That we need to start thinking about how to preserve digital objects and web-based objects with new standards?

Jennifer Bowen

7/20/2011

GO TO TEXT

My general impression of this section is that it fairly accurately describes the difficulties that libraries may face to adopting linked data (although some statements may be a bit too sweeping at times – I agree with some of the other comments below). However, my concern is that these challenges and barriers are presented without any attempt to suggest possible solutions to them. Since there are indeed many challenges to the library community, the recommendations presented later in the report do not seem to be adequately justified by the content of the report. In other words, the way the report reads right now, the challenges may outweigh the benefits. I do not believe that is what the report intends to convey, and that is not what is SHOULD convey. I recommend that some of the sections below include at least a brief discussion of possible steps to mediate these challenges and barriers, otherwise the whole situation just begins to seem pretty hopeless. Making the benefits section at the beginning more compelling will also help considerably. I will add other suggestions below where additions could be made.

Jennifer Bowen

7/20/2011

GO TO TEXT

One of the conclusions that I would make from this is that libraries can derive great benefit from linked data to try to address this situation of decreased budgets and inability to extend their missions to include digital information. Libraries have a great NEED for linked data, and this paragraph explains why.

Jennifer Bowen

7/20/2011

GO TO TEXT

Are there any signs that this is beginning to change, where libraries are beginning to interact with these other communities? Cite some examples here? How about the mere existence of this incubator group?

Jennifer Bowen

7/20/2011

GO TO TEXT

The sentences on library workers do not follow logically one from another. I would like to see this paragraph suggest possible ways to change the way that library workers are educated, or provide continuing education in linked data. Much has been happening in that arena over the past year or so.

Jennifer Bowen

7/21/2011

GO TO TEXT

More should be made here (or somewhere in the report) about how the strong cooperative culture that is now present in the library community can be an asset for implementing linked data: use of common vocabularies and standards, consistency of metadata, structures in place to mobilize community action toward a shared goal…

Jennifer Bowen

7/21/2011

GO TO TEXT

This section deals with libraries being understaffed technologically, so the section on library leaders should probably address the problems that library leaders have in employing technology staff. The points that are made here about libraries taking leadership in LLD should perhaps go in a separate section. I suggest that this also include discussion of how some library organizations are now exploring what actions to take regarding LLD (ALA and the Program for Cooperative Cataloging are two examples) and that what is needed is advice and leadership from outside the library community, to enable library leaders to know what specific steps need to be taken and to make informed decisions. That process is already beginning.

Jennifer Bowen

7/21/2011

GO TO TEXT

If the statement about the library community only engaging with established technologies is allowed to stand, then there needs to be some explanation of WHY that is the case.

Jennifer Bowen

7/21/2011

GO TO TEXT

The statement that there are “no tools that specifically address library data” is a bit strong. I suggest at least a mention of emerging tools, such as the eXtensible Catalog, which will help to make library data “linked data ready”, if not (currently) to create true linked data yet.

karencoyle

7/23/2011

GO TO TEXT

Jennifer, I would gladly add XC but (and I just checked) there is no documentation that demonstrates that it produces LD. The only documentation that I can find talks about MARC and FRBR, but there is nothing on the record format or serialization. That information has to be public and open before a service can be included in the report. Your ontology needs to be open access on the Web in RDF format. If it is, please give a pointer.

Jennifer Bowen

7/21/2011

GO TO TEXT

This paragraph could use some clarification. Who are the “few” in the last sentence? People within the library community? I assume the bibliographic data that needs “smarting up” is meant to be data from outside the library community that would be enriched with data from the library community? This is not clear.

Adrian

7/28/2011

GO TO TEXT

Adrian

7/28/2011

GO TO TEXT

Jennifer Bowen

7/21/2011

GO TO TEXT

I would like to see an acknowledgment of other measures of success for linked data other than those that can be calculated, in particular the ability of libraries to meet the needs of their users. The success of this can best be studied using other methods, such as participatory design, as described in the recent book, “Scholarly Practice, Participatory Design and the eXtensible Catalog” http://www.alastore.ala.org/detail.aspx?ID=3408.

karencoyle

7/23/2011

GO TO TEXT

For non-profits and other service organizations, ROI includes intangible benefits like “making society better.” The non-profit management literature addresses this. So we should assume ROI to include those “less tanglibles.”

Jennifer Bowen

7/21/2011

GO TO TEXT

But there are ways to address this issue, by providing tools that enable a smooth migration process for libraries to begin using linked data while continuing to use these niche systems. What is needed are cost-effective strategies for moving libraries forward..

Jennifer Bowen

7/21/2011

GO TO TEXT

On the one hand this can be seen as a barrier for libraries to participate in linked data. On the other hand it represents an area where linked data could be a huge improvement for libraries in terms of managing such changes using a different infrastructure (registries, etc.)

Jennifer Bowen

7/21/2011

GO TO TEXT

It seems to me that more detail is needed here about the issues with data sharing and the history of cooperative cataloging using centralized databases. This is so brief that it seems to be skirting around the issue. Perhaps just an additional sentence or two.

Jennifer Bowen

7/21/2011

GO TO TEXT

Although some work has been done to try to change this situation and some progress has been made. (e.g. MARC21 subfield zero) It seems misleading to me to not include some mention of efforts to get around these limitations.

karencoyle

7/23/2011

GO TO TEXT

Jennifer, can you give examples? I’m not sure what you’re referring to.

Jennifer Bowen

7/21/2011

GO TO TEXT

Discussed where below? This paragraph is very intriguing and deserves more attention. It seems related to the migration strategies in the Recommendations. Coming up with a plan for making these two paradigms coexist will be extremely important for the success of LLD.

Alan Danskin

7/22/2011

GO TO TEXT

This whole section has a rather negative tone. Libraries are aware of the need for change. Linked data is one of the directions that change might take, if the benefits can be demonstrated, but as the section makes clear the challenges are considerable.

Alan Danskin

7/22/2011

GO TO TEXT

“While the Web values global interchange between all parties, library cataloguing standards in the past have aimed to address only the exchange of data within the library community where the need to think of broader bibliographic data exchange (e.g. with publishers) is new and not universally accepted.”

This is not a new issue. Libraries and publishers have different business models, which are reflected in their development of different standards for exchange. Publishers think of publications as products; libraries are concerned with inventory of their collections and the content of publications. The granularity of open linked data may provide an opportunity for a fresh look at what could be shared for mutual benefit. However publishers, as well as librarians, may regard metadata as a commodity to be restricted.

Adrian

7/28/2011

GO TO TEXT

“the need to think of broader bibliographic data exchange (e.g. with publishers) is new and not universally accepted”

I suggest adding scholars to the brackets as an example of communities with which data exchange and interlinking would be very fruitful for academic libraries.

Adrian

8/2/2011

GO TO TEXT

I think this paragraph has to be fundamentally changed or even omitted. It implicitely argues that individual records are copyrighted. Much speaks for individual records aren’t copyrighted at all and that, thus, nobody owns any rights on them. At least in Europe you only have the related database right on collections of records.

I believe the legal status of records is quite clear (not copyrighted), at most this is a grey area. The report shouldn’t speak in favour of the view that individual records are copyrightable.

Adrian

8/2/2011

GO TO TEXT

The heading seems to assume that linked data necessarily means open data. This isn’t the case as you can publish data as RDF without an open license or without any license at all (as several organisations do) or as you can even do linked data in an intranet. Also, you can publish linkable data and let it be linked to and then establish a paywall around the data.

In general, the report lacks a clarification regarding the terms “Linked Data” vs. “Open Data”.

I suggest adding a paragraph or section to the report which clarifies the two terms, “open data” being about open access, open standards and open licenses in the first place and “linked data” being about a specific set of standards or best practices for publishing data on the web recommended by the W3C. An important aspect of open data is legal compatibility of data while linked data deals with technical compatibility of data.

Johan Oomen

8/10/2011

GO TO TEXT

Please elaborate on the “particularly libraries” statement.

Also “have greatly hindered libraries ability to create competitive information services.” => add a reference

Johan Oomen

8/10/2011

GO TO TEXT

“cooperative agreements ” between whom?

Johan Oomen

8/10/2011

GO TO TEXT

“Technological changes have taken place so quickly that many in library positions today began their careers long before the World Wide Web was a reality, and these workers may not fully understand the import of these changes.” => what an extremely bold statement.

Implementation challenges and barriers to adoption

1 0

@@ To read the most-up-to-date of version of this section, in the context of the entire report, please see our wiki page

2 0

This section comprises the following sub-sections:

3 0

4 0

Designed for stability, the library ecosystem resists change

5 3

As stable and reliable archives with long-term goals, cultural heritage organizations particularly libraries are predisposed to traditionalism and conservation. This emphasis on larger goals has led libraries to fall out of step with the faster-moving technology culture of the past few decades. When most information was in print format, libraries were at the forefront of information organization and retrieval. With the introduction of machine-readable catalogs in the 1960s, libraries were early adopters of the computer, though primarily for automating the production of printed catalogs of print materials. As the volume of information in digital format has overtaken print, libraries have struggled both to maintain their function as long-term archives as well as to extend their missions to include digital information. Decreased budgets for libraries and their parent institutions have greatly hindered libraries ability to create competitive information services.

6 1

Cooperative metadata creation is economical but creates barriers to change

7 1

Libraries take advantage of cooperative agreements allowing them to share resources, as well as metadata describing those resources. These cooperative efforts are both a strength and a weakness: while shared data creation has economic benefit, changes to share data require coordination among the sharing parties.

8 0

Consequently, major changes require a strong agent to coordinate the effort. In most countries, the national library provides this type of leadership. Changes that transcend the borders of any single country such as adopting data standards like FRBR or moving to linked library data require a broad leadership that can take into account the many local needs of the international library community.

9 0

Library Data is shareable among libraries, but not yet with the wider world

10 1

Linked Data reaches a diverse community far broader than the library community; moving to library Linked Data requires libraries to understand and interact with the entire information community. Much of this information community has been engendered by the capabilities provided by new technologies. The library community has not fully engaged with these new information communities, yet the success of Linked Data will require libraries to interact with them as fully as they interact with other libraries today. This will be a huge cultural change that must be addressed.

11 2

Libraries are understaffed in the technology area

12 1

As libraries have not kept pace with technological change, they also have not provided sufficient educational opportunities for staff. Training within libraries is limited in some countries, and workers are not encouraged to seek training on their own. Technological changes have taken place so quickly that many in library positions today began their careers long before the World Wide Web was a reality, and these workers may not fully understand the import of these changes. Libraries struggle to maintain their technological presence and are often under-staffed in key areas of technology.

13 2

In-house software developers. An informal survey of Code4Lib participants suggests that there are few software developers in libraries. Although the developers are embedded in library operations, coding is often a small part of their duties. Staff developers tend to be closely bound to working with systems from off-the-shelf software providers. These developers are for the most part maintaining existing systems and do not have much time to explore new technology paradigms and new software systems. They are dependent on a shrinking number of off-the-shelf providers as market players have consolidated over the past two decades (see Marshall Breedings History of Library Automation).
Library workers. Software development skills, including metadata modeling, have often not been a strong part of a library workers education. Libraries have in essence out-sourced their technology development to a few organizations in the community and to the library systems vendors. These vendors understand library functionality and data, but they need an expectation of development-costs recovery before beginning work on new products.
Library leaders. There are many individual Linked Data projects coming out of libraries and related institutions, but no obvious emerging leaders. IFLA has been a thought-leader in this area, but there is still a need to use their work to provide functional systems and software. Many national libraries have an interest in exploring LLD and some have ongoing projects. LLD will be international in scope, and this increases the amount of coordination that will be needed. Because of its strong community ties, however, leadership from within can be expected to have a dramatic effect on the communitys ability to move in the direction of Linked Data.

14 0

Library technology has largely been implemented by a small set of vendors

15 0

Much of the technical expertise in the library community is concentrated in the small number of vendors who provide the systems and software that run library management functions as well as the user discovery service. These vendor systems hold the bibliographic data integrated into library management functions like acquisitions, receipt of materials, user data, and circulation. Other technical expertise exists primarily in large academic libraries where development of independent discovery systems for local materials is not uncommon. These latter systems are more likely to use mainstream technologies for data creation and management, but they do not represent the primary holdings of the library.

16 1

Libraries do not adapt well to technological change

17 7

Technology has continually evolved since computers were first used for library operations in the 1960s. However, the library community tends to engage only with established technologies that have brought proven benefits to their operations and services. The Linked Data approach is relatively new, with enabling technologies and best practices being developed outside of mainstream library applications. Experimentation with Linked Data in the library community has been limited in part due to lack of developer tools for LD in general but also because there are no tools that specifically address library data. It can be difficult to demonstrate the value of LD to librarians because the few examples of implementations that do exist use unfamiliar data and interfaces.

18 1

The long-term view by libraries applies also to standards

19 0

While both library and Web communities value preservation and endurance (or permanence) of information, the timescales differ: library preservation is measured in generations and centuries (if not millenia) while Web-native information might be considered old at two decades. Ensuring this long-term life of information promotes a conservative outlook for library organizations, which is in contrast to the mainstream perspective of the Web community which values novelty and experimentation over preservation of the past.

20 0

Therefore it is not surprising that the library standardization process is slower than comparable Web standards development. Current developments towards a new metadata environment can be traced back more than ten years: The basic groundwork for a shift to a new data format was laid in 1998 with the development of the Functional Requirements for Bibliographic Records (FRBR) which provides an entity-relation view of library catalog data. That model is the basis for a new set of cataloguing rules, Resource Description and Access (RDA), which although they became final in 2010, are still under review before implementation. RDA is a standard of four Anglo-American library communities, and has not had international acceptance, although it is being studied widely. LLD standards associated with RDA are still in the process of development. Through a joint working group with DCMI, the Joint Steering Committee for RDA approved an RDF implementation of the properties and value vocabularies of RDA. These have not yet been moved to production status and are not integrated with the primary documentation and cataloguer tools in the RDAToolkit.

21 0

Library standardization process is cumbersome

22 1

A further difference is that Web-related organizations focus on implementations, often hammering out differences with practical examples, and leaving edge cases for later work. This is in contrast to the library standardization approach: Standards such as FRBR and RDA have been created as documents, without the use of test cases, prototype implementations, and iterative development methodologies that characterize modern IT approaches. Library standards have a strong top-down direction, and major standards efforts are undertaken by national or international bodies. S Development of an international standard takes years and that development cannot keep up with the increasingly fast pace of technological change. Development cycles are often locked into face-to-face business meetings of the parent organization or group to comply with formal approval procedures. As a result, standards may be technologically out-of-date as soon as they are published.

23 0

Bottom-up standards can be successful but garner little recognition

24 1

While on the Web, bottom-up development is common for all but the largest and most-used standards (e.g. HTML5), bottom-up development often does not get proper recognition from the library community. Even so, some bottom-up initiatives have led to successful standards adopted by the library community, including OpenURL, METS, OAI, and Dublin Core. LLD will require funding and will need institutional support (though it isnt clear where funding and support will come from) but it will also require an environment where the bottom-up developers can flourish.

25 0

Library standards are limited to the library data

26 7

While the Web values global interchange between all parties, library cataloguing standards in the past have aimed to address only the exchange of data within the library community where the need to think of broader bibliographic data exchange (e.g. with publishers) is new and not universally accepted. There is fear that library data will need to be dumbed down in order to interact with other communities; few see the possibility of smarting up bibliographic data using library-produced information.

27 3

ROI is difficult to calculate

28 0

Some cost issues are known but are unmeasured

29 0

It is admittedly difficult to calculate or estimate costs and benefits in a publicly funded service environment. This makes it particularly difficult to create concrete justifications for large-scale changes of the magnitude required for adopting Linked Data in libraries. While there is a general recognition of distinct disadvantages to the silod library data practices, no measurement exists that would compare the resources required to create and manage current library data compared to linked library data. (Note: there are some studies on the cost of cataloging, but they do not separately study costs related to data technology: Library of Congress Study of the North American MARC Records Marketplace, R2 Consulting LLC, Ruth Fischer, Rick Lugg, October 2009 ) and Implications of MARC Tag Usage on Library Metadata Practices, OCLC, March 2010.)

30 0

MARC data cannot continue to exist in its own discrete environment, separate from the rest of the information universe. It will need to be leveraged and used in other domains to reach users in their own networked environments. The 200 or so MARC 21 fields in use must be mapped to simpler schema. Smith-Yoshimura, et al., Implications of MARC Tag Usage on Library Metadata Practices. www.oclc.org/research/publications/library/2010/2010-06.pdf

31 2

Library-specific data formats require niche systems solutions

32 1

It is possible, however, to observe the consequences of library data practices. Libraries use data technology specific to libraries and library systems. They are therefore dependent on niche software systems tailored to formats that nobody uses outside of the library world. Because the formats used in libraries (notably MARC) are unique to libraries, vendors of library systems cannot use mainstream data modeling systems, programmer tools, and database software to build library systems. Development of library systems also requires personnel specifically trained in library data. This makes it expensive to provide systems for the library community. The common practice of commissioning a new, customized system in every library every 5 to 10 years is very expensive; the aggregate cost to the library community has not been reliably estimated.

33 2

Vocabulary changes in library data are costly

34 1

Controlled vocabularies will play an important role in linked data in general, and although controlled vocabularies are used in library data (in particular for names of persons and organizations, and for subjects) they are not managed in a manner to facilitate linked data: changes to vocabularies require that all related records be retrieved and changed; this is a disruptive process, made even more expensive because the library metadata record, being designed primarily as a communication format, requires a full record replace for updates to any of its fields.

35 1

Data may have rights issues that prevent open publication

36 0

For a perspective from Europe, see Free library data? by Raymond Bérard.

37 0

Some data cannot be published openly

38 1

Data related to user identity and use of the library is protected by privacy policies and legislation. Other data, such as that related to purchasing and contracts, is not included in our analysis.

39 0

Rights ownership can be unmanageably complex

40 2

Some library bibliographic data has unclear and untested rights issues that can hinder the release of open data. Ownership of legacy catalogue records has been complicated by data sharing among libraries over the past 50 years. The records most-shared are those created by national cataloguing agencies such as the Library of Congress in the USA and the British Library in the UK. Records are frequently copied and the copies are modified or enhanced for local cataloguer users. These records may be subsequently re-aggregated into the catalogues of regional, national, and international consortia. Assigning legally-sound intellectual property rights between relevant agents and agencies is difficult, and the lack of certainty is a hindrance to data sharing in a community which is necessarily extremely cautious on legal matters such as censorship data privacy/protection.

41 0

Rights have perceived value

42 0

On the other hand, some bibliographic data may never have been shared with another party, so rights may be exclusively held by creating agencies, who put a value on past, present and future investment in creating, maintaining, and collecting metadata. Larger agencies are likely to treat records as assets in their business plans, and may be reluctant to publish them as open LD, or may be willing to release them only in a stripped- or dumbed-down form with loss of semantic detail. For example, data about specific types of title such as preferred title and parallel title might be output as a general title, losing the detail required for a formal citation of the resource.

43 0

Library data is expressed in library-specific formats that cannot be easily shared outside the library community

44 0

Library data is expressed primarily as text strings, not linkable URIs

45 2

Most information in library data is encoded as display-oriented text strings. There are a few shared identifiers for resources, such as ISBNs for books, but most identification is done with text strings. Some coded data fields are used in MARC records, but there is not a clear incentive to include these in all records, since most coded data fields are not used in library system functions. Some data fields, such as authority controlled names and subjects, do have their own associated records in separate files, which have identifiers that could be used to represent those entities in library metadata. However, the data formats currently used do not support the inclusion of these identifiers in existing library records and consequently neither do current library systems support their use.

46 0

Some library data is being expressed in RDF on an experimental basis, but without standardization or best practices

47 0

Work has begun to express library data in RDF. Some libraries have experimented with publishing LD from their catalogue records although no standard or best practice has yet emerged. There has been progress in defining value vocabularies currently used in libraries. Transformation of legacy data will require more than the mapping of attributes to RDF properties; where possible, library data should be transformed from text to data with identified values. New approaches for library data, such as the FRBR model which informs RDA, offer an opportunity for incorporating linked data principles into future library data practices, particularly when these new standards are implemented.

48 0

The library community and the Semantic Web community have no shared terminology for metadata concepts

49 2

Work on LLD can be hampered by the disparity in concepts and terminology between libraries and the Semantic Web community. Few in libraries would use a term like statement for metadata, and the Web community does not have concepts equivalent to libraries headings or authority control. Each community has its own vocabulary and these reflect the differences in their points of view. Mutual understanding must be fostered as both groups bring important expertise to the potential web of data.

50 0

Library data must be conceptualized according to the Graph Paradigm

51 0

Translators of legacy library standards into Linked Data must recognize that Semantic Web technologies are not merely variants of practices but represent a fundamentally different way to conceptualize and interpret data. Since the introduction of MARC formats in the 1960s, digital data in libraries has been managed predominantly in the form of records bounded sets of information described in documents with a precisely specified structure in accordance with what may be called a Record Paradigm. The Semantic Web and Linked Data, in contrast, are based on a Graph Paradigm. In graphs, information is conceptualized as a boundless web of links between resources in visual terms as sets of nodes connected by arcs (or edges), and in semantic terms as sets of statements consisting of subjects and objects connected by predicates. The three-part statements of Linked Data, or triples, are expressed in the language of the Resource Description Framework (RDF). In the Graph Paradigm, the statement is an atomic unit of meaning that stands on its own and can be combined with statements from many different sources to create new graphs a notion ideally suited for the task of integrating information from multiple sources into recombinant graphs.

52 0

Under the Record Paradigm, a data architect can specify with precision the form and expected content of a data record, which can be validated for completeness and accuracy. Data sharing among libraries has been based largely on the standardization of fixed record formats, and the consistency of that data has been ensured by adherence to well-defined content rules. Under the Graph Paradigm, in contrast, data is conceptualized according to significantly different assumptions. According to the so-called open-world assumption, any data at hand may, in principle, be incomplete. It is assumed that data may be supplemented by incorporating information from other, possibly unanticipated, sources, and that information can be added without invalidating information already present.

53 0

The notion of constraints takes on significantly different meanings under these two paradigms. Under the Record Paradigm, if the format schema for a metadata record says that the description of a book can have only one subject heading and a description with two subject headings is encountered, a validator will report an error in the record. Under the Graph Paradigm, if an OWL ontology says that a book has only one subject heading, and a description with two subject headings (URIs) is encountered, an OWL reasoner will infer that the two subject-heading URIs identify the same subject.

54 1

As will be discussed below, the two paradigms may be seen as complementary. The traditional closed-world approach is good for flagging data that is inconsistent with the structure of a metadata record as a document, while OWL ontologies are good for flagging logical inconsistencies with respect to a conceptualization of things in the world. The differences between these two approaches mean that the process of translating library standards and datasets into Linked Data cannot be undertaken mechanically, but requires intellectual effort and modeling skill. Translators, in other words, must acquire some fluency in the language of RDF.

W3C incubator group on LLD – Draft Report for comment

Implementation challenges and barriers to adoption

Designed for stability, the library ecosystem resists change

Cooperative metadata creation is economical but creates barriers to change

Library Data is shareable among libraries, but not yet with the wider world

Libraries are understaffed in the technology area

Library technology has largely been implemented by a small set of vendors

Libraries do not adapt well to technological change

The long-term view by libraries applies also to standards

Library standardization process is cumbersome

Bottom-up standards can be successful but garner little recognition

Library standards are limited to the library data

ROI is difficult to calculate

Some cost issues are known but are unmeasured

Library-specific data formats require niche systems solutions

Vocabulary changes in library data are costly

Data may have rights issues that prevent open publication

Some data cannot be published openly

Rights ownership can be unmanageably complex

Rights have perceived value

Library data is expressed in library-specific formats that cannot be easily shared outside the library community

Library data is expressed primarily as text strings, not linkable URIs

Some library data is being expressed in RDF on an experimental basis, but without standardization or best practices

The library community and the Semantic Web community have no shared terminology for metadata concepts

Library data must be conceptualized according to the Graph Paradigm