Publication Date

3-13-2019

Keywords

Digital libraries, Entity alignment, Text mining, Text duplication

Abstract

As digital libraries grow, they are prompting new consideration into same-work relationships. They provide unique opportunities for resource discovery, but their scale and aggregated models lead to challenges presented by duplicates and variants. Addressing this problem is complicated by metadata inconsistencies as well as structural/content differences. Following from work in algorithmically identifying duplicate works in the HathiTrust Digital Library, we present some cases that complicate our existing language for work entity relationships. These serve to contextualize the complexities of same-work alignment in digital libraries, ground future discussion around content similarity, and inform methods to better identify duplicates in large-scale digital libraries.

Publication Statement

The final authenticated version is available online at https://doi.org/10.1007/978-3-030-15742-5_40.

Citation of authenticated version:

Organisciak, P., Shetenhelm, S., Vasques, D. F. A., & Matusiak, K. (2019). Characterizing Same Work Relationships in Large-Scale Digital Libraries. In N. G. Taylor, C. Christian-Lamb, M. H. Martin, & B. Nardi (Eds.), Lecture Notes in Computer Science: Vol.11420. Information in Contemporary Society 14th International Conference Proceedings (pp. 419-425). DOI: 10.1007/978-3-030-15742-5_40



Share

COinS