Digital libraries, Entity alignment, Text mining, Text duplication
As digital libraries grow, they are prompting new consideration into same-work relationships. They provide unique opportunities for resource discovery, but their scale and aggregated models lead to challenges presented by duplicates and variants. Addressing this problem is complicated by metadata inconsistencies as well as structural/content differences. Following from work in algorithmically identifying duplicate works in the HathiTrust Digital Library, we present some cases that complicate our existing language for work entity relationships. These serve to contextualize the complexities of same-work alignment in digital libraries, ground future discussion around content similarity, and inform methods to better identify duplicates in large-scale digital libraries.
Organisciak, Peter; Shetenhelm, Summer; Vasques, Danielle Francisco Albuquerque; and Matusiak, Krystyna K., "Characterizing Same Work Relationships in Large-Scale Digital Libraries" (2019). Library and Information Science: Faculty Conference Presentations. 4.
The final authenticated version is available online at https://doi.org/10.1007/978-3-030-15742-5_40.
Citation of authenticated version:
Organisciak, P., Shetenhelm, S., Vasques, D. F. A., & Matusiak, K. (2019). Characterizing Same Work Relationships in Large-Scale Digital Libraries. In N. G. Taylor, C. Christian-Lamb, M. H. Martin, & B. Nardi (Eds.), Lecture Notes in Computer Science: Vol.11420. Information in Contemporary Society 14th International Conference Proceedings (pp. 419-425). DOI: 10.1007/978-3-030-15742-5_40