Publication Date


Document Type



Digital libraries, Entity alignment, Text mining, Text duplication


As digital libraries grow, they are prompting new consideration into same-work relationships. They provide unique opportunities for resource discovery, but their scale and aggregated models lead to challenges presented by duplicates and variants. Addressing this problem is complicated by metadata inconsistencies as well as structural/content differences. Following from work in algorithmically identifying duplicate works in the HathiTrust Digital Library, we present some cases that complicate our existing language for work entity relationships. These serve to contextualize the complexities of same-work alignment in digital libraries, ground future discussion around content similarity, and inform methods to better identify duplicates in large-scale digital libraries.

Publication Statement

The final authenticated version is available online at

Citation of authenticated version:

Organisciak, P., Shetenhelm, S., Vasques, D. F. A., & Matusiak, K. (2019). Characterizing Same Work Relationships in Large-Scale Digital Libraries. In N. G. Taylor, C. Christian-Lamb, M. H. Martin, & B. Nardi (Eds.), Lecture Notes in Computer Science: Vol.11420. Information in Contemporary Society 14th International Conference Proceedings (pp. 419-425). DOI: 10.1007/978-3-030-15742-5_40

Rights Holder

Peter Organisciak, Summer Shetenhelm, Danielle Francisco Albuquerque Vasques, Krystyna K. Matusiak


Received from author

File Format



English (eng)


7 pgs

File Size

206 KB