Curatr: Exploration and Curation of Historical Texts

About Curatr

Project Overview

Curatr is an online platform which provides access to the British Library Digital Collection, developed by the ERC-funded VICTEUR project at the UCD English, Drama and Film, in collaboration with researchers at the Insight Research Ireland Centre for Data Analytics, as part of its Cultural Analytics Research Initiative. The platform hosts digitised versions of all English-language books from the British Library collection, corresponding to 35,884 unique titles, both fiction and non-fiction, from 1700 to 1899. When we take into account multi-volume works, this consists of over forty-six thousand unique volumes of text. The platform also incorporates the first digitised version of the topical classification index of books used by the British Library from 1823-1985.

The platform includes a searchable index on the equivalent of over 12 million individual pages of text, which can be searched and sorted by author, title, year, and the actual full-text of the volumes themselves. Alternatively, concordance analysis can be employed to identify every occurrence of a particular word or phrase within the collection, preserving the original context. Researchers can use these tools to identify content related to specific themes within little-known or very long, unwieldy texts. This is further supported by additional functionality based on modern natural language processing techniques. These features encompass content-based recommendation methods and the visualisation of conceptual relationships in the collection using semantic networks.

Curatr also supports the creation and export of smaller sub-corpora, defined thematically, chronologically, and by classification. This addresses the common requirement for humanities scholars to engage in online document curation, without the need for extensive technical training. Since creating an appropriate lexicon of words for curation can often be a tedious and time-consuming process, we expedite this by using a custom word embedding model to identify other potentially relevant words which are semantically similar to the original "seed" words provided by the user. The resulting lexicon can be used to filter the entire collection to produce a much smaller set of texts for closer inspection.

There are inevitable variations in the legibility of earlier texts and those in non-standard formats. Therefore, a key use of Curatr is to assist researchers to identify original texts relevant to their work for consultation in situ in the library. The next phase of the project will seek to integrate Curatr with other relevant online cultural resources, such as records originating from popular lending libraries in the nineteenth-century.

Tutorials

See here for a series of short instructional videos from VICTEUR project researchers on the use and functionality of Curatr.

Citing Curatr

If you use Curatr in an academic publication, we would appreciate citations to the following paper:

Leavy, S., Meaney, G., Wade, K. and Greene, D. (2019) Curatr: A Platform for Semantic Analysis and Curation of Historical Literary Texts, in Proceedings of the 13th International Conference on Metadata and Semantics Research (MTSR 2019) (pp. 354-366). Springer International Publishing. [PDF] [BibTeX] [RIS]

If you wish to cite a specific version of a text identified on the Curatr platform, we recommend the following citation format:

Dickens, C. (1892) The Old Curiosity Shop. Available at: curatr.ucd.ie (Accessed: 28 June 2025).

Acknowledgements

This work is part of the VICTEUR project, which has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 884951), and is being undertaken by members of the UCD School of English, Drama and Film, in collaboration with researchers from the Insight Research Ireland Centre for Data Analytics at the UCD School of Computer Science. For more details, please contact us via e-mail. Curatr by UCD Centre for Cultural Analytics is licensed under a Creative Commons BY-NC-ND 4.0 Licence. Background image created by Steven Cadman.