Transforming Indexes Locorum into Citation Networks
Matteo Romanello - Deutsches Archäologisches Institut / King’s College London
Canonical citations are the standard way of citing primary sources in Classics: the ability to read them, that requires knowing what numerous -- and at times cryptic -- abbreviations stand for, is part of the early training of any classicist. Such citations play a key role as they signal, in journal articles and other types of secondary sources, text passages that were studied in relation to one another. Classicists, who have long been exploiting such citations for information retrieval purposes by creating manually indexes of cited passages – the indexes locorum – are now faced with the challenge of scaling up such indexes to cope with the sheer amount of publications available online in digital form (Crane, Seales, and Terras 2009).
This talk is divided into three parts. In the first part I describe the approach I have developed to the automatic extraction of such citations from text (Romanello 2013; Romanello 2014). My approach consists of treating the extraction of citations as a typical problem of named entity extraction and results in a three-step process. First, the components of a citation are extracted and classified (named entity recognition); second, the relations between such components are extracted so as to form citations (relation detection); third, the cited authors, works and text passages are identified unambiguously by means of a unique identifier (named entity disambiguation). I shall present the evaluation of the system I have implemented and discuss the challenges that were faced as well as those that remain still unsolved.1
In the second part of my talk I argue that representing indexes of citations as networks is not a mere change in representation format but changes radically the way in which we can access and interact with the information such indexes contain. The main difference, which is especially relevant for the study of intertextuality, is that cited authors, works and passages are not shown in isolation as in an index, but the relations that exist between them can be measured, searched for and visualised. I shall first describe the creation of three citation networks – the macro-, meso- and micro-level network – that can be used to search, browse and visualise the citations that were automatically extracted from the text.2 Then I shall illustrate by means of examples how the varying degree of granularity of these networks allows for analysing different aspects of classical texts of interest to scholars in the field, such as intertextuality and the history of scholarship.
- The annotated dataset I have created for evaluation purposes is available at http://dx.doi.org/10.5072/zenodo.12762, while the code I have written is available at http://dx.doi.org/10.5281/zenodo.10886.
- A preliminary version of the citation networks that will be presented and discussed can be accessed from https://github.com/mromanello/APh_network_viz.
- Crane, Gregory, Brent Seales, and Melissa Terras. 2009. “Cyberinfrastructure for Classical Philology.” Digital Humanities Quarterly 3 (1).
- Romanello, Matteo. 2013. “Creating an Annotated Corpus for Extracting Canonical Citations from Classics-Related Texts by Using Active Annotation.” In Computational Linguistics and Intelligent Text Processing. 14th International Conference, CICLing 2013, Samos, Greece, March 24-30, 2013, Proceedings, Part I, edited by Alexander Gelbukh, 1:60–76. Springer: Berlin / Heidelberg.
- —. 2014. “Mining Citations, Linking Texts.” ISAW Papers 7 (24).