CCP’s Concept Integration in Comparative Law project has launched a new tool that solves a core challenge faced by researchers working with large corpora like constitutions or court rulings. It develops a new approach by combining the efficiency of automated text analysis with the accuracy of domain expertise when identifying and tracking new topics in a corpus.
This new Segments-as-Topics tool allows users to assess the best formulation of a potential new topic for a specific corpus, identify that topic in the text, and produce an exhaustive set of segments from the corpus that reference that topic.
The innovation in this framework is not just that it combines automation and human expertise, but how it does this. As the name of the tool implies, it uses segments from the corpus itself to represent topics that researchers seek to find in that corpus. Using corpus segments to identify other segments on the same topic produces more accurate automated matching, with human intervention needed at just a few key points.
The research team developed and tested the method as part of its own research to expand the topics it tracks in constitutions. The team is now making the tool publicly available as an open-source tool for others to start using as well.
Launching today, the tool is available on a GitHub repository that includes instructions, applications, and sample data to facilitate others’ use of the tool. A video tutorial also walks users through how to use the tool in their own analysis—whether in the constitutional domain or beyond.
The methodology behind the Segments-as-Topics tool and an analysis of its application to national constitutions is described in a new paper by Roy Gardner, Matthew Martin, Ashley Moran, Zachary Elkins, Andrés Cruz, and Guillermo Pérez.
The Segments-as-Topics tool is part of a suite of open-source tools the program is developing for public use. The aim is to make these machine-assisted research methods accessible for a wide range of scholars and research applications across constitutional and comparative law.
Funding acknowledgement: The Concept Integration in Comparative Law program is supported by the National Science Foundation under Grant Number 2315189. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation (NSF). The research team deeply appreciates NSF’s Accountable Institutions and Behavior program and Human Networks and Data Science program for this support.