New Publication: A Framework for Topic Integration in Texts

The Comparative Constitutions Project is pleased to announce a new publication in Social Science Computer Review: “Expanding Your Vocabulary: A Framework for Topic Integration in Texts.”

The paper introduces the segments-as-topic (SAT) methodology, a four-stage framework that combines automated text analysis with expert judgment to expand domain-specific vocabularies. The approach is designed to address a long-standing dilemma in computational social science: fully automated topic discovery is efficient but often misses domain-specific conceptual nuance, while purely manual approaches are time-intensive and vulnerable to bias.

The SAT framework proceeds in four stages:

  1. Generation — Domain experts define a topic, and a sentence-level semantic similarity model retrieves corpus segments aligned with that topic.
  2. Expansion — The seed set grows iteratively as similar segments are accepted or rejected, building a final segment set.
  3. Review — A panel of scholars evaluates the topic for inclusion in the vocabulary.
  4. Integration — All segments in the final set are automatically tagged with the new topic.

To demonstrate the methodology, we applied SAT to the CCP vocabulary, which tracks more than 330 topics across national constitutions, and successfully integrated three new topics. The result is a systematic, replicable, and user-friendly framework that offers social scientists both computational efficiency and expert insight.

The full paper is available open-access in Social Science Computer Review. For a walkthrough of the methodology, you can watch the tutorial on YouTube, and the tools used in the paper are available for download for researchers who want to apply the SAT framework to their own corpora.