The toolkit’s features for Ancient Greek and Latin are those most developed, however by design all languages supported by CLTK are “first-class citizens,” in that all language-specific data sets and functions are similarly easy to access by end users. We conclude with a statement of how we envision the project contributing to the future of ancient literacy, that is, how the entire CLTK ecosystem can work together to offer readers of classical languages a presentation of texts and supporting materials not available in print editions. We offer here first a brief introduction to CLTK and then turn to the two summer projects. Johnson acted as the supervisor for the former project and Luke Hollis for the latter. Burns proposed to write a multiple-pass, rules-based lemmatizer for the CLTK Core tools Khan proposed to rework the codebase for the Classical Language Archive, a front-end JavaScript application for use as a reading environment by non-programmers. Burns (then a doctoral student in Classics at Fordham University) and Suhaib Khan (an undergraduate at the Netaji Subhas Institute of Technology).
CLTK, having received over 100 student applications for the program, chose Patrick J. In 2016, GSoC accepted CLTK and allotted it funding slots for two students. Google pays a stipend to student programmers, who in turn contribute code to an approved project between the months of May and August.
This paper chronicles CLTK’s participation in Google Summer of Code (GSoC), a program run by Google to encourage the growth of open source software. The Classical Language Toolkit (CLTK) is software that brings natural language processing (NLP) to the languages of ancient, classical, and medieval Eurasia.