Greek Heritage Language Corpus


The Greek Heritage Language Corpus (GHLC) is a speech corpus developed at Democritus University of Thrace, Greece within the frame of the project entitled Varieties of Heritage Greek: corpus compilation and comparative study (MIS 5006199) under the supervision of Professor Zoe Gavriilidou. The project aimed at profiling Greek heritage speakers who live in the USA (Chicago) and Russia (Moscow and Saint Petersburg) in order to gain a clearer understanding of their characteristics.

The GHLC provides a valuable unique source and an advanced research tool for the analysis of Greek heritage speakers’ productions, since it reflects the level of acquisition of the GHL, possible subsequent attrition, and interference from the dominant language that gradually lead to the formation of new, heritage grammars characterized by innovations on all levels, from phonology and morphology to syntax and semantics.

Features of GHLC

The GHLC contains: (a) audio recordings, (b) transcriptions of the recordings with metadata, (c) free-access online transcriptions.

It consists of 1st, 2nd, and 3rd generation Greek Heritage Language Speakers’ oral productions. In particular, it includes approximately 130,000 words (20,000 from Moscow, 25,000 from Saint Petersburg, and 85,000 from Chicago) and approximately 90 hours of recordings (30h from Moscow, 30h from Saint Petersburg, and 30h from Chicago).

The transcribed texts of the GHLC adopt the orthographic representation of spoken language but also include additional symbols which are inserted in order to mark overlaps, pauses, intonation and other features (see Transcription symbols).

This material includes:

  • Narrations in Greek
  • Narrations in English
  • Narrations in Russian
  • Conversations with Greek Heritage Speakers

Access and conditions of use

Access is

  • contingent on the detailed explanation of the reasons that necessitate particular types/quantity/form of material,
  • subject to the Laboratory’s discretion.
  • for further information see: Access to GHLC