Celtic Language Technology Workshop 2019

At MT Summit XVII, Dublin City University, Dublin. Monday 19th August 2019


The proceedings are now available on the ACL Anthology: https://www.aclweb.org/anthology/volumes/W19-69/.


(09.05-10.30)Morning Session 1
09.00-09.10Opening Remarks
Micheál Ó Conaire, Department of Culture, Heritage and the Gaeltacht
09.10-10.05Invited talk by Kelly Davis
Free(ing) Speech Corpora and STT Models with Common Voice and Deep Speech [slides]
10.05-10.20Speech technology and Argentinean Welsh [slides]
Elise Bell
(11.00-12.30) Morning Session 2
11.00-11.20Embedding Welsh to English MT in a private company [slides]
Myfyr Prys and Dewi Bryn Jones
11.20-11.35Leveraging backtranslation to improve machine translation for Gaelic languages [slides]
Meghan Dowling, Teresa Lynn and Andy Way
11.35-11.55Improving full-text search results on dúchas.ie using language technology [slides]
Brian Ó Raghallaigh, Kevin Scannell and Meghan Dowling
11.55-12:15Adapting Term Recognition to an Under-Resourced Language: the Case of Irish [slides]
John P. McCrae and Adrian Doyle
12:15-12.30Unsupervised multi–word term recognition in Welsh [slides]
Irena Spasić, David Owen, Dawn Knight and Andreas Artemiou
(14.00-15.00)Afternoon Session 3
14.00-14.20Development of a Universal Dependencies treebank for Welsh [slides]
Johannes Heinecke and Francis M. Tyers
14.20-14.40Universal dependencies for Scottish Gaelic: syntax [slides]
Colin Batchelor
14.40-15.00A Character-Level LSTM Network Model for Tokenizing the Old Irish text of the Würzburg Glosses on the Pauline Epistles [slides]
Adrian Doyle, John P. McCrae and Clodagh Downey
(15.00-15.30)Afternoon break
(15.30-17.45)Afternoon Session 4
15.30-16.30Invited talk by Claudia Soria
BLaRKing at minority language speakers: the Digital Language Survival Kit as a speaker-centered approach to digital development of minority languages. [slides]
16:30-16.50Code-switching in Irish tweets: A preliminary analysis [slides]
Teresa Lynn and Kevin Scannell
16:50-17.10A Green Approach for an Irish App (Refactor, reuse and keeping it real) [slides]
Monica Ward, Maxim Mozgovy and Marina Purgina
17.10-17.30Community Discussion

Invited speakers

Claudia Soria is a researcher at CNR-ILC. She has a background in computational linguistics, with a focus on language resources in their entire life-cycle, from creation to representation to evaluation. She is one of the authors of LMF, Lexical Markup Framework, an ISO standard for the representation of computational lexicons. Her current research interests revolve around use of technological means, Language Technology in particular, for protection and valorisation of linguistic diversity. Other current interests are use and usability of regional/minority languages on social media; ethnolinguistic vitality of regional and minority languages of Italy; creation of lexico-conceptual resources for archiving traditional knowledge. She coordinated an Erasmus+ project, “The Digital Language Diversity Project”, and a research project in cooperation with the Polish Academy of Sciences, “Protection of the linguistic heritage. A comparison of attitudes towards linguistic diversity in Poland and Italy”. She’s currently serving as vice-director of the European Language Equality Network (ELEN), and is part of the UNESCO Board of Experts on Multilingualism in Cyberspace. On the activist side, she is involved in spreading awareness about Italy's linguistic diversity, encouraging use and re-appropriation of autochthonous languages.

Kelly Davis has many irons in the fire. He studied Mathematics and Physics at MIT, then went on to do graduate work in Superstring Theory/M-Theory. He then jumped ship, coding at a startup that eventually went public in the late 90's. When the bubble burst, he jumped back into an academic setting and joined the Max Planck Institute for Gravitational Physics where he worked on software systems used to help simulate black hole mergers. Jumping ship yet again, he went back into industry, writing 3D rendering software at Mental Images/NVIDIA. When that lost its charm, he founded a NLU at a startup, 42, that created a system, based off of IBM'S Watson, able to answer general knowledge questions. After a brief stint as the Director of Machine Learning at another Berlin startup, he joined Mozilla where he now leads the machine learning group.

Venue and registration

The Helix, DCU, Dublin.

Call for papers

Language Technology and Computational Linguistics research innovations in recent years have given us a great deal of modern language processing tools and resources for many languages. Basic language tools like spell and grammar checkers through to interactive systems like Siri, as well as resources like the Trillion Word Corpus, all fit together to produce products and services which enhance our daily lives.

Until relatively recently, languages with smaller numbers of speakers have largely not benefited from attention in this field. However, modern techniques in the field are making it easier to create language tools and resources from fewer resources in a faster time. In this light, many lesser-spoken languages are making their way into the digital age through the provision of language technologies and resources.

The Celtic Language Technology Workshop (CLTW) series of workshops provides a forum for researchers interested in developing NLP (Natural Language Processing) resources and technologies for Celtic languages. As Celtic languages are under-resourced, our goal is to encourage collaboration and communication between researchers working on language technologies and resources for Celtic languages.

This will be the third Celtic Language Technology Workshop (CLTW), this time co-located with MT Summit XVII in Dublin, Ireland.

Our workshop welcomes theoretical and practical submissions on any Celtic language (Irish, Welsh, Scottish Gaelic, Manx, Cornish or Breton) that contributes to research in machine translation, automated language processing, language/speech technologies or resources for the same. With Ireland’s recent progress in the area of machine translation (particularly in public administration) and steps towards combining speech processing and machine translation for Welsh, there is much scope for sharing best practices and leveraging from learned experiences through working with limited resources in this forum. We will particularly encourage studies that address either practical applications with a human in the loop or the lack of resources available for a given language in this field.

Topics of interest for the CLTW include but are not limited to:

Important dates

Instructions for authors

Full papers must not exceed 10 (ten) pages plus unlimited pages for references, and must be formatted according to the MT Summit 2019 style guide (links below).

Short papers should be up to five pages with unlimited pages for references.

All papers will be rigorously reviewed for novelty and impact, and published in the workshop proceedings. The papers that best suit poster presentation will be presented as posters and the rest as talks.

Submitted papers must be in PDF. To allow for blind reviewing, please do not include author names and affiliations within the paper, and avoid obvious self-references. Papers must be submitted to the Easy Chair system.


Programme committee

Previous workshops

The first Celtic Language Technology Workshop took place at COLING in 2014, again at DCU. The second one was held at JEP-TALN in 2016 in Paris.


We thank Mozilla and the Department of Culture, Heritage and the Gaeltacht for support.

Colin Batchelor