Newsletter #3 – November 2021

Dear reader,

This week’s ELT Newsletter is full of great events, past and present – most importantly the upcoming META-FORUM 2021, which kicks off in less than two weeks. The programme for the 3rd annual ELG conference has been online since last week, including a session in which we present the first results of the European Language Equality project. If you haven’t registered yet, do so now.

Apart from the session at META-FORUM, the ELE project has been and will be topic in many more circles: from national language institutes to European research libraries and Wikipedia – there are many areas in which digital language plays an increasingly important role and whose representatives become curious about the efforts towards digital language equality. Learn more in our ELE section below.

In other news: our colleagues from the Mediterranean coast of Spain have brought the joint effort of ELG and ELE to a regional level and founded a new forum for Language Technology researchers concentrating on the Catalan language. “NLP CommuniCat” raises interest not only as an interactive Slack chat, but also as a new Twitter channel.

In summary, this newsletter bursts with great news, both about events past and present and from new initiatives to bring digital language equality forward. We hope you enjoy this third edition and have a great week!

With best regards

Georg Rehm

Language Technology and NLP in the news
Social media highlights
General news

Less than two weeks are left until META-FORUM 2021, the 3rd annual ELG conference. Following last year’s procedure and the format everyone has gotten used to, META-FORUM will be held online via ZOOM from 15 to 17 November. The international conference on powerful and innovative Language Technologies for the multilingual information society presents and discusses the most recent developments and achievements in European Language Technology from industry and research – including language-centric AI. During the three-day conference hosted and organized from Berlin, participants will receive news and updates from the ELG pilot projects, see first results and findings from the European Language Equality project and be able to interact with members of the LT community from all over Europe. Further information, the full programme and the option to register free of charge can be found on the ELG website – we hope to see you soon!

As part of the preparation for the META-FORUM 2021, the consortium is also finalizing a video tutorial for the European Language Grid, explaining how to register, use and provide resources to the ELG in an easy-to-follow format. The tutorial will soon show up on the ELG YouTube channel, so make sure to follow and keep an eye open.

While the annual META-FORUM invites participants from all over Europe and elsewhere for a detailed look at the ELG and ELE project, our National ELG Workshops continue with a more regional aim: The last workshop on 18 October focussed the Czech Republic and saw contributions by ELG partners from the Charles University in Prague, the University of Pilsen and the German Research Center for Artificial Intelligence (DFKI). All slides from the workshop can be found on the website; if you're familiar with Czech, feel free to check them out! The next National ELG Workshop is hosted by the Centre for Language Technology at the University of Copenhagen, taking place on 16 November.

Selected new tools and resources on the
European Language Grid
  • HENSOLDT ANALYTICS Named Entity Detection for Bulgarian – a named entity detection engine that provides classification of named entities of following types: Person, Location, Organization and set of other still developed types. The running tool is available in ELG, therefore, everybody can directly try it out with an example sentence. It was added by HENSOLDT ANALYTICS on 20 September 2021.

  • Python Annotated Code Search (PACS) Datasets & Pretrained Models – The datasets and pretrained models were used for the paper Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent. The code for easily loading these datasets and models will be made available here: http://github.com/nokia/codesearch. There are three types of datasets: snippet collections, code search evaluation data and training data. Each model can embed queries and (annotated) code snippets in the same space. The resources were automatically harvested from Zenodo.

  • Dataset repository from an analysis of syntax-semantics interactions of 38 languages – This repository contains the datasets that accompany the paper 'Syntax-semantics interactions – seeking evidence from a synchronic analysis of 38 languages'. It consists of a subset of the Universal Dependencies Corpora v2.6. 27 European languages are covered in the data. The resource was automatically harvested from Zenodo and is published as an open source under the Apache 2.0 License.

Selected new ELG members

  • The Institute of Mathematics and Computer Science (IMCS) is a research institute at the University of Latvia. Its main research directions of Computer Science are Knowledge Engineering, Machine Learning, Computational Linguistics, Real-Time and Autonomous systems, Bioinformatics, and Computer Security. The Artificial Intelligence Laboratory (AiLab) at IMCS is an important language technology research group in Latvia, focusing on natural language understanding (NLU) and generation (NLG). Although AiLab primarily focuses on Latvian, it has successfully participated in international NLU and NLG evaluation campaigns on well-resourced languages as well. AiLab also actively participates in the Universal Dependencies, FrameNet, WordNet, and other international initiatives through the development of advanced language resources. It is also the national coordinator of CLARIN ERIC European research infrastructure for language resources and technology in Latvia, the National Competence Centre of the European Language Grid platform, technical NAP at ELRC, as well as an observer of the European Lexicographic Infrastructure.

  • Vicomtech is a technological center for applied research specialized in Artificial Intelligence, Visual Computing & Interaction, founded in 2001 and currently formed by nearly 200 research professionals. At Vicomtech, digital technologies are researched and developed, always attending to market demand, and providing innovative solutions to companies that contribute to improving their processes and competitiveness. The Center is a member of BRTA (BASQUE RESEARCH & TECHNOLOGY ALLIANCE) and has its own comprehensive management methodology and benchmark based on proven models that guarantees the optimization of processes and the transfer of technology to companies in a transparent manner. Vicomtech has a successful research group specialized in speech and natural language technologies. They do research and development on speech processing, natural language processing, machine translation and dialogue systems.
General news

As this newsletter finds you, the ELE project consortium finds itself in the midst of a series of events: With three conferences held in October and four more coming up in November, the initiative for digital language equality is presented and discussed all over Europe – and even in Brazil and Russia. In the first week of October, the 18th Conference of the European Federation of National Institutions for Language (EFNIL) saw contributions from ELE coordinator Prof. Andy Way (ADAPT Centre, Dublin City University) who presented the project framework on site, and Dr. Anželika Gaidienė (The Institute of the Lithuanian Language), who joined in via ZOOM to discuss the “Lithuanian Language Technology Landscape: from Documents to Language Technologies” based on preliminary results of the ELE research. In her presentation, Dr. Gaidenė shared some early insights such as the large amount of lexical and conceptual resources on Lithuanian, which make up 62% of all resources, and the strong language-dependency of tools, 98% of which are only available in Lithuanian.

Every two years the Directorate General for Translation of the European Parliament organises a conference. At this year’s DG TRAD Conference, which took place on 27 and 28 October, the main umbrella topic was Machine Translation.  Prof. Georg Rehm (German Research Center for Artificial Intelligence, DFKI) was invited to present the projects ELE and ELG in the context of “Language technologies for a multilingual Europe 2020-2030”. Jumping into the future and from a European to a national level, Georg will also present both initiatives at the Symposium 2021: “Machine-Based Cataloguing Processes” of the German National Library (DNB) taking place virtually and “mainly in German” on 18 and 19 November. His contribution focuses on the “European Language Grid: An AI platform for flexible language technologies”, but also touches on ELE.

Maria Heuschkel of Wikimedia Deutschland on the other hand has just finished presenting the preliminary ELE project results and discussing the pains of under-resourced language communities that are active in the Wikidata community at WikidataCon, the conference for “everything about Wikidata, the free, collaboratively created database of structured data” organized by Wikimedia Deutschland and Wiki Movimiento Brazil between 29 to 31 October. In the session on ELE, questions about the challenges, needs and expectations for the future of Language Technology for under-resourced languages and the role of policymakers for the preservation of European languages through Wikimedia projects were posed to the conference participants. At the upcoming CEE Wikimedia Conference from 5 to 7 November, Maria will host another session on ELE, presenting the current state of the project such as the preliminary definition of Digital Language Equality, the primary survey results and the languages involved. The conference organized by members of Wikimedia from Austria, Poland, Greece, Russia and others countries is the yearly meeting of Wikimedians from Central and Eastern Europe centred on Wikimedia projects. The ELE session will involve group discussions with community members about challenges and problems when working with technologies in and for their languages.

On to further questions: Do you know whether and how your library uses digital search tools, translation technology, speech recognition or spell checkers? And how would language equality affect such services? These and many more questions will be posed at the workshop on “Achieving Digital Language Equality 2030: Implications for Libraries, Collections, and Library Users” hosted by the Association of European Research Libraries (LIBER), partner of the ELE project. The online workshop takes place on 18 November and presents speakers from ELE and practitioners working with language technology, painting a picture ofwhat a future entailing language equality throughout Europe looks like. Participants can learn about the work of ELE and the potential of digital language equality in the library sector, but also contribute input and discuss how digital language quality would affect them. Registration is now open!

The Catalan language is known for its cultural and political importance, but only two percent of the European Union’s population is able to speak it, while less than one percent is native to the Latin language used along the Spanish and French Mediterranean coast. The gap to the Spanish national language, spoken by 17 percent of the EU’s population, is immense and prone to grow larger in the digital world, where resources are typically focused on larger audiences. To increase the technological support for Catalan, members of the ELE project working at the Barcelona Supercomputing Center (BSC) and other NLP researchers have founded a new community: NLP CommuniCat, aiming to unite and create exchange between the people working with language technology in Catalan. More than 40 participants soon gathered in the Slack chat group, while the NLP CommuniCat Twitter account gained more than 220 followers within a single month. The initiative to combine research efforts and find synergies between experts working with Natural Language Processing and Language Technology is a great example of how inequality between digital languages can be fought through a joint effort.
 
Coming up

November 5-7: CEE Wikimedia Conference (online)

November 15-17: META-FORUM 2021 (online)

November 16: National ELG Workshop: Denmark (online)

November 18: LIBER Workshop: Achieving Digital Language Equality 2030 (online)

November 18-19: Symposium 2021: “Machine-Based Cataloguing Processes”, German National Library (online)

The ELE consortium Partner presentation

ILSP / Athena RC

Athena RC is a Research and Innovation Centre in Informatics and Computational Sciences. Athena RC serves the full spectrum of the research lifecycle, starting from basic and applied research, through to system & product development and technology transfer & entrepreneurship. The key value of Athena RC lies in the unique collection of skills and know-how of its more than 300 researchers, associated faculty and collaborating scientists, participating in more than 200 R&D projects and producing more than 1000 publications in the last 5 years.

Athena RC operates in three cities in Greece (Athens, Patras and Xanthi) under the auspices of the General Secretariat for Research & Innovation, Ministry of Development & Investments. It comprises three research institutes and six special purpose units.

The Institute for Language and Speech Processing (ILSP) of the Athena RC is a leading R&D organisation in the area of Language Technologies and a centre of excellence for basic and applied research in the field. ILSP conducts interdisciplinary research, drawing on two traditions, linguistics and information technologies. Its activities cover all aspects of the Language Technologies spectrum, from linguistic studies to signal processing, with special focus on Natural Language Processing, Multilingual Content Processing, Speech and Music Technology, Sign Language Technologies and Embodied Language Processing. To complement and support its research efforts in these areas, ILSP continuously invests in developing its Language Resources Infrastructure. 

The Natural Language Processing and Language Infrastructures (NLPLI) department of ILSP, led by Stelios Piperidis, is a core partner of the European Language Grid (ELG) and European Language Equality (ELE) projects, having assumed, among others, the responsibility for the ELG Catalogue design, implementation and its operations.

Stelios Piperidis: “At ILSP we feel very proud and honored to participate in and lead important tasks of the ELG and ELE projects. Building on the common achievements, knowledge and wisdom of the language technology community in infrastructural issues, we are developing the ELG platform, its components and underlying policies keeping in focus the evolving requirements of both providers and consumers. ELG aspires to pave the way towards digital language equality and support the ELE agenda by providing, among others, mechanisms to improve and continuously monitor the technological readiness of European languages.”

Next edition

The next ELT newsletter will be sent out on 7 December 2021. Until then, follow our ELT social media accounts (as linked below) for the latest news! 


Want to learn more? Visit https://european-language-technology.eu 
or contact us directly.
Website
YouTube
Twitter
LinkedIn
Copyright © 2021 ELE and ELG Consortium, All rights reserved.
Why did I get this email?
The European Language Grid is an initiative funded by the European Union’s Horizon 2020 programme under grant agreement № 825627 (ELG).
The European Language Equality Project has received funding from the European Union under the grant agreement № LC-01641480 – 101018166 (ELE)
Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.