Newsletter #26 – April 2023
Dear reader,

the international Language Technology news landscape continues to be enriched by the growing mainstream presence of competing LLMs, the release of GPT-4, and the many questions regarding their capabilities and potential pitfalls. Still, Europe and its languages remain our focus, with plenty of interesting news to share!

We want to remind you again that the registration for META-FORUM 2023 is open! The conference will take place on 27 June in Brussels and present the final results of the European Language Equality project, among many other things. Please save the date and register your attendance if you want to participate in the final ELE conference.

The Center for Advanced Internet Studies (CAIS) has announced a CfA, providing funding for research sabbaticals and interdisciplinary cooperations.

In our ELG Tools and Resources section, we’re introducing a Spanish language tool that is able to detect sensitive information in documents, extract sensitivity indicators and provide a sensitivity score.

We’re also taking a look at two selected FSTP projects: Computing facilities for LT, a project by the Faculty of Humanities and Social Sciences at the University of Zagreb, and Multilingual and Mixed Language Data for Inclusive Speech Technology by the Universiteit Gent.

In the section “From the SRIA”, we present the Text analytics and NLP Recommendations included in the Strategic Research, Innovation and Implementation Agenda and Roadmap.

With best regards

Georg Rehm
 
Language Technology and NLP in the news
Social media highlights
CfA Funding for Research Sabbaticals and Cooperations at CAIS

Looking for funding to support your research sabbatical or interdisciplinary collaboration for your language technology projects? Consider applying for a fellowship or working group at the Center for Advanced Internet Studies (CAIS) in Bochum, Germany. As a fellow, you'll receive financial support for your sabbatical leave, research expenses, and even comfortable apartments during your six or three month stay. Alternatively, working groups of up to twelve members can come together for joint projects for up to three weeks, with travel and accommodation expenses covered by CAIS. The next application deadline is April 30, 2023 and the program is open to scholars and practitioners at all career stages and in all disciplines. 

For more information and to apply, visit https://www.cais-research.de/en/cais-college/fellowships/ or https://www.cais-research.de/en/cais-college/working-groups/

If you have any questions, contact esther.laufer@cais-research.de.
Selected new tools and resources on the
European Language Grid
FARO – FARO is a Spanish language tool that detects sensitive information in documents of an organisation. It is designed to be used by small companies and individuals that want to track sensitive documents inside an organisation but who might lack time and resources to configure complex data protection tools. FARO extracts sensitivity indicators from documents (e.g. document IDs, monetary quantities, personal emails) and provides a sensitivity score for the document (from low to high) using frequency and type of the indicators in the document.
 
General news

Our next conference – META-FORUM 2023 – will take place on 27 June in Brussels, Belgium. We will present the final results of the European Language Equality project and discuss all kinds of topics touching upon language technologies, language resources, language-centric AI and especially digital language equality. We will talk about the future of the sector and also present the new ELE Book. You can register for free here.

We also want to introduce another two selected FSTP projects:

Computing facilities for LT is a project by the Faculty of Humanities and Social Sciences at the University of Zagreb. It aims to study the feasibility of existing High-Performance Computing services in supporting Language Technology and NLP. It analyses various HPC setups, evaluating factors such as GPU capacity, access protocols, and compatibility with large neural models. The project's output is a detailed report on the HPC infrastructure available in the EU, along with an interactive website that recommends HPC instances based on user requirements. It maps existing HPC initiatives and their technicalities in order to help users understand and choose suitable HPC services. It targets not only SMEs but also European researchers working on NLP/LT problems, promoting digital language equality. The project will be carried out by enumerating various HPC services in the EU, thoroughly analysing each platform's aspects such as capacity, accessibility, hardware specifications, software support, and helpdesk support. Finally, a website will be created to display multiple HPC options based on user requirements, serving as a point of contact for those interested in using HPC platforms.

The project Multilingual and Mixed Language Data for Inclusive Speech Technology by the Universiteit Gent focuses on collecting conversational and mixed language data from a multilingual immigrant community in Belgium, which has three official languages (Dutch, French, German) and a significant Turkish immigrant population. Most speech technologies are built with monolingual assumptions, making them unable to handle multilingual and mixed language communication. By collecting and transcribing multilingual (Turkish, Dutch, English) speech data, the project aims to develop language technologies that serve the needs and preferences of multilingual community members. The collected speech and transcribed textual data can be used to develop new speech technologies, make dialectal comparisons, and set standards for collecting and transcribing multilingual data for immigrant and underrepresented communities across Europe. The data collection involves recruiting and compensating Turkish-Dutch bilingual students for participating in naturalistic conversations that are audio-recorded. Since no software exists for automatically transcribing the multilingual data, multilingual student assistants will be recruited for manual transcription using PRAAT software. The project will also develop transcription guidelines for multilingual data and share them with the community as an example for future studies.
From the SRIA
Research Topic: Text Analytics and NLP Recommendations

The SRIA recommendations for Text Analytics and Natural Language Understanding include increasing the adoption of self-supervised, zero-shot, and few-shot learning approaches. They also emphasise supporting research that integrates speech, NLP, contextual information, and additional modes of perception. Strengthening basic research in neurosymbolic approaches to NLP/NLU, such as grounding and using human-understandable databases, is also recommended. Creating large open-access language models for all European languages, datasets, multilingual models, and models incorporating symbolic knowledge and discourse features is encouraged. The recommendations also advocate for progress in reinforcement-based learning, novel dialogue management strategies, situation-aware natural language generation, and interdisciplinary research to better model multimodal environments.

You can read more about all SRIA recommendations here or take a look at the full document.
If you would like to voice your support for the ELE Programme and its goal and vision to achieve digital language equality in Europe by 2030, please consider filling out the endorsement form by clicking the button below and become a listed supporter on the ELE website:
Click here to endorse the ELE SRIA
Upcoming Events

If you have an event that you think the European language technology community should know about, get in touch with us to have it featured in this newsletter.
 

Next edition

The next ELT newsletter will be sent out on 2 May 2022. Until then, follow our ELT social media accounts (as linked below) for the latest news!


Want to learn more? Visit https://european-language-technology.eu 
or contact us directly.
Website
YouTube
Twitter
LinkedIn
Copyright © 2022 ELE and ELG Consortium, All rights reserved.
Why did I get this email?
The European Language Grid is an initiative funded by the European Union’s Horizon 2020 programme under grant agreement № 825627 (ELG).
The European Language Equality Project has received funding from the European Union under the grant agreement № LC-01641480 – 101018166 (ELE)
Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.