Newsletter #26 – April 2023
|
|
|
Dear reader,
the international Language Technology news landscape continues to be enriched by the growing mainstream presence of competing LLMs, the release of GPT-4, and the many questions regarding their capabilities and potential pitfalls. Still, Europe and its languages remain our focus, with plenty of interesting news to share!
We want to remind you again that the registration for META-FORUM 2023 is open! The conference will take place on 27 June in Brussels and present the final results of the European Language Equality project, among many other things. Please save the date and register your attendance if you want to participate in the final ELE conference.
The Center for Advanced Internet Studies (CAIS) has announced a CfA, providing funding for research sabbaticals and interdisciplinary cooperations.
In our ELG Tools and Resources section, we’re introducing a Spanish language tool that is able to detect sensitive information in documents, extract sensitivity indicators and provide a sensitivity score.
We’re also taking a look at two selected FSTP projects: Computing facilities for LT, a project by the Faculty of Humanities and Social Sciences at the University of Zagreb, and Multilingual and Mixed Language Data for Inclusive Speech Technology by the Universiteit Gent.
In the section “From the SRIA”, we present the Text analytics and NLP Recommendations included in the Strategic Research, Innovation and Implementation Agenda and Roadmap.
With best regards
Georg Rehm
|
|
Language Technology and NLP in the news
|
|
|
- “The inside story of how ChatGPT was built from the people who made it” – MIT Technology Review, 3 March 2023
-
“Romania Introduces AI Government Adviser Ion” – Voicebot, 3 March 2023
-
“ChatGPT broke the EU plan to regulate AI” – Politico, 3 March 2023
-
“Meet the companies trying to keep up with ChatGPT” – The Verge, 5 March 2023
-
“EU AI Act: ChatGPT stirs up legal debate on generative models” – TechHQ, 7 March 2023
-
“Noam Chomsky: The False Promise of ChatGPT” – The New York Times, 8 March 2023
-
“10 unusual facts about European languages you didn't know” – The Brussels Times, 10 March 2023
-
“Using AI to predict the Oscars (and maybe even save humanity)” – VentureBeat, 11 March 2023
-
“GPT-4 Has Arrived — Here’s What You Should Know” – Medium, 14 March 2023
-
“How Our Native Language Shapes Our Brain Wiring” – Neuroscience News, 17 March 2023
-
“Google releases Bard, a competitor to ChatGPT, Claude and Bing Chat” – VentureBeat, 21 March 2023
-
“Explainer: What is the European Union AI Act?” – Reuters, 22 March 2023
-
“The criminal use of ChatGPT – a cautionary tale about large language models” – Europol, 27 March 2023
-
“How ChatGPT and Bard Performed as My Executive Assistants” – The New York Times, 29 March 2023
-
“Open letter calling for AI ‘pause’ shines light on fierce debate around risks vs. hype” – VentureBeat, 29 March 2023
|
|
CfA Funding for Research Sabbaticals and Cooperations at CAIS
|
|
|
Looking for funding to support your research sabbatical or interdisciplinary collaboration for your language technology projects? Consider applying for a fellowship or working group at the Center for Advanced Internet Studies (CAIS) in Bochum, Germany. As a fellow, you'll receive financial support for your sabbatical leave, research expenses, and even comfortable apartments during your six or three month stay. Alternatively, working groups of up to twelve members can come together for joint projects for up to three weeks, with travel and accommodation expenses covered by CAIS. The next application deadline is April 30, 2023 and the program is open to scholars and practitioners at all career stages and in all disciplines.
For more information and to apply, visit https://www.cais-research.de/en/cais-college/fellowships/ or https://www.cais-research.de/en/cais-college/working-groups/.
If you have any questions, contact esther.laufer@cais-research.de.
|
|
Selected new tools and resources on the
European Language Grid
|
|
|
FARO – FARO is a Spanish language tool that detects sensitive information in documents of an organisation. It is designed to be used by small companies and individuals that want to track sensitive documents inside an organisation but who might lack time and resources to configure complex data protection tools. FARO extracts sensitivity indicators from documents (e.g. document IDs, monetary quantities, personal emails) and provides a sensitivity score for the document (from low to high) using frequency and type of the indicators in the document.
|
|
Our next conference – META-FORUM 2023 – will take place on 27 June in Brussels, Belgium. We will present the final results of the European Language Equality project and discuss all kinds of topics touching upon language technologies, language resources, language-centric AI and especially digital language equality. We will talk about the future of the sector and also present the new ELE Book. You can register for free here.
We also want to introduce another two selected FSTP projects:
Computing facilities for LT is a project by the Faculty of Humanities and Social Sciences at the University of Zagreb. It aims to study the feasibility of existing High-Performance Computing services in supporting Language Technology and NLP. It analyses various HPC setups, evaluating factors such as GPU capacity, access protocols, and compatibility with large neural models. The project's output is a detailed report on the HPC infrastructure available in the EU, along with an interactive website that recommends HPC instances based on user requirements. It maps existing HPC initiatives and their technicalities in order to help users understand and choose suitable HPC services. It targets not only SMEs but also European researchers working on NLP/LT problems, promoting digital language equality. The project will be carried out by enumerating various HPC services in the EU, thoroughly analysing each platform's aspects such as capacity, accessibility, hardware specifications, software support, and helpdesk support. Finally, a website will be created to display multiple HPC options based on user requirements, serving as a point of contact for those interested in using HPC platforms.
The project Multilingual and Mixed Language Data for Inclusive Speech Technology by the Universiteit Gent focuses on collecting conversational and mixed language data from a multilingual immigrant community in Belgium, which has three official languages (Dutch, French, German) and a significant Turkish immigrant population. Most speech technologies are built with monolingual assumptions, making them unable to handle multilingual and mixed language communication. By collecting and transcribing multilingual (Turkish, Dutch, English) speech data, the project aims to develop language technologies that serve the needs and preferences of multilingual community members. The collected speech and transcribed textual data can be used to develop new speech technologies, make dialectal comparisons, and set standards for collecting and transcribing multilingual data for immigrant and underrepresented communities across Europe. The data collection involves recruiting and compensating Turkish-Dutch bilingual students for participating in naturalistic conversations that are audio-recorded. Since no software exists for automatically transcribing the multilingual data, multilingual student assistants will be recruited for manual transcription using PRAAT software. The project will also develop transcription guidelines for multilingual data and share them with the community as an example for future studies.
|
|
Research Topic: Text Analytics and NLP Recommendations
The SRIA recommendations for Text Analytics and Natural Language Understanding include increasing the adoption of self-supervised, zero-shot, and few-shot learning approaches. They also emphasise supporting research that integrates speech, NLP, contextual information, and additional modes of perception. Strengthening basic research in neurosymbolic approaches to NLP/NLU, such as grounding and using human-understandable databases, is also recommended. Creating large open-access language models for all European languages, datasets, multilingual models, and models incorporating symbolic knowledge and discourse features is encouraged. The recommendations also advocate for progress in reinforcement-based learning, novel dialogue management strategies, situation-aware natural language generation, and interdisciplinary research to better model multimodal environments.
You can read more about all SRIA recommendations here or take a look at the full document.
|
|
If you would like to voice your support for the ELE Programme and its goal and vision to achieve digital language equality in Europe by 2030, please consider filling out the endorsement form by clicking the button below and become a listed supporter on the ELE website:
|
|
|
- Workshop on Profiling Second Language Vocabulary and Grammar, 20-21 April, Gothenburg, Sweden
-
10th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, 21-23 April, Poznań, Poland
-
3rd International Conference ‘Language in the Human-Machine Era’ (LITHME), 15-16 May, Groningen, Netherlands
-
META-FORUM 2023, 27 June, Brussels, Belgium
If you have an event that you think the European language technology community should know about, get in touch with us to have it featured in this newsletter.
|
|
The next ELT newsletter will be sent out on 2 May 2022. Until then, follow our ELT social media accounts (as linked below) for the latest news!
|
|
|
|
|