*|MC:SUBJECT|*

Newsletter #28 – June 2023

Dear reader,

With META-FORUM 2023 just around the corner, we’re excited to meet you in person on 27 June in Brussels, to conclude our project European Language Equality (ELE) and also to publicly unveil the ELE book. If all goes according to plan, all participants of the conference will receive a copy of the book.

In this month’s newsletter, we’re providing insights into some of the conference topics. If you haven’t already done so, you can check out the full programme on the ELE website. If you’d like to participate, you can register free of charge.

As our monthly ELG resource, we’re featuring the NGT-Dutch Hotel Review Corpus that provides a parallel corpus of Dutch and Sign Language of the Netherlands.

In the section “From the SRIA”, we’re taking a look at the vision and recommendations for Data and Knowledge.

If you want to stay up to date with the latest developments around generative AI technology in Europe and beyond, also take a look at our curated press review section.

With best regards

Georg Rehm

Subscribe to the Common European Language Data Space (LDS) Newsletter

The European Language Data Space initiative that was started back in January 2023 recently launched its monthly newsletter, providing information on the latest developments in secure, privacy-preserving language data sharing and use across Europe.

We’d like to invite you to subscribe to the newsletter for updates on LDS implementation, success stories, events, and more!

Language Technology and NLP in the news

“These are the European companies rivalling ChatGPT to change the face of AI” – Euronews, 8 May 2023
“Who killed the EU’s translators?” – POLITICO, 12 May 2023
“On Device AI – Double-Edged Sword” – SemiAnalysis, 13 May 2023
“Europe takes aim at ChatGPT with what might soon be the West’s first A.I. law. Here’s what it means” – CNBC, 15 May 2023
“The race to bring generative AI to mobile devices” – Financial Times, 15 May 2023
“OpenAI readies new open-source AI model, The Information reports” – Reuters, 16 May 2023
“Why AI’s diversity crisis matters, and how to tackle it” – Nature, 19 May 2023
“G7 calls for developing global technical standards for AI” – Yahoo! Finance, 20 May 2023
“Generative AI: Europe’s glaring absence” – Innovation Origins, 21 May 2023
“The Horrific Content a Kenyan Worker Had to See While Training ChatGPT” – Slate, 21 May 2023
“The Dire Defect of ‘Multilingual’ AI Content Moderation” – Wired, 23 May 2023
“Google to work with Europe on stop-gap ‘AI Pact’” – TechCrunch, 24 May 2023
“OpenAI warns over split with Europe as regulation advances” – Financial Times, 25 May 2023
“Sam Altman shares his optimistic view of our AI future ” – TechCrunch, 26 May 2023
“Top AI researchers and CEOs warn against ‘risk of extinction’ in 22-word statement” – The Verge, 30 May 2023

Social media highlights

Are AI ethics dead? Alberto Romero offers some suggestions.
Why OpenAI's 'moat' is a fortress in the world of AI, despite skeptics and leaked memos.
Interesting analysis on Google's AI efforts, its response to competition, and the potential implications of AI on the tech industry as a whole
A new report from CDT examines new LLMs that companies claim can analyse text across languages.
Joe Scott explores the rapidly changing AI landscape.

Selected new tools and resources on the
European Language Grid

NGT-Dutch Hotel Review Corpus – This month’s selected resource is already making its second appearance in our Newsletter. A few months back, we introduced it as part of the selected FSTP projects. Now that the project is finished, the results are available on the ELG website as a parallel corpus of hotel reviews in written English, written Dutch and in Sign Language of the Netherlands (NGT) videos.

META-FORUM 2023

META-FORUM 2023 will take place on 27 June in Brussels, Belgium. We will present the final results of the European Language Equality (ELE) project and discuss all kinds of topics touching upon language technologies, language resources, language-centric AI and especially digital language equality. We will talk about the future of the sector and also present the new ELE Book. You can register for free here.

The final reports of the finished FSTP projects will be presented in Session 3 of META-FORUM 2023. After a first overview of the pilot projects, each representative is going to present their project’s results individually.

Session 4 will focus on European Large Language Models and will feature several speakers.
Jussi Karlgren from Silo.AI (Sweden) will share insights on industrial language models for a multilingual Europe. Pedro Ortiz (DFKI, Germany) will talk about the development of multilingual large language models. The presentation by Barry Haddow (University of Edinburgh, UK) will focus on the EU project high-performance language technologies (HPLT). Michael Granitzer (University of Passau, Germany) will discuss European web crawls and Large Language Models in the context of the OpenWebSearch EU project, which aims to promote Europe's independence in web search and create an open and human-centred search engine market. It seeks to develop a European Open Web Index (OWI) and an open Web Search and Analysis Infrastructure (OWSAI) based on European values, principles, legislation, ethics, and standards. The session will conclude with a question and answer segment, allowing participants to engage in discussions about the topics presented.

To have a look at the full programme, featured topics, and speakers for all the sessions, check out the META-FORUM 2023 section on the ELE website.

From the SRIA

Research Topic: Data and Knowledge

The availability of suitable language data is crucial for training and evaluating advanced Language Technology tools, especially in deep-learning paradigms where the size of the training dataset directly affects tool quality. However, the current lack of parity in language resources contributes to digital language inequalities, varying across the EU due to factors like the number of speakers, commercial interest, and data accessibility restrictions. Untapped potential exists in quality language data within EU public sectors, particularly in domains like medical, health, pharmaceutical, legal, finance, insurance, science, manufacturing, publishing, and others. The scarcity of data, along with the need for annotated and labelled data, poses challenges and costs for both the research and industry communities. Research is needed to develop faster, cheaper, and more reliable methods for generating multilingual datasets. Furthermore, efforts like the movement of FAIR Data and Principles (Findability, Accessibility, Interoperability, and Reuse of digital assets) and the EU's Data Spaces initiative aim to address data availability issues by promoting findability, accessibility, interoperability, and reuse of digital assets.

You can read more about all SRIA recommendations here or take a look at the full document.

If you would like to voice your support for the ELE Programme and its goal and vision to achieve digital language equality in Europe by 2030, please consider filling out the endorsement form by clicking the button below and become a listed supporter on the ELE website:

Click here to endorse the ELE SRIA

Upcoming Events

Large Language Models: Overview – Limitations – Opportunities, 6 June, Online
META-FORUM 2023, 27 June, Brussels, Belgium
1st European Summer School on Artificial Intelligence (ESSAI) & 20th Advanced Course on Artificial Intelligence (ACAI), 24 July - 28 July, Ljubljana, Slovenia
34th European Summer School in Logic, Language and Information (ESSLLI), 31 July - 11 August, Ljubljana, Slovenia

If you have an event that you think the European language technology community should know about, get in touch with us to have it featured in this newsletter.

Next edition

The next ELT newsletter will be sent out on 4 July 2023. Until then, follow our ELT social media accounts (as linked below) for the latest news!

Want to learn more? Visit https://european-language-technology.eu
or contact us directly.

The European Language Grid is an initiative funded by the European Union’s Horizon 2020 programme under grant agreement № 825627 (ELG).

The European Language Equality Project has received funding from the European Union under the grant agreement № LC-01641480 – 101018166 (ELE)

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.