Newsletter #29 – July 2023
|
|
|
Dear reader,
This edition of our newsletter is one of special occasions! META-FORUM 2023 took place last week and we consider it a great success. We would like to thank all attendees, speakers, and everyone who made the event possible.
With the release of the ELE book, which was also presented at META-FORUM 2023, this month also marks the end of the ELE 2 project that started on 1 July 2022. Revisit the project’s results on the ELE website and in the open access ELE book.
Since the end of the project also coincides with the anniversary of the end of the ELG project, we’re also taking a look at the current numbers of the European Language Grid and its resources.
Our curated news section is overflowing this month with European AI news, not least due to the topic of AI regulation and the AI act.
Finally, the European Parliament’s Panel for the Future of Science and Technology published a technical feasibility study for the development of a European streaming platform for European national news accessible in all EU languages. The study was prepared by a team of 21 researchers from six different research labs of the German Research Centre for Artificial Intelligence (DFKI).
With best regards
Georg Rehm
|
|
Subscribe to the Common European Language Data Space (LDS) Newsletter
The European Language Data Space initiative that was started back in January 2023 recently launched its monthly newsletter, providing information on the latest developments in secure, privacy-preserving language data sharing and use across Europe.
We’d like to invite you to subscribe to the newsletter for updates on LDS implementation, success stories, events, and more!
In case you missed CONNECT University’s and LDS’s workshop on Large Language Models on 6 June or want to revisit it, you can watch the entire stream on the DigitalEU YouTube channel:
-
Introduction to LLM's – Philippe Gelin (Head of Sector "Multilingualism", European Commission)
-
EC policies landscape on Language Technologies – Yvo Volman (Director G Data, European Commission)
-
Overview, Basic Concepts, Capabilities – Georg Rehm, Pedro Ortiz, Malte Ostendorff (German Research Centre for Artificial Intelligence, DFKI, Germany)
-
Limitations and Evaluation – Ondrej Bojar (Charles University, Czech Republic)
-
Legal Aspects in a Nutshell – Mickael Rigault (Evaluations and Language Resources Distribution Agency, ELDA, France)
-
Data Governance – Anna Rogers (IT University of Copenhagen, Denmark)
-
Ethical Aspects and Biases – Karën Fort (LORIA, Sorbonne University, France)
-
Enabling New and Boosting Existing Applications – Peter Sarlin (Silo.AI, Sweden)
|
|
Language Technology and NLP in the news
|
|
|
- “The Good, the Bad, and the Unexpected of Machine Translation in BLOOMChat” – Slator, 2 June 2023
-
“Romania's prime minister has hired the world's first AI government adviser. What will it do?” – Euronews, 3 June 2023
-
“Britain to host first global summit on artificial intelligence safety” – Reuters, 8 June 2023
-
“Macron Polishes France’s AI Agenda in Meeting With Meta, Google” – Yahoo! Finance, 9 June 2023
-
“UN chief backs idea of global AI watchdog like nuclear agency” – Reuters, 12 June 2023
-
“Meta and Deepmind alumni raise €105m seed round to build OpenAI rival Mistral” – Sifted, 13 June 2023
-
“Europeans Take a Major Step Toward Regulating A.I.” – New York Times, 14 June 2023
-
“The Who, Where, and How of Regulating AI” – IEEE Spectrum, 14 June 2023
-
“Germany’s BDÜ Issues Position Paper on AI’s Impact on the Translation Profession” – Slator, 14 June 2023
-
“The three challenges of AI regulation” – Brookings, 15 June 2023
-
“Google forced to delay Bard AI's EU launch over privacy concerns” – Engadget, 15 June 2023
-
“EU Starts Machine Translating Press Content After ‘Little Housekeeping Announcement’” – Slator, 16 June 2023
-
“France makes high-profile push to be the A.I. hub of Europe setting up challenge to U.S., China” – CNBC, 18 June 2023
-
“Five big takeaways from Europe’s AI Act” – MIT Technology Review, 19 June 2023
-
“UK’s ‘early access’ to OpenAI and DeepMind models is a double-edged sword” – The Next Web, 19 June 2023
-
“France’s Bid to Become Europe’s AI Hub: A Potential Challenge for the U.S” – Unite.AI, 19 June 2023
-
“Consumer group calls on EU to urgently investigate ‘the risks of generative AI’” – TechCrunch, 20 June 2023
-
“OpenAI lobbied the EU to avoid harsher AI regulations” – The Verge, 20 June 2023
-
“Current AI Language Models Fall Short of EU AI Legislation Requirements” – BlockchainReporter, 21 June 2023
-
“Students switch to AI to learn languages” – BBC, 22 June 2023
-
“The EU still needs to get its AI Act together” – The Verge, 29 June 2023
|
|
A year ago, on 1 July 2022, the ELG project ended. For this anniversary, we want to take a look at its current numbers. The selection of tools and resources has been growing since, and currently features a total of:
Furthermore, we’re happy to report that ELG has more than 1,000 users – the number is constantly growing.
We’re excited to see the continued use of the platform as an important space for Language Technologies in Europe.
|
|
Selected new tools and resources on the
European Language Grid
|
|
|
DaMuEL 1.0: A Large Multilingual Dataset for Entity Linking – This month’s resource is DaMuEL, a large Multilingual Dataset for Entity Linking containing data in 53 languages. DaMuEL consists of two components: a knowledge base that contains language-agnostic information about entities, including their claims from Wikidata and named entity types (PER, ORG, LOC, EVENT, BRAND, WORK_OF_ART, MANUFACTURED); and Wikipedia texts with entity mentions linked to the knowledge base, along with language-specific text from Wikidata such as labels, aliases, and descriptions, stored separately for each language. The Wikidata QID is used as a persistent, language-agnostic identifier, enabling the combination of the knowledge base with language-specific texts and information for each entity. The dataset contains 27.9M named entities in the knowledge base and 12.3G tokens from Wikipedia texts. The dataset is published under the CC BY-SA licence.
|
|
On 27 June, META-FORUM 2023 took place in Brussels, Belgium, with over 120 attendees, bringing together experts, stakeholders, and decision makers from academia, research, public administration and industry. At the conference, the ELE consortium members showcased key aspects of the project, placing emphasis on its outcomes such as the revised Strategic Research and Innovation Agenda (SRIA) and the roadmap aimed at achieving full digital language equality in Europe by 2030.
We want to thank everyone for attending, contributing, and participating in fruitful discussions at this year’s event.
Videos of all presentations and panels will be uploaded on youtube in the coming weeks, so stay tuned!
In the meantime, if you want to take another look at the full programme, featured topics, and speakers for all the sessions, check out the META-FORUM 2023 section on the ELE website.
|
|
|
Photo: The ELE2 consortium and representatives of the European Commission
(27 June 2023, Brussels)
|
|
Infrastructure Recommendations
The SRIA presents several infrastructure recommendations to pave the way for achieving digital language equality in Europe. One crucial recommendation is to strengthen existing research infrastructures and develop new ones, including language technology platforms. These infrastructures will support research and development activities, facilitate collaboration among stakeholders, foster knowledge sharing, and promote open access to data and technologies. Another important aspect is ensuring sufficient operational capacity, particularly for LLMs. It is crucial to allocate resources and investments to meet the computational requirements of these models effectively. Additionally, addressing the identified gaps in data, language resources, and knowledge graphs is essential. Europe should establish a clear trajectory toward comprehensive and interlinked data infrastructures that can bridge these gaps, promoting accessibility and integration.
To support the infrastructure requirements, flexible access to GPU-based high-performance computing facilities and a more suitable computing infrastructure is needed. These resources will ensure researchers and practitioners have the necessary computational power to develop and deploy advanced language technologies.
Finally, it is necessary to create a European network of centres of excellence in LT. Such a network would serve multiple purposes, including increasing industry visibility, shaping national research agendas, and implementing a European Data Strategy.
You can read more about all SRIA recommendations here or take a look at the full document.
|
|
If you have an event that you think the European language technology community should know about, get in touch with us to have it featured in this newsletter.
|
|
The ELT newsletter will take a short summer break now. In the meantime, follow our ELT social media accounts (as linked below) for the latest news!
|
|
|
|
|