“Tá torthaí ár gcuid taighde an-scanrúil. Tá formhór na dteangacha Eorpacha gann ar acmhainní agus tá cuid acu a bhfuil faillí iomlán déanta orthu. Ciallaíonn sé seo go bhfuil roinnt mhaith dár gcuid teangacha nach bhfuil cosanta don todhchaí”, a deir an tOll. Hans Uszkoreit, comhordaitheoir ar META-NET, stiúrthóir eolaíochta ag DFKI (Ionad Taighde don Intleacht Shaorga sa Ghearmáin) agus comheagarthóir ar an taighde seo. Cuireann an comheagarthóir eile ar an taighde seo, Dr. Georg Rehm (DFKI) leis seo ag rá: “tá difríochtaí ollmhóra idir leibhéal na tacaíochta teanga teicniúla atá ar fáil idir na teangacha éagsúla Eorpacha agus na réimsí teicneolaíochta. Tá an bhearna idir na teangacha ‘móra’ agus ‘beaga’ fós ag dul i méid. Caithimid a bheith cinnte de go gcuirimid na teicneolaíochtaí bunúsacha ar fáil do na teangacha atá gann ar acmhainní. Muna ndéanann, ní bheidh i ndán do na teangacha seo ach díothú digiteach.”
Is as réimse na teicneolaíochta teanga a eascraíonn bogearraí a dhéanann próiseáil ar an teanga labhartha agus scríofa. I measc na samplaí is cáiliúla de bhogearraí teicneolaíochta teanga tá na seiceálaithe litrithe agus gramadaí, cúntóirí pearsanta idirghníomhacha ar ghutháin chliste (cosúil le Siri ar an iPhone), córais dialóige a oibríonn ar an nguthán, córais aistriúcháin uathoibríocha, innill chuardaigh ar líne, agus guthanna sintéiseacha atá in úsáid i gcórais loingseoireachta i gcarranna. Sa lá atá inniu ann braitheann na córais teicneolaíochta teanga ar mhodhanna staitistiúla go príomha, agus bíonn gá acu siúd le bunachar sonraí scríofa nó labhartha atá ollmhór chun go n-oibreoidis. Tá sé an-deacair an méid sonraí a bhíonn ag teastáil a bhailiú, go háirithe do theangacha le líon beag cainteoirí dúchais. Sa bhreis air sin, tá caighdeán na gcóras teicneolaíochta teanga staitistiúla i bhfad ó bheith foirfe, rud a léirítear go minic sna haistriúcháin ghreannmhara a thagann ó chórais aistriúcháin uathoibríocha.
D’éirigh leis an Eoraip fáil réidh le nach mór gach uile teorainn idir a cuid tíortha. Tá teorainn amháin fós soiléir, áfach, agus tá an chuma ar an scéal go bhfuil sí dothreáite: seo teorainn dofheicthe na bacainní teanga a chuireann bac le saorshreabhadh eolais. Déanann sí dochar chomh maith don sprioc fadtéarmach atá ann margadh digiteach amháin a chruthú toisc go gcuireann sé bac le saorshreabhadh earraí, táirgí agus seirbhísí. Cé go bhfuil an fhéidearthacht ann go bhfaigheadh teicneolaíocht teanga réidh le bacainní teanga ag úsáid córais aistriúcháin uathoibríocha nua-aimseartha, léiríonn torthaí thaighde META-NET go soiléir nach bhfuil mórán de theangacha na hEorpa ullamh go fóill. Tá bearnaí suntasacha sa teicneolaíocht toisc go ndíríonn an chuid is mó den T&F ar an mBéarla, go bhfuil easpa gealltanais agus acmhainní airgeadais ann, agus go bhfuil easpa soiléireachta sa bhfís taighde agus teicneolaíochta.
Tá gá le hiarracht comhordaithe ar scála mór san Eoraip chun na teicneolaíochtaí atá in easnamh a chruthú as an nua chomh maith le teicneolaíochtaí atá ann cheana a aistriú go tromlach na dteangacha. Tá cúiseanna láidre ann chun dul i ngleic leis an dúshlán ollmhór seo mar iarracht phobail a chuimsíonn an tAontas Eorpach, a bhallstáit agus tíortha gaolmhara, maraon le hearnáil na tionsclaíochta. I measc na gcúiseanna seo tá an t-ualach ard airgeadais per capita do chomhphobail bheaga teanga; an riachtanas atá ann teicneolaíochtaí a aistriú idir teangacha éagsúla; an easpa idir-inoibritheachta i leith acmhainní, uirlisí agus seirbhísí; agus cúis eile ná nach i gcónaí a thiteann teorainneacha teanga agus teorainneacha polaitiúla san áit chéanna. Caithfidh an Eoraip beart a dhéanamh a cuid teangacha a ullmhú don ré digiteach. Is comhpháirt fíorluachmhara iad dár n-oidhreacht chultúrtha agus, mar sin, caithfear iad a chosaint don am atá amach romhainn. Ar an 26 Meán Fómhair déanann Comhairle na hEorpa ceiliúradh ar Lá Eorpach na dTeangacha, lá a thugann aird ar an dtábhacht a bhaineann le cothú agus le forbairt a dhéanamh ar an oidhreacht luachmhar teangeolaíoch agus cultúrtha atá againn ar an Mór-Roinn s’againne. Is meabhrú láidir í an obair atá á déanamh ag META-NET ar na dúshláin agus ar na féidearthachtaí atá le teacht dár n-oidhreacht teangeolaíoch in Aois na Faisnéise.
Teicneolaíocht Teanga: Cúlra
Tá teicneolaíocht teanga ina cabhair againn cheana féin leis na tascanna laethúla a dhéanaimid, cosúil le r-phoist a scríobh nó ticéid a cheannach. Bainimid leas as teicneolaíocht teanga nuair a bhímid ag cuardach agus ag aistriú leathanaigh gréasáin, ag úsáid seiceálaithe litrithe agus gramadaí na bpróiseálaithe focal, ag tabhairt orduithe labhartha dár gcórais siamsaíochta inár gcarranna nó ar ár ngutháin phóca, ag fáil moltaí ó shiopaí leabhar ar líne, nó ag leanúint na dtreoracha a ghlaonn feidhmchlár loingseoireachta soghluaiste amach. San am atá romhainn, beimid ábalta labhairt le cláracha ríomhairí chomh maith le meaisíní agus le fearais eile, lena n-áirítear na róbait a dhéanfaidh freastal orainn inár dtithe agus inár láithreacha oibre. Is cuma cén áit ina mbeimid, nuair a bheidh gá againn le heolas, ní bheidh le déanamh againn ach é a éileamh os ard. Athróidh an domhain ina bhfuilimid nuair a bhainfear an bac cumarsáide idir an duine daonna agus an teicneolaíocht.
Glactar leis go bhfuil teicneolaíocht teanga i measc na bpríomh-earnálacha fáis sa teicneolaíocht faisnéise sa lá atá inniu ann. Tá infheistíocht suntasach déanta ag corparáidí móra idirnáisiúnta cosúil le Google, Microsoft, IBM agus Nuance san earnáil seo. Tá na céadta gnóthais bheaga agus mheánmhéide san Eoraip a dhírigh ar fheidhmchláir nó ar sheirbhísí ar leith sa teicneolaíocht teanga. Cuireann teicneolaíocht teanga ar chumas daoine comhoibriú, foghlaim, gnó a dhéanamh agus eolas a roinnt lena chéile beag beann ar theorainnacha teanga agus neamhspleách ar a gcuid scileanna ríomhaireachta.
Sraith Páipéar Bhána META-NET
Déanann an tsraith Páipéar Bhána META-NET dar teideal “Teangacha i Sochaí Faisnéise na hEorpa” tuairisiú ar an staid ina bhfuil 30 teanga Eorpach maidir le Teicneolaíocht Teanga agus míníonn sé na rioscaí agus na deiseanna is práinní a ghabhann leo. Clúdaíonn an tsraith gach teanga oifigiúil de chuid Ballstáit an Aontais Eorpaigh chomh maith le roinnt eile teanga a labhraítear san Eoraip. Cé go bhfuil roinnt staidéar déanta atá luachmhar agus cuimsitheach maidir le gnéithe áirithe de theangacha agus den teicneolaíocht, níl aon choimre ann atá intuigthe a léiríonn na torthaí taighde agus na dúshláin do gach teanga i gcomhthéacs na hEorpa atá ilteangach le tacaíocht chuí teicneolaíochta. Líonann an tSraith Páipéar Bhána META-NET an bhearna seo. Tá sé ar chumas META-NET anois léiriú a dhéanamh ar an gcúis go bhfuil fadhbanna suntasacha os a gcomhar amach ag an chuid is mó de na teangacha agus na príomh-bhagairtí a aithint. Tá níos mó ná 200 údar agus rannpháirtí a chabhraigh in ullmhú na sraithe Páipéar Bán ar Theangacha.
Scríobhadh na páipéir bhána i leith na dteangacha Eorpacha seo a leanas: an Bhascais, an Bhulgáiris, an Chatalóinis, an Chróitis, an tSeicis, an Danmhairgis, an Ollainnis, an Béarla, an Eastóinis, an Fhionlainnis, an Fhraincis, an Ghailísis, an Ghearmáinis, an Ghréigis, an Ungáiris, an Íoslainnis, an Ghaeilge, an Iodáilis, an Laitvis, an Liotuáinis, an Mháltais, an Ioruais (bokmål agus nynorsk), an Pholainnis, an Phortaingéilis, an Rómáinis, an tSeirbis, an tSlóvaicis, an tSlóivéinis, an Spáinnis, agus an tSualainnis. Tá gach Páipéar Bán ar Theangacha scríofa sa teanga ar a bhfuil an tuairisc dírithe, le haistriúchán iomlán Béarla san áireamh leo ar fad.
Faoi META-NET agus META
Tá META-NET, Líonra Barr Feabhais Eorpach a chuimsíonn 60 ionad taighde i 34 tír, tiomnaithe tosú ar bhunús teicneolaíochta a thógáil do shochaí faisnéise ilteangach Eorpach. Tá META-NET comhmhaoinithe ag an gCoimisiún Eorpach trí cheithre thionscadal ar fad.
Tá META-NET ag múnlú META, an Comhaontas Teicneolaíochta don Eoraip Ilteangach. Tá níos mó ná 600 eagraíocht ó 55 tír éagsúla, lena n-áirítear lárionaid taighde, ollscoileanna, comhlachtaí beaga agus meánmhéide chomh maith le fiontair mhóra, tar éis dul i gcomhpháirt leis an gcomhaontas oscailte teicneolaíochta seo.
At Least 21 European Languages in Danger of Digital Extinction
Study by Europe’s Leading Language Technology Experts Warns Most European Languages Unlikely to Survive in the Digital Age
Most European languages face digital extinction, a new study by Europe’s leading Language Technology experts finds. Assessing the level of support through language technology for 30 of the approximately 80 European languages, the experts conclude that digital support for 21 of the 30 languages investigated is “non-existent” or “weak” at best. The study was carried out by META-NET, a European network of excellence that consists of 60 research centres in 34 countries.
The study, prepared by more than 200 experts and documented in 30 volumes of the META-NET White Paper Series (available both online and in print), assessed language technology support for each language in four different areas: automatic translation, speech interaction, text analysis and the availability of language resources. A total of 21 of the 30 languages (70%) were placed in the lowest category, “support is weak or non-existent” for at least one area by the experts. Several languages, for example, Icelandic, Latvian, Lithuanian and Maltese, receive this lowest score in all four areas. The Irish language receives the lowest score for technological readiness in all but one area. On the other end of the spectrum, the surprise is that no language was considered to have “excellent support”. Even English is assessed as having only “good support”, followed by languages such as Dutch, French, German, Italian and Spanish with “moderate support”. Languages such as Basque, Bulgarian, Catalan, Greek, Hungarian and Polish exhibit “fragmentary support”, placing them also in the set of high-risk languages.
“The results of our study are most alarming. The majority of European languages are severely under-resourced and some are almost completely neglected. In this sense, many of our languages are not yet future-proof.”, says Prof. Hans Uszkoreit, coordinator of META-NET, scientific director at DFKI (German Research Center for Artificial Intelligence) and co-editor of the study. The study’s other co-editor, Dr. Georg Rehm (DFKI), adds: “There are dramatic differences in language technology support between the various European languages and technology areas. The gap between ‘big’ and ‘small’ languages still keeps widening. We have to make sure that we equip all smaller and under-resourced languages with the needed base technologies, otherwise these languages are doomed to digital extinction.”
The field of language technology produces software that can process spoken or written human language. Well-known examples of language technology software include spell and grammar checkers, interactive personal assistants on smartphones (such as Siri on the iPhone), dialogue systems that work over the phone, automatic translation systems, web search engines, and synthetic voices used in car navigation systems. Today language technology systems primarily rely on statistical methods that require incredibly large amounts of written or spoken data. Especially for languages with relatively few speakers it is difficult to acquire the needed mass of data. Furthermore, statistical language technology systems have inherent limits in their quality, as can be seen, for example, in the often amusing incorrect translations produced by online machine translation systems.
Europe has succeeded in removing almost all borders between its countries. One border still exists, however, and it seems to be impenetrable: the invisible border of language barriers is one that hinders the free flow of knowledge and information. It also harms the long-term goal of establishing a single digital market because it hinders the free flow of goods, products, and services. While language technology has the potential to get rid of language barriers through modern machine translation systems, the results of the META-NET study clearly show that many of European languages are not yet ready. There are significant gaps in technology due to the English-language focus of most R&D, a lack of commitment and financial resources, and also a lack of a clear research and technology vision.
A coordinated, large-scale effort has to be made in Europe to create the missing technologies as well as transfer technology to the majority of languages. There are strong reasons for approaching this immense challenge in a community effort involving the European Union, its member states and associated countries, as well as industry. These reasons include the high per-capita financial burden for smaller language communities; the needed transfer of technologies between languages; the lack of interoperability of resources, tools, and services; and the fact that linguistic borders often do not coincide with political borders. Europe must take action to prepare its languages for the digital age. They are a precious component of our cultural heritage and, as such, they deserve future-proofing. On September 26 the Council of Europe marks the European Day of Languages, a day which recognises the importance of fostering and developing the rich linguistic and cultural heritage of our continent. META-NET’s work is a stark reminder of the challenges and possibilities facing our linguistic heritage in the information age.
Language Technology: Background
Language technology already supports us in everyday tasks, such as writing e-mails or buying tickets. We benefit from language technology when searching for and translating web pages, using a word processor’s spell and grammar checking features, operating our car’s entertainment system or our mobile phone with spoken commands, getting recommendations in an online book-store, or following the instructions spoken by a mobile navigation app. In the near future, we will be able to talk to computer programs as well as machines and appliances, including the long-awaited service robots that will soon enter our homes and work places. Wherever we are, when we need information, we will simply ask for it, and, when we need help, we will demand it out loud. Removing the communication barrier between people and technology will change our world.
Language technology is generally acknowledged today as one of the key growth areas in information technology. Large international corporations such as Google, Microsoft, IBM, and Nuance have invested substantially in this area. In Europe, hundreds of small and medium enterprises have specialized in certain language technology applications or services. Language technology allows people to collaborate, learn, do business, and share knowledge across language borders and independently of their computer skills.
The META-NET White Paper Series
The META-NET White Paper series “Languages in the European Information Society” reports on the state of 30 European languages with respect to Language Technology and explains the most urgent risks and chances. The series covers all official EU Member State languages and several other languages spoken in Europe. While there have been a number of valuable and comprehensive scientific studies on certain aspects of languages and technology, until now there has been no generally understandable compendium that presents the main findings and challenges for each language with regard to a technology-supported multilingual Europe. The META-NET White Paper Series fills this gap. META-NET can now show why most languages face serious problems and pinpoint the most threatening gaps. In total, more than 200 authors and contributors helped preparing the Language White Papers.
The white papers were written for the following European languages: Basque, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hungarian, Icelandic, Irish, Italian, Latvian, Lithuanian, Maltese, Norwegian (bokmål and nynorsk), Polish, Portuguese, Romanian, Serbian, Slovak, Slovene, Spanish, and Swedish. Each Language White Paper is written in the language it reports upon and includes a complete English translation.
About META-NET and META
META-NET, a Network of Excellence consisting of 60 research centres from 34 countries, is dedicated to building the technological foundations of a multilingual European information society. META-NET is co-funded by the European Commission through a total of four projects.
META-NET is forging META, the Multilingual Europe Technology Alliance. More than 600 organisations from 55 countries, including research centres, universities, small and medium companies as well as several big enterprises, have already joined this open technology alliance.