Close Menu
  • Breaking News
  • Business
  • Career
  • Sports
  • Climate
  • Science
    • Tech
  • Culture
  • Health
  • Lifestyle
  • Facebook
  • Instagram
  • TikTok
Categories
  • Breaking News (5,142)
  • Business (314)
  • Career (4,365)
  • Climate (215)
  • Culture (4,332)
  • Education (4,550)
  • Finance (205)
  • Health (863)
  • Lifestyle (4,218)
  • Science (4,237)
  • Sports (335)
  • Tech (175)
  • Uncategorized (1)
Hand Picked

U.S. flight cancellations begin after FAA shutdown order

November 7, 2025

Life after: Escaping the Taliban

November 7, 2025

Second New Glenn launch set for Nov. 9

November 7, 2025

CapeNews.netBHS Gets Ongoing $50,000 Grant For Career Pathways ProgramThanks to the $5.4 million Reimagining High School Initiative, authorized by Governor Maura T. Healey's administration, Bourne High School….7 hours ago

November 7, 2025
Facebook X (Twitter) Instagram
  • About us
  • Contact us
  • Disclaimer
  • Privacy Policy
  • Terms and services
Facebook X (Twitter) Instagram
onlyfacts24
  • Breaking News

    U.S. flight cancellations begin after FAA shutdown order

    November 7, 2025

    An Amelia Earhart search expedition to Nikumaroro Island is delayed until 2026

    November 7, 2025

    Tottenham vs Manchester United: Premier League – team news, start, lineups | Football News

    November 7, 2025

    India’s auto sales soar 40% as festival demand lift sentiment

    November 7, 2025

    37-day government shutdown sees bipartisan efforts for breakthrough deal

    November 7, 2025
  • Business

    SAP Concur Global Business Travel Survey in 2025

    November 4, 2025

    Global Topic: Panasonic’s environmental solutions in China—building a sustainable business model | Business Solutions | Products & Solutions | Topics

    October 29, 2025

    Google Business Profile New Report Negative Review Extortion Scams

    October 23, 2025

    Land Topic is Everybody’s Business

    October 20, 2025

    Global Topic: Air India selects Panasonic Avionics’ Astrova for 34 widebody aircraft | Business Solutions | Products & Solutions | Topics

    October 19, 2025
  • Career

    CapeNews.netBHS Gets Ongoing $50,000 Grant For Career Pathways ProgramThanks to the $5.4 million Reimagining High School Initiative, authorized by Governor Maura T. Healey's administration, Bourne High School….7 hours ago

    November 7, 2025

    Career Minded | 2025 | News & Stories

    November 7, 2025

    Nancy Pelosi announces retirement after historic career in Congress

    November 7, 2025

    Career Paths: Shannon Porter’s Unconventional Journey to Finding Her Passion | Leeds School of Business

    November 7, 2025

    ‘Find Your Inspiration’ event helps Green Bay 8th graders explore interests, career paths

    November 7, 2025
  • Sports

    Thunder’s Nikola Topic diagnosed with testicular cancer – NBC Boston

    November 6, 2025

    Bozeman Daily ChronicleThunder guard Nikola Topic diagnosed with testicular cancer and undergoing chemotherapyOKLAHOMA CITY (AP) — Oklahoma City Thunder guard Nikola Topic has been diagnosed with testicular cancer and is undergoing chemotherapy..3 days ago

    November 3, 2025

    Thunder guard Nikola Topić diagnosed with testicular cancer, will undergo chemotherapy

    November 3, 2025

    Thunder guard Nikola Topic diagnosed with testicular cancer and undergoing chemotherapy | Sports

    November 2, 2025

    Thunder guard Nikola Topic diagnosed with testicular cancer and undergoing chemotherapy | Sports

    November 2, 2025
  • Climate

    NAVAIR Open Topic for Logistics in a Contested Environment”

    November 5, 2025

    Climate-Resilient Irrigation

    October 31, 2025

    PA Environment & Energy Articles & NewsClips By Topic

    October 26, 2025

    important environmental topics 2024| Statista

    October 21, 2025

    World BankDevelopment TopicsProvide sustainable food systems, water, and economies for healthy people and a healthy planet. Agriculture · Agribusiness and Value Chains · Climate-Smart….2 days ago

    October 20, 2025
  • Science
    1. Tech
    2. View All

    Google to add ‘What People Suggest’ in when users will search these topics

    November 1, 2025

    It is a hot topic as Grok and DeepSeek overwhelmed big tech AI models such as ChatGPT and Gemini in ..

    October 24, 2025

    Countdown to the Tech.eu Summit London 2025: Key Topics, Speakers, and Opportunities

    October 23, 2025

    The High-Tech Agenda of the German government

    October 20, 2025

    Second New Glenn launch set for Nov. 9

    November 7, 2025

    Issue with Atlas 5 booster liquid oxygen vent valve causes second scrub of ViaSat-3 F2 launch – Spaceflight Now

    November 7, 2025

    SpaceX launches 28 Starlink satellites on Falcon 9 rocket from Vandenberg SFB – Spaceflight Now

    November 7, 2025

    See stunning photos of November’s full ‘Beaver Moon’ — the biggest supermoon of 2025

    November 7, 2025
  • Culture

    Austin food news: Mediterranean cuisine, steak, and whiskey

    November 7, 2025

    Advancing harmony between nature and culture in the Mediterranean

    November 7, 2025

    Pumpkin Smash – Illinois Times, the capital city’s weekly source of news, politics, arts, entertainment, culture

    November 7, 2025

    New course dives into Korean culture

    November 7, 2025

    Mayo High School students celebrate “Culture Day” – ABC 6 News

    November 7, 2025
  • Health

    Hot Topic, Color Health streamline access to cancer screening

    November 6, 2025

    Health insurance coverage updates the topic of Penn State Extension webinar

    November 5, 2025

    Hot Topic: Public Health Programs & Policy in Challenging Times

    November 5, 2025

    Hot Topic: Public Health Programs & Policy in Challenging Times

    November 2, 2025

    Help us Rank the Top Ten Questions to Advance Women’s Health Innovation – 100 Questions Initiative – CEPS

    November 1, 2025
  • Lifestyle
Contact
onlyfacts24
Home»Culture»Unlocking data to advance European commerce and culture
Culture

Unlocking data to advance European commerce and culture

July 22, 2025No Comments
Facebook Twitter Pinterest LinkedIn Tumblr Email
6014448 q1 fy26 eu security blog header v3.png
Share
Facebook Twitter LinkedIn Pinterest Email

Editor’s Note: This blog is also available in Italian, Spanish, French, and German.

Europe is home to more than 200 languages and a rich cultural legacy that spans thousands of years, preserved in millions of cultural assets that tell the story of its people. But these languages are more than carriers of heritage and history—they support both culture and commerce by making it possible for people to connect, create, and do business.

Yet, as the world digitizes, much of Europe’s linguistic and cultural diversity risks being left behind. The majority of online web content—the primary source of training data for today’s Large Language Models (LLMs)—is in English. Much of it reflects an American perspective. The European Commission has warned that the continent’s ambition to digitize its vast cultural corpus remains “significantly out of reach.” As Europe’s leaders have recognized, without urgent action, this imbalance is not just a cultural concern—it’s a commercial one. AI that doesn’t understand Europe’s languages, histories, and values can’t fully serve its people, its businesses, or its future.

That’s why today in Paris, we’re deepening our commitment to Europe’s digital future with two new initiatives focused on making what’s uniquely European more open and accessible—its languages and culture. This builds on our European Digital Commitments, announced earlier this year, to expand AI and cloud infrastructure, strengthen digital resilience and data privacy protections, enhance cybersecurity, and support Europe’s digital sovereignty and broader economy.

First, to support the development of more multilingual LLMs in Europe and for Europe, we’re basing employees from two of our innovation centers in Strasbourg, France—long a crossroads of cultures and now home to key European institutions. These centers will help expand the availability of multilingual data for AI development—leveraging Microsoft Azure, our technical expertise, and partnerships across Europe to promote more inclusive language representation in AI models. As part of this effort, we’re also issuing a call for proposals to help expand the supply of digital content for 10 European languages.

Second, to help ensure Europe’s cultural richness is represented and accessible in the digital realm, we’re expanding Microsoft’s Culture AI initiative, which helps to safeguard languages, landmarks, and artifacts through digital replicas and data collaboration. Since 2019, Microsoft has digitally preserved heritage including Ancient Olympia in Greece, Mount St. Michel in France, St. Peter’s Basilica in Rome, and the 80th Anniversary of the Allied Beach Landings in Normandy, to name a few. Today we’re announcing that this fall, Microsoft will begin work with the French Ministry of Culture and the French firm Iconem to create a digital replica of Notre Dame—Paris’ newly restored, 862-year-old Gothic masterpiece.

This type of support for Europe and its diversity is not new to Microsoft. These latest steps to support languages and culture are informed by our more than 40 years of experience serving countries and cultures across Europe and around the world. Early on, we learned that empowering every person on the planet requires that the technologies we offer must be available in the languages the world speaks. That is why today Windows supports over 90 languages, including all official European Union languages as well as languages including Basque, Catalan, Galician, Luxembourgish, Valencian, and more. Microsoft 365 also has a broad reach, with support through Office applications in more than 30 European languages, including all official languages of the European Union.

The urgency of bridging the language gap

The European Union has 24 official languages, with dozens more acknowledged at the national or regional level. Yet many of these languages—even those that are part of the official 24, like Danish, Finnish, Swedish, and Greek—represent less than 0.6% of web content. Others, such as Maltese, Irish, Estonian, Latvian, and Slovenian, are barely visible online. While only 5% of the world’s population speaks English as a first language, English text makes up half of web content, dominating the data used to train AI models.

Circular chart illustrating the representation of European languages on the internet. English occupies the largest segment, followed by Spanish, French, German, Russian, Portuguese, and other languages. The chart is titled 'How well represented is each European language on the internet

This digital underrepresentation has real consequences, as LLMs rely heavily on web content for training. When a language lacks sufficient online presence, it risks being excluded from future AI services. While larger, general-purpose models can handle multiple languages, they can still miss the linguistic nuance, cultural context, and regional depth needed for truly inclusive applications. LLMs trained on limited data are less accurate, have higher hallucinations and errors, struggle with vocabulary, and reflect more bias.[1]

As an example, Llama 3.1, a popular open source model, shows a performance gap of more than 15 percentage points between answering in English and Greek and a gap of more than 25 points when comparing English to Latvian. This mean that if this model was a high school student, she would be at the top of her class in English but at the middle of her class in Greek and at the bottom in Latvian. And this disparity between languages is seen in all major performance LLM tests.[2]

Scatter plot titled 'GSM8K performance vs CommonCrawl for low resource European languages.' The x-axis shows Common Crawl Percentage, and the y-axis shows EU21-GSM8K Performance. Points represent languages such as Swedish, Danish, Romanian, Hungarian, and others. A dotted trend line with R² = 0.8156 indicates a strong positive correlation.

In many cases, languages with deep cultural heritage, such as Breton, Occitan, and Romansh, which UNESCO classifies as endangered, are largely unsupported in today’s mainstream AI systems.

The economic power of language

This lopsided development of language models has real economic consequences. When AI systems can’t understand or respond in a region’s language, they limit access to services and opportunities, undermining both local businesses and broader economic growth.

Broad AI diffusion—adoption and use across economies—will be one of the most important drivers of innovation and productivity growth over the next decade. Like electricity and other general-purpose technologies in the past, AI represents the next stage of industrialization.

For communities whose languages are underrepresented online, the benefits of AI risk remaining out of reach. Imagine a small business owner in Malta who speaks only Maltese. Currently, the advanced AI tools for tasks like market analysis or content generation likely don’t operate in Maltese, limiting how this entrepreneur can leverage AI. Or consider a Polish-speaking student in a town outside Warsaw who can’t find AI educational resources in his language, potentially impacting learning opportunities. And even when an AI platform nominally supports a language, the experience may be sub-par.

European governments and institutions have recognized the importance of addressing this situation. To drive economic competitiveness in the AI era, Europe will need to break down the language barriers and spur AI diffusion across the continent. According to the European Commission, only 13.5% of EU businesses use AI. The EU AI Continent Action Plan notes that breaking down language barriers in the single market could boost intra-EU trade by up to EUR 360 billion.

New steps to address language gaps

To help bridge this language gap, Microsoft will collaborate with European partners to increase the availability of multilingual data. In partnership with the ICube Laboratory at the University of Strasbourg—an institution dedicated to engineering, computer science, and imaging—we will support AI training efforts by placing personnel from the Microsoft Open Innovation Center (MOIC) and our AI for Good Lab in Strasbourg, France. This team will be backed by a global internal network of more than 70 Microsoft engineers, data scientists, and policy professionals. This collaboration between the MOIC, Microsoft AI for Good Lab, and the University of Strasbourg will also fund two post-doctoral researchers and provide up to US $1 million in Azure credits.

This team will start by tapping into Microsoft’s own store of multilingual data, making it accessible and transparent to the European public, including open source developers. This includes, for example, multilingual text data from GitHub and voice data sets. MOIC and GitHub will partner with Hugging Face, a popular collaboration platform for AI model development, to host and make the data broadly accessible. This builds on our existing relationship with Hugging Face to make a broad range of open models in the Hugging Face model collection available for 1-click deployment in the Azure Model Catalogue. This includes last week’s release of the latest contributions toward multilingual AI—the SmoILM3 model, a highly efficient 3B model parameter multi-lingual model with support for 6 languages: English, French, Spanish, German, Italian, and Portuguese.

MOIC will also partner with Common Crawl, one of the largest free and open repositories of web crawled data. MOIC will fund work at Common Crawl, leveraging native speakers to annotate and seed European language data in the publicly available Common Crawl data set.

In addition, the MOIC and the AI for Good Lab will issue a call for proposals to help expand the supply of digital content for 10 European languages by making their text collections available responsibly and ethically on their own terms for multilingual AI development and experiences. Applications for grants will be available on the AI for Good Lab website, beginning on 1 September 2025. In selecting recipients, the MOIC and the AI for Good Lab will focus on opportunities to unlock data in languages with relatively low representation in online content, such as Estonian, Alsatian, Slovak, Greek, and Maltese. Grants will provide recipients with Azure credits and engineering and technical support.

While more multilingual data is essential, better technology tools and know-how can also help. For example, many languages use scripts (writing systems) that currently pose challenges for models originally designed for the Latin alphabet. Cyrillic characters, the Greek alphabet, and Arabic’s cursive script each have different properties. Off-the-shelf “tokenizers” often break these scripts in suboptimal ways. This can hurt a model’s ability to learn long-range context or accurate spelling in those languages. New advances in techniques that enable a model to handle any script uniformly can help. Better mechanisms to create synthetic data and to better process and curate that data can also help, especially when they manage privacy and sensitive data concerns effectively.

The MOIC and the AI for Good Lab will work to facilitate the development and sharing of knowledge, tools, and capabilities to address these issues and empower European developers. The AI for Good Lab will publish a blueprint to detail how to create high-quality language datasets and train local LLMs to get more power out of the data that exists. These two groups will also support relevant research, organize convenings, co-invest in data commons projects, and ensure that knowledge, tools, and capabilities are available where they’re needed most. These teams also will continue to support efforts such as those of the Barcelona Supercomputing Center, Basque Center for Language Technology, and the University of Santiago de Compostela to release AI models trained in Spanish, Catalan, Basque, and Galician on Azure AI Foundry. This initiative empowers developers to build AI systems that operate in Spain’s official languages, fostering innovation and inclusivity.

Finally, to advance responsible AI research and help close the language gap, Microsoft is launching two new academic collaborations in Europe at the University of Strasbourg and IE University School of Science & Technology in Spain. Microsoft’s AI for Good Lab and MOIC will partner with the University of Strasbourg to provide Azure grants to support joint AI research. At IE University School of Science & Technology, the Microsoft AI for Good Lab will provide Azure grants to support joint research targeting low resource languages, including support for related capstone projects to accelerate new solutions focused on language and AI.

New steps to help digitally safeguard Europe’s cultural legacy

Since 2019, Microsoft’s Culture AI initiative has focused on using artificial intelligence around the world to help preserve the languages, places, stories, and artifacts that define human history.  Powered by the AI for Good Lab and through partnerships with nonprofits, universities, governments, and cultural institutions, the initiative supports projects that digitize and protect cultural heritage—from endangered languages to iconic landmarks, including in France, Rome, and Greece. Whether it’s creating digital replicas of historic sites or making museum collections more accessible, the goal is to ensure that cultural identity and diversity are not only preserved but made more inclusive and discoverable in the digital age.

Today we are announcing our next project, building a digital replica in partnership with the French Ministry of Culture and the French firm Iconem. The project will create a digital twin of Notre Dame in Paris, an architectural and cultural landmark shaped over centuries. Construction of Notre Dame began in 1163 and continued for nearly 200 years, resulting in a 128-meter-long Gothic masterpiece with twin towers rising 69 meters above the Seine. After a devastating fire in 2019, Notre Dame re-opened to the public at the end of 2024. The project will use the technology and methods we developed with Iconem to create a digital twin of St. Peter’s Basilica last year, which was based on more than 400,000 photos and advanced AI algorithms, in partnership with the Vatican.

Just as last year’s project documented for the Vatican every detail of St. Peter’s, this new project will create a digital replica that will preserve permanently in digital form every detail of Notre Dame, ensuring that its structure, story, and symbolism are protected and accessible for generations to come. By combining advanced imaging with AI, we will create and donate to the French State a digital twin that can be used by preservationists and be displayed in the future Musée Notre Dame de Paris.

In addition to the project at Notre Dame, we are also announcing today a partnership with the Bibliothèque Nationale de France and in collaboration with Iconem to digitize nearly 1,500 cinematic model sets from shows at the Opera National de Paris between 1800 and 1914. The digitized model sets will be made available through interactive, educational experiences and exhibitions and as a dataset made available on the Bibliothèque Nationale de France’s Gallica platform for cultural AI and research projects.

Finally, we are embarking on new work with the Musée des Arts Décoratifs to make publicly accessible the detailed digital descriptions of approximately 1.5 million artifacts from the Middle Ages to the present day. This step will enable researchers in history, art history, and conservation to access this new information for study and use in their own AI-driven research.

Looking ahead: Taking a principled approach

We take these new steps today with humility and respect, recognizing that the preservation of Europe’s linguistic and cultural diversity is a task for Europeans to be led by Europeans. The European Union has already launched a multi-state effort to pool EU language data and digitize all types of cultural heritage. Our role is to contribute to and support these and similar efforts. None of what we are announcing today will create any proprietary data or technology for Microsoft itself.

Ultimately, the best way to empower more people across Europe to address these needs is to equip them with the AI skills that will enable them to be successful in these fields. As the European Commission recently concluded, a deficit of digital skills in the cultural sector is inhibiting efforts to digitalize cultural heritage works across Europe. To help bridge this skills gap, the MOIC and the AI for Good Lab will share what we know and learn about how to do this critical work.

Technology should reflect the richness of humanity—not strip it away. By taking intentional steps now, we can help ensure that AI doesn’t erase linguistic and cultural diversity but strengthens it.

This is one of the defining equity challenges of the AI era. And if we work together—with purpose and urgency—we can close the gap and build a digital future that honors every language, every culture, and every community across Europe.

[1] P. Rohera, C. Ginimav, G. Sawant, and R. Joshi, “Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages,” Apr. 28, 2025, arXiv: arXiv:2504.20022. doi: 10.48550/arXiv.2504.20022.

[2] K. Thellmann et al., “Towards Multilingual LLM Evaluation for European Languages,” Oct. 17, 2024, arXiv: arXiv:2410.08928. doi: 10.48550/arXiv.2410.08928.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

Austin food news: Mediterranean cuisine, steak, and whiskey

November 7, 2025

Advancing harmony between nature and culture in the Mediterranean

November 7, 2025

Pumpkin Smash – Illinois Times, the capital city’s weekly source of news, politics, arts, entertainment, culture

November 7, 2025

New course dives into Korean culture

November 7, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

U.S. flight cancellations begin after FAA shutdown order

November 7, 2025

Life after: Escaping the Taliban

November 7, 2025

Second New Glenn launch set for Nov. 9

November 7, 2025

CapeNews.netBHS Gets Ongoing $50,000 Grant For Career Pathways ProgramThanks to the $5.4 million Reimagining High School Initiative, authorized by Governor Maura T. Healey's administration, Bourne High School….7 hours ago

November 7, 2025
News
  • Breaking News (5,142)
  • Business (314)
  • Career (4,365)
  • Climate (215)
  • Culture (4,332)
  • Education (4,550)
  • Finance (205)
  • Health (863)
  • Lifestyle (4,218)
  • Science (4,237)
  • Sports (335)
  • Tech (175)
  • Uncategorized (1)

Subscribe to Updates

Get the latest news from onlyfacts24.

Follow Us
  • Facebook
  • Instagram
  • TikTok

Subscribe to Updates

Get the latest news from ONlyfacts24.

News
  • Breaking News (5,142)
  • Business (314)
  • Career (4,365)
  • Climate (215)
  • Culture (4,332)
  • Education (4,550)
  • Finance (205)
  • Health (863)
  • Lifestyle (4,218)
  • Science (4,237)
  • Sports (335)
  • Tech (175)
  • Uncategorized (1)
Facebook Instagram TikTok
  • About us
  • Contact us
  • Disclaimer
  • Privacy Policy
  • Terms and services
© 2025 Designed by onlyfacts24

Type above and press Enter to search. Press Esc to cancel.