13 December 2023

GPT-NL: Secure and ethical AI to strengthen Dutch society

The Netherlands is to develop its own open language model: GPT-NL. Non-profit parties TNO, NFI and SURF are jointly developing the model to take an important step towards transparent, fair and verifiable use of AI in line with Dutch and European values and guidelines and with respect for data ownership.

Recently, the consortium, facilitated by the Netherlands AI Coalition (NL AIC) and HSD, received funding of EUR 13.5 million from the Ministry of Economic Affairs and Climate Change/RVO to implement this project.

With the launch of ChatGPT in 2022, the power of AI and Large Language Models (LLMs) became clear to the general public for the first time. Many discovered the benefits of the technology, but several issues regarding companies like OpenAI and the technology behind their solutions call for care. For example, they are not transparent about the algorithms and datasets used, making it impossible to monitor them or hold them accountable for possible unethical or harmful results. It is also unclear what happens to the information we enter into the model and who has access to it, so we cannot assume that our privacy is respected.

Moreover, the quality of output depends not only on the quality of the datasets on which a model is trained, but also on the amount of data. This is a problem for languages like Dutch, which is spoken by about 22 million people worldwide. Most, if not all, LLMs are trained on datasets that contain very little Dutch data, which affects the quality of Dutch output. What the Netherlands does have is a strong research and knowledge base in AI on which to build, an excellent network structure with relevant public, private and academic partners and a solid digital infrastructure. In addition, there is a growing need for a strong Dutch-language LLM that complies with Dutch and European privacy and ethics regulations, is transparent about the algorithms and datasets used, and adheres to Dutch cultural norms. This led to the GPT-NL project.

Limitations of current language models

The Netherlands Forensic Institute, the initiator of the project, has a strong history of using LLMs. They use these models for various purposes, such as analysing large amounts of data for evidence of criminal activity. "Language models have been indispensable in investigation work for years," says Erwin van Eijk, head of the Digital and Biometric Traces Department at the NFI. "It is impossible for humans to analyse the huge amounts of data within the limited time frame our work requires. Moreover, AI is used to protect investigators from unnecessary exposure to traumatising content. But our language models have limitations because we do not have sufficient resources to develop more comprehensive technology, which is especially needed as messaging in criminal circuits becomes increasingly cryptic. However, we do have a solid base of available data, algorithms, expertise and experience that we can build on for the GPT-NL project." continues Erwin.

Connecting the AI ecosystem

The use of language models like ChatGPT is practically impossible for the NFI, as the results of the models are used in criminal cases and therefore need to be transparent in their operation and compliance with legal requirements. But concerns about existing LLMs apply to a much wider range of organisations and applications. Erwin therefore sees the potential for many organisations in the Netherlands, from the public, private and academic sectors, to benefit from an expanded Dutch language model.

"To access the resources needed for this project, we had to join forces with other organisations and define a common goal," he says. Security Delta (HSD), the Dutch security cluster, and the Dutch AI Coalition (NL AIC), saw the urgency and potential of a Dutch AI language model from the beginning. They have very proper connections and helped get the relevant organisations on board to make this project a reality," says Erwin.

Snellius: The Dutch National Supercomputer

LLMs require very high computing power and sophisticated hardware infrastructure. "As a security cluster, we knew the perfect partner to facilitate that infrastructure," says Joris den Bruinen, head of Security Delta (HSD) and the NL AIC's Security, Peace and Law working group. "In SURF, educational institutions and research institutes join forces to develop and procure digital services. It is a public organisation built around the need for shared access to digital infrastructure and research data. SURF has the Dutch National Supercomputer Snellius on one hand, and on the other the confidence needed to find a wide range of partners willing to share their datasets on the platform," Joris said.

How Dutch society will benefit

ChatGPT offers numerous potential benefits for Dutch society. "As Erwin mentioned, there are a large number of potential applications for GPT-NL. To be clear, the project does not involve developing models for specific applications; it focuses on building the structural foundation on which an infinite number of customised models can be built," says Saskia Lensink, NLP specialist at TNO. "Multiple government organisations can benefit from GPT-NL, if only to align their communication with the language used by their citizens," adds Joris den Bruinen. "The language model developed for the GPT-NL project will be exploited based on a licensing structure, with different rates for academic, non-commercial and commercial use," says Joris. "This allows companies, including start-ups, to develop commercial applications on top of that. This ensures sovereignty in Dutch products and services, resulting in economic added value," he continues.

Some examples can be found in healthcare, where such a model could support medical professionals by, for example, summarising transcripts of conversations with patients, which requires the data to be stored securely according to European privacy laws. In education, we see that current AI models offer an American context and American values in their solutions, something we may not want for our children. While the current models may suffice for now, when GPT-NL becomes available, it may offer a valuable alternative in this segment. "We cannot really predict this, but we have seen with ChatGPT the power of AI and how it can elicit a wide variety of commercial and public applications," Joris concludes.

More information?

The full article is published on the NL AIC website (in Dutch).

Interested in learning more about the GPT-NL project? Then visit the pages below:

GPT-NL strengthens Dutch autonomy, knowledge and technology in AI (tno.nl)
Q&A GPT-NL: Dutch own open AI language model | SURF.nl

Vergelijkbaar >

Similar news items

>View all news items >

September 9

Multilingual organizations risk inconsistent AI responses >

AI systems do not always give the same answers across languages. Research from CWI and partners shows that Dutch multinationals may unknowingly face risks, from HR to customer service and strategic decision-making.

Making immunotherapy more effective with AI >

Researchers at Sanquin have used an AI-based method to decode how immune cells regulate protein production. This breakthrough could strengthen immunotherapy and improve cancer treatments.

ERC Starting Grant for research on AI’s impact on labor markets and the welfare state >

Political scientist Juliana Chueri (Vrije Universiteit Amsterdam) has received an ERC Starting Grant for her research into the political consequences of AI for labor markets and the welfare state.