2 January 2024

Famous AI image bank offline after discovery of child abuse images

A large and widely used AI database has been temporarily taken offline after researchers discovered that it contains more than a thousand child abuse images. Painful because AI companies use the image database to train their models.

There are at least a thousand cases of child abuse in the more than five billion images in the AI image database Laion-5B. This is according to new research by the Stanford Internet Observatory. The images came to light by comparing Laion with existing databases of known child abuse imagery.

AI companies can use Laion for free to train their models. The best-known example is Stable Diffusion, which is able to create photorealistic images based on text descriptions. For example, "French cat doing a dance on the moon.

The German non-profit organisation behind Laion collected billions of images on the internet, including the corresponding descriptive texts. The problem is that this automated scouring of the web also brings in problematic material, on the basis of which AI can then generate new illegal images.

The German organisation has temporarily taken its database offline. AI professor Marcel Worring (UvA) praises this decision, but also thinks the organisation was at fault: 'At least part of Stanford's research they could have done themselves. Namely: comparing Laion with the abuse images already known.'

Other companies do not share database

Stability, the UK company behind the widely used Stable Diffusion, stressed to Bloomberg that it had already taken action to curtail illegal activity.

The research focuses on Stable Diffusion because Stability is one of the few AI companies to be transparent about its training data. This is not the case with other major image generators. Midjourney also presumably uses Laion-5B, but would not comment to Forbes.

Google's Imagen was also partly created using Laion, but during an internal research, developers found "a wide range of inappropriate content", including porn and racist comments. Google subsequently deemed Laion unsuitable for public use, the researchers write.

OpenAI did not train Dall-E with Laion. What it did with is shrouded in mystery, Worring says. 'Perhaps their database also contains problematic material, but researchers cannot access it. Laion is now rightly under the magnifying glass, but that is also due to the fact that they are transparent.'

Maas in EU law

Natali Helberger, professor of law and digital technology at the UvA, calls the research "worrying", also in view of the so-called EU AI Act, European legislation on artificial intelligence. After intensive lobbying by Germany and France, the European Parliament and member states reached a compromise last week: only the largest models, such as those of OpenAI and Google, have to meet the strictest requirements.

'Especially for the smaller models, the free Laion is a realistic choice, but they do not have to prove whether their data has been checked for the presence of bias, hate or illegal material,' Helberger says. And so the 'strange situation' arises that US big tech does have to comply with European values, the professor said. This makes the Laion incident, she says, 'a pointed example of the gap in European legislation'.

This article was published on the Volkskrant website.

© SOPA Images/LightRocket via Gett

Vergelijkbaar >

Similar news items

>View all news items >

14 November 2024

Famous AI image bank offline after discovery of child abuse images

Vergelijkbaar >

Similar news items

The Amsterdam Vision on AI: A Realistic View on Artificial Intelligence >

Interview: KPN Responsible AI Lab with Gianluigi Bardelloni and Eric Postma >

AI pilots TLC Science: generative AI in academic education >