Skip to content
← All articles
Music News
12,320,916 YouTube tracks are already training an AI. Look yourself up.
Photo · Background: Tom Majric via Pixabay. Treatment: BCKSTG.

12,320,916 YouTube tracks are already training an AI. Look yourself up.

By BCKSTG EditorialLast reviewed:

12,320,916 YouTube tracks are public in LAION-DISCO-12M, the dataset feeding AI music models. Search yours. Read what Google argued in court.

For the last eighteen months, AI labs have been pulling music in from wherever it played. Some announced their datasets. Others let theirs surface online on their own. The rule is simple. If your music is published publicly, someone treats it as material to train a model on.

The biggest receipt sitting in plain sight is called LAION-DISCO-12M. The methodology started as a 2023 NeurIPS paper, DISCO-10M, from a team at ETH Zürich. The German nonprofit LAION ran the same recipe at larger scale and put the result on Hugging Face in November 2024. A list of 12,320,916 songs referenced by YouTube ID, with title, artist, view count, and duration for each. About 91 years of music. Roughly 750 MB of parquet files, licensed Apache 2.0.

DISCO-12M isn’t the only public music dataset in the wild. SLEEPING-DISCO 9M uses the same recipe at a smaller scale. The lookup above only checks DISCO-12M; if your name isn’t here, you could still be in one of the others.

The dataset doesn’t contain the audio. It contains the receipts that point to the audio on YouTube. Anyone with the IDs can pull the songs and train a model on them. The distinction matters for lawyers. For the artist whose songs are in the list, it matters less. The track is already flagged.

Search DISCO-12M for your name

Who takes the hit

Independent artists. Major labels have legal departments. You don’t.

DISCO-12M was pulled from YouTube, which is also the distribution platform, discovery engine, and de facto release valve for most independent music in the first place. There isn’t really an opt-out.

The Atlantic’s AI Watchdog beat has been tracking these releases as they land. Their search runs across more datasets than ours. If your name doesn’t surface in DISCO-12M here, check theirs too.

What Google just argued in court

The case has a backstory worth knowing. In November 2023, Google DeepMind launched Lyria and credited four researchers as core contributors. Within months all four left Google, founded Udio, and got sued by the RIAA in June 2024. Udio settled with Universal Music Group in October 2025, agreeing to rebuild on licensed music only. Google watched the people who built its music AI get sued for exactly that, then released Lyria 3 anyway on February 18, 2026, inside the Gemini app with its 750 million monthly users.

That timeline is what the lawsuit hangs on. Google’s own published research documents at least 44 million clips and 280,000 hours of music used to train its music models, with the actual dataset, per the complaint, substantially larger.

On Monday, June 8, 2026, Google filed a motion to dismiss the class-action lawsuit a group of independent artists brought against it in the US District Court for the Northern District of Illinois (Kogon v. Google LLC, No. 1:26-cv-02582, filed March 6, 2026 by Loevy & Loevy). The suit accuses Google of using their songs without permission to train Lyria 3, its music-generation model. Music Business Worldwide reported the filing on June 10, 2026, in a story by Mandy Dalugdug.

Google’s central argument fits in one line. When the plaintiffs uploaded their songs to YouTube, they granted Google a license that covers, among other things, AI training. If the court agrees, that same argument becomes the template for anyone training a model on music uploaded to a platform with broad terms of service. Not just for DISCO-12M. For everything that comes after.

The legal discussion can take months. The material part will not. The songs are already in the list.

What parts of the fine print on platforms like YouTube should independent artists be reading the closest, so they don’t accidentally hand over permissions that cover AI training, on a service that is borderline mandatory for a music career?

Open question. Sent to a handful of music attorneys. We’ll update this story when they reply.

Look yourself up

Open the DISCO-12M lookup, type the artist name, and you will see how many tracks appear in the dataset along with ten samples you can open on YouTube. The search runs against our mirror of the dataset on Hugging Face. We do not log queries. There is no signup.

Before the Lyria 3 case produces any legal answer, there are three concrete moves.

One. Document. Screenshot the lookup with the date. If a class action moves forward, that count is useful evidence.

Two. Ask the host to pull you. The Hugging Face dataset page carries a community report button and a contact email. Removal is handled there, not at LAION or Google.

Three. Watch the case. Whatever the court decides about the YouTube license will set the template for the next dataset, not this one.

FAQ

What is LAION-DISCO-12M?

LAION-DISCO-12M is a public dataset published on Hugging Face in November 2024 by the German nonprofit LAION. It lists 12,320,916 songs by YouTube ID with title, artist, view count, and duration. The dataset doesn’t contain the audio. It contains the references that point to where the audio sits on YouTube, which is enough for anyone to pull the songs and train a model on them. Released under Apache 2.0.

Was my music used to train AI?

Open the DISCO-12M lookup, type your artist name, and you will see how many tracks appear in the dataset along with samples. Presence in DISCO-12M doesn’t prove a specific AI company trained on your songs, but the dataset has been downloaded thousands of times and is openly used in the AI research community. The track is flagged.

How do I get my music removed from DISCO-12M?

Removal is handled at the dataset host (Hugging Face), not at LAION or Google. The Hugging Face dataset page has a community report button and a contact email. Document your case first by screenshotting the lookup with the date, in case a class action proceeds.

What is Google’s motion to dismiss the Kogon case actually arguing?

Google argues that when the plaintiffs uploaded their songs to YouTube, they granted Google a license that covers, among other things, AI training. If the court agrees, the same argument applies to any company training a model on music uploaded to a platform with broad terms of service. Not just DISCO-12M. The Northern District of Illinois case is Kogon v. Google LLC, No. 1:26-cv-02582, filed March 6, 2026.

What does this mean for independent artists who depend on YouTube?

YouTube is the distribution platform, discovery engine, and de facto release valve for most independent music. The platform’s terms of service are written by Google. If the court accepts Google’s reading of those terms, opting out of AI training and staying on YouTube as a career platform start to look mutually exclusive. The choice isn’t really a choice when leaving means losing the discovery surface that fed the career in the first place. That is the actual stakes question the artists in Kogon are testing.

Open the DISCO-12M lookup

← Read moreBCKSTG Playback

Got a story for Playback? Send the angle our way.

Pitch a Story