Home
News
News
House of Lords Says AI Is Strip-Mining UK Websites. Here's What to Do About It

News 8 March 2026 7 min read

House of Lords Says AI Is Strip-Mining UK Websites. Here's What to Do About It

Q: What's the difference between robots.txt and AI discovery files?

robots.txt is a blocking mechanism that tells crawlers "don't access these pages." AI discovery files (llms.txt, ai.json, identity.json) are an identity layer that tells AI systems "this is who we are, represent us accurately." You need both: robots.txt to control access, and discovery files to control how AI describes your business.

A 180-page Lords report warns that AI companies are scraping UK website content without permission, payment, or disclosure. The committee wants licensing, not opt-out. Here's what it means for the 5.5 million UK businesses publishing content online, and five steps you can take before the law catches up.

Mark McNeece Founder & Managing Director, 365i

Houses of Parliament at dusk with bold text overlay reading AI Copyright: Lords Demand Licensing and What UK Business Owners Must Do Now

At a Glance

House of Lords published 180-page "AI and Copyright" report on 6 March 2026
Committee recommends licensing-first approach, rejecting opt-out copyright exceptions
UK creative industries worth £124B and 2.4M jobs vs AI sector at £12B and 86K jobs
95% of industry respondents rejected the government's original opt-out proposals
Government economic assessment due 18 March; legislation not expected before 2027
Five practical steps UK website owners can take now to protect their content

The House of Lords published a 180-page report on 6 March 2026 warning that AI companies are strip-mining copyrighted content from UK websites without permission, payment, or even disclosure. The Communications and Digital Committee's "AI and Copyright" report calls on the government to reject opt-out copyright exceptions and build a licensing-first framework backed by statutory transparency rules.

Most coverage focuses on the music, film, and publishing industries. That makes sense: they're the loudest voices in the room. But the report's implications run far deeper than Spotify royalties and newspaper paywalls. If you run a UK business website with blog posts, service pages, case studies, or product guides, your content is almost certainly being scraped to train AI models too. And right now, you have no legal mechanism to stop it, no way to know it's happening, and no compensation when it does.

What the Report Actually Says

Key recommendations from the House of Lords AI copyright report displayed as cards against a dark parliamentary backdrop — The committee's core recommendations would reshape how AI companies interact with UK website content.

The committee spent months hearing evidence from AI companies, publishers, musicians, artists, and legal experts. Their conclusions are blunt.

UK creative industries contribute £124 billion to the economy and employ 2.4 million people. The AI sector, by comparison, generates £12 billion and employs 86,000. The committee's position: don't sacrifice a proven economic powerhouse for speculative AI gains.

Three specific recommendations stand out for website owners:

No opt-out copyright exceptions. The government had floated a text and data mining (TDM) exception that would let AI companies scrape freely unless creators explicitly opted out. The committee says this puts the burden on the wrong side. Instead, AI developers should need a licence before using copyrighted material for commercial training.

Statutory transparency. AI companies would be legally required to disclose what training data they've used. Right now, nobody outside these companies knows which websites are in the training set. This would change that.

Protection against digital replicas. The report calls for stronger rules against AI-generated content that imitates specific creators' styles or identities. If an AI system can write "in the style of" your brand voice after training on your blog, that's a problem the report wants addressed.

Why This Isn't Just About Musicians and Authors

The media framing of "creative industries vs Big Tech" misses a massive audience: the millions of UK small businesses that publish content online every day.

Think about what's on your website. Service descriptions explaining what you do. Blog posts answering customer questions. Case studies showing your results. Product pages with detailed specifications. FAQ sections, guides, how-to content. All of it written by you or your team, often at considerable cost.

AI crawlers from OpenAI, Google, Anthropic, Meta, and dozens of smaller companies are visiting these pages and ingesting the text. That text then trains models that can answer questions about your industry, sometimes replacing the need for a customer to visit your site at all. Your content helped build a tool that competes with you for attention. The flip side is worth holding onto, though: those same crawlers decide whether AI tools ever mention you, so blanket-blocking them carries its own cost.

Baroness Keeley, the committee chair, put it directly in her statement accompanying the report:

"Watering down the protections in our existing copyright regime to lure the biggest US tech companies is a race to the bottom that does not serve UK interests. We should not sacrifice our creative industries for AI jam tomorrow."

Baroness Barbara Keeley, Chair, House of Lords Communications and Digital Committee, The Register

Owen Meredith, chief executive of the News Media Association, was equally direct:

"There is one swift, simple step the Government can take to unlock growth in UK intellectual property: publicly rule out changes to copyright law and allow the market to scale at pace."

Owen Meredith, CEO, News Media Association, NMA statement

The committee found that 95% of industry respondents rejected the government's original opt-out proposals. The government has since abandoned those proposals, but hasn't confirmed what replaces them. An economic impact assessment is due by 18 March 2026.

What UK Website Owners Can Do Right Now

Illustration showing practical content protection steps for website owners including robots.txt configuration and AI discovery files — You don't have to wait for legislation. Several technical steps can signal your intent to AI systems right now.

Legislation is coming, but it won't arrive until 2027 at the earliest. In the meantime, you're not powerless. Here's what you can do today.

1. Update your robots.txt to block AI crawlers. Most UK websites still allow every bot through the front door. Our analysis with Cloudflare data found that 72% of UK sites have no AI crawler restrictions in their robots.txt files. Add specific rules for GPTBot, ChatGPT-User, ClaudeBot, Google-Extended, and other AI crawlers. Will they all obey? No. But it establishes your intent, and the Lords' report recommends making that intent legally enforceable.

2. Set up AI discovery files. This is different from blocking. AI discovery files tell AI systems who you are, what you do, and how you want to be represented. They don't prevent scraping on their own, but they create a documented record of your terms. If licensing frameworks arrive, you'll want that record in place. The AI Discovery Files Service can set these up for you, or you can create an llms.txt file yourself.

3. Check what AI already knows about your business. Use the AI Visibility Checker to see what AI systems currently see when they visit your site. You might be surprised. Some businesses find AI models are confidently describing their services using outdated or inaccurate information pulled from old web pages.

4. Document your original content. If licensing becomes law, you'll need to prove what's yours. Keep records of publication dates (your sitemap helps), authorship, and original research. Use our Post Sitemap to CSV tool to export a complete inventory of every page on your site, with titles and URLs, as a baseline record.

5. Host with a UK provider. The committee raised concerns about UK content being processed by AI systems in jurisdictions with weaker copyright protections. Hosting your site on UK-based infrastructure keeps your data governed by UK law. That matters more than it used to.

The Gap Between Now and 2027

Timeline showing the gap between the Lords report in March 2026 and expected legislation in 2027 with key dates highlighted — The government's economic assessment is due 18 March. Legislation won't follow until 2027 at the earliest.

Here's the uncomfortable reality. The committee's recommendations aren't law yet. They're pressure on the government, and significant pressure at that, but the Financial Times reports that ministers have delayed reforms until 2027. In the meantime, AI companies continue scraping.

Tom Kiehl, CEO of UK Music, summed up the stakes at the report's launch:

"The UK is at a crossroads and the Government has a choice to make: either become a global leader in ethical and transparent AI innovation, or sell our incredible cultural and creative sectors down the river to unscrupulous big tech firms."

Tom Kiehl, CEO, UK Music, UK Music statement

The UK CMA has already forced Google to let websites opt out of AI Overviews. The Online Safety Act now covers AI chatbots. This Lords report adds a third front to the UK's emerging AI regulatory framework. The direction of travel is clear, even if the timetable isn't.

For UK website owners, the practical question isn't whether to act, but how quickly. Every month you wait is another month of your content being ingested without your knowledge, consent, or compensation. The technical tools to signal your intent exist today. The legal framework to enforce it is on its way.

What to Watch Next

18 March 2026: The government's economic impact assessment on AI and copyright is due. This will indicate whether ministers lean toward the Lords' licensing-first approach or try to find a middle ground that AI companies can live with.

Late 2026: The EU's AI Act enforcement deadline, which will create pressure for UK alignment (or deliberate divergence) on AI transparency and data provenance requirements.

2027: Earliest date for UK legislative action on AI copyright, based on the government's stated timeline.

In the meantime, the practical steps above cost nothing and take less than an hour. Block AI crawlers you don't want. Set up discovery files for the ones you do. Document what's yours. When the law catches up, you'll be ready.

Frequently Asked Questions

Does the Lords report affect my business website, or just creative industries?

It affects any UK website that publishes original content. Blog posts, service pages, product descriptions, case studies, FAQ sections, and guides are all copyrighted works under UK law. AI companies scrape business websites alongside news sites and creative platforms. The report's recommendations would apply to all copyrighted material, not just music, film, or publishing.

How do I know if AI companies are scraping my website?

Check your server logs for user agents like GPTBot, ClaudeBot, ChatGPT-User, Google-Extended, and Bytespider. If you're on managed hosting, your provider may filter these for you. You can also use our free Robots.txt Checker to see which crawlers your current rules allow or block, and the AI Visibility Checker to see what AI systems have already ingested.

Can I block AI crawlers from my website right now?

Yes, using robots.txt directives. Add rules for GPTBot, ChatGPT-User, ClaudeBot, Google-Extended, and other AI-specific user agents. Not all crawlers respect robots.txt, but Cloudflare data shows that sites with explicit AI crawler rules see 43% fewer violations than those without.

What does "licensing-first" mean for website owners?

Under a licensing-first model, AI companies would need permission before using your content for commercial training. The default shifts from "scrape unless told not to" to "ask before you take." This is how stock photography, music licensing, and academic publishing already work. The committee wants the same principle applied to AI training data.

When will new AI copyright laws actually take effect in the UK?

Not before 2027. The government has an economic impact assessment due 18 March 2026, but ministers have delayed legislative action. In the meantime, robots.txt rules and AI discovery files are your best tools for signalling your intent.

What's the difference between robots.txt and AI discovery files?

robots.txt is a blocking mechanism. It tells crawlers "don't access these pages." AI discovery files (llms.txt, ai.json, identity.json) are an identity layer. They tell AI systems "this is who we are, this is what we do, represent us accurately." You need both: robots.txt to control access, and discovery files to control how AI describes your business when it does engage with your content.

Does hosting in the UK help protect my content from AI scraping?

UK hosting means your data is governed by UK copyright law and any future AI licensing requirements. The committee raised concerns about content being processed in jurisdictions with weaker IP protections. Hosting with a UK-based provider keeps your content under the legal framework most likely to enforce AI licensing rules.

Check What AI Crawlers See on Your Website

Your robots.txt rules might be letting every AI crawler through the door. Test them for free and find out which bots you're blocking and which ones you're not.

Check Your Robots.txt

Sources

Published: 8 March 2026 · Last reviewed: 22 April 2026 · Written by: Mark McNeece, Founder & Managing Director, 365i

Editorially reviewed by: Mark McNeece on 22 April 2026 · Our editorial standards

About the Author

Mark McNeece is an industry leader in AI Visibility. He developed and published the AI Discovery File Specifications, the emerging open standard for making websites discoverable by large language models such as ChatGPT, Claude, and Gemini. Mark founded 365i in 2002, runs 365i Web Design for sites and AI visibility, and founded Press Forge for specialist WordPress services.

Every article on this site is drawn from real client work across more than 20 years of UK hosting and WordPress experience, not from release-note reruns. His WordPress plugins are published on wordpress.org. Get in touch if you'd like Mark to look at your site.

Mark also reviews every post on this site against our editorial standards before it publishes and again whenever it is substantively updated.

Tags:

AI Copyright AI Visibility AI Governance AI Discovery Files UK Business Content Scraping SEO Security

WordPress 17 Jul 2026

WordPress 7.1 Could Reduce the Need for Page Builders. Here's What's Changing

WordPress 7.1 lands on 19 August with responsive styling and hover states built into the editor, two jobs that have pushed people towards Elementor for years. We check what actually shipped in Beta 1, what quietly didn't, and run the UK maths on whether your page builder site should care.

13 min read Read

WordPress 16 Jul 2026

WordPress AI Agents Are Coming. Is Your WordPress Hosting Ready?

WordPress is turning from software humans drive into software AI agents can inspect and operate. Giving an agent access without backups, permissions, logs and easy rollback is like giving a very enthusiastic apprentice the server password and going to lunch. Here is what changed in WordPress 7.0, what goes wrong, and what AI-ready WordPress hosting actually needs to include.

12 min read Read

WordPress 12 Jul 2026

Your WordPress Backup Isn't a Backup Until You've Restored It

Nearly every host promises "daily backups included", but that phrase tells you almost nothing about whether you could actually recover your site on a bad day. Here is what really gets backed up, why retention matters more than frequency, whether the backup can be restored, and how 365i's Timeline Backups handle files, databases and mailboxes without a plugin.

11 min read Read

SEO 16 Jun 2026

We Deleted Every Blog Post on This Removals Site. Then ChatGPT Scored It 99/100

The old Brewood Removals site was buried under AI-slop blog posts and around 160 doorway landing pages, and ranked for almost nothing. We deleted the lot, rebuilt it as an advice hub written from real jobs, and commissioned an independent ChatGPT assessment that scored it 99/100, ahead of Pickfords and every national chain. An early case study, with the search data still to come.

13 min read Read