Standards

llms.txt — the new robots.txt for AI engines

llms.txt is an emerging standard for declaring AI-friendly content sources at your domain root. What it does, who reads it, and how to ship one this week.

by Robert Langner·Published: 2026-02-20·6 min read

llms.txtTechnicalStandards

robots.txt told search engines what they could crawl. llms.txt tells AI engines what they should *understand*. The standard is young but adoption is accelerating, and shipping one is a 30-minute job that future-proofs your site.

What llms.txt is

llms.txt is a markdown-formatted file at the root of your domain (/llms.txt) that declares the canonical, machine-readable sources of information for AI consumption. Think of it as a curated table of contents for LLMs: instead of crawling 500 pages and guessing what's authoritative, an AI engine can read your llms.txt and know which 20 pages contain your canonical product information, pricing, FAQ, etc.

What it looks like

Minimal valid llms.txt:

# YourBrand

> One-line description of what your company / product does.

## Documentation
- [Product overview](https://yoursite.com/product): what we do, key features
- [Pricing](https://yoursite.com/pricing): plans and limits
- [API reference](https://yoursite.com/api): full endpoint list

## Comparison
- [vs. Competitor A](https://yoursite.com/vs/comp-a)
- [vs. Competitor B](https://yoursite.com/vs/comp-b)

## Support
- [FAQ](https://yoursite.com/faq)
- [Contact](https://yoursite.com/contact)

Who reads llms.txt today

Adoption is partial but growing. Anthropic's Claude has explicit support, Perplexity has signalled they parse it, and several smaller research-grade engines (and increasingly developer tools — IDE assistants, doc generators) read it routinely. Google and OpenAI haven't formally committed, but neither has signalled they will ignore it. Cost-benefit is overwhelmingly in favour of shipping one.

When to ship llms.txt vs. when to skip it

Ship one if:

You have more than 50 indexable pages (signal-to-noise ratio benefits from curation).
You sell something complex (API, SaaS, professional services) where the wrong page being cited is worse than no citation.
You have multiple language variants (DE/EN/FR) and want AI engines to know which is canonical for your primary market.

Skip if:

You have fewer than 20 pages and your homepage is already a clear summary.
You actively don't want AI engines to consume your content (block via robots.txt instead).

Common mistakes

Listing too many pages. llms.txt is a curation file, not a sitemap. Aim for 10–30 entries grouped by section.
Forgetting the H1. Without # YourBrand at the top, AI engines can't establish entity identity for the file.
Linking to deep blog posts. Link to evergreen pillar content. Time-sensitive blog posts go in your RSS feed or sitemap, not your llms.txt.
Mixing languages without declaration. If you have a multilingual site, ship /llms.txt for the primary language and /en/llms.txt, /de/llms.txt per locale.

How to verify it's working

Three checks:

`curl https://yourdomain.com/llms.txt` returns 200 + plain text (not 404, not 301-to-anywhere).
Ask Claude or Perplexity about your brand and look at the citation pattern. Pages from your llms.txt curated list should appear disproportionately.
Run a longitudinal AI-search visibility track to see if your citation count rose after deploying llms.txt. Two weeks is usually enough to see signal.

Frequently asked questions

Is llms.txt a W3C / IETF standard?

Not yet — it's a community proposal that gained traction in 2024–2025. Standards bodies haven't ratified it. The pragmatic answer: ship it anyway, the cost is trivial and the engines that adopt it will use yours.

Does llms.txt replace robots.txt?

No. They're complementary. robots.txt declares what AI bots may *crawl*. llms.txt declares what AI bots should *prefer to read* among the crawlable content.

Should I include sensitive content paths in llms.txt?

No. Only public, canonical, evergreen pages. Login-protected, internal-only or experimental content stays out.