A proposed plain-text file that points AI crawlers at the pages on your site that matter most.
What llms.txt actually is
The idea, proposed by Jeremy Howard of Answer.AI in September 2024, is simple: drop a plain-text file at the root of your domain — yoursite.com/llms.txt — that hands an AI system a short, hand-picked map of the pages that matter most. It is written in Markdown, not XML, precisely because the intended reader is a language model. A conventional file opens with an H1 carrying the site or brand name, a one-line blockquote summary, and then H2 sections that list your priority links, each with a brief description after the URL. A companion file, llms-full.txt, goes further and inlines the full text of those pages into one document an agent can read in a single pass.
It helps to be precise about what it is not. Robots.txt controls access — it tells a crawler which paths it may or may not fetch, and says nothing about importance. An XML sitemap is an exhaustive, machine-readable inventory of every indexable URL, optimized for completeness rather than editorial judgment. llms.txt is the opposite of exhaustive on purpose: it is a curated shortlist, a way of saying "if you only read ten things here, read these, and here is how to describe us." That curation layer is the gap the proposal tries to fill.
| File | Primary job | Audience |
|---|---|---|
| robots.txt | Grant or deny crawl access by path | All crawlers |
| sitemap.xml | List every indexable URL | Search engine indexers |
| llms.txt | Highlight the most important pages and how to summarize them | LLMs and AI agents |
Why it matters for AI search — and the honest caveat
The appeal is intuitive in an era of AI Overviews, ChatGPT Search, Perplexity, and Gemini. If models are increasingly the thing standing between your content and a user, a file that spells out your best pages and your preferred framing sounds like an easy win for being cited correctly. A documentation-heavy SaaS site, for example, might list its API reference, pricing, and core guides in llms.txt so an assistant answering "how do I authenticate with X?" is pointed straight at the canonical page rather than a stale forum thread. That kind of curation is part of the broader discipline of generative engine optimization — shaping how AI systems retrieve and represent you.
Here is the nuance that gets lost in the hype: as of early 2026, no major AI provider has publicly confirmed that its retrieval systems actually consume the file. Google has been explicit that it will not support llms.txt, with Search Relations' Gary Illyes drawing a pointed comparison to the 1990s keywords meta tag — a self-declared signal that died because site owners could game it. Server-log studies through 2025 found AI crawlers rarely, if ever, requesting /llms.txt during normal visits, and one analysis of hundreds of thousands of domains put adoption around 10 percent. Anthropic and Perplexity have shown more interest, and developer-tool sites have embraced the convention, but support remains uneven and unverified.
So the common misconception is that publishing llms.txt is a ranking lever, like submitting a sitemap once moved the needle for traditional indexing. It is not — at least not yet. The realistic posture is that it costs little to maintain, may help agents that choose to read it, and signals nothing to the engines that currently dominate AI answers. The fundamentals still do the heavy lifting: clean, well-structured, genuinely useful pages are what get surfaced, which is the core argument in our overview of what generative engine optimization is and how it works. Treat llms.txt as a low-risk experiment in a fast-moving space, not a guaranteed path to visibility.