Entry #15 · June 29, 2026

Who actually reads your llms.txt? We logged 461,328 requests to find out

The AI-native files at the center of today’s GEO advice draw under 1% of crawler traffic. The bots that answer users touch them least of all. The main audience turns out to be scrapers and tooling.

What we measured

Open any current guide to GEO / AEO and you will find the same checklist item: publish an llms.txt, add structured JSON feeds, hand the machines a clean, machine-readable map of your site. The premise is intuitive. Models are machines, machine-readable files are easy for machines to parse, so machines should prefer them.

We had a rare chance to test that premise against real traffic, and it does not hold.

Across a set of production AI-native sites we operate and instrument, we measured bot access for a 90-day window: 461,328 requests across 31,935 log files. We decoded them user-agent by user-agent and sorted them into more than twenty classes: the AI crawlers (training, indexing, and live citation fetchers from OpenAI, Anthropic, Perplexity, Meta, and others), the traditional search engines, generic scrapers, and human or tooling traffic.

These sites are AI-native in that they serve exactly what the industry best practices recommend: an llms.txt, an llms-full.txt, and JSON feeds for search, pages, topics, and Q&A. We wanted to measure what is useful and what is not.

To be clear, plenty of literature exists that calls the llms.txt files into question. Plenty recommends them. In fact, the whole GEO/AEO space is fluid: everyone wants to do it, but no one knows how to do it for sure. So we measured on our own domains.

Finding one: for everyone, it is a rounding error

Of all 461,328 requests, just under 1% touched the AI-native files at all: llms.txt drew 0.30% and the JSON feeds 0.67%, for 0.97% combined. The other ninety-nine percent went to ordinary HTML pages and sitemaps.

That pattern holds across every crawler. Here is the share of each one’s own traffic that went to llms.txt or the JSON feeds, highest first:

Crawler Category AI-native reqs Total reqs Share
OAI-SearchBotAI indexing753,0922.43%
GPTBotAI training1407,1311.96%
browser / toolinghuman2,604215,0791.21%
CCBotAI training242,3211.03%
Baiduspidersearch engine567,2230.78%
Googlebotsearch engine264,1400.63%
YandexBotsearch engine81,5000.53%
PerplexityBotAI indexing61,1490.52%
Applebotsearch engine316,6780.46%
generic scraperscraper38088,9460.43%
MetaAI training5313,3660.40%
Bingbotsearch engine5715,0240.38%
AmazonbotAI training7222,7740.32%
DuckDuckBotsearch engine72,2060.32%
curlscraper4020,9780.19%
pythonscraper94,8460.19%
ChatGPT-UserAI citation2115,0920.14%
http-libscraper1412,1710.12%
ByteSpiderAI training65,9300.10%
ClaudeBotAI training211,0700.02%

The single most engaged crawler anywhere is OpenAI’s SearchBot, and it spends 2.4% of its requests on the structured files. OpenAI’s GPTBot is next at 2.0%. After that it falls off a cliff. Every other bot, every search engine, every scraper sits under 1.2%, and most are under half a percent. There is no crawler in the data for which the AI-native layer is anything but a trace.

So the foundational premise of these files, that bots will favor structured feeds over HTML, is not borne out by a single crawler we logged. They all run on HTML.

Finding two: the index crawlers actually use is the sitemap

Here is the part that reframes the exercise. The case for llms.txt is that machines need a clean, machine-readable index of a site instead of parsing cluttered HTML page by page. That argument is half right. Crawlers do lean on a machine-readable index. It just is not llms.txt. It is the sitemap, a standard that has existed since 2005.

Break all 461,328 requests down by what was fetched:

Path Requests Share
HTML (pages, Q&A, homepage)254,75255.2%
sitemap.xml94,33820.4%
robots.txt11,0822.4%
JSON feeds (/api/*.json)3,0940.67%
llms.txt1,4030.30%
other (redirects, assets, scanner noise)96,65921.0%

HTML, sitemap, and robots together are 78% of everything bots fetch. The two AI-native inventions, llms.txt and the JSON feeds, are 0.97% combined. The sitemap alone outdraws llms.txt by roughly sixty-seven to one.

So llms.txt fails because it is a second index for a job the sitemap already does.

Finding three: the bots that answer users ignore it hardest

If the structured files were going to matter anywhere, you would expect it to be at the moment of citation, when a model fetches a page to answer a live user. We see the opposite. The closer a bot sits to producing a user-facing answer, the less it touches the AI-native layer.

ChatGPT-User, OpenAI’s live answer-time fetcher, made 15,092 requests in the window. Twenty-one of them were for the AI-native files. That is 0.14%. Anthropic’s Claude-User: one fetch. Anthropic’s ClaudeBot, with more than 11,000 requests, fetched llms.txt zero times and hit the JSON feeds twice, a rate of 0.02%.

These are the bots whose output users actually read. They consume your HTML and effectively skip the layer built specifically for them. And they skip the sitemap too: ChatGPT-User’s sitemap rate is 0.0%. The live fetcher does not browse an index at all, machine-readable or otherwise. It goes straight to the single page the model wants. Indexes are for the crawlers that build a map in advance, not for the agent answering in the moment, which means even the sitemap is an indexing lever, not a citation one.

What this does and does not show

The scope matters, because this is easy to over-read.

What the data shows is narrow and solid: the machine-format add-on files, llms.txt and JSON feeds, earn negligible engagement from every category of crawler and near-zero from the ones that generate answers. If your plan is to bolt these files onto a site and expect the citation pipeline to consume them, the pipeline is not consuming them.

What the data does not show, and what we are not claiming here, is anything about HTML quality, content structure, or whether a dedicated AI site beats an ordinary one. Every site serves HTML; the fact that bots read HTML is not an argument for any particular kind of site. Those are separate questions for a separate piece.

One fair caveat for skeptics. We are measuring fetch volume, not influence. In principle a model could read llms.txt once and let it shape a whole answer, so low volume is not the same as zero value. But the volume for the answer-producing bots is so low, 0.14% and below, that even a generous influence-per-fetch assumption leaves the layer touching a vanishing slice of what those bots do. We cannot see inside the models. We can only see what they request, and what they request is HTML.

Caching is not hiding the number, either. These are CDN edge logs, which record cache hits as well as origin fetches, so a low llms.txt count is a real low count and not traffic quietly absorbed before it reaches the log. One scoping note in the other direction: these are requests to the AI-native sites we serve. If a site also published llms.txt from its own origin, fetches there would not appear here. This is a large, representative sample of how bots treat the files, not a census of the entire web.

A note on our own position: we build and serve these files on the sites we instrument, which is exactly why we can measure them at this resolution. This finding partly critiques our own product surface, and we would rather publish the measurement than the marketing. Whatever the value of an AI-native site turns out to be, it is not the llms.txt.

The takeaway

The AI-native add-on layer does not pay off. Across 461,328 requests and more than twenty crawler types, the files at the center of the standard GEO checklist drew under 1% of traffic, were ignored hardest by the bots that answer users, and found their main audience among scrapers and tooling. The machines you built them for are reading your pages instead.

There is a constructive reading underneath the negative one. Crawlers are not allergic to structure; they consume a machine-readable index of your site all day. It is the sitemap, and it gets sixty-seven times the traffic of llms.txt. So the effort that goes into llms.txt and JSON feeds is better spent on the two things every crawler actually ingests: the HTML pages themselves, and a clean, complete sitemap.

An AI site does not just produce AI-discovery files. That is a small part of it. It produces optimally formatted, rewritten HTML pages organized in a new sitemap with semantic groupings, alongside lots of actual user-generated Q&A. That, in our logs, is what is actually fetched.

Get this for your company

Rozz gives you visibility into the AI conversations happening about your product, and the tools to influence what AI recommends.

$997/month | AI site + chatbot + analytics

Book a call  |  See how it works  |  rozz@rozz.site

Latest Entry

Data source: CloudFront edge access logs across the production AI-native sites ROZZ operates, a 90-day window. 461,328 requests over 31,935 log files, classified by decoded user-agent into 20+ crawler classes and by fetched path. Fetch volume only (cache hits and origin fetches); no claim about per-fetch influence.