Entry #14 · May 19, 2026

What AI bots read, what they ignore, and what an AI site is actually for. Part 1: Citation bots

Name: Citation-Bot Fetch Distribution on rozz.genymotion.com (Apr 29 – May 19, 2026)
Creator: ROZZ

We built rozz.genymotion.com to market to AI agents, like genymotion.com markets to humans. Let’s take a look at what the bots actually fetch and what they don’t. Then we’ll pull some learnings out of all this.

Teaser: Q&As are pretty popular (61% corpus coverage in 21 days). llms.txt is not (0 ClaudeBot fetches; sitemap.xml got 209).

What’s an AI site?

rozz.genymotion.com exists alongside genymotion.com. The marketing website serves humans: it’s organized in a standard way: Products, Resources, Pricing etc. The AI site serves AI agents (ChatGPT, Claude, Perplexity, and the crawlers that feed them) by presenting content in a different way, a way that we’re researching live in this Insights series.

We’ve been writing about weekly findings after analyzing the AI-site logs in this blog series. This article zooms out. After four months of building and iterating, we’ll use the past three weeks of clean bot-log data (Apr 29 – May 19, 2026, ChatGPT-User and Claude-User, 200-OK only) to try to make sense of what an AI site actually is.

Quick scope note: we have bot logs for rozz.genymotion.com. Everything below is about what bots do on the AI site. We’re not making claims about the human site.

What we built

The rozz.genymotion.com AI site structure:

545 content pages (/pages/) imported and rewritten from the existing Genymotion documentation, support articles, and blog tutorials
262 Q&A pages (/qna/) generated from real chatbot conversations — each one corresponds to a question a user actually asked
16 topic listings (/topics/) grouping content by canonical topic
2 runbooks (/runbooks/) written for AI agents with terminal access (gmsaas for cloud, gmtool for desktop)
1 homepage with featured Q&As embedded in FAQPage JSON-LD
AI-native discovery files: llms.txt, llms-full.txt, per-topic sitemaps
JSON APIs: /api/qna.json, /api/pages.json, /api/topics.json, /api/search.json
Markdown alternatives: /index.md for the homepage
Browse-all indexes: /pages/index.html, /qna/index.html, etc.
Traditional SEO infrastructure: robots.txt and sitemap.xml

Some of this content is for the AI agent making a query on behalf of a human user. Some is for a coding agent itself (the runbooks), since Genymotion is a tool for developers. Some is for the crawler bots that build the search indexes those AI agents draw from. The point of an AI site is that all these consumers exist and we try to cater to each one, and see what sticks.

In this article, we’ll focus on the citation bots, which provide the most direct business value as they occur during actual conversations with users.

What citation bots fetched

In the 21-day window, ChatGPT-User and Claude-User did 1,517 content fetches between them. The shape:

Slice	Share of fetches	Share of corpus
Top 1 URI	~8%	0.1%
Top 10	~37%	1.2%
Top 50	~78%	6.2%
Top 100	~93%	12.4%

A small number of URIs do most of the work.

What sits in the head? The most-fetched single content URI is the pricing Q&A (123 fetches). After it: the rooted-device Q&A, the system-requirements Q&As, the SaaS-vs-Desktop comparison, the macOS compatibility Q&A. On the pages side, the Burp Suite security-testing tutorial (130 fetches). Together with /topics/android-version-selection (251 fetches as a topic listing), this is what the head of the AI site does: pricing, requirements, compatibility, and a security-testing tutorial. It seems like the pro-enterprise skew we introduced in article 13 is paying off.

Two surfaces, two patterns

We currently have 2 main types of content pages: Q&A (/qna) and cleaned website pages (/pages). If you separate Q&As from pages, two different shapes appear.

	/qna/	/pages/
Corpus	262	545
Fetched URIs	160	132
Coverage	61%	24%
Total fetches	930	587
Top 1 share	13%	22%
Top 10 share	41%	53%
1-fetch share (of fetched)	35%	51%
Dark inventory (never fetched)	39%	76%

Q&As get fetched 2.5× more proportionally than pages. The Q&A distribution shows more items being fetched. The pages Pareto is much steeper, with one tutorial (Burp Suite) accounting for 22% of all pages fetches, and three-quarters of the page corpus unread in three weeks.

Let’s try to explain the difference.

Q&As are the answers

Each Q&A in the corpus was generated from a real chatbot conversation. Every Q&A title corresponds to a question someone actually asked, in their actual words. The idea is to conform the most to the actual user queries in the AI engines.

Top 10 Q&As, last 21 days:

Fetches	Q&A
123	`what-pricing-plans-are-available-for-genymotion`
66	`does-genymotion-provide-rooted-device...`
50	`how-much-memory-do-i-need-to-have-20-virtual-devices`
26	`what-are-genymotion-desktop-s-system-requirements`
24	`what-are-the-system-requirements`
23	`what-are-genymotion-s-pricing-options-for-saas-and-desktop`
19	`what-are-the-costs-for-using-genymotion-saas`
19	`does-the-emulator-work-with-the-latest-mac-os`
18	`how-do-i-install-genymotion-desktop-on-windows-macos-or-linux`
17	`how-to-run-the-emulator-in-the-cloud`

This is what people ask AI about Genymotion when they’re considering whether to use it.

And there’s a real tail underneath that head. 100 Q&As were fetched once or twice each in the window. Examples of single-fetch Q&As:

“I will be having 100 instances within a year — need to discuss”
“I am a reseller who purchased Genymotion Desktop business licenses...”
“I’m using Genymotion SaaS, can I extend the trial beyond the 6 days?”
“Does pay-as-you-go pricing plan allow root access, Google Play...”
“How do I create and configure arm64 cloud SaaS devices...”

Each of these is one specific person evaluating Genymotion for one specific commercial use case. The Q&A got fetched once by one AI session on behalf of one buyer. This looks like a long-tail content pattern. The top 10 Q&As account for 41% of fetches. The remaining 150 fetched Q&As account for the other 59%. The tail is bigger than the head.

Pages are the sources

The /pages/ corpus behaves differently. The Burp Suite tutorial is by far the most accessed page (130 fetches, 22% of all pages fetches). After that the numbers drop fast: the Genymotion documentation hub (48), the Linux install guide (27), and a handful of requirements / install / root-access pages in the 12–17 range. 122 pages got 10 or fewer fetches, more than half of them got exactly one fetch each, and 413 pages (76%) got nothing.

Does this mean our entire /pages corpus is unnecessary? What’s the role of a corpus where most items don’t get fetched directly?

We think the answer is here in this stat: 79% of sessions that fetched a Q&A also fetched at least one of the source pages cited in that Q&A’s “Based on these sources” sidebar. This indicates that the model verifies the Q&A answer by pulling the cited source page in the same session. The /pages give the proper credibility to the Q&A pages.

This also means the silent 76% isn’t dead weight. Each page is a potential source citation for a Q&A. When that Q&A gets asked, the page gets fetched. When the Q&A isn’t asked, the page sits. Looking ahead, we could trim the pages that are not mentioned in the sources sidebar (if any).

Page single-fetch tail samples:

react-native-ui-testing-with-detox-genymotion-saas (CI/CD integration)
how-to-install-magisk-on-genymotion (advanced rooting)
how-many-genymotion-virtual-devices-can-i-launch-per-aws-ec2 (cloud capacity)

A simple way to understand the /pages content is the supply of product and marketing information available to the Q&A. Don’t cut it, but let the market decide which ones to keep.

Summary

Q&As are demand-driven, fetched as primary answers, distributed in a real long tail. Pages are supply-driven, fetched as source citations, distributed with a concentrated head and a large dormant body. They play different roles and are complementary.

What citation bots didn’t fetch

Unsurprisingly, the citation bots rarely fetch the discovery layer, which are the reference files like robots.txt, sitemaps, llms.txt and the various JSON APIs that we published. Citation bots don’t browse around the website, they fetch URLs that the model already knows. The discovery work is done by crawler bots upstream (ClaudeBot, GPTBot, OAI-SearchBot, PerplexityBot) that build the indexes the citation bots draw from.

This is what the data shows. 39% of ChatGPT-User sessions start with a direct content-URL fetch. The bot knew the URL it wanted before the session began.

A teaser for the next articles: those AI bot crawlers seem to heavily prefer traditional SEO infrastructure over the AI-native discovery standard. In the same 21-day window, ClaudeBot fetched sitemap.xml 209 times and llms.txt zero times. OAI-SearchBot and GPTBot fetched sitemap.xml 16 times combined and llms.txt twice each. We published the AI-native files because the standard exists but our observations show that they’re not really used yet.

What the fetched content looks like

In our 21-day sample, some consistent patterns emerged. Here are some insights.

Decomposition by product line. The popular Q&As split the answer by product line (Genymotion Desktop: / Genymotion SaaS (Cloud):) or by persona (Ideal for: Occasional users, pilots / Ideal for: Enterprises needing bespoke setups). We assume the LLM can easily pick the content it needs to answer the question.

Tables for comparisons. The SaaS-vs-Desktop Q&A is an 8-row × 3-column table covering Hosting, Scalability, Collaboration, Automation, Maintenance, Use Cases, Cost Model, Security & Compliance. The rooted-device Q&A uses a 3×3 task / need / how table. Seems like LLMs are good at reading tables.

User-voice question titles. Some Q&A titles preserve typos, missing words, run-ons, broken grammar. The titles are made of real chatbot input. That seems to register with the LLMs.

The wording of the pages has been rewritten according to some fluency rules: bullets-not-paragraphs, tables-not-prose, every-sentence-standalone, no anaphora (“this”, “the above”), no cross-paragraph dependencies. The compressed form is less pleasant to read for humans, but that isn’t the point. Our target reader is an AI agent that extracts content and makes its own prose when chatting with a human.

The human-readable prose still exists. It lives on genymotion.com. The AI site is generated from that material via the rewriting pipeline. Canonical URLs on every AI-site page point back to genymotion.com. The AI site isn’t a replacement for the human site. It’s the same content for a different kind of user.

What this means for a B2B company building an AI site

Three takeaways that hold up against the data.

1. Q&As do more work than pages, and the gap is large. 2.5× higher coverage, gentler Pareto, real long tail. The Q&As generated from observed user questions earn more bot traffic per page than rewriting existing content. The Q&As represent the user demand, what they’re actually asking about. The pages are used for verification. You need both, but they aren’t equal in contributors to citations.

2. The compressed form is the content; the marketing prose lives elsewhere. An AI site isn’t your human website with structured data added. It’s a parallel surface where the content is rewritten for extraction. The human site keeps doing the human job. The AI site is a different deliverable with a different consumer.

3. Don’t bet on AI-native discovery as your primary channel yet. Publish llms.txt, expose JSON APIs, fine. But the work in mid-2026 is still being done by robots.txt and sitemap.xml — same as for the last 25 years of web SEO. Use your traditional SEO infrastructure to reference your AI site. Bots still use it.

The simpler version of all three: an AI site is a separate property with a different design target than your marketing website. The shape of the content, the navigation, the discovery layer, and the audience all differ. Treating it as “the same content with JSON-LD” misses what it actually is — and what bots actually use.

Get this for your company

Rozz gives you visibility into the AI conversations happening about your product, and the tools to influence what AI recommends.

$997/month | AI site + chatbot + analytics

→ Book a call | → See how it works | → rozz@rozz.site

← Previous Entry

Entry #13: Two channels, two audiences

All Entries

Entry #15: Who reads your llms.txt?

→ Data source: CloudFront access logs for rozz.genymotion.com, April 29 – May 19, 2026 (21 days). ChatGPT-User and Claude-User content fetches only, 200-OK responses. Corpus inventory reconciled against the live AI-site URI registry.