AI Site Structure Matters More than We Thought
Every week we look into the data and share insights on this here blog. This week is about the importance of “topics” or the semantic structure we build to organize content for AI agents on the AI site.
The Genymotion AI site, our case study in this weekly series, has 16 topic pages. In the logs, we found out that AI platforms and search engines are asking for 61 more that don’t exist any more. That’s 1,001 requests in 7 days to topic pages that we removed when we improved the taxonomy. Why were so many topic index pages being queried repeatedly?
A design choice: structure for machines
So this is what we’ll talk about in this twelfth article in the series: organization by sematic topic, and the stability of this structure. Turns out it matters more than we realized when we made it.
An AI site is a website for AI agents. A regular website for humans, as were are accustomed to using, helps humans browse. They click through menus, usually the same for each B2B site: Products, Solutions, Resources, Pricing etc. Humans like to find the same structure to zoom in on the content they’re looking for. They judge a site by what it looks like.
None of that applies when the reader is an AI agent from ChatGPT, Claude, or Perplexity. We’re on a mission to find out what they truly want.
When Rozz builds an AI site, we build it around content taxonomy, not around human navigation. The primary organizational layer is a set of topic hubs. Each hub is named after a set, or cluster, of related content specific to each site. For Genymotion, it’s: CLI Tooling, Cloud Deployment, Virtual Device Management, Licensing…
This is the opposite of how traditional sites are built. Traditional sites are organized around how users browse through them and where they should convert. AI sites are organized around concepts to facilitate content retrieval.
The topic hubs work
We found confirmation of the importance of that design in the logs. Every major AI platform queries the topic listing pages.
In the past week, ChatGPT-User fetched /topics/cli-shell-tooling.html 130 times during live user sessions. PerplexityBot hit /topics/android-os-versions.html six times across a monitoring schedule. ClaudeBot visited /topics/mobile-test-automation.html twice while sampling the site. These aren’t incidental pulls, they happen week over week. We think the topic pages help AI systems understand what content exists before they navigate to specific answers, or to filter what they collect before pulling the data.
This isn’t obvious because RAG systems can also query a huge index and find pages using semantic search, which Rozz does too (at a much smaller scale and for each individual site separately). Filtering content increases RAG efficiency. This may be what’s going on here.
The point we’re making here is about the expectation, or even the demand, the LLM systems appear to have for structural consistency over time. Whereas individual pages come and go without much afterthought, the topical layer was tentatively retrieved time and time again, over multiple days, as if there was an expectation of its durability.
This leads us to reinforce the attention we place in curating the topics in each AI site, both algorithmically and by providing tools for human oversight.
What we didn’t anticipate
In our original design, the topic taxonomy on an AI site was generated every week to reflect the site’s new content and the new Q&As coming from the chatbot. Clustering algorithms decide which pieces of content belong together and assign names to the groups. Clusters get split, merged, or renamed. Over 90 days, we’ve run that iteration loop several times. Each iteration improved the taxonomy: topics became more specific, less overlapping, more aligned with how users actually query the content.
We thought that was OK because LLMs like fresh content. But each iteration also changed URLs.
/topics/android-os-versions.html became /topics/android-version-selection.html. /topics/mobile-testing-security.html split into /topics/mobile-test-automation.html and /topics/network-security-config.html. A dozen other topic slugs shifted as the clustering tightened.
The ghost problem
Here is what that turned into, in one week of logs:
| Topic URL | Requests (7 days) | Status |
|---|---|---|
/topics/android-os-versions.html | 344 | Retired |
/topics/mobile-testing-security.html | 127 | Retired |
/topics/root-access-and-tools.html | 69 | Retired |
/topics/ci-cd-tooling.html | 13 | Retired |
/topics/arm-apple-silicon.html | 9 | Retired |
| …56 more retired topic URLs | 439 | Retired |
| Total ghost topic requests | 1,001 | — |
61 topic URLs that no longer exist on the site received 1,001 requests in seven days. ChatGPT-User alone contributed 387 of those. Perplexity, Claude, and other retrieval systems added the rest.
The pattern across requesters is clean: each system had learned the old taxonomy at some point, cached the topic URLs, and kept fetching them long after we retired them. ChatGPT remembers topic URLs from when it last indexed the site. Bing remembers sitemap filenames we replaced months ago — six retired sitemap shards are still being polled every 27 minutes. Every retrieval system that ever read the site carries a version of our structure that’s out of date.
Stability patterns
Once we understood what was happening, the fix was straightforward. The principle is that algorithm-generated structure needs URL-level stability that the algorithm’s optimization instruction doesn’t provide on its own. So we’ve made some changes:
Canonical topics. Topics are still proposed by the algorithm but can easily be manually curated. They include detailed descriptions and are kept stable over time, unless they become so deprecated that they must be retired.
A topic URL registry. Every topic URL the site has ever had is tracked, including retired ones. When the clustering algorithm proposes renaming a topic, the rename is recorded in the registry rather than silently replacing the old URL. This helps to redirect some retired topics to aactive ones.
301 redirects from retired topics to their closest current equivalents. When AI platforms request /topics/android-os-versions.html, they now get a 301 redirect to /topics/android-version-selection.html — the current topic that covers the same content. That way, the LLM stills get the content they need. Even though they don’t learn that the topic name has changed for next time, at least they get the proper data.
The first pass shipped this week with 61 redirect mappings — every retired topic URL we could identify in the logs. More will surface as we continue to watch the traffic.
Why this matters
This is what we think separates a high-performance AI site from a one-shot prototype.
A prototype is easy to build. Generate a taxonomy once, publish the URLs, stop. The site works on the day it ships. It also goes stale the day the first better clustering algorithm comes out, because you either keep the stale structure or break every external cache that learned it.
A high-performance AI site requires the opposite discipline. You iterate on the taxonomy because that’s how the site gets better over time. You also treat the URLs the taxonomy produces as infrastructure that external systems depend on. Every iteration produces a set of changes that need to be governed: which URLs move, which merge, which retire, which redirect to which. The clustering algorithm needed a stability layer.
Most discussions of AI SEO focus on content: write answer-first, use Q&A schema, keep sentences short. That work matters. But below the content layer is the structural layer, and below that is the stability layer. The structural layer is what tells a machine reader what your site is about. The stability layer is what keeps that information useful as you improve the structure.
An AI site without a stability layer performs well at launch and decays over time. Every algorithm upgrade creates another set of ghost URLs. Every ghost URL represents real users asking real AI platforms real questions that arrive at a dead page and gets a stub instead of an answer.
We built this site for machines. The machines read it. Then the structure changed, and the machines kept reading the old map. Stability patterns are what close that gap.
Get this for your company
Rozz builds AI sites for B2B companies. Structured for machines. Iterated to stay current. Governed so that iteration doesn’t break what retrieval systems have already learned.
$997/month | AI site + chatbot + analytics
→ Book a call | → See how it works | → rozz@rozz.site
→ Data source: CloudFront access logs for rozz.genymotion.com, April 15 – April 22, 2026 (7 days). Retired topic URL inventory reconciled against the current site’s 16 live topic hubs. Requester breakdown from User-Agent classification.