AI search readiness

AI Search Optimization Audit

Check whether ChatGPT, Gemini, Claude and other AI search systems can understand and cite your website.

Free AI SEO audit for robots.txt, sitemap visibility, structured metadata, homepage crawlability and AI citation readiness.

AI policy templates →Compare AI models →Newsroom AI ROI calculator →All journalism tools →

Scan your site

Technical audit of robots.txt, sitemap, and crawlability.

Homepage + random featured article HTML — 10 checklist items for AI search.

Put your website URL, not an article. An article will be picked at random while the check runs.

After your audit, download the full report as a standalone HTML file (open offline or share with your team).

Results cached 24h per URL. Article check may take longer (headless browser fallback).

What we check — and why

After a scan you get a score, green/red checklist, crawler table, issue-by-issue recommendations, and a suggested robots.txt for your domain.

  1. robots.txt & crawler rules

    What · We fetch /robots.txt and classify training vs retrieval/search bots.

    Why · Wrong bot rules hide your site from AI answers or allow unwanted training crawlers.

  2. Sitemap & homepage crawlability

    What · We verify the homepage loads and check sitemap availability.

    Why · Crawlers need reachable HTML and URL discovery.

  3. AI search checklist

    What · We parse homepage HTML + one article picked at random from the homepage (browser-like fetch, Playwright fallback) and score 10 must-have signals from our AI search audit framework.

    Why · Measures whether AI systems can understand, extract, summarize, and cite your journalism — entity clarity, structure, schema, trust, freshness, and more.

  4. Why your overall score can change between runs

    What · Each full-site scan picks a different sample story at random (unless you paste an article URL). Article scores reflect that page. Technical Foundation and Homepage AI structure use your site URL and stay the same when the homepage has not changed.

    Why · The overall AI Visibility Score blends site-wide and article checks, so it may move up or down on repeat runs even when your homepage is unchanged. Use a manual article URL if you want a fixed story.

Training bots vs retrieval bots

Block crawlers used for model training (e.g. GPTBot, ClaudeBot, Google-Extended) if you want to limit training use of your content. Allow retrieval and search bots (OAI-SearchBot, ChatGPT-User, PerplexityBot, Googlebot) so AI answers can cite your pages. Never use User-agent: * / Disallow: / unless you intend to hide the entire site.

Metadata & structured data for AI

Newsrooms benefit from Organization or NewsMediaOrganization schema, Article/NewsArticle markup, and FAQ blocks where appropriate. Title tags, canonical URLs, and visible trust signals (bylines, dates) improve how LLMs summarize and attribute your reporting.

How the free AI visibility check works

  1. Enter your site URL — we normalize it and fetch robots.txt, the homepage, and your sitemap.
  2. We score crawler rules, crawlability, metadata, sitemap health, and whether key content is available without heavy JavaScript.
  3. We parse one article linked from your homepage and run ten editorial signals for AI search (entities, structure, trust, freshness).
  4. Download a standalone HTML audit report, plus a suggested robots.txt tailored to your domain.

Frequently asked questions

Common questions about AI visibility, robots.txt for LLM crawlers, and how this free newsroom audit works.

What is an AI visibility check for newsrooms?
It is a free technical audit of your public website: robots.txt rules for AI crawlers, sitemap and homepage crawlability, metadata and schema, JavaScript rendering, and a checklist that parses your homepage plus one sample article for AI-friendly structure and trust signals.
What does the AI visibility score mean?
The primary AI Visibility score (0–100) uses eight weighted sections: robots, crawlability, rendering, sitemap, metadata & schema, homepage AI structure, article AI structure, and entity & trust signals. Technical Foundation is shown separately — it measures infrastructure (can AI access the site?) without implying citation readiness. This is a heuristic snapshot, not a live ranking inside ChatGPT or Perplexity.
Should I block GPTBot and other AI training crawlers?
Many publishers block training crawlers (GPTBot, ClaudeBot, Google-Extended) to limit model training on their content while still allowing retrieval and search bots (OAI-SearchBot, ChatGPT-User, PerplexityBot, Googlebot) so AI answers can cite your reporting. Your policy should drive the choice; the tool shows what your robots.txt currently allows.
What is the difference between AI training bots and retrieval bots?
Training bots collect content to build or improve models. Retrieval and search bots fetch pages when a user asks a question so an AI system can quote or summarize your site. Blocking training bots does not have to block citation if retrieval bots remain allowed.
How often should I run the audit?
Run it after changing robots.txt, sitemap, templates, or paywall/rendering setup, and periodically on your homepage. Results are cached for 24 hours per URL so repeat scans the same day return the same report.
What is the difference between AI citations and brand mentions?
A citation is when an AI answer points to your URL as a source for a specific claim — it can drive traffic and footnotes your reporting. A brand mention is when the AI names your publication in the answer without necessarily linking to you — it builds presence and can influence readers even when there is no click. Mentions in AI responses can outpace referral traffic from those answers, so both metrics matter for newsrooms. This free audit prepares your site for citation (crawlable HTML, extractable structure, schema); it does not measure live mention share across AI prompts.
What are training data and live retrieval — and why does crawl access still matter?
AI answers draw on two pools: training data (a frozen web snapshot baked into the model, where long-established brands often have deeper profiles) and live retrieval (real-time fetches that supplement and verify answers for any site with a crawlable presence). Newer or less historically prominent outlets rely more heavily on live retrieval — so robots.txt, sitemap health, and whether automated clients can actually download your homepage and articles are decisive. Our audit tests that live-retrieval path (including cases where robots.txt looks open but HTML returns HTTP 403). It does not assess how strongly your title already sits in model training memory.
What does “being understood” vs “being trusted” mean for AI visibility?
AI visibility is often two sequential layers. Being understood is an on-page problem: a clear topical home base, structured articles, schema, and consistent entity language so AI can place you in the right topic neighborhood. Being trusted is largely off-page — consistent mentions in major publications, knowledge sources, reviews, and industry discussions that validate who you are. You cannot skip the first step, but on-page work alone rarely wins sustained mention share. This audit focuses on being understood: access, extraction, structure, and editorial trust on your own site. It does not score third-party consensus or off-site mention footprint.
What on-page structure does AI need from news articles?
Strong citation-ready pages often share four patterns: answer-first structure (lead with the direct answer, not a long preamble — a large share of AI citations come from the early portion of a page), descriptive headings (H2/H3 that stand alone as questions or answers), topical completeness (coverage across your topic and subtopics, not scattered one-off pages), and consistent entity language (the same publication and topic names across pages). Our checklist scores related signals on one sample article — intro clarity, heading hierarchy, extractable paragraphs, internal links, schema, and freshness. It is a crawl-based heuristic for a single story, not a full-site topical map or competitor gap analysis.
What important AI visibility checks are outside this free audit?

This tool is a crawl-based technical and on-page audit. Several high-value signals used in broader AI search optimization (AEO) are not part of this free scan — fix crawl and structure here first, then pursue the rest with dedicated research and monitoring:

  • Live prompt testing — running branded and topical queries across ChatGPT, Perplexity, Gemini, and similar systems to measure mention rate (your name in the answer) and citation rate (your URL linked as a source). We do not query live models or report share-of-voice by prompt.
  • Off-page footprint — whether you appear on Wikipedia or Wikidata, how often independent publications co-cite you, and third-party mention density across the web. Our trust checklist only reads signals on your own pages (bylines, sourcing, contact).
  • Site-wide topical map — topic silos, hub and section pages, and coverage completeness across your beat (not just one homepage + one sample article). We use internal links on a single story as a weak proxy, not a full information architecture review.
  • Answer-first / first-30% scoring — whether the direct answer sits in the opening portion of the article (where many AI systems extract citations). We score intro clarity and paragraph length but do not yet measure answer placement in the first 30% of page text.
  • Competitor topical gap view — comparing your coverage and AI visibility against peers in the same category (e.g. your outlet vs two rival publications on shared topics). That requires competitive benchmarking outside this tool.