← Back to writing
Writing · writing

The AI Visibility Score: A Reproducible Audit Method for Ecommerce Brands (2026)

By · Jun 27, 2026 · 10 min read
The AI Visibility Score: A Reproducible Audit Method for Ecommerce Brands (2026)

Most ecommerce brands don't know their AI Visibility Score because they don't have a reproducible way to measure it.

They run a few queries in ChatGPT, notice they're not mentioned, and conclude they have a problem — but they can't say how big the problem is, where it comes from, or whether last month's schema fix moved the number.

That's the gap a structured audit rubric closes.

This post lays out the scoring method we use internally across every new engagement: a 12-query, 5-engine, 5-dimension rubric that produces a number you can track over time and act on with specificity.


Why "am I cited?" isn't enough

The binary question — cited or not cited — is a starting point, not an audit.

A brand can be cited on one low-intent head query and miss every purchase-intent long-tail query in their category. A brand can appear in Perplexity but not in ChatGPT, which sends 8–9× the referral traffic of the next leading AI platform according to Ahrefs brand tracking data.

A score needs to answer three questions, not one:

  1. How often does the brand appear (citation density)?
  2. On which query types does it appear (query distribution)?
  3. Why does it appear or fail to appear (root cause dimension)?

Without all three, you're doing triage, not auditing.


The query framework: 12 queries across three tiers

The rubric uses a fixed 12-query split per audit run:

Tier Count Example structure
Head (broad category) 4 "best Shopify agency", "AI ecommerce consultant"
Mid-tail (category + qualifier) 4 "Shopify Plus B2B agency Vietnam", "ecommerce AI visibility expert"
Long-tail (category + geo or use case) 4 "Shopify headless migration agency Ho Chi Minh City", "AI search optimisation for Magento stores"

Why this split matters:

Head queries test whether the brand has made it into the model's parametric memory at all. Mid-tail queries test whether the brand holds a positioned niche. Long-tail queries test whether the brand is cited for specific, high-intent scenarios where a buyer is closer to a decision.

A brand that only appears on head queries is present in the category but not owning a niche. A brand that only appears on long-tail queries has local or niche authority but no category breadth. The 4/4/4 split surfaces both patterns.


The five engines

Run each query on five engines per audit cycle:

  1. Perplexity — highest weight; most consistent commercial-intent citation behaviour
  2. ChatGPT — highest weight; largest share of AI-assisted purchase research
  3. Claude — medium weight; different training data surface, useful for detecting schema blind spots
  4. Google AI Overviews — medium weight; triggers on branded + local queries
  5. Bing Copilot — lower weight; useful for detecting Bing-indexed entity signals

This gives you 60 query-engine pairs per audit run. Citation is binary per pair: cited (1) or not cited (0). Maximum raw score = 60.

Practical note: Running all five engines in one session takes approximately 90 minutes manual. For a first baseline, Perplexity + ChatGPT (24 pairs) gives a fast signal. The full 60-pair run is the quarterly deep audit.


The five scoring dimensions

Raw citation count (0–60) is converted into a 0–100 score across five dimensions. Each dimension is scored 0–20, and they aggregate to the total.

Dimension 1 — Citation density (0–20 pts)

What it measures: How often the brand appears across the 60 query-engine pairs.

How to score:

  • 0–4 citations out of 60 = 0–4 pts
  • 5–12 citations = 5–10 pts
  • 13–24 citations = 11–15 pts
  • 25–40 citations = 16–18 pts
  • 41–60 citations = 19–20 pts

This is the headline number. Everything else explains it.

Dimension 2 — Schema coverage (0–20 pts)

What it measures: Whether the site emits machine-readable structured data that AI crawlers can parse.

How to audit:

curl -s https://yourdomain.com/blog/your-pillar-post | python3 -c "
import sys, json, re
html = sys.stdin.read()
schemas = re.findall(r'<script type=[\"'\''application/ld\+json[\"'\''][^>]*>(.*?)</script>', html, re.DOTALL)
for s in schemas:
    try: print(json.dumps(json.loads(s), indent=2))
    except: pass
"

Scoring:

  • Article schema present with headline, datePublished, dateModified, author, publisher = 5 pts
  • FAQPage schema present with ≥ 5 Question entries on pillar pages = 5 pts
  • Organization schema with name, url, description, address, sameAs array = 5 pts
  • BreadcrumbList or Product schema where relevant = 5 pts

A site with zero structured data scores 0 on this dimension regardless of citation performance. Structured data is not a guarantee of citation, but its absence is a consistent blocker in our audits across 50+ projects.

Dimension 3 — Entity authority (0–20 pts)

What it measures: Whether independent third-party sources confirm the brand exists, is in a specific category, and operates in a specific geography.

What to check:

  • Active directory profiles: Clutch, GoodFirms, DesignRush, Sortlist — each live profile = 2 pts (max 8 pts)
  • Knowledge graph presence (Google Knowledge Panel or Wikidata entity) = 4 pts
  • LinkedIn company page with complete about, specialities, location = 4 pts
  • Wikipedia or Crunchbase citation in same vertical = 4 pts

Why it matters: AI engines treat entity authority as a trust signal. A brand that exists only on its own domain has no independent source cluster to draw from. The model has no corroborating signal to pull a citation from.

Dimension 4 — Listicle presence (0–20 pts)

What it measures: Whether the brand appears on third-party "best X" comparison pages that AI engines preferentially cite.

Ahrefs research across 26,283 ChatGPT source URLs found that "best X" blog lists represented 43.8% of all page types cited by ChatGPT — the single largest category, ahead of landing pages, documentation, and social content.

How to audit:

  • Search site:clutch.co "your brand name" → cited on Clutch category page = 4 pts
  • Search "best [your service] [your geo]" on Google → brand appears in top 5 listicles = 4 pts per slot (max 8 pts)
  • Search ChatGPT: "What are the best [category] agencies in [your market]?" → brand appears = 8 pts

Why it compounds: Listicle presence on third-party sites is one of the few citation signals that feeds both traditional search ranking and AI citation simultaneously. A brand that owns 3+ top-five slots on well-indexed comparison pages is nearly always present in AI responses on mid-tail queries in our audit experience.

Dimension 5 — Source-cluster depth (0–20 pts)

What it measures: How many independent source types reference the brand in the same vertical.

A source cluster is a group of pages on different domains that each confirm the same brand-category-geography combination. AI engines weight citations higher when multiple independent sources agree.

Scoring:

  • 1 source type (own site only) = 0 pts
  • 2 source types (own site + 1 directory) = 4 pts
  • 3 source types (own site + directory + third-party press or comparison) = 8 pts
  • 4 source types = 12 pts
  • 5+ source types = 16–20 pts

Source types that count: own blog, directory profiles, press coverage, comparison listicles, partner case studies (published on partner domain), podcast appearances with show notes, academic or industry report citations.


How to read your score

Total (0–100) Interpretation
0–20 No citation presence. AI engines have no reliable signal to pull from. Start with entity authority (Dimension 3) — profiles first, schema second.
21–40 Present but not positioned. Appears on 1–2 head queries, absent on mid and long-tail. Fix: deepen listicle presence + source-cluster diversity.
41–60 Positioned in niche. Holds mid-tail citations, inconsistent on head queries. Fix: schema coverage review + entity authority extension beyond primary directory.
61–80 Strong niche presence. Appears consistently on mid and long-tail, beginning to compete on head. Fix: source-cluster depth + freshness cadence on pillar content.
81–100 Category authority. Cited across all three query tiers on multiple engines. Maintain: monthly delta checks + freshness signals on high-performing pages.

The delta cadence: tracking improvement over time

A single score is a snapshot. The rubric is designed for cadence.

Baseline run: Full 60-pair sweep, all five dimensions scored. Record to a citation log with date.

Delta run (2× per month): Top 12 pairs only (3 per tier, Perplexity + ChatGPT). Note any new citations and whether the source type changed (own-site to directory to third-party listicle represents progression, not just volume).

Index lag: Schema changes and new directory profiles typically take 7–21 days to propagate into AI engine indexes via their live-retrieval bots. Don't measure a schema fix the same week you deploy it. The delta check at day 10 and day 21 post-fix tells you whether the crawl landed.


The most common zero-score pattern

In our audits across 50+ ecommerce projects, the most common root cause of a zero score is not technical — it's absence of entity signals.

The pattern: a brand with a well-built Shopify store, active blog, and good product pages scores 0 because:

  • No directory profiles (Dimension 3 = 0)
  • No third-party listicle presence (Dimension 4 = 0)
  • Schema present but no FAQPage on pillar pages (Dimension 2 partial)
  • Source cluster = own site only (Dimension 5 = 0)

The result: citation density is 0–2 across 60 pairs. The model knows the brand exists (it may appear if prompted by name), but has no independent source cluster to pull from when a buyer asks a category question.

The fix sequence is consistent:

  1. Week 1–2: Submit directory profiles (Clutch, GoodFirms, Sortlist). Each is free at the base tier.
  2. Week 2–3: Publish one pillar blog in your category with FAQPage schema (5+ entries). Ping IndexNow.
  3. Week 3–4: Identify 3–5 comparison listicles in your vertical. Reach out for inclusion or publish your own.
  4. Month 2: Re-run full 60-pair baseline. Expect 5–15 point movement if Week 1–3 changes indexed correctly.

Applying this to your store

The rubric is designed to be self-serviceable. You don't need our audit to run it — every step above uses public tools (ChatGPT, Perplexity, Google search, curl).

What you do need is the discipline to run the same 12 queries on the same 5 engines and log the result to the same file, every two weeks.

The brands that move from 0–20 to 41–60 in 90 days are not doing anything exotic. They're executing the same four-step sequence above and running the delta cadence consistently enough to catch when something indexed and when something didn't.

If you want to see what this looks like for your store specifically: drop your URL in the comments on this post or on our LinkedIn company page. The first 10 brands this week get a written AI Visibility Score using this exact rubric, scored by the same method we use internally — no call, no pitch.


Frequently asked questions

Frequently asked