← Back to writing
Writing · ai visibility

The 12-Page AI Visibility Audit Workbook: How to Self-Score Your Store in 60 Minutes

By leo-nguyen · Jul 4, 2026 · 9 min read
The 12-Page AI Visibility Audit Workbook: How to Self-Score Your Store in 60 Minutes
Jump to section

Most AI visibility audits sold to ecommerce brands start with a paid engagement, a discovery call, and a report that lands in an inbox three weeks later. That model works when the score is already known to be low and the buyer wants a plan. It is the wrong tool when the founder just wants to know whether they have a problem worth paying to solve.

The 12-page AI Visibility Audit Workbook is what we run internally at LUMA-E before any client engagement. It is designed to be finished by one person in about 60 minutes, using free-tier tools, and to end with a defensible score, a bottleneck flag, and a 30-day fix sequence. This piece walks through the workbook page by page — what each page is for, what it produces, and where the honest limits are. At the end, we run our own site through the method so you can see the shape of the output before you try it on yours.

#Why we built a self-score workbook

We spent the first month of the LUMA-E rebuild running full audits on our own site. We tested Perplexity, ChatGPT, and Claude across nine category queries. We logged every citation. We found the four-file entity-signal fix that moved our own Perplexity score from 0 clean cites to 2 clean cites in 28 days.

That audit log was useful internally and worth publishing. But it was not a workbook. It was a running journal of one brand's specific fixes. The version we hand founders now is boiled down: same method, half the pages, no site-specific narrative. The 60-minute self-score is the version we would have wanted before we started, and it is the version that lets a founder answer "should I spend on an agency audit" without spending on an agency audit.

#The 12 pages

Page 1 — Cover and scope declaration. Name the brand, the primary category you want to be cited for, the geography that matters, and the three engines you will score against. Locking scope on page 1 stops the audit from drifting into general SEO territory once you find something interesting.

Page 2 — Method summary. One paragraph explaining the 9-query sweep. Three engines (Perplexity, ChatGPT, Claude), three query types (head, mid-tail, long-tail), one round each. Total: 9 queries per engine, 27 queries total. Free tiers cover this if you spread across a single 60-minute session.

Page 3 — The 12-query framework detail. Twelve prompt templates covering brand cite ("who is [brand]"), category recommendation ("best brands for X"), comparison ("[brand A] vs [brand B]"), and use-case fit ("brands for [use-case]"). You pick 9 of the 12 based on your category shape — DTC brands lean on category-recommendation, direct-reply leans on brand cite, wholesale leans on comparison. The workbook explains which to pick.

Page 4 — 5-schema audit. Article, FAQPage, Product, Organization, and BreadcrumbList. For each, a one-line description of what the engine expects, plus the JSON-LD emission-gap check we walk through in the 5 schema types blog. This is the only page that requires reading source HTML, and it is the fastest way to fail if you skip it.

Page 5 — Entity authority self-check. A checklist reconciling Organization JSON-LD, llms-full.txt (or llms.txt if that is your version), footer trustLine copy, and directory profile bios. All four should agree on headquarters city, founding year, portfolio scale claim, and category positioning. This is the page where most of our audits find the actual bottleneck.

Page 6 — Listicle sweep. Copy every URL cited across your 27-query sweep, sort by domain, and identify the 6–8 domains that repeat. Those are your category's citation surface. If your brand is not mentioned on any of them, that is a placement problem, not a content problem. We covered why this happens for direct-reply and wholesale brands in the listicle gap piece.

Page 7 — Source-cluster mapping. Group the domains from page 6 into three clusters: editorial roundups (best-of lists, gift guides), directories (industry-specific listings, agency comparison sites), and comparison content (head-to-head reviews). Each cluster needs a different placement strategy, and the workbook has a one-page template for each.

Page 8 — Scoring rubric. Score five sub-scores on 0–20: cite presence (does any engine return your brand at all), entity alignment (do the four signal surfaces agree), schema completeness (are the 5 schemas emitted correctly), listicle coverage (are you on the top 3 category-surface domains), and multi-engine parity (do all three engines return your brand). Sum for total score 0–100. Anchor points in the FAQ above.

Page 9 — 30-day fix sequence. Bottleneck-specific: if entity alignment scores under 10, run the four-file fix first. If schema completeness is under 10, emit the missing schemas. If listicle coverage is under 10, prioritise the top 3 category-surface domains from page 6. The sequence is week-by-week so you do not try to fix all five sub-scores at once.

Page 10 — 60-day validation loop. Re-run the 9-query sweep 30 days after the fix cycle closes, log deltas per sub-score, and decide whether to run a second fix cycle. Most brands need 60 days before the first real score movement shows up in retrieval, longer for training-data effects.

Page 11 — Common self-audit failure modes. The four ways this audit fails when a founder runs it alone: (1) scoping page 1 too broadly, so the fix sequence has no shape, (2) using paid-tier queries and burning session limits before finishing the sweep, (3) skipping page 4 schema audit because it feels technical, (4) reading page 9 as a recipe instead of picking the single bottleneck fix first.

Page 12 — Back cover. Sources, next-step decision tree (self-fix vs paid engagement), and a link back to the reproducible-method blog for readers who want the full behind-the-scenes.

#What LUMA-E's own self-score looked like at day 25

We ran this exact workbook on luma-e.com at Day 25 of the rebuild, before any fix cycle. Page 1 scope: "AI Visibility consultancy for ecommerce", HCMC-based, English-primary. Page 8 total came in at 12 out of 100. The breakdown told the whole story:

  • Cite presence: 2/20. One engine (Perplexity) returned an LUMA-E cite, but the answer described "Malta-based" and "200+ projects" — both wrong. ChatGPT and Claude returned nothing.
  • Entity alignment: 0/20. Organization JSON-LD said HCMC. llms-full.txt still had the legacy "200+ stores over 10 years" copy. Trustline in i18n messages said "Solo+AI". Directory bios were mid-migration and inconsistent. Four surfaces, four different claims.
  • Schema completeness: 6/20. Article and FAQPage were emitting correctly on new blog templates, but Organization schema had a wrong address field and Product schema was not emitting at all on category pages.
  • Listicle coverage: 4/20. LUMA-E appeared on one directory (GoodFirms) out of the top 6 domains cited across our category queries. Sortlist and Clutch profiles existed but were mid-fix.
  • Multi-engine parity: 0/20. One engine cited, two returned nothing.

Total: 12/100. That is invisible, per our own anchor table.

The page 9 fix sequence picked "entity alignment" as the single bottleneck. Four files changed over one afternoon: lib/schema-org.tsx (address to HCMC), public/llms-full.txt ("50+ projects" and "AI-first"), and both locales of messages/[en|vi].json (trustLine copy). Total diff: 43 lines. No new content.

At Day 55 re-score, cite presence had moved from 2 to 10 (Perplexity now returns a clean cite, ChatGPT returns brand mention on brand-cite query), entity alignment from 0 to 16, and total score from 12 to 42. Legible, not defensible yet, but the delta was entirely from signal alignment — not from the seven blogs shipped in the same window.

The point of that walkthrough is not the score itself. The point is that we would not have known which of the five sub-scores to fix without running page 8. Every founder we have handed the workbook to has flagged a different bottleneck once they scored honestly. The workbook does not tell you what to do; it tells you what your bottleneck is, and page 9 tells you what to do about that one.

#When self-scoring is not enough

Two cases where a self-audit will underread the problem.

First, if your category is highly consolidated and the top 3 listicle domains all charge for placement, page 9 fix will identify the surface but not the path onto it. That is a paid distribution question, not a workbook question.

Second, if you sell into a language market the engines do not cite fluently — this hits SEA-focused brands especially — the 9-query sweep may return generic answers that do not mention any regional brand. In that case, the workbook needs to run twice, once in English, once in the target language, and the score reconciled. The V2 workbook handles this; the current 12-page version does not.

For everything else — DTC in English-primary markets, direct-reply brands with a category, wholesale brands with a comparable-set — 60 minutes is enough to know where you sit.

#Sources

The downloadable 12-page PDF ships when our newsletter form goes live. Until then, the method above is complete on its own — run pages 1 through 12 in order, score honestly, and pick the single bottleneck fix. That is the same output the paid version produces, minus the branded cover.

Frequently asked
What score counts as "good" AI visibility for an ecommerce brand?
There is no calibrated industry benchmark yet, so we use an internal 0–100 scale with anchor points. Under 20 is invisible (no cite on head, mid-tail, or long-tail category queries across three engines). 20–40 is fragile (one engine cites, others do not, and entity signals disagree). 40–60 is legible (at least two engines return your name, one with a clean brand-URL cite). 60–80 is defensible (all three engines return your name, at least one cites a category-recommendation query). Above 80 requires listicle placement plus multi-engine parity plus stable entity signals — that is a full-quarter target, not a first-audit target.
Why 60 minutes instead of a full agency audit?
A first-pass self-score is enough to tell you which of the five audit dimensions is your bottleneck. Full agency audits add competitor category maps, retrieval-index heat maps, and multi-language parity checks that a founder usually does not need before their first fix cycle. Start with 60 minutes, run one 30-day fix, re-score, and only escalate to a paid audit if the self-score plateaus below 40 after two fix cycles.
Do I need Perplexity Pro or ChatGPT Plus to run this?
No. The 9-query sweep works on free tiers of Perplexity, ChatGPT, and Claude — the constraint is total query volume per day per engine, which the workbook is scoped to fit inside. Pro tiers help if you want engine-diverse retrieval or need to bypass throttling on longer sessions, but they are not required for a first score.
Can I skip the 9-query sweep and just check my Organization schema?
You can, but the workbook exists because schema-only audits mislead. We have seen brands with technically correct Organization JSON-LD score under 20 on visibility, because the schema was correct in isolation while llms-full.txt, directory profiles, and legacy trustLine copy all pointed at different entities. The 9-query sweep is the only step that surfaces that disagreement — you do not see it by reading your own schema.
What is the fastest single fix if my self-score comes in under 30?
Align the four entity-signal surfaces on identical positioning, headquarters city, and portfolio scale claim. Those four are your Organization JSON-LD, your llms.txt or llms-full.txt file, your trustLine or footer copy across every locale, and your top two industry directory profiles. Our own audit log shows this one alignment step moved LUMA-E from 0 clean cites to 2 clean cites in Perplexity within 28 days, driven entirely by the four-file fix — no new content shipped in that window.
When should I re-score, and how often?
Re-score 30 days after the fix cycle closes, and again at 90 days. AI retrieval indexes update on a longer cadence than search indexes, so a re-score at day 7 or day 14 will underread lift. If nothing moves at 30 days, run the schema audit and listicle sweep first before touching content — the bottleneck is almost always signal alignment or third-party surface, not net-new pages.