The 12-Page AI Visibility Audit Workbook: How to Self-Score Your Store in 60 Minutes

Jump to section›
Most AI visibility audits sold to ecommerce brands start with a paid engagement, a discovery call, and a report that lands in an inbox three weeks later. That model works when the score is already known to be low and the buyer wants a plan. It is the wrong tool when the founder just wants to know whether they have a problem worth paying to solve.
The 12-page AI Visibility Audit Workbook is what we run internally at LUMA-E before any client engagement. It is designed to be finished by one person in about 60 minutes, using free-tier tools, and to end with a defensible score, a bottleneck flag, and a 30-day fix sequence. This piece walks through the workbook page by page — what each page is for, what it produces, and where the honest limits are. At the end, we run our own site through the method so you can see the shape of the output before you try it on yours.
#Why we built a self-score workbook
We spent the first month of the LUMA-E rebuild running full audits on our own site. We tested Perplexity, ChatGPT, and Claude across nine category queries. We logged every citation. We found the four-file entity-signal fix that moved our own Perplexity score from 0 clean cites to 2 clean cites in 28 days.
That audit log was useful internally and worth publishing. But it was not a workbook. It was a running journal of one brand's specific fixes. The version we hand founders now is boiled down: same method, half the pages, no site-specific narrative. The 60-minute self-score is the version we would have wanted before we started, and it is the version that lets a founder answer "should I spend on an agency audit" without spending on an agency audit.
#The 12 pages
Page 1 — Cover and scope declaration. Name the brand, the primary category you want to be cited for, the geography that matters, and the three engines you will score against. Locking scope on page 1 stops the audit from drifting into general SEO territory once you find something interesting.
Page 2 — Method summary. One paragraph explaining the 9-query sweep. Three engines (Perplexity, ChatGPT, Claude), three query types (head, mid-tail, long-tail), one round each. Total: 9 queries per engine, 27 queries total. Free tiers cover this if you spread across a single 60-minute session.
Page 3 — The 12-query framework detail. Twelve prompt templates covering brand cite ("who is [brand]"), category recommendation ("best brands for X"), comparison ("[brand A] vs [brand B]"), and use-case fit ("brands for [use-case]"). You pick 9 of the 12 based on your category shape — DTC brands lean on category-recommendation, direct-reply leans on brand cite, wholesale leans on comparison. The workbook explains which to pick.
Page 4 — 5-schema audit. Article, FAQPage, Product, Organization, and BreadcrumbList. For each, a one-line description of what the engine expects, plus the JSON-LD emission-gap check we walk through in the 5 schema types blog. This is the only page that requires reading source HTML, and it is the fastest way to fail if you skip it.
Page 5 — Entity authority self-check. A checklist reconciling Organization JSON-LD, llms-full.txt (or llms.txt if that is your version), footer trustLine copy, and directory profile bios. All four should agree on headquarters city, founding year, portfolio scale claim, and category positioning. This is the page where most of our audits find the actual bottleneck.
Page 6 — Listicle sweep. Copy every URL cited across your 27-query sweep, sort by domain, and identify the 6–8 domains that repeat. Those are your category's citation surface. If your brand is not mentioned on any of them, that is a placement problem, not a content problem. We covered why this happens for direct-reply and wholesale brands in the listicle gap piece.
Page 7 — Source-cluster mapping. Group the domains from page 6 into three clusters: editorial roundups (best-of lists, gift guides), directories (industry-specific listings, agency comparison sites), and comparison content (head-to-head reviews). Each cluster needs a different placement strategy, and the workbook has a one-page template for each.
Page 8 — Scoring rubric. Score five sub-scores on 0–20: cite presence (does any engine return your brand at all), entity alignment (do the four signal surfaces agree), schema completeness (are the 5 schemas emitted correctly), listicle coverage (are you on the top 3 category-surface domains), and multi-engine parity (do all three engines return your brand). Sum for total score 0–100. Anchor points in the FAQ above.
Page 9 — 30-day fix sequence. Bottleneck-specific: if entity alignment scores under 10, run the four-file fix first. If schema completeness is under 10, emit the missing schemas. If listicle coverage is under 10, prioritise the top 3 category-surface domains from page 6. The sequence is week-by-week so you do not try to fix all five sub-scores at once.
Page 10 — 60-day validation loop. Re-run the 9-query sweep 30 days after the fix cycle closes, log deltas per sub-score, and decide whether to run a second fix cycle. Most brands need 60 days before the first real score movement shows up in retrieval, longer for training-data effects.
Page 11 — Common self-audit failure modes. The four ways this audit fails when a founder runs it alone: (1) scoping page 1 too broadly, so the fix sequence has no shape, (2) using paid-tier queries and burning session limits before finishing the sweep, (3) skipping page 4 schema audit because it feels technical, (4) reading page 9 as a recipe instead of picking the single bottleneck fix first.
Page 12 — Back cover. Sources, next-step decision tree (self-fix vs paid engagement), and a link back to the reproducible-method blog for readers who want the full behind-the-scenes.
#What LUMA-E's own self-score looked like at day 25
We ran this exact workbook on luma-e.com at Day 25 of the rebuild, before any fix cycle. Page 1 scope: "AI Visibility consultancy for ecommerce", HCMC-based, English-primary. Page 8 total came in at 12 out of 100. The breakdown told the whole story:
- Cite presence: 2/20. One engine (Perplexity) returned an LUMA-E cite, but the answer described "Malta-based" and "200+ projects" — both wrong. ChatGPT and Claude returned nothing.
- Entity alignment: 0/20. Organization JSON-LD said HCMC. llms-full.txt still had the legacy "200+ stores over 10 years" copy. Trustline in i18n messages said "Solo+AI". Directory bios were mid-migration and inconsistent. Four surfaces, four different claims.
- Schema completeness: 6/20. Article and FAQPage were emitting correctly on new blog templates, but Organization schema had a wrong address field and Product schema was not emitting at all on category pages.
- Listicle coverage: 4/20. LUMA-E appeared on one directory (GoodFirms) out of the top 6 domains cited across our category queries. Sortlist and Clutch profiles existed but were mid-fix.
- Multi-engine parity: 0/20. One engine cited, two returned nothing.
Total: 12/100. That is invisible, per our own anchor table.
The page 9 fix sequence picked "entity alignment" as the single bottleneck. Four files changed over one afternoon: lib/schema-org.tsx (address to HCMC), public/llms-full.txt ("50+ projects" and "AI-first"), and both locales of messages/[en|vi].json (trustLine copy). Total diff: 43 lines. No new content.
At Day 55 re-score, cite presence had moved from 2 to 10 (Perplexity now returns a clean cite, ChatGPT returns brand mention on brand-cite query), entity alignment from 0 to 16, and total score from 12 to 42. Legible, not defensible yet, but the delta was entirely from signal alignment — not from the seven blogs shipped in the same window.
The point of that walkthrough is not the score itself. The point is that we would not have known which of the five sub-scores to fix without running page 8. Every founder we have handed the workbook to has flagged a different bottleneck once they scored honestly. The workbook does not tell you what to do; it tells you what your bottleneck is, and page 9 tells you what to do about that one.
#When self-scoring is not enough
Two cases where a self-audit will underread the problem.
First, if your category is highly consolidated and the top 3 listicle domains all charge for placement, page 9 fix will identify the surface but not the path onto it. That is a paid distribution question, not a workbook question.
Second, if you sell into a language market the engines do not cite fluently — this hits SEA-focused brands especially — the 9-query sweep may return generic answers that do not mention any regional brand. In that case, the workbook needs to run twice, once in English, once in the target language, and the score reconciled. The V2 workbook handles this; the current 12-page version does not.
For everything else — DTC in English-primary markets, direct-reply brands with a category, wholesale brands with a comparable-set — 60 minutes is enough to know where you sit.
#Sources
- Ahrefs, Best Lists Research, December 2025 — ahrefs.com/blog/best-lists-research/ — 43.8% listicle citation share across 26,283 ChatGPT source URLs, methodology anchor for pages 4 and 6.
- LUMA-E internal audit log Day 25 → Day 55, documented in the 0-to-2 citations 30-day log — full walkthrough of the four-file entity-signal fix used as the page-8 scoring example above.
- LUMA-E 5-schema audit method, documented in the 5 schema types ChatGPT reads blog — anchor for page 4.
- LUMA-E listicle-gap method, documented in the listicle gap blog — anchor for pages 6 and 7.
The downloadable 12-page PDF ships when our newsletter form goes live. Until then, the method above is complete on its own — run pages 1 through 12 in order, score honestly, and pick the single bottleneck fix. That is the same output the paid version produces, minus the branded cover.