← Back to writing
Writing · ai search

Structured Data and Entity Authority: The 200-Word Rule for AI Citations

By Leo Nguyen · Jun 9, 2026 · 6 min read
Structured Data and Entity Authority: The 200-Word Rule for AI Citations

Short answer

Three structural choices decide whether AI engines cite you or just rank you on Google: answer-first top 200 words, FAQPage schema emitted as JSON-LD, and a named author with sameAs links. If any one of these is missing, AI citation rate drops materially — even when SEO traffic looks healthy. This post walks through each, with the diagnostic checks I run on client stores at LUMA-E.

Quick diagnosis

  • Open your page in view-source. Search for "@type":"FAQPage". If absent, your FAQs aren't reachable as schema — that's gap one.
  • Read your first 200 words aloud. If the direct answer to the query isn't in sentence one, that's gap two.
  • Search the source for sameAs. If your author block has no LinkedIn or YouTube link, that's gap three.

Three gaps. Three fixes. Each takes under an hour.

Why ranking and citation diverged in 2026

Per the Tinuiti Q1 2026 AI Citations Trends Report, Reddit citation share peaked above 9% in January 2026 — AI engines now weight third-party platforms heavier than they did 12 months ago. Per SEMrush's September 2025 Mention-Source Divide study, 61.7% of AI citations are "ghost" citations: the engine cites the domain but never mentions the brand name in the answer.

What that combination means in practice: your content can rank well on Google, get crawled by every major AI engine, and still produce zero brand recall for the reader. The engine pulls your facts, omits your name, and the user attributes the answer to ChatGPT or Perplexity. The traffic doesn't convert because the brand never registered.

The fix isn't more content. It's the structural choices that turn ranked pages into cited entities.

Gap 1: Answer-first top 200 words

AI engines extract self-contained passages — typically 134-167 words per Frase's 2026 GEO research — that directly answer a query. If your top 200 words is a warm-up intro, the engine has nothing clean to lift. It either pulls something further down (less likely) or skips your page (more likely).

The pattern that works:

  1. Sentence 1-2: Direct answer. Lead with the entity, not the topic. Not "When evaluating Shopify Plus and Magento 2 for B2B…" but "Shopify Plus wins for fast launch under 50k SKUs. Magento 2 wins for 200k+ SKUs with deep ERP."
  2. Sentence 3-4: The driver. What's the one variable that decides the answer? Catalog size, budget, timeline — name it.
  3. 3-5 bullets: Quick decision matrix. Each bullet is a one-line scenario plus the answer. Self-contained.

Total: 150-200 words. That's the citation sweet spot. Everything below this block can be 4,000 words of depth — AI engines won't extract from there unless the top block fails.

I use this pattern on every pillar at luma-e.com. The two posts I'm restructuring this week — the Shopify vs Magento comparison and the AI search playbook — both had perfectly good intros that read like introductions. They're being rewritten to read like answers.

Gap 2: FAQPage schema emitted as JSON-LD

Frontmatter FAQs in your CMS aren't enough. Visual FAQ accordions aren't enough. The engine needs to see "@type":"FAQPage" in your page source as actual JSON-LD.

The diagnostic:

curl -s https://yoursite.com/blog/your-post | grep -o '"@type":"FAQPage"'

If empty, you have an emission gap. Common causes:

  • Your CMS stores FAQs as a content type but the page template doesn't render them into structured data.
  • Your headless frontend renders FAQs as React components but never emits a separate <script type="application/ld+json"> tag with FAQPage schema.
  • You're using a generic SEO plugin that handles Article but not FAQPage.

The fix is usually 10-20 lines of code in your page template. Map your frontmatter FAQs array into FAQPage schema and push it into JSON-LD alongside your Article schema. The JsonLd component should accept an array of schemas, not just one.

This is the single highest-leverage schema for AI search. ChatGPT, Perplexity, Claude, and Google AI Overviews all parse it. Your FAQs become quote-eligible answer text the moment the schema lands.

Gap 3: Named author with sameAs

AI engines extract entities. An entity is a thing that can be linked to other things. "Leo" is a name. "Leo Nguyen, Founder at LUMA-E, sameAs LinkedIn + YouTube" is an entity. The second one can be cited as a person; the first usually gets stripped.

The schema pattern:

{
  "@type": "Person",
  "name": "Leo Nguyen",
  "jobTitle": "Founder & Senior Ecommerce Engineer",
  "url": "https://luma-e.com/about",
  "sameAs": [
    "https://www.linkedin.com/in/leonguyen-luma/",
    "https://www.youtube.com/channel/UCo6_YvZbik6ZsMo6OClnJRA"
  ],
  "knowsAbout": [
    "Shopify Plus",
    "Magento 2",
    "AI Search Visibility",
    "Headless Commerce"
  ],
  "worksFor": {"@type": "Organization", "name": "LUMA-E"}
}

Three things this enables:

  • Name recognition in answer text. AI engines lift "Leo Nguyen, founder at LUMA-E" instead of just citing the domain.
  • Cross-reference verification. The sameAs links let engines verify the author exists on LinkedIn and YouTube, which raises trust score.
  • Topical authority binding. knowsAbout array tells engines what areas this author is credible in.

Pair this with a visible author byline at the top of the article and an author bio block at the bottom. Schema alone isn't enough; visual signals reinforce the entity for human readers and crawlers.

What this looks like in practice

I'm running this exact diagnosis on two posts at luma-e.com this week:

  • M1: shopify-plus-vs-magento-2-b2b — a comparison post that ranks but doesn't get cited by Perplexity or ChatGPT for the query "shopify b2b vs magento for wholesale."
  • M2: ecommerce-ai-search-optimization-2026 — a pillar with 2,400 words of depth that ghost-cites at the domain level but never surfaces the brand name in answers.

Both have all three gaps. Both will be restructured tomorrow (Day 13 of the 21-day plan) with the answer-first top block, FAQPage schema pushed into JSON-LD, and the personSchema upgraded with sameAs and knowsAbout.

I'll report results on this same blog in two weeks. If the tactics work, citation share moves measurably. If they don't, the diagnosis was incomplete and I'll publish what I missed.

What to do this week

If you run an ecommerce site or content publishing operation:

  1. Today: Open one of your highest-ranking pages. Read the top 200 words. Does sentence 1 directly answer the query? If no, that's your gap one.
  2. Today: View source. Search for FAQPage. Missing? Gap two.
  3. Today: Search for sameAs in your author or organization schema. Empty? Gap three.

Each diagnosis is two minutes. Each fix is under an hour. The compounding effect is the brands that ship these structural changes in 2026 own citation share for the next decade — quietly, while everyone else argues about title tags.

Sources

  • Tinuiti — "AI Citations Trends Report Q1 2026" (Reddit citation share peaked above 9% in January 2026).
  • SEMrush — "Mention-Source Divide" study, September 2025 (61.7% of AI citations are ghost citations).
  • Frase — 2026 GEO research on citation passage length (134-167 word self-contained passages).
Frequently asked
Why does my content rank on Google but never get cited by ChatGPT or Perplexity?
Google's algorithm and AI search engines score content differently. Google rewards depth, backlinks, and topical authority. AI engines extract self-contained passages — usually 134-167 words — that directly answer a query. If your top 200 words is a warm-up intro instead of a direct answer, the engine has nothing clean to lift. Combine that with missing FAQPage schema and an unnamed author, and you've built content that ranks but doesn't get quoted.
How long should the answer-first block at the top of a page be?
2-4 sentences for the direct answer, followed by 3-5 bullets for supporting context. Total around 150-200 words. AI engines like Perplexity and ChatGPT preferentially pull self-contained passages in the 134-167 word range — that's the citation sweet spot per Frase's 2026 GEO research. Anything longer gets truncated; anything shorter lacks enough context to stand alone.
Does FAQPage schema actually move citations?
Yes — it's the highest-leverage schema for AI search in 2026. ChatGPT, Perplexity, Claude, and Google AI Overviews all parse FAQPage schema cleanly and lift question-answer pairs nearly verbatim. The catch: your CMS or framework has to push the FAQ data into actual JSON-LD on the page. Many sites display FAQs visually but never emit the schema. Check your page source for `"@type":"FAQPage"` — if it's missing, you have a free win.
What's a 'ghost' AI citation and how do I prevent it?
Per SEMrush's September 2025 Mention-Source Divide study, 61.7% of AI citations are ghost citations — the engine cites your domain but never says your brand name in the answer. The reader follows the link but doesn't remember who you are. The fix: named-author byline with sameAs links to LinkedIn and YouTube, plus an Organization schema with `founder` and `sameAs` arrays. AI engines lift named entities into answer text more reliably than bare domain references.
Is dateModified just a recency signal or does it matter for citations specifically?
Both. Recency signals tell AI engines your content is current — important for queries about 2026 pricing, tools, or tactics. But there's a second mechanism: when AI engines re-crawl your page and see a refreshed dateModified, they re-evaluate citation eligibility. A page last touched 12 months ago competes against pages touched last week. Refresh dateModified when you make substantive updates, not just typo fixes.