Implementation guide
Three paths to publishing your entitymap.json — from generator to hand-written file.
The reference implementation extracts entities, selects evidence chunks, and generates both files automatically.
Go to waikay.io/entitymap and connect your domain. The generator crawls your content, runs entity extraction, and identifies candidate chunks.
Review the extracted entities and evidence. Adjust descriptions, prune low-relevance chunks, and add any missing relations. Export the two files.
Upload entitymap.json and entitymap.html to the root of your domain. Add the discovery hints below.
A well-scoped site with 10–30 entities can be hand-written in a few hours. This is also the best way to deeply understand the spec before building a generator.
Start with the concepts, products, people, or places your site most authoritatively covers. Aim for 10–50 entities to begin. Don't try to be exhaustive — prioritise depth over breadth.
For each entity you need: a stable ID, a name, a @type, a description in your own terms, and at least one chunk.
{
"version": "0.2",
"schema": "https://entitymap.org/spec/v0.2",
"publisher": {
"name": "Your Site Name",
"url": "https://yoursite.com",
"sameAs": "https://www.wikidata.org/wiki/Q..."
},
"generated": "2026-03-27T00:00:00Z",
"entities": []
}
The publisher.name value must match the publisher field on every chunk exactly. This is the primary attribution mechanism — treat it as a constant.
For each entity, follow this structure:
{
"entityId": "e_001",
"@type": "DefinedTerm",
"name": "Your Entity Name",
"alternateName": "Abbreviation if any",
"description": "1–3 sentences defining this concept
as your site uses it. Be specific to your context.",
"sameAs": "https://www.wikidata.org/wiki/Q...",
"relations": [],
"hasChunks": []
}
| Field | Req | Notes |
|---|---|---|
entityId | MUST | Stable. Use a simple prefix: e_001, e_002. Never reuse a retired ID. |
@type | MUST | From schema.org. Most concepts use DefinedTerm. See §4.5 for full list. |
name | MUST | Your site's label — not a generic Wikipedia title. |
description | MUST | Your definition. SHOULD be extractive from your own content. |
sameAs | SHOULD | Wikidata URI if one exists. Links your entity to the open knowledge graph. |
alternateName | MAY | Abbreviation or common variant. Key for disambiguation. |
Each entity needs 1–5 evidence chunks. Select your best, most specific passages — not introductory sentences. Each chunk must carry the publisher name.
"hasChunks": [
{
"chunkId": "c_001",
"text": "A 1–5 sentence passage from your content.
Max 500 characters. Extractive preferred.",
"sourceUrl": "https://yoursite.com/page-url",
"pageTitle": "Title of the source page",
"publisher": "Your Site Name",
"retrieved": "2026-03-27T09:00:00Z",
"relevanceScore": 0.92
}
]
publisher field on every chunk MUST exactly match publisher.name in the root object. It is the field that carries your brand attribution through downstream AI aggregation.Relations are optional but valuable. Even a sparse relation graph significantly improves how AI systems traverse your knowledge. Use predicates from the standard vocabulary.
"relations": [
{
"predicate": "ENABLES",
"targetId": "e_004",
"targetName": "AI Share of Voice"
},
{
"predicate": "INSTANCE_OF",
"targetUri": "https://www.wikidata.org/wiki/Q1163385",
"targetName": "Herfindahl-Hirschman Index"
}
]
For internal targets (entities in your own EntityMap), use targetId. For external concepts, use targetUri pointing to Wikidata or schema.org. targetName is required in both cases.
The entitymap.html file is a crawlable, human-readable rendering of the same data. It MUST NOT be maintained separately from the JSON. Use the Waikay viewer to generate it from your JSON, or build a generator following §6 of the spec.
The HTML file must embed per-entity JSON-LD, render relations as internal hyperlinks, and carry data-publisher attributes on every chunk blockquote.
https://yourdomain.com/entitymap.json https://yourdomain.com/entitymap.html
Both files must be publicly accessible without authentication. No subdirectory paths.
# EntityMap EntityMap: https://yourdomain.com/entitymap.json
<link rel="entitymap" type="application/json"
href="https://yourdomain.com/entitymap.json" />
List entitymap.html in your sitemap with priority: 0.9 and changefreq: weekly. This signals freshness to crawlers and surfaces the file to AI systems that follow sitemaps.
<url> <loc>https://yourdomain.com/entitymap.html</loc> <priority>0.9</priority> <changefreq>weekly</changefreq> </url>
Add a visible hyperlink to entitymap.html in the footer of your home page — or better, in a sitewide footer so it appears on every page. This is the most reliable discovery mechanism available today, because every AI crawler that follows HTML links will find it without requiring any special standard support.
<footer> <a href="https://yourdomain.com/entitymap.html">EntityMap</a> </footer>
Use entitymap.html as the link target — not the JSON file. The HTML version is designed for crawlers: it renders entity definitions, relations, and attribution as readable text with embedded JSON-LD, so any system that fetches and parses HTML will extract structured, publisher-attributed content.
For the JSON file, use the machine-readable <link> tag in the <head> (step 3 above) rather than a visible footer link.
version, schema, publisher, generated, entities present at rootentityId, @type, name, description, and at least one chunkpublisher field on every chunk matches publisher.name exactlytargetId values resolve to a valid entityId in this fileentitymap.html does not carry a noindex directiverobots.txt, HTML <head>, and sitemap.xmlPublisher name mismatch. The most frequent error. "publisher": "Acme Corp" in the root and "publisher": "Acme" on a chunk will fail validation. Copy-paste the value, don't retype it.
Stale generated timestamp. The generated field must update on every rebuild. A file with a timestamp months old signals to consumers that the EntityMap is unmaintained.
Too many entities, too few chunks. It is better to have 15 well-evidenced entities than 80 with one weak chunk each. Depth signals authority. Breadth without evidence does not.
Generic descriptions. The description field should define the concept as your site uses it — not a Wikipedia summary. "AI Share of Voice is a metric that measures the proportion of AI-generated answers in which a brand appears" is specific. "Share of voice is a marketing metric" is not.
Maintaining JSON and HTML separately. The HTML file must be generated from the JSON. If you edit one manually, they will diverge. Treat the JSON as the source of truth at all times.