Implementation guide

How to implement EntityMap

Three paths to publishing your entitymap.json — from generator to hand-written file.

Option A — Generator

Use the Waikay generator to produce a conforming entitymap.json and entitymap.html automatically from your existing site content.

Option B — Hand-write

Write your entitymap.json directly from the spec and minimal example. Practical for small sites with a defined entity set.

Option C — Build your own generator

Implement a generator for your platform against the spec and JSON schema. Register it in the implementations registry when ready.


Option A — Waikay generator

The reference implementation extracts entities, selects evidence chunks, and generates both files automatically.

1.

Connect your site

Go to waikay.io/entitymap and connect your domain. The generator crawls your content, runs entity extraction, and identifies candidate chunks.

2.

Review and publish

Review the extracted entities and evidence. Adjust descriptions, prune low-relevance chunks, and add any missing relations. Export the two files.

3.

Deploy to your root

Upload entitymap.json and entitymap.html to the root of your domain. Add the discovery hints below.


Option B — Hand-write your file

A well-scoped site with 10–30 entities can be hand-written in a few hours. This is also the best way to deeply understand the spec before building a generator.

1.

List your entities

Start with the concepts, products, people, or places your site most authoritatively covers. Aim for 10–50 entities to begin. Don't try to be exhaustive — prioritise depth over breadth.

For each entity you need: a stable ID, a name, a @type, a description in your own terms, and at least one chunk.

2.

Write the root object

{
  "version": "0.2",
  "schema": "https://entitymap.org/spec/v0.2",
  "publisher": {
    "name": "Your Site Name",
    "url": "https://yoursite.com",
    "sameAs": "https://www.wikidata.org/wiki/Q..."
  },
  "generated": "2026-03-27T00:00:00Z",
  "entities": []
}

The publisher.name value must match the publisher field on every chunk exactly. This is the primary attribution mechanism — treat it as a constant.

3.

Add entity objects

For each entity, follow this structure:

{
  "entityId": "e_001",
  "@type": "DefinedTerm",
  "name": "Your Entity Name",
  "alternateName": "Abbreviation if any",
  "description": "1–3 sentences defining this concept
    as your site uses it. Be specific to your context.",
  "sameAs": "https://www.wikidata.org/wiki/Q...",
  "relations": [],
  "hasChunks": []
}
FieldReqNotes
entityIdMUSTStable. Use a simple prefix: e_001, e_002. Never reuse a retired ID.
@typeMUSTFrom schema.org. Most concepts use DefinedTerm. See §4.5 for full list.
nameMUSTYour site's label — not a generic Wikipedia title.
descriptionMUSTYour definition. SHOULD be extractive from your own content.
sameAsSHOULDWikidata URI if one exists. Links your entity to the open knowledge graph.
alternateNameMAYAbbreviation or common variant. Key for disambiguation.
4.

Add chunks

Each entity needs 1–5 evidence chunks. Select your best, most specific passages — not introductory sentences. Each chunk must carry the publisher name.

"hasChunks": [
  {
    "chunkId": "c_001",
    "text": "A 1–5 sentence passage from your content.
      Max 500 characters. Extractive preferred.",
    "sourceUrl": "https://yoursite.com/page-url",
    "pageTitle": "Title of the source page",
    "publisher": "Your Site Name",
    "retrieved": "2026-03-27T09:00:00Z",
    "relevanceScore": 0.92
  }
]
The publisher field on every chunk MUST exactly match publisher.name in the root object. It is the field that carries your brand attribution through downstream AI aggregation.
5.

Add relations

Relations are optional but valuable. Even a sparse relation graph significantly improves how AI systems traverse your knowledge. Use predicates from the standard vocabulary.

"relations": [
  {
    "predicate": "ENABLES",
    "targetId": "e_004",
    "targetName": "AI Share of Voice"
  },
  {
    "predicate": "INSTANCE_OF",
    "targetUri": "https://www.wikidata.org/wiki/Q1163385",
    "targetName": "Herfindahl-Hirschman Index"
  }
]

For internal targets (entities in your own EntityMap), use targetId. For external concepts, use targetUri pointing to Wikidata or schema.org. targetName is required in both cases.

6.

Generate the HTML companion

The entitymap.html file is a crawlable, human-readable rendering of the same data. It MUST NOT be maintained separately from the JSON. Use the Waikay viewer to generate it from your JSON, or build a generator following §6 of the spec.

The HTML file must embed per-entity JSON-LD, render relations as internal hyperlinks, and carry data-publisher attributes on every chunk blockquote.


Deploy and discover

1.

Serve both files at root

https://yourdomain.com/entitymap.json
https://yourdomain.com/entitymap.html

Both files must be publicly accessible without authentication. No subdirectory paths.

2.

Add robots.txt hint

# EntityMap
EntityMap: https://yourdomain.com/entitymap.json
3.

Add HTML head link to every page

<link rel="entitymap" type="application/json"
      href="https://yourdomain.com/entitymap.json" />
4.

Add entitymap.html to sitemap.xml

List entitymap.html in your sitemap with priority: 0.9 and changefreq: weekly. This signals freshness to crawlers and surfaces the file to AI systems that follow sitemaps.

<url>
  <loc>https://yourdomain.com/entitymap.html</loc>
  <priority>0.9</priority>
  <changefreq>weekly</changefreq>
</url>
5.

Link to entitymap.html from your site footer

Add a visible hyperlink to entitymap.html in the footer of your home page — or better, in a sitewide footer so it appears on every page. This is the most reliable discovery mechanism available today, because every AI crawler that follows HTML links will find it without requiring any special standard support.

<footer>
  <a href="https://yourdomain.com/entitymap.html">EntityMap</a>
</footer>

Use entitymap.html as the link target — not the JSON file. The HTML version is designed for crawlers: it renders entity definitions, relations, and attribution as readable text with embedded JSON-LD, so any system that fetches and parses HTML will extract structured, publisher-attributed content.

For the JSON file, use the machine-readable <link> tag in the <head> (step 3 above) rather than a visible footer link.


Pre-publish checklist

Valid JSON — parseable without error
version, schema, publisher, generated, entities present at root
Every entity has entityId, @type, name, description, and at least one chunk
publisher field on every chunk matches publisher.name exactly
No entity has more than 5 chunks
All internal targetId values resolve to a valid entityId in this file
All predicates are from the standard vocabulary or a declared custom vocabulary
Both files accessible at root without authentication
entitymap.html does not carry a noindex directive
Discovery hints added to robots.txt, HTML <head>, and sitemap.xml

Common mistakes

Publisher name mismatch. The most frequent error. "publisher": "Acme Corp" in the root and "publisher": "Acme" on a chunk will fail validation. Copy-paste the value, don't retype it.

Stale generated timestamp. The generated field must update on every rebuild. A file with a timestamp months old signals to consumers that the EntityMap is unmaintained.

Too many entities, too few chunks. It is better to have 15 well-evidenced entities than 80 with one weak chunk each. Depth signals authority. Breadth without evidence does not.

Generic descriptions. The description field should define the concept as your site uses it — not a Wikipedia summary. "AI Share of Voice is a metric that measures the proportion of AI-generated answers in which a brand appears" is specific. "Share of voice is a marketing metric" is not.

Maintaining JSON and HTML separately. The HTML file must be generated from the JSON. If you edit one manually, they will diverge. Treat the JSON as the source of truth at all times.