Implementation guide

How to implement EntityMap

Three paths to publishing your entitymap.json — from generator to hand-written file.

Option A — Write manually

Write your entitymap.json directly from the spec and minimal example. Claude can help you draft entities, chunks, and relations from your existing content.

Option B — Build your own generator

Implement a generator for your platform against the spec and JSON schema on GitHub. Register it in the implementations registry when ready.

Option C — Waikay generator

The reference implementation generates a conforming entitymap.json and entitymap.html automatically from your site. Requires creating a free account and project at waikay.io/entitymap.


Option A — Write manually

A well-scoped site with 10–30 entities can be hand-written in a few hours. Use the GitHub repo for the JSON schema and minimal example. Claude can help you draft entities, descriptions, chunks, and relations directly from your content — paste a page and ask it to produce conforming entity objects.

1.

List your entities

Start with the concepts, products, people, or places your site most authoritatively covers. Aim for 10–50 entities to begin. Don't try to be exhaustive — prioritise depth over breadth.

For each entity you need: a stable ID, a name, a @type, a description in your own terms, and at least one chunk.

2.

Write the root object

{
  "version": "1.0",
  "schema": "https://entitymap.org/spec/v1.0",
  "publisher": {
    "name": "Your Site Name",
    "url": "https://yoursite.com",
    "sameAs": "https://www.wikidata.org/wiki/Q..."
  },
  "generated": "2026-04-07T00:00:00Z",
  "entities": []
}

The publisher.name value must match the publisher field on every chunk exactly. This is the primary attribution mechanism — treat it as a constant.

3.

Add entity objects

For each entity, follow this structure:

{
  "entityId": "e_001",
  "@type": "Concept",
  "name": "Your Entity Name",
  "alternateName": "Abbreviation if any",
  "description": "1–3 sentences defining this concept
    as your site uses it. Be specific to your context.",
  "sameAs": "https://www.wikidata.org/wiki/Q...",
  "relations": [],
  "hasChunks": []
}
FieldReqNotes
entityIdMUSTStable. Use a simple prefix: e_001, e_002. Never reuse a retired ID.
@typeMUSTv1.0 core type. General concepts use Concept; publisher-coined terms use ProprietaryTerm. See §4 for the full list.
nameMUSTYour site's label — not a generic Wikipedia title.
descriptionMUSTYour definition. SHOULD be extractive from your own content.
sameAsSHOULDWikidata URI if one exists. Links your entity to the open knowledge graph.
alternateNameMAYAbbreviation or common variant. Key for disambiguation.
4.

Add chunks

Each entity needs 1–5 evidence chunks. Select your best, most specific passages — not introductory sentences. Each chunk must carry the publisher name.

"hasChunks": [
  {
    "chunkId": "c_001",
    "text": "A 1–5 sentence passage from your content.
      Max 600 characters. Extractive preferred.",
    "sourceUrl": "https://yoursite.com/page-url",
    "pageTitle": "Title of the source page",
    "publisher": "Your Site Name",
    "retrieved": "2026-03-27T09:00:00Z",
    "relevanceScore": 0.92
  }
]
The publisher field on every chunk MUST exactly match publisher.name in the root object. It is the field that carries your brand attribution through downstream AI aggregation.
5.

Add relations

Relations are optional but valuable. Even a sparse relation graph significantly improves how AI systems traverse your knowledge. Use predicates from the standard vocabulary.

"relations": [
  {
    "predicate": "ENABLES",
    "targetId": "e_004",
    "targetName": "AI Share of Voice"
  },
  {
    "predicate": "INSTANCE_OF",
    "targetUri": "https://www.wikidata.org/wiki/Q1163385",
    "targetName": "Herfindahl-Hirschman Index"
  }
]

For internal targets (entities in your own EntityMap), use targetId. For external concepts, use targetUri pointing to Wikidata or schema.org. targetName is required in both cases.

6.

Generate the HTML companion

The entitymap.html file is a crawlable, human-readable rendering of the same data. It MUST NOT be maintained separately from the JSON. Use the Waikay viewer to generate it from your JSON, or build a generator following §6 of the spec.

The HTML file must embed per-entity JSON-LD, render relations as internal hyperlinks, and carry data-publisher attributes on every chunk blockquote.


Option C — Waikay generator

The reference implementation extracts entities, selects evidence chunks, and generates both files automatically. Requires creating a free account and project at waikay.io/entitymap.

1.

Create an account and project

Go to waikay.io/entitymap, create a free account, and set up a project for your domain.

2.

Connect your site

The generator crawls your content, runs entity extraction, and identifies candidate chunks automatically.

3.

Review and export

Review the extracted entities and evidence. Adjust descriptions, prune low-relevance chunks, and add any missing relations. Export both files — then deploy and add discovery hints as below.

Generator output MUST be published with verificationStatus: "generator-draft" unless you have reviewed and approved every entity and relation. The confidence: "declared" designation and the ProprietaryTerm type require explicit human review.

Deploy and discover

1.

Serve both files at root

https://yourdomain.com/entitymap.json
https://yourdomain.com/entitymap.html

Both files must be publicly accessible without authentication. No subdirectory paths.

2.

Add robots.txt hint

# EntityMap
EntityMap: https://yourdomain.com/entitymap.json
3.

Add HTML head link to every page

<link rel="entitymap" type="application/json"
      href="https://yourdomain.com/entitymap.json" />
4.

Add entitymap.html to sitemap.xml

List entitymap.html in your sitemap with priority: 0.9 and changefreq: weekly. This signals freshness to crawlers and surfaces the file to AI systems that follow sitemaps.

<url>
  <loc>https://yourdomain.com/entitymap.html</loc>
  <priority>0.9</priority>
  <changefreq>weekly</changefreq>
</url>
5.

Link to entitymap.html from your site footer

Add a visible hyperlink to entitymap.html in the footer of your home page — or better, in a sitewide footer so it appears on every page. This is the most reliable discovery mechanism available today, because every AI crawler that follows HTML links will find it without requiring any special standard support.

<footer>
  <a href="https://yourdomain.com/entitymap.html">EntityMap</a>
</footer>

Use entitymap.html as the link target — not the JSON file. The HTML version is designed for crawlers: it renders entity definitions, relations, and attribution as readable text with embedded JSON-LD, so any system that fetches and parses HTML will extract structured, publisher-attributed content.

For the JSON file, use the machine-readable <link> tag in the <head> (step 3 above) rather than a visible footer link.


Pre-publish checklist

Valid JSON — parseable without error
version, schema, publisher, generated, entities present at root
Every entity has entityId, @type, name, description, and at least one chunk
publisher field on every chunk matches publisher.name exactly
No entity has more than 5 chunks
All internal targetId values resolve to a valid entityId in this file
All predicates are from the standard vocabulary or a declared custom vocabulary
Tier 3 predicates (IMPROVES, DEGRADES, LEADS_TO, SUITED_FOR, TARGETS, ACHIEVES) have a confidence field
Both files accessible at root without authentication
entitymap.html does not carry a noindex directive
Discovery hints added to robots.txt, HTML <head>, and sitemap.xml

Common mistakes

Publisher name mismatch. The most frequent error. "publisher": "Acme Corp" in the root and "publisher": "Acme" on a chunk will fail validation. Copy-paste the value, don't retype it.

Stale generated timestamp. The generated field must update on every rebuild. A file with a timestamp months old signals to consumers that the EntityMap is unmaintained.

Too many entities, too few chunks. It is better to have 15 well-evidenced entities than 80 with one weak chunk each. Depth signals authority. Breadth without evidence does not.

Generic descriptions. The description field should define the concept as your site uses it — not a Wikipedia summary. "AI Share of Voice is a metric that measures the proportion of AI-generated answers in which a brand appears" is specific. "Share of voice is a marketing metric" is not.

Maintaining JSON and HTML separately. The HTML file must be generated from the JSON. If you edit one manually, they will diverge. Treat the JSON as the source of truth at all times.