Open standard · Technical specification
A structured, entity-first index of website content for AI agent and LLM consumption
sitemap.xml tells crawlers what pages exist, entitymap.json aims to do for AI agents what the Sitemaps protocol did for search crawlers: provide a predictable, machine-readable discovery layer for site knowledge.Throughout this document:
entitymap.json or entitymap.html that satisfies all MUST requirements in this specification.Large language models build knowledge from training data. At inference time, retrieval-augmented systems supplement that knowledge by fetching relevant content from the web. Current retrieval mechanisms operate at the page level - fetching HTML documents and extracting text without structured awareness of the entities, concepts, or relationships those pages contain.
This creates three problems:
Disambiguation loss. The same concept may appear under many surface forms across a site ("AI SOV", "AI Share of Voice", "artificial intelligence share of voice"). A page-level retriever treats these as separate signals rather than a single entity.
Attribution loss. When retrieved content is incorporated into an LLM response, the publisher that produced it may not be credited. A URL may be used as a source while the publisher's name never appears in the answer - the "ghost citation" problem.
Reasoning loss. Relationships between concepts are buried in prose. A model must reconstruct that "AI Topical Presence explains AI Share of Voice" from unstructured text rather than reading it as an explicit, typed relation.
EntityMap allows publishers to pre-solve these problems by publishing a structured index alongside their existing content. The index aggregates evidence by entity rather than by page, carries explicit publisher attribution on every evidence chunk, and encodes relationships between entities using a controlled predicate vocabulary.
A well-formed EntityMap enables consumers to:
EntityMap uses an application-specific JSON structure rather than strict JSON-LD. This choice prioritises implementation simplicity and broad compatibility. The HTML companion file (entitymap.html) exposes equivalent schema.org / JSON-LD representations per entity for broader interoperability with existing structured data consumers. Publishers who require full JSON-LD compliance may embed the EntityMap vocabulary within a JSON-LD context document; this specification does not preclude that approach.
An EntityMap consists of two files served at predictable URLs:
| File | URL pattern | Purpose |
|---|---|---|
entitymap.json | https://example.com/entitymap.json | Machine-readable primary file |
entitymap.html | https://example.com/entitymap.html | Crawler and human readable rendered view |
Both files MUST be served from the root of the domain or subdomain they describe, without authentication.
Publishers MAY expose EntityMap discovery hints using one or more of the following mechanisms. Consumers MAY support one or more of these mechanisms. None of these mechanisms are currently recognized standards - they are publisher-side conventions proposed by this specification for future adoption.
# EntityMap
EntityMap: https://example.com/entitymap.json
<link rel="entitymap" type="application/json"
href="https://example.com/entitymap.json" />
Publishers MAY list entitymap.html as an additional URL entry with high priority and changefreq: weekly to signal freshness.
entitymap.html MUST NOT carry a noindex directive. It is the primary discovery surface for AI crawlers that consume HTML. Publishers MAY use a rel="canonical" to manage search engine indexing according to their SEO strategy; self-canonicalization is the recommended default.
For sites with more than 200 entities, the EntityMap SHOULD be sharded. The root entitymap.json becomes a manifest pointing to typed shard files:
/entitymap.json ← manifest only
/entitymap/concepts.json
/entitymap/people.json
/entitymap/products.json
/entitymap/places.json
The manifest MUST list all shards with their entity counts and lastModified timestamps. A topEntities array of up to 20 entries MAY be included to provide consumers with a fast entry point to the most important entities.
{
"version": "0.2",
"schema": "https://entitymap.org/spec/v0.2",
"publisher": { ... },
"generated": "2026-03-27T00:00:00Z",
"vocabulary": { ... },
"entities": [ ... ]
}
| Field | Type | Conformance | Description |
|---|---|---|---|
version | string | MUST | Spec version this file conforms to |
schema | string | MUST | URI of the EntityMap spec used |
publisher | object | MUST | Identity of the site publisher. See §4.2 |
generated | string (ISO 8601) | MUST | Timestamp of last full generation |
vocabulary | object | MAY | Custom predicate declarations. If omitted, standard vocabulary applies |
entities | array | MUST | List of entity objects. See §4.3 |
{
"name": "Acme Corp",
"url": "https://acme.com",
"sameAs": "https://www.wikidata.org/wiki/Q..."
}
| Field | Type | Conformance | Description |
|---|---|---|---|
name | string | MUST | Human-readable publisher name. Must match publisher field on all chunks. |
url | string | MUST | Canonical URL of the publisher |
sameAs | string | MAY | Wikidata or schema.org URI anchoring publisher to open knowledge graph |
{
"entityId": "e_001",
"topicID": 456,
"@type": "DefinedTerm",
"name": "AI Share of Voice",
"alternateName": "AI SOV",
"canonicalLabel": "share of voice",
"description": "A metric measuring...",
"sameAs": "https://www.wikidata.org/wiki/Q...",
"relations": [ ... ],
"hasChunks": [ ... ]
}
| Field | Type | Conformance | Description |
|---|---|---|---|
entityId | string | MUST | Stable unique identifier within this EntityMap. Used as the reference target in relations. |
topicID | integer | MAY | Proprietary entity resolution ID. Implementation-specific; omit without affecting conformance. |
@type | string | MUST | Schema.org type. See §4.5 |
name | string | MUST | Publisher-specific label as used in this site's content (e.g. "AI Share of Voice") |
alternateName | string | MAY | Abbreviation or common alternative surface form |
canonicalLabel | string | MAY | General concept label from entity resolver (e.g. "share of voice"). Distinct from publisher-specific name. |
description | string | MUST | 1–3 sentence definition as this publisher uses the concept. SHOULD be extractive or minimally normalised from source content. |
sameAs | string | SHOULD | Wikidata or schema.org URI anchoring entity to open knowledge graph |
relations | array | MAY | Typed relationships to other entities. See §4.4 |
hasChunks | array | MUST | Evidence chunks. 1–5 per entity. See §4.6 |
Relations are directional: the subject is the entity containing the relation array, the object is the target. Three target patterns are supported depending on whether the target is internal, external, or both:
{
"predicate": "ENABLES",
"targetId": "e_012",
"targetName": "AI Topical Presence"
}
{
"predicate": "INSTANCE_OF",
"targetUri": "https://www.wikidata.org/wiki/Q1163385",
"targetName": "Herfindahl-Hirschman Index"
}
| Field | Type | Conformance | Description |
|---|---|---|---|
predicate | string | MUST | From standard vocabulary (§5) or declared custom vocabulary (§5.3) |
targetId | string | SHOULD | The entityId of the target entity. Required for internal relations. |
targetName | string | MUST | Human-readable name of the target entity. Required in all cases for readability. |
targetUri | string | MAY | URI for external entities (Wikidata, schema.org, etc.) |
targetId MUST match a valid entityId within the same EntityMap or shard set. targetName MUST be present in all cases - it is the human-readable anchor that survives aggregation and redistribution.Permitted @type values are drawn from schema.org:
| Type | Use for |
|---|---|
DefinedTerm | Concepts, metrics, methodologies, standards |
Person | Named individuals |
Organization | Companies, institutions, bodies |
Product | Named products or services |
Place | Locations, regions, venues |
Event | Named events, conferences, occurrences |
ScholarlyArticle | Research, studies, reports |
CreativeWork | Books, guides, courses |
Regulation | Laws, policies, standards bodies |
Publishers MAY use additional schema.org types. Non-schema.org types MUST be prefixed with a declared namespace (e.g. waikay:MetricComponent).
{
"chunkId": "c_001",
"text": "AI Share of Voice measures...",
"sourceUrl": "https://acme.com/ai-share-of-voice",
"pageTitle": "What is AI Share of Voice?",
"publisher": "Acme Corp",
"retrieved": "2026-03-27T09:14:00Z",
"relevanceScore": 0.97
}
| Field | Type | Conformance | Description |
|---|---|---|---|
chunkId | string | MUST | Unique identifier within this EntityMap |
text | string | MUST | The evidence passage. 1–5 sentences. Max 500 characters. SHOULD be extractive. MUST preserve the original meaning of the source. |
sourceUrl | string | MUST | Canonical URL of the source page |
pageTitle | string | MUST | Title of the source page at time of retrieval |
publisher | string | MUST | Publisher name. MUST match publisher.name in root object. Primary brand attribution mechanism. |
retrieved | string (ISO 8601) | SHOULD | Timestamp when the source page was fetched to produce this chunk. Signals freshness to consumers. |
relevanceScore | float 0.0–1.0 | MAY | Publisher-assigned relevance of this chunk to its entity. Scoring method is implementation-specific. |
publisher field MUST be present on every chunk - it is the primary mechanism for brand attribution in downstream AI consumption and MUST survive aggregation by third-party systems.All predicates are uppercase. The vocabulary is tiered: Core predicates are the minimum set for interoperability; Extended predicates are recommended for richer graphs; Custom predicates require explicit declaration.
Implementations SHOULD support all core predicates. A consumer that claims EntityMap compatibility MUST be able to process core predicates without error.
Extended predicates MAY be used by publishers. Consumers MUST NOT reject a conforming file that contains extended predicates, but MAY ignore them if not supported.
"vocabulary": {
"predicates": ["POLLINATES", "ZONES_AS", "SEASONALLY_OPERATES"],
"namespace": "https://acme.com/entitymap/vocab/v1"
}
Custom predicates MUST be uppercase. They MUST NOT conflict with standard predicate names. They MUST be documented at the declared namespace URI. Consumers MUST NOT reject a conforming file that contains undeclared custom predicates, but MAY ignore them.
entitymap.html is a rendered, crawlable view of the same data as entitymap.json. It is generated from the JSON and MUST NOT be maintained independently.
A conforming entitymap.html MUST:
entitymap.json via <link rel="alternate" type="application/json" href="/entitymap.json" /><script type="application/ld+json"> blocks<a href="#entity-slug"> internal hyperlinks where targets exist in the same filepublisher attribution on every evidence blockquote via a data-publisher attributenoindex directiveA conforming entitymap.html SHOULD:
rel="canonical" for SEO management. Self-canonicalization is the recommended default.<link rel="entitymap"> pointing back to entitymap.jsonA conforming entitymap.json MUST:
publisher field on every chunk matching publisher.name@type values or namespaced type extensionstargetId values resolve to a valid entityId within the EntityMap or its shard setThe spec version is declared in the root version field and MUST match the version of the schema URI used.
Minor versions (0.x) MAY add optional fields without breaking conformance of existing files. Major versions (x.0) MAY introduce breaking changes and MUST be announced with a minimum 6-month deprecation window for the previous version.
Publishers MUST update generated on every rebuild. Consumers SHOULD treat files with a generated timestamp older than 30 days as potentially stale.
EntityMap files are public by definition. Publishers MUST NOT include:
topicID field is optional and implementation-specific. Publishers using proprietary entity resolution systems MAY omit it without affecting conformance. If included, it MUST NOT expose information that would compromise the publisher's systems or data.Sitemaps describe pages. EntityMap describes knowledge. Both SHOULD be present and are complementary, not competing.
EntityMap uses schema.org @type values and is designed for compatibility. The HTML companion embeds valid JSON-LD per entity.
EntityMap discovery MAY be declared via an EntityMap: directive. This is a proposed convention, not yet a recognised robots.txt standard.
EntityMap uses application-specific JSON for implementation simplicity. JSON-LD representations are exposed in the HTML companion for broader interoperability.
sameAs fields SHOULD use Wikidata URIs as canonical entity anchors, linking site knowledge to the open knowledge graph.
Conceptual analogy: EntityMap is to AI agents as RSS is to feed readers - a structured, subscribable content layer with predictable discovery.
The initial reference implementation was developed by Waikay and consists of:
entitymap.json and entitymap.html filesentitymap.html dynamically from entitymap.jsonThe reference implementation is available at waikay.io/entitymap. Third-party implementations are welcomed. To register an implementation, open an issue at the specification repository.
{
"version": "0.2",
"schema": "https://entitymap.org/spec/v0.2",
"publisher": {
"name": "Acme Gardens",
"url": "https://acmegardens.com"
},
"generated": "2026-03-27T00:00:00Z",
"entities": [
{
"entityId": "e_001",
"@type": "DefinedTerm",
"name": "Companion Planting",
"description": "The practice of growing different plants in proximity for mutual benefit, including pest control, pollination support, and improved yield.",
"sameAs": "https://www.wikidata.org/wiki/Q905413",
"relations": [
{
"predicate": "IMPROVES",
"targetId": "e_002",
"targetName": "Crop Yield"
},
{
"predicate": "PREVENTS",
"targetId": "e_003",
"targetName": "Pest Damage"
}
],
"hasChunks": [
{
"chunkId": "c_001",
"text": "Companion planting pairs plants that benefit each other - for example, growing basil near tomatoes to repel aphids and improve fruit flavour.",
"sourceUrl": "https://acmegardens.com/companion-planting-guide",
"pageTitle": "The Complete Companion Planting Guide",
"publisher": "Acme Gardens",
"retrieved": "2026-03-27T09:14:00Z",
"relevanceScore": 0.95
}
]
}
]
}
CORE STRUCTURAL: RELATES_TO, INCLUDES, PART_OF, DEPENDS_ON,
CONFLICTS_WITH, ENABLES, REQUIRES
CORE CAUSATION: IMPROVES, DEGRADES, PRODUCES, PREVENTS, LEADS_TO
CORE INFORMATION: DESCRIBES, MEASURES, REFERENCES
CORE METADATA: AUTHORED_BY, AFFILIATED_WITH, INSTANCE_OF
EXT STRUCTURAL: EXCLUDES, SUITED_FOR
EXT STATE: MAINTAINS, PRECEDES, LACKS
EXT CAUSATION: TRANSFORMS, RESTRICTS, REMOVES, RESTORES,
CONVERTS, ALLOWS
EXT INFORMATION: RECOMMENDS, PROVIDES, PUBLISHES
EXT ANALYTICAL: IDENTIFIES, DIAGNOSES, COMPARES,
MONITORS, BENCHMARKS
EXT SEQUENTIAL: PASSES_THROUGH, NAVIGATES_TO
EXT AGENCY: REGULATES, PROTECTS, CREATES, TARGETS, ACHIEVES
Core: 18 predicates · Extended: 25 predicates · Total: 43 standard predicates
| Version | Date | Notes |
|---|---|---|
| 0.2 | 2026-03-27 | RFC 2119 normative language throughout. Relation model updated: targetId + targetName + targetUri replacing name-only target. retrieved field added to chunk. Predicate vocabulary tiered into Core and Extended. Discovery language hedged as publisher-side conventions. Opening claims moderated. Chunk text extractive requirement added. Reference implementation framing softened. |
| 0.1 | 2026-03-27 | Initial draft |