Open standard · Technical specification

EntityMap specification

A structured, entity-first index of website content for AI agent and LLM consumption

Version 0.2
Status Draft
Date 2026-03-27
Authors Waikay / InLinks Optimization Ltd
License CC BY 4.0
Files entitymap.json · entitymap.html
Contents
  1. Abstract
  2. Conventions and terminology
  3. Motivation
  4. File conventions
  5. JSON structure
  6. Standard predicate vocabulary
  7. The HTML companion file
  8. Validation
  9. Versioning and evolution
  10. Privacy and security
  11. Relationship to existing standards
  12. Reference implementation
  13. Appendix A - Minimal valid example
  14. Appendix B - Complete predicate reference
  15. Appendix C - Changelog

-Abstract

EntityMap is an open standard for publishing a structured, entity-first index of a website's content, designed for consumption by AI agents, large language models, and RAG (Retrieval-Augmented Generation) pipelines.

Where sitemap.xml tells crawlers what pages exist, entitymap.json aims to do for AI agents what the Sitemaps protocol did for search crawlers: provide a predictable, machine-readable discovery layer for site knowledge.

EntityMap is designed to reduce disambiguation loss, attribution loss, and reasoning loss in AI retrieval systems - three structural problems that page-level retrieval does not solve.

1.Conventions and terminology

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this document are to be interpreted as described in RFC 2119.

Throughout this document:


2.Motivation

2.1 The retrieval problem

Large language models build knowledge from training data. At inference time, retrieval-augmented systems supplement that knowledge by fetching relevant content from the web. Current retrieval mechanisms operate at the page level - fetching HTML documents and extracting text without structured awareness of the entities, concepts, or relationships those pages contain.

This creates three problems:

Disambiguation loss. The same concept may appear under many surface forms across a site ("AI SOV", "AI Share of Voice", "artificial intelligence share of voice"). A page-level retriever treats these as separate signals rather than a single entity.

Attribution loss. When retrieved content is incorporated into an LLM response, the publisher that produced it may not be credited. A URL may be used as a source while the publisher's name never appears in the answer - the "ghost citation" problem.

Reasoning loss. Relationships between concepts are buried in prose. A model must reconstruct that "AI Topical Presence explains AI Share of Voice" from unstructured text rather than reading it as an explicit, typed relation.

2.2 The EntityMap solution

EntityMap allows publishers to pre-solve these problems by publishing a structured index alongside their existing content. The index aggregates evidence by entity rather than by page, carries explicit publisher attribution on every evidence chunk, and encodes relationships between entities using a controlled predicate vocabulary.

A well-formed EntityMap enables consumers to:

2.3 Design rationale: application JSON over JSON-LD

EntityMap uses an application-specific JSON structure rather than strict JSON-LD. This choice prioritises implementation simplicity and broad compatibility. The HTML companion file (entitymap.html) exposes equivalent schema.org / JSON-LD representations per entity for broader interoperability with existing structured data consumers. Publishers who require full JSON-LD compliance may embed the EntityMap vocabulary within a JSON-LD context document; this specification does not preclude that approach.


3.File conventions

3.1 Location

An EntityMap consists of two files served at predictable URLs:

FileURL patternPurpose
entitymap.jsonhttps://example.com/entitymap.jsonMachine-readable primary file
entitymap.htmlhttps://example.com/entitymap.htmlCrawler and human readable rendered view

Both files MUST be served from the root of the domain or subdomain they describe, without authentication.

3.2 Discovery

Publishers MAY expose EntityMap discovery hints using one or more of the following mechanisms. Consumers MAY support one or more of these mechanisms. None of these mechanisms are currently recognized standards - they are publisher-side conventions proposed by this specification for future adoption.

In robots.txt

# EntityMap
EntityMap: https://example.com/entitymap.json

In the HTML <head> of every page

<link rel="entitymap" type="application/json"
      href="https://example.com/entitymap.json" />

In sitemap.xml

Publishers MAY list entitymap.html as an additional URL entry with high priority and changefreq: weekly to signal freshness.

3.3 Indexability

entitymap.html MUST NOT carry a noindex directive. It is the primary discovery surface for AI crawlers that consume HTML. Publishers MAY use a rel="canonical" to manage search engine indexing according to their SEO strategy; self-canonicalization is the recommended default.

3.4 Large sites - sharding

For sites with more than 200 entities, the EntityMap SHOULD be sharded. The root entitymap.json becomes a manifest pointing to typed shard files:

/entitymap.json              ← manifest only
/entitymap/concepts.json
/entitymap/people.json
/entitymap/products.json
/entitymap/places.json

The manifest MUST list all shards with their entity counts and lastModified timestamps. A topEntities array of up to 20 entries MAY be included to provide consumers with a fast entry point to the most important entities.


4.JSON structure

4.1 Root object

{
  "version": "0.2",
  "schema": "https://entitymap.org/spec/v0.2",
  "publisher": { ... },
  "generated": "2026-03-27T00:00:00Z",
  "vocabulary": { ... },
  "entities": [ ... ]
}
FieldTypeConformanceDescription
versionstringMUSTSpec version this file conforms to
schemastringMUSTURI of the EntityMap spec used
publisherobjectMUSTIdentity of the site publisher. See §4.2
generatedstring (ISO 8601)MUSTTimestamp of last full generation
vocabularyobjectMAYCustom predicate declarations. If omitted, standard vocabulary applies
entitiesarrayMUSTList of entity objects. See §4.3

4.2 Publisher object

{
  "name": "Acme Corp",
  "url": "https://acme.com",
  "sameAs": "https://www.wikidata.org/wiki/Q..."
}
FieldTypeConformanceDescription
namestringMUSTHuman-readable publisher name. Must match publisher field on all chunks.
urlstringMUSTCanonical URL of the publisher
sameAsstringMAYWikidata or schema.org URI anchoring publisher to open knowledge graph

4.3 Entity object

{
  "entityId": "e_001",
  "topicID": 456,
  "@type": "DefinedTerm",
  "name": "AI Share of Voice",
  "alternateName": "AI SOV",
  "canonicalLabel": "share of voice",
  "description": "A metric measuring...",
  "sameAs": "https://www.wikidata.org/wiki/Q...",
  "relations": [ ... ],
  "hasChunks": [ ... ]
}
FieldTypeConformanceDescription
entityIdstringMUSTStable unique identifier within this EntityMap. Used as the reference target in relations.
topicIDintegerMAYProprietary entity resolution ID. Implementation-specific; omit without affecting conformance.
@typestringMUSTSchema.org type. See §4.5
namestringMUSTPublisher-specific label as used in this site's content (e.g. "AI Share of Voice")
alternateNamestringMAYAbbreviation or common alternative surface form
canonicalLabelstringMAYGeneral concept label from entity resolver (e.g. "share of voice"). Distinct from publisher-specific name.
descriptionstringMUST1–3 sentence definition as this publisher uses the concept. SHOULD be extractive or minimally normalised from source content.
sameAsstringSHOULDWikidata or schema.org URI anchoring entity to open knowledge graph
relationsarrayMAYTyped relationships to other entities. See §4.4
hasChunksarrayMUSTEvidence chunks. 1–5 per entity. See §4.6

4.4 Relation object

Relations are directional: the subject is the entity containing the relation array, the object is the target. Three target patterns are supported depending on whether the target is internal, external, or both:

Internal target (entity within this EntityMap)

{
  "predicate": "ENABLES",
  "targetId": "e_012",
  "targetName": "AI Topical Presence"
}

External target (entity outside this EntityMap)

{
  "predicate": "INSTANCE_OF",
  "targetUri": "https://www.wikidata.org/wiki/Q1163385",
  "targetName": "Herfindahl-Hirschman Index"
}
FieldTypeConformanceDescription
predicatestringMUSTFrom standard vocabulary (§5) or declared custom vocabulary (§5.3)
targetIdstringSHOULDThe entityId of the target entity. Required for internal relations.
targetNamestringMUSTHuman-readable name of the target entity. Required in all cases for readability.
targetUristringMAYURI for external entities (Wikidata, schema.org, etc.)
For internal relations, targetId MUST match a valid entityId within the same EntityMap or shard set. targetName MUST be present in all cases - it is the human-readable anchor that survives aggregation and redistribution.

4.5 Entity types

Permitted @type values are drawn from schema.org:

TypeUse for
DefinedTermConcepts, metrics, methodologies, standards
PersonNamed individuals
OrganizationCompanies, institutions, bodies
ProductNamed products or services
PlaceLocations, regions, venues
EventNamed events, conferences, occurrences
ScholarlyArticleResearch, studies, reports
CreativeWorkBooks, guides, courses
RegulationLaws, policies, standards bodies

Publishers MAY use additional schema.org types. Non-schema.org types MUST be prefixed with a declared namespace (e.g. waikay:MetricComponent).

4.6 Chunk object

{
  "chunkId": "c_001",
  "text": "AI Share of Voice measures...",
  "sourceUrl": "https://acme.com/ai-share-of-voice",
  "pageTitle": "What is AI Share of Voice?",
  "publisher": "Acme Corp",
  "retrieved": "2026-03-27T09:14:00Z",
  "relevanceScore": 0.97
}
FieldTypeConformanceDescription
chunkIdstringMUSTUnique identifier within this EntityMap
textstringMUSTThe evidence passage. 1–5 sentences. Max 500 characters. SHOULD be extractive. MUST preserve the original meaning of the source.
sourceUrlstringMUSTCanonical URL of the source page
pageTitlestringMUSTTitle of the source page at time of retrieval
publisherstringMUSTPublisher name. MUST match publisher.name in root object. Primary brand attribution mechanism.
retrievedstring (ISO 8601)SHOULDTimestamp when the source page was fetched to produce this chunk. Signals freshness to consumers.
relevanceScorefloat 0.0–1.0MAYPublisher-assigned relevance of this chunk to its entity. Scoring method is implementation-specific.
Maximum 5 chunks per entity. Implementations SHOULD select the highest-relevance chunks and discard the remainder. The publisher field MUST be present on every chunk - it is the primary mechanism for brand attribution in downstream AI consumption and MUST survive aggregation by third-party systems.

5.Standard predicate vocabulary

All predicates are uppercase. The vocabulary is tiered: Core predicates are the minimum set for interoperability; Extended predicates are recommended for richer graphs; Custom predicates require explicit declaration.

5.1 Core predicates

Implementations SHOULD support all core predicates. A consumer that claims EntityMap compatibility MUST be able to process core predicates without error.

Structural
RELATES_TO
INCLUDES
PART_OF
DEPENDS_ON
CONFLICTS_WITH
ENABLES
REQUIRES
Causation
IMPROVES
DEGRADES
PRODUCES
PREVENTS
LEADS_TO
Information
DESCRIBES
MEASURES
REFERENCES
Metadata
AUTHORED_BY
AFFILIATED_WITH
INSTANCE_OF

5.2 Extended predicates

Extended predicates MAY be used by publishers. Consumers MUST NOT reject a conforming file that contains extended predicates, but MAY ignore them if not supported.

Structural
EXCLUDES
SUITED_FOR
State
MAINTAINS
PRECEDES
LACKS
Causation
TRANSFORMS
RESTRICTS
REMOVES
RESTORES
CONVERTS
ALLOWS
Information
RECOMMENDS
PROVIDES
PUBLISHES
Analytical
IDENTIFIES
DIAGNOSES
COMPARES
MONITORS
BENCHMARKS
Sequential / spatial
PASSES_THROUGH
NAVIGATES_TO
Agency
REGULATES
PROTECTS
CREATES
TARGETS
ACHIEVES

5.3 Declaring custom predicates

"vocabulary": {
  "predicates": ["POLLINATES", "ZONES_AS", "SEASONALLY_OPERATES"],
  "namespace": "https://acme.com/entitymap/vocab/v1"
}

Custom predicates MUST be uppercase. They MUST NOT conflict with standard predicate names. They MUST be documented at the declared namespace URI. Consumers MUST NOT reject a conforming file that contains undeclared custom predicates, but MAY ignore them.


6.The HTML companion file

entitymap.html is a rendered, crawlable view of the same data as entitymap.json. It is generated from the JSON and MUST NOT be maintained independently.

A conforming entitymap.html MUST:

A conforming entitymap.html SHOULD:


7.Validation

A conforming entitymap.json MUST:


8.Versioning and evolution

The spec version is declared in the root version field and MUST match the version of the schema URI used.

Minor versions (0.x) MAY add optional fields without breaking conformance of existing files. Major versions (x.0) MAY introduce breaking changes and MUST be announced with a minimum 6-month deprecation window for the previous version.

Publishers MUST update generated on every rebuild. Consumers SHOULD treat files with a generated timestamp older than 30 days as potentially stale.


9.Privacy and security

EntityMap files are public by definition. Publishers MUST NOT include:

The topicID field is optional and implementation-specific. Publishers using proprietary entity resolution systems MAY omit it without affecting conformance. If included, it MUST NOT expose information that would compromise the publisher's systems or data.

10.Relationship to existing standards

sitemap.xml

Sitemaps describe pages. EntityMap describes knowledge. Both SHOULD be present and are complementary, not competing.

schema.org

EntityMap uses schema.org @type values and is designed for compatibility. The HTML companion embeds valid JSON-LD per entity.

robots.txt

EntityMap discovery MAY be declared via an EntityMap: directive. This is a proposed convention, not yet a recognised robots.txt standard.

JSON-LD

EntityMap uses application-specific JSON for implementation simplicity. JSON-LD representations are exposed in the HTML companion for broader interoperability.

Wikidata

sameAs fields SHOULD use Wikidata URIs as canonical entity anchors, linking site knowledge to the open knowledge graph.

RSS / Atom

Conceptual analogy: EntityMap is to AI agents as RSS is to feed readers - a structured, subscribable content layer with predictable discovery.


11.Reference implementation

The initial reference implementation was developed by Waikay and consists of:

The reference implementation is available at waikay.io/entitymap. Third-party implementations are welcomed. To register an implementation, open an issue at the specification repository.


A.Appendix A - Minimal valid example

{
  "version": "0.2",
  "schema": "https://entitymap.org/spec/v0.2",
  "publisher": {
    "name": "Acme Gardens",
    "url": "https://acmegardens.com"
  },
  "generated": "2026-03-27T00:00:00Z",
  "entities": [
    {
      "entityId": "e_001",
      "@type": "DefinedTerm",
      "name": "Companion Planting",
      "description": "The practice of growing different plants in proximity for mutual benefit, including pest control, pollination support, and improved yield.",
      "sameAs": "https://www.wikidata.org/wiki/Q905413",
      "relations": [
        {
          "predicate": "IMPROVES",
          "targetId": "e_002",
          "targetName": "Crop Yield"
        },
        {
          "predicate": "PREVENTS",
          "targetId": "e_003",
          "targetName": "Pest Damage"
        }
      ],
      "hasChunks": [
        {
          "chunkId": "c_001",
          "text": "Companion planting pairs plants that benefit each other - for example, growing basil near tomatoes to repel aphids and improve fruit flavour.",
          "sourceUrl": "https://acmegardens.com/companion-planting-guide",
          "pageTitle": "The Complete Companion Planting Guide",
          "publisher": "Acme Gardens",
          "retrieved": "2026-03-27T09:14:00Z",
          "relevanceScore": 0.95
        }
      ]
    }
  ]
}

B.Appendix B - Complete predicate reference

CORE STRUCTURAL:   RELATES_TO, INCLUDES, PART_OF, DEPENDS_ON,
                   CONFLICTS_WITH, ENABLES, REQUIRES

CORE CAUSATION:    IMPROVES, DEGRADES, PRODUCES, PREVENTS, LEADS_TO

CORE INFORMATION:  DESCRIBES, MEASURES, REFERENCES

CORE METADATA:     AUTHORED_BY, AFFILIATED_WITH, INSTANCE_OF

EXT STRUCTURAL:    EXCLUDES, SUITED_FOR

EXT STATE:         MAINTAINS, PRECEDES, LACKS

EXT CAUSATION:     TRANSFORMS, RESTRICTS, REMOVES, RESTORES,
                   CONVERTS, ALLOWS

EXT INFORMATION:   RECOMMENDS, PROVIDES, PUBLISHES

EXT ANALYTICAL:    IDENTIFIES, DIAGNOSES, COMPARES,
                   MONITORS, BENCHMARKS

EXT SEQUENTIAL:    PASSES_THROUGH, NAVIGATES_TO

EXT AGENCY:        REGULATES, PROTECTS, CREATES, TARGETS, ACHIEVES

Core: 18 predicates · Extended: 25 predicates · Total: 43 standard predicates


C.Appendix C - Changelog

VersionDateNotes
0.22026-03-27RFC 2119 normative language throughout. Relation model updated: targetId + targetName + targetUri replacing name-only target. retrieved field added to chunk. Predicate vocabulary tiered into Core and Extended. Discovery language hedged as publisher-side conventions. Opening claims moderated. Chunk text extractive requirement added. Reference implementation framing softened.
0.12026-03-27Initial draft

EntityMap is an open standard published by Waikay / InLinks Optimization Ltd under CC BY 4.0. Contributions and implementations are welcomed. Feedback via waikay.io/entitymap.