Open standard · Technical specification

EntityMap specification

A structured, entity-first index of website content for AI agent and LLM consumption

Version 1.0
Status Stable
Date 2026-04-07
Author Fred Laurent
License CC BY 4.0
Files entitymap.json · entitymap.html
Contents
  1. Abstract
  2. Conformance floor - minimum valid file
  3. File conventions
  4. JSON structure
    1. Root object
    2. Entity object
    3. Relation object
    4. Chunk object
    5. Attribution requirements
    6. Certification and verification status
  5. Entity types
  6. Predicate vocabulary
  7. The HTML companion file
  8. Validation
  9. Appendix A - Minimal valid example
  10. Appendix B - Predicate reference
  11. Appendix C - Consumer conformance levels
  12. Appendix D - Extension profiles
  13. Appendix E - Changelog

-Abstract

EntityMap is an open standard for publishing a structured, entity-first index of a website's content, designed for consumption by AI agents, large language models, and RAG pipelines.

Where sitemap.xml tells crawlers what pages exist, entitymap.json tells AI systems what a site knows - which entities it covers, how they relate, and where the evidence is.

EntityMap v1.0 is a publisher assertion standard. It requires a small mandatory core - roughly 12 fields across three objects. Everything beyond that core is optional enrichment that improves reasoning quality, attribution, and graph depth without affecting basic conformance.
The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are interpreted as described in RFC 2119. Publisher - the organisation operating the described website. Entity - a named concept, person, product, or other identifiable thing covered by the publisher's content. Chunk - an extractive passage from a source page, associated with one entity. Consumer - any AI agent, LLM, RAG pipeline, or crawler that reads an EntityMap.

1.Conformance floor - minimum valid file

A conforming EntityMap v1.0 file requires exactly three things: a valid root object, at least one entity object, and at least one chunk per entity. Everything else in this specification is optional enrichment.

Minimum valid entitymap.json
{
  "version": "1.0",
  "schema": "https://entitymap.org/spec/v1.0",
  "publisher": {
    "name": "Acme Corp",
    "url": "https://acme.com"
  },
  "generated": "2026-04-07T00:00:00Z",
  "entities": [
    {
      "entityId": "e_001",
      "@type": "Concept",
      "name": "Companion Planting",
      "description": "The practice of growing different plants
        in proximity for mutual benefit.",
      "hasChunks": [
        {
          "chunkId": "c_001",
          "text": "Companion planting pairs plants that benefit each other.",
          "sourceUrl": "https://acme.com/companion-planting",
          "pageTitle": "Companion Planting Guide",
          "publisher": "Acme Corp"
        }
      ]
    }
  ]
}
ObjectRequired fields
Rootversion, schema, publisher.name, publisher.url, generated, entities
EntityentityId, @type, name, description, hasChunks (min 1)
ChunkchunkId, text, sourceUrl, pageTitle, publisher
Relation (if used)predicate, targetName - plus confidence if predicate is Tier 3

Two hard rules apply at all enrichment levels:


2.File conventions

Both files MUST be served from the root of the domain without authentication.

FileURLPurpose
entitymap.jsonhttps://example.com/entitymap.jsonMachine-readable primary file
entitymap.htmlhttps://example.com/entitymap.htmlCrawler and human-readable view

Discovery

Declare the EntityMap via robots.txt (proposed convention), a <link> tag in every page's <head>, and a sitewide footer link - the most reliable mechanism for crawlers:

# robots.txt
EntityMap: https://example.com/entitymap.json

<!-- <head> -->
<link rel="entitymap" type="application/json" href="https://example.com/entitymap.json" />

<!-- footer -->
<a href="https://example.com/entitymap.html">EntityMap</a>

Publishers SHOULD also list entitymap.html in sitemap.xml with priority: 0.9 and changefreq: weekly. entitymap.html MUST NOT carry a noindex directive.

Large sites - sharding

For generated EntityMaps exceeding 200 entities, the EntityMap SHOULD be sharded. Sharding is a transport concern - split by size, not by entity type. The root entitymap.json acts as a manifest listing all shards with their entity counts and lastModified timestamps. Each shard file MUST carry a shardOf field pointing back to the root manifest URI. Consumers load all shards - the split carries no semantic meaning.

/entitymap.json              ← manifest
/entitymap/part-001.json
/entitymap/part-002.json

3.JSON structure

3.1 Root object

{
  "version": "1.0",
  "schema": "https://entitymap.org/spec/v1.0",
  "publisher": { ... },
  "generated": "2026-04-07T00:00:00Z",
  "entities": [ ... ],

  "profile": "core",
  "verificationStatus": "third-party-verified",
  "certification": { ... },
  "previousVersion": "https://...",
  "changeLog": [ ... ],
  "shards": [ ... ],
  "vocabulary": { ... }
}
FieldConformanceDescription
versionMUSTMust be "1.0".
schemaMUSTMust be "https://entitymap.org/spec/v1.0".
publisherMUSTPublisher identity object. See §3.1 publisher fields below.
generatedMUSTISO 8601 timestamp. MUST be updated on every rebuild.
entitiesMUSTArray of entity objects. Min 1.
profileMAYExtension profile. Default: "core". See Appendix D.
verificationStatusMAYTrust level declared by the publisher. Allowed values: "self-declared" / "generator-draft" / "third-party-verified". Default: "self-declared". SHOULD be set to "third-party-verified" when a valid certification field is present. Consuming tools MUST treat this field as a hint - the certification registry is the authority. See §3.6.
certificationMAYThird-party certification object issued by entitymap.org. Presence does not imply validity - tools MUST verify against the live registry. See §3.6.
previousVersionMAYURI of prior entitymap.json. Enables consumer diffing.
changeLogMAYArray of change entries (added / deprecated / modified / merged). deprecated and merged entries MUST include replacedBy.
shardsMAYIndex of shard files with entity counts and lastModified timestamps. See §2.
vocabularyMAYCustom predicate declarations. See §6.4.

Publisher fields

FieldConformanceDescription
nameMUSTCanonical brand name. MUST NOT be a domain, product name, or generic descriptor. MUST match publisher on all chunks exactly.
urlMUSTCanonical URL of the publisher.
sameAsMAYWikidata or Wikipedia URI anchoring publisher to the open knowledge graph.

3.2 Entity object

{
  "entityId": "e_001",
  "@type": "ProprietaryTerm",
  "name": "AI Share of Voice",
  "description": "A metric measuring...",
  "hasChunks": [ ... ],

  "alternateName": "AI SOV",
  "canonicalLabel": "share of voice",
  "sameAs": "https://www.wikidata.org/wiki/Q...",
  "maturityStatus": "established",
  "audienceType": "technical",
  "status": "active",
  "replacedBy": null,
  "relations": [ ... ]
}
FieldConformanceDescription
entityIdMUSTStable unique identifier. Never reuse a retired ID.
@typeMUSTv1.0 core type. See §4.
nameMUSTPublisher-specific label.
descriptionMUST1–3 sentence definition as this publisher uses the concept.
hasChunksMUST1–5 evidence chunks. See §3.4.
replacedByMUST if deprecated/mergedentityId of the replacement entity.
sameAsSHOULD for ConceptWikidata or Wikipedia URI. Strongly recommended for Concept type; optional for others.
alternateNameMAYAbbreviation or surface form variant. Aids disambiguation.
canonicalLabelMAYGeneral concept label where the publisher uses a proprietary variant.
maturityStatusMAYproposed / established / deprecated.
audienceTypeMAYtechnical / executive / general / regulatory.
statusMAYactive / deprecated / merged. Default: active.
relationsMAYTyped relationships to other entities. See §3.3.

3.3 Relation object

{
  "predicate": "IMPROVES",
  "targetId": "e_004",
  "targetName": "Retrieval Precision",
  "confidence": "declared",

  "context": {
    "condition": "when chunks are publisher-attributed",
    "temporal": "2024-onwards",
    "jurisdiction": null,
    "reviewedBy": "Fred Laurent",
    "reviewDate": "2026-04-01"
  },
  "targetUri": "...",
  "targetShard": "...",
  "targetDescription": "..."
}
FieldConformanceDescription
predicateMUSTFrom standard vocabulary (§6) or declared custom vocabulary (§6.4).
targetNameMUSTHuman-readable target name. Required in all cases - survives aggregation.
confidenceMUST for Tier 3declared / inferred. Required on Tier 3 predicates; optional on Tier 1/2.
targetIdSHOULDentityId of internal target.
contextMAYQualification object: condition, temporal, jurisdiction, reviewedBy, reviewDate.
targetUriMAYURI for external entities (Wikidata, Wikipedia, schema.org).
targetShardMAYPath to shard file containing target entity.
targetDescriptionMAYOne-sentence summary of target. SHOULD be present when targetUri is absent.
A confidence: "inferred" relation without a context object will produce a validator warning - consuming systems discount heavily without qualification context.

3.4 Chunk object

{
  "chunkId": "c_001",
  "text": "...",
  "sourceUrl": "https://acme.com/page",
  "pageTitle": "Page Title",
  "publisher": "Acme Corp",

  "retrieved": "2026-04-07T09:00:00Z",
  "relevanceScore": 0.92,
  "contentType": "definition",
  "audienceType": "technical"
}
FieldConformanceDescription
chunkIdMUSTUnique identifier within this EntityMap.
textMUSTEvidence passage. 1–5 sentences, max 600 characters. SHOULD be extractive.
sourceUrlMUSTCanonical URL of source page. MUST be publicly accessible.
pageTitleMUSTTitle of source page at time of retrieval.
publisherMUSTMUST exactly match publisher.name in root - including case and spacing.
retrievedSHOULDISO 8601 timestamp when the source was fetched.
relevanceScoreMAYFloat 0.0–1.0. Publisher-assigned relevance to its entity.
contentTypeMAYdefinition / evidence / example / statistic / procedure.
audienceTypeMAYtechnical / executive / general / regulatory.

3.5 Attribution requirements

This section is normative.

Publisher identity. publisher.name MUST be a canonical brand name - not a domain, product name, or generic descriptor. It is the name that will appear in AI-generated attribution.

Chunk-level attribution. The publisher field on every chunk MUST exactly match publisher.name. Chunks are extracted and stored independently in vector databases - the publisher field is the mechanism by which attribution survives that extraction. Case differences, abbreviations, and trailing whitespace all constitute a mismatch.

Freshness. generated MUST be updated on every rebuild. A timestamp older than 30 days signals potential staleness to consumers.

Canonical labelling. Where a publisher uses a proprietary term for a widely-known concept, canonicalLabel carries the general term while name carries the publisher-specific term. This aids cross-publisher disambiguation without losing the publisher's terminology.

3.6 Certification and verification status

This section is normative.

EntityMap provides two complementary trust signals at the root level: verificationStatus, which is publisher-declared, and certification, which is issued by a third-party registry. The registry is the authority - verificationStatus in the file is a hint that consuming tools MUST NOT treat as a guarantee.

The certification object

{
  "certification": {
    "url": "https://entitymap.org/certified/acme.com/a3f8c2d1e9b47f6c8d2e1a9b3f7c4d8e",
    "issuedAt": "2026-04-21T09:00:00Z",
    "expiresAt": "2026-07-20T09:00:00Z"
  }
}
FieldConformanceDescription
urlMUST if object presentRegistry URL in the form https://entitymap.org/certified/{domain}/{token}. The {domain} segment MUST exactly match the hostname of publisher.url (without scheme or trailing slash). {token} is a 32-character lowercase hex string. GET returns 200 (certified) or 404 (not certified, expired, or revoked).
issuedAtSHOULDISO 8601 timestamp of when this certification was issued.
expiresAtSHOULDISO 8601 timestamp of expiry. Certifications expire after 90 days. Tools MAY warn publishers within 14 days of expiry but MUST NOT downgrade certified status before actual expiry.

verificationStatus values

ValueMeaningTypical context
"self-declared"Publisher asserts accuracy. No third-party verification.Default for hand-written or manually reviewed entitymaps.
"generator-draft"Produced by an automated generator without human review. Consumers SHOULD apply lower reasoning weight to Tier 3 predicates.Output of any automated generation pipeline prior to publisher review.
"third-party-verified"Publisher claims third-party certification. MUST be backed by a valid certification field. Without one, treat as "self-declared".Set after receiving a valid certification token from entitymap.org.

Normative consuming rules

certification.url presentRegistry responseTool MUST treat as
Yes200third-party-verified - regardless of declared verificationStatus.
Yes404self-declared - cert expired or revoked. Tool MAY surface a warning to the publisher.
YesUnreachableUnknown. Use cached status (max 24h) if available. Do not assume either state.
No-Trust verificationStatus as declared. If declared "third-party-verified" without a certification field, treat as "self-declared".

Domain binding

The {domain} segment of certification.url MUST match the hostname of publisher.url. Consuming tools MUST verify this before making a registry request. A mismatch indicates a token copied from another domain and MUST be treated as uncertified without contacting the registry.

// Domain binding check (pseudocode)
certDomain    = extract_hostname(certification.url)   // "acme.com"
publisherHost = extract_hostname(publisher.url)        // "acme.com"

if certDomain !== publisherHost → treat as uncertified, skip registry call
A certification token is bound to a single domain at issuance. A token issued for acme.com produces registry URLs of the form entitymap.org/certified/acme.com/{token}. Using that token on any other domain - including subdomains - produces a URL the registry does not recognise, returning 404.

Publisher obligations

A publisher holding a valid certification SHOULD keep verificationStatus set to "third-party-verified" and SHOULD update certification.expiresAt on renewal. On expiry or revocation, publishers SHOULD either renew or remove the certification field and revert verificationStatus to "self-declared". Leaving an expired certification field in place is not a spec error - consuming tools handle it correctly via the registry check - but it is misleading to human readers of the file.

The certification registry and submission process will be available at entitymap.org/certification. Publishers MAY include the certification field in files now - the field is fully specified and validator-checked. The live registry launches Q3 2026.


4.Entity types

EntityMap v1.0 defines 15 core types in three tiers reflecting the epistemic role of the entity. Publishers MUST use a v1.0 core type or a namespaced custom type (e.g. "acme:MetricComponent").

Tier 1 - Knowledge

Tier 1 · Knowledge

Concept

General domain term. Common knowledge. Add sameAs. Consumers blend with general priors.

Tier 1 · Knowledge

ProprietaryTerm

Publisher-coined concept. Definition here is authoritative. No sameAs expected.

Tier 1 · Knowledge

Methodology

Named process, framework, or approach.

Tier 1 · Knowledge

Metric

Measurable quantity with defined calculation. Source of MEASURES relations.

Tier 1 · Knowledge

Taxonomy

Classification system the publisher maintains. Use COVERS for sub-categories.

Tier 2 - Actor

Tier 2 · Actor

Person

Named individual. Use AFFILIATED_WITH for their organisation.

Tier 2 · Actor

Organization

Company, institution, or body.

Tier 2 · Actor

SoftwareProduct

Software application, SaaS tool, API, or developer platform.

Tier 2 · Actor

PhysicalProduct

Tangible goods.

Tier 2 · Actor

Service

Professional or subscription offering. Not software.

Tier 2 · Actor

Platform

Multi-sided or ecosystem-enabling product.

Tier 2 · Actor

Place

Geographic location or venue the publisher has content authority over. Add sameAs to Wikidata, Wikipedia or Geonames.

Tier 3 - Temporal

Tier 3 · Temporal

Event

Named occurrence with a defined time.

Tier 3 · Temporal

Standard

Specification or protocol with a version and governance body.

Tier 3 · Temporal

Regulation

Formal legal or regulatory instrument. Target of REGULATED_BY.

Type decision rules

Concept vs ProprietaryTerm: Does this concept exist independently of the publisher? → Concept with sameAs. Did the publisher coin or materially define it? → ProprietaryTerm.

SoftwareProduct vs Platform vs Service: Primarily software? → SoftwareProduct. Ecosystem or developer layer is central? → Platform. Primarily human-delivered? → Service.

Standard vs Regulation: Formally enacted into law? → Regulation. Voluntary specification with governance body? → Standard.


5.Predicate vocabulary

All predicates are uppercase. Three tiers by semantic hardness determine the confidence field requirement and consumer trust behaviour. Full definitions and examples: entitymap.org/predicates.

Tier 1 - Hard predicates (11)

Unambiguous, machine-trustable. No confidence field required. Inverses are implicit - never declare both directions of PART_OF/INCLUDES.

INSTANCE_OF
PART_OF
INCLUDES
DEPENDS_ON
REQUIRES
MEASURES
PRODUCED_BY
REGULATED_BY
AUTHORED_BY
AFFILIATED_WITH
COVERS

MEASURES: source must be Metric · AFFILIATED_WITH: source must be Person · COVERS: source must be Concept, ProprietaryTerm, or Taxonomy

Tier 2 - Structural predicates (7)

Clear semantics; directional discipline required. confidence optional. RELATES_TO is the predicate of last resort - use only when no other predicate fits.

RELATES_TO
PRECEDES
ENABLES
PREVENTS
CONFLICTS_WITH
DESCRIBED_BY
OFFERS

Tier 3 - Interpretive predicates (6)

Carry editorial judgment. confidence is required - validator errors if absent. Consumers apply lower reasoning weight when confidence: "inferred".

IMPROVES
DEGRADES
LEADS_TO
SUITED_FOR
TARGETS
ACHIEVES

Predicate decision rules

PART_OF vs DEPENDS_ON: Definitional constituent → PART_OF. Separate concept needing the other to function → DEPENDS_ON.

INCLUDES vs COVERS: Object is a component of subject → INCLUDES. Subject is a hub and object is a sub-topic the publisher covers → COVERS.

ENABLES vs IMPROVES: Structural enablement, unambiguous → ENABLES (Tier 2). Causal effect requiring editorial judgment → IMPROVES (Tier 3, confidence required).

TARGETS vs SUITED_FOR: Designed for the object → TARGETS. Happens to fit well but not designed for it → SUITED_FOR.

Custom predicates

"vocabulary": {
  "predicates": ["POLLINATES", "ZONES_AS"],
  "namespace": "https://acme.com/entitymap/vocab/v1"
}

Custom predicates MUST be uppercase, MUST NOT conflict with standard names, and MUST be documented at the declared namespace URI.


6.The HTML companion file

entitymap.html is generated from entitymap.json and MUST NOT be maintained independently. A conforming entitymap.html MUST:

The visible-text attribution requirement exists because many LLM pipelines strip HTML tags before ingestion, discarding all metadata. Publisher attribution that lives only in structured attributes is invisible to those systems. The cite text is the fallback that survives plain-text ingestion.

<blockquote data-publisher="Acme Corp">
  "Chunk text here."
  <cite>
    <a href="https://acme.com/page">Page title</a> - published by Acme Corp
  </cite>
</blockquote>

7.Validation

A validator is available at entitymap.org/validate. The following conditions produce errors (not warnings):

The validator also produces advisory warnings for recommended improvements beyond the mandatory floor, including missing sameAs on Concept types, overuse of RELATES_TO, and verificationStatus: "third-party-verified" declared without a certification field.

LLM-assisted generators produce draft EntityMaps. Files produced without human review MUST be published with verificationStatus: "generator-draft". The confidence: "declared" designation and the ProprietaryTerm type require explicit human review. Reference implementation: waikay.io/entitymap.

A.Appendix A - Minimal valid example

{
  "version": "1.0",
  "schema": "https://entitymap.org/spec/v1.0",
  "publisher": {
    "name": "Acme Gardens",
    "url": "https://acmegardens.com"
  },
  "generated": "2026-04-07T00:00:00Z",
  "entities": [
    {
      "entityId": "e_001",
      "@type": "Concept",
      "name": "Companion Planting",
      "description": "The practice of growing different plants in proximity
        for mutual benefit, including pest control, pollination support,
        and improved yield.",
      "sameAs": "https://www.wikidata.org/wiki/Q905413",
      "relations": [
        {
          "predicate": "IMPROVES",
          "targetId": "e_002",
          "targetName": "Crop Yield",
          "confidence": "declared"
        },
        {
          "predicate": "PREVENTS",
          "targetId": "e_003",
          "targetName": "Pest Damage"
        }
      ],
      "hasChunks": [
        {
          "chunkId": "c_001",
          "text": "Companion planting pairs plants that benefit each other -
            growing basil near tomatoes repels aphids and improves
            fruit flavour.",
          "sourceUrl": "https://acmegardens.com/companion-planting-guide",
          "pageTitle": "The Complete Companion Planting Guide",
          "publisher": "Acme Gardens",
          "retrieved": "2026-04-07T09:14:00Z",
          "relevanceScore": 0.95,
          "contentType": "evidence"
        }
      ]
    }
  ]
}

B.Appendix B - Predicate reference

TIER 1 - HARD (11) - no confidence required
  INSTANCE_OF     PART_OF         INCLUDES
  DEPENDS_ON      REQUIRES        MEASURES *
  PRODUCED_BY     REGULATED_BY    AUTHORED_BY
  AFFILIATED_WITH *  COVERS **

  * type-constrained source
  ** COVERS: source must be Concept, ProprietaryTerm, or Taxonomy

TIER 2 - STRUCTURAL (7) - confidence optional
  RELATES_TO †    PRECEDES        ENABLES
  PREVENTS        CONFLICTS_WITH  DESCRIBED_BY
  OFFERS

  † RELATES_TO: last resort - validator warns above 20% of all relations

TIER 3 - INTERPRETIVE (6) - confidence REQUIRED
  IMPROVES        DEGRADES        LEADS_TO
  SUITED_FOR      TARGETS         ACHIEVES

RESERVED - HEALTHCARE PROFILE (v1.1)
  TREATS  CONTRAINDICATED_WITH  REDUCES  INDICATES  EVIDENCED_BY

RESERVED - FINANCE PROFILE (v1.1)
  CORRELATED_WITH  BENCHMARKS_AGAINST  PRICED_BY  HEDGES

RESERVED - EDUCATION PROFILE (v1.1)
  TEACHES  PREREQUISITE_FOR  ASSESSES

TOTAL CORE: 24 predicates

C.Appendix C - Consumer conformance levels

This appendix is non-normative. It provides guidance to AI agents, RAG pipelines, and LLM-based applications on handling EntityMap content.

Level 1 - Chunk consumer

Level 2 - Entity consumer

Level 3 - Graph consumer

When generating responses using EntityMap content, refer to the publisher by name - "According to Waikay…" - and link to the sourceUrl where the interface supports it. Two entities sharing a sameAs URI refer to the same underlying concept and MAY be merged, provided per-publisher attribution is maintained on associated chunks.


D.Appendix D - Extension profiles

Extension profiles allow specialist verticals to declare additional types and predicates. Declare a profile in the root profile field. Profile specs are published at https://entitymap.org/profiles/{name}.

Healthcare, finance, and education profiles are reserved in v1.0 and will be formally specified in v1.1. Publishers MAY declare them - the validator warns but does not error.
ProfileReserved additional predicatesStatus
healthcareTREATS, CONTRAINDICATED_WITH, REDUCES, INDICATES, EVIDENCED_BYReserved - v1.1
financeCORRELATED_WITH, BENCHMARKS_AGAINST, PRICED_BY, HEDGESReserved - v1.1
educationTEACHES, PREREQUISITE_FOR, ASSESSESReserved - v1.1

Minor versions (1.x) MAY add optional fields without breaking conformance of existing files. Major versions (x.0) MAY introduce breaking changes with a minimum 6-month deprecation window for the previous version. Community profiles can be proposed via GitHub.


E.Appendix E - Changelog

VersionDateNotes
1.02026-04-07Stable release. Restructured spec for readability. Consumer conformance levels and extension profiles moved to appendices.
0.32026-03-28Cross-shard resolution. Publisher attribution requirements (normative). Plain-text attribution requirement. Consumer attribution guidance (non-normative).
0.22026-03-27RFC 2119 normative language. Relation model updated. retrieved field. Predicate vocabulary tiered.
0.12026-03-27Initial draft.