Skip to content

Node files

A Knowledge Node is a set of static files served from a CDN. This page documents every file a published node contains: its purpose, its shape, and a real example, plus the stable-id namespaces and the storage layout.

The entry point is always manifest.json. A consumer reads manifest.json, then ai.json, then the record files it needs, branching on each record's schema_version. See Consuming a node for the full read order, and Knowledge graph for the model these files are projected from.

Two version axes

Each file embeds a per-schema schema_version (frozen at 1.0.0). The contract document that defines these schemas carries its own contractVersion (0.3.0). They move independently: a breaking change to one file bumps only that file's schema and its schema_version const.

Storage layout

Publishing writes to object storage under a per-site root, bainquet/{siteHash}/ where siteHash = sha256(websiteId)[0:16]:

bainquet/{siteHash}/
  versions/{ulid}/...      # immutable, per-version copies of every file
  cas/{sha[0:2]}/{sha}     # content-addressed dedupe store
  latest/...               # mutable serving copies
  latest/manifest.json     # THE pointer, flipped LAST
  • Immutable files (.jsonl, schema.jsonld, openapi.json) go to versions/{ulid}/ with a one-year immutable cache header and are deduped through the content-addressed cas/ store.
  • Mutable files (manifest.json, llms.txt, ai.json, sitemap-ai.xml, trust.json) go to latest/ with a short max-age plus stale-while-revalidate header.
  • Atomic flip: the publisher uploads all immutable files, patches the mutable files, HEAD-verifies that every manifest-referenced file is present (publish.atomic_conflict otherwise), writes the mutable files to latest/, and finally writes latest/manifest.json as the single last mutating PUT. Any error before the flip leaves the previously published latest/ fully intact. A per-site Redis lock prevents concurrent publishes (publish.conflict).

The ulid is the monotonic ordering key; the human label (2026-06-12.1) is display only. To pin a build, read from versions/{ulid}/; to always get the live build, read from latest/.

Stable-id namespaces

Every exported record carries a stable, namespace-prefixed string id. Internal Postgres uuid primary keys never appear in node files. Each id is <prefix>_<hash>, where the hash is a website-scoped hash of the record's natural key, so the id is stable across versions for the same logical record.

PrefixRecordNatural key
entity_entity(entityType, canonicalKey)
fact_fact(subject, predicate, language)
rel_relationship(fromStableId, relationType, toRef)
chunk_AIChunkcanonical chunkId
qa_synthetic Q&A(question, language)
src_sourceurl

Cross-file references use these stable ids, never the uuid. For example facts.jsonl.subject_entity_id points to entities.jsonl.id, and chunks.jsonl.source_id points to sources.jsonl.id. A value matching a uuid where a stable id is required is a publish-integrity failure (node.uuid_leak).

Universal record fields

Every derived-knowledge record line (entities, facts, relationships) carries this envelope. Purely structural files (sources, sitemap) omit confidence/provenance_method; chunks may carry source_id but not confidence/provenance_method.

FieldTypeNotes
idstringstable, namespace-prefixed
schema_versionstring (semver)per-schema frozen version (1.0.0); a const per schema
languagestring (BCP-47)per-language files repeat the language they belong to
confidencenumber [0,1]provenance-derived (see the confidence ladder)
provenance_methodenumcms_field schema_org seo_meta text_extraction llm_inferred
source_urlstring (uri)provenance
source_idstringpoints to sources.jsonl.id

Encoding for all files: UTF-8, LF line endings. Each .jsonl file is exactly one JSON object per line, no array wrapper, no blank lines, final line ends with \n. Monetary value is always numeric; the formatted string lives in display. Timestamps are RFC 3339 UTC.

Files produced today Stable

Each of these has a built exporter module in apps/api/src/modules/exporters/internal/exporters/. The free exporters always run with zero LLM. The advanced exporters (trust, openapi) gate on plan-derived capabilities.

manifest.json

The root index and the single entry point. It lists every file present (path, content type, checksum, byte size, and record count for .jsonl), so a consumer can fetch only what it needs and skip unchanged files between versions. It is emitted last; it references only files already uploaded in the same version directory. capabilities is plan-derived (never hand-set); trust_tier mirrors trust.json.trust_tier when present.

Required fields: schema_version, node_version (ULID sort key), brand, domain, website_id, generated_at, capabilities, languages, default_language, files. Each files[] entry has path, content_type, checksum (sha256:<64 hex>), bytes, immutable, and optionally records, language, produced_by.

json
{"schema_version":"1.0.0","node_version":"01J9Z2K7QF8MABCD3","node_version_label":"2026-06-12.1","brand":"bainquet","domain":"bainquet.online","website_id":"acme-cafe","generated_at":"2026-06-12T08:30:00Z","capabilities":["facts","entities","relationships","chunks","qa","sources","sitemap"],"languages":["en","de"],"default_language":"en","trust_tier":"dns_txt","files":[{"path":"facts.en.jsonl","content_type":"application/x-ndjson","checksum":"sha256:3f1c0000000000000000000000000000000000000000000000000000000003f1","bytes":40213,"records":118,"language":"en","produced_by":"facts-jsonl@1.0.0","immutable":true},{"path":"manifest.json","content_type":"application/json","checksum":"sha256:aa100000000000000000000000000000000000000000000000000000000000aa","bytes":912,"immutable":false}]}

ai.json

A machine "about this node" descriptor: brand, website identity, the entry path to manifest.json, a description, the languages, license, contact, a refresh hint, and an ordered consumes consumption hint. Required: schema_version, brand, website_id, entry, license.

json
{"schema_version":"1.0.0","brand":"bainquet","website_id":"acme-cafe","entry":"manifest.json","description":"Structured AI node for Acme Cafe","languages":["en","de"],"license":"CC-BY-4.0","contact":"hello@acme-cafe.example","refresh_hint_seconds":86400,"consumes":["facts.en.jsonl","entities.en.jsonl","chunks.en.jsonl"]}

llms.txt

A plain-text profile following the llms.txt convention: an H1 title, a blockquote summary, and a markdown link list to the key node files. It is described structurally, not JSON-Schema-validated.

text
# Acme Cafe — AI Node
> Specialty coffee roaster and cafe in Lisbon. Structured for AI consumption by bAInquet.
## Files
- [Facts](facts.en.jsonl): typed facts
- [Entities](entities.en.jsonl): people, products, places

entities.jsonl (per-language: entities.<lang>.jsonl)

One entity per line: a thing the site is about. Required: id (entity_*), schema_version, type, name, language, confidence, provenance_method. The open attributes bag is the only field that permits extension.

json
{"id":"entity_017","schema_version":"1.0.0","type":"Product","name":"Single-Origin Ethiopia","language":"en","confidence":0.95,"provenance_method":"schema_org","source_id":"src_004","attributes":{"roast":"light"}}

facts.jsonl (per-language: facts.<lang>.jsonl)

One typed subject-predicate-value statement per line. The value stays typed (string, number, boolean, or object); monetary values stay numeric, with the formatted string in display. Required: id (fact_*), schema_version, subject, predicate, value, language, confidence, provenance_method. Optional: unit, currency, display, subject_entity_id (entity_*), importance (1-5), freshness (stable/seasonal/frequent/real_time), valid_from, valid_until.

Editorial and internal fields (editorialStatus, pinned, reviewedBy, originalValue) are never exported. A hidden fact is never emitted. Only the surviving fact per (subject, predicate) is emitted; superseded and merged facts are dropped.

json
{"id":"fact_042","schema_version":"1.0.0","subject":"Single-Origin Ethiopia","predicate":"price","value":16.99,"unit":"EUR","currency":"EUR","display":"16.99 EUR","subject_entity_id":"entity_017","importance":4,"freshness":"seasonal","language":"en","confidence":0.98,"provenance_method":"cms_field","source_id":"src_004"}

relationships.jsonl (per-language: relationships.<lang>.jsonl)

One typed edge per line between two entities. Required: id (rel_*), schema_version, type, from_entity_id (entity_*), to_entity_id (entity_*), resolution_status, language, confidence, provenance_method. Only resolved edges are emitted; dangling edges are suppressed until the relationship-resolution job resolves them.

json
{"id":"rel_009","schema_version":"1.0.0","type":"PRODUCED_BY","from_entity_id":"entity_017","to_entity_id":"entity_003","resolution_status":"resolved","language":"en","confidence":0.9,"provenance_method":"text_extraction","source_id":"src_004"}

chunks.jsonl (per-language: chunks.<lang>.jsonl)

One retrieval-sized text chunk per line. Required: id (chunk_*), schema_version, text, tokens_estimate, language. Optional: entity_ids (referenced entity_* ids), source_id. The token estimate is computed once at creation. Embedding vectors are never inlined here.

json
{"id":"chunk_125","schema_version":"1.0.0","text":"Acme Cafe roasts single-origin Ethiopian beans...","tokens_estimate":58,"entity_ids":["entity_017"],"language":"en","source_id":"src_004"}

qa.jsonl (per-language: qa.<lang>.jsonl)

One synthetic question/answer pair per line, generated deterministically on the free tier. Required: id (qa_*), schema_version, question, answer, language. Optional: fact_ids (the fact_* rows the answer was derived from), entity_ids, confidence, provenance_method. Stale Q&A is excluded entirely; the internal stale flag is never serialized.

json
{"id":"qa_031","schema_version":"1.0.0","question":"How much is the Single-Origin Ethiopia?","answer":"16.99 EUR.","fact_ids":["fact_042"],"language":"en","confidence":0.98,"provenance_method":"cms_field"}

sources.jsonl

One source per line: the provenance record that every source_id points back to. Language-agnostic file (sources are shared across languages), though each row carries its own language. Required: id (src_*), schema_version, url, content_type, last_seen_at, checksum, language, trust. Optional: canonical_url, title. The trust value is derived from the verification method strength.

json
{"id":"src_004","schema_version":"1.0.0","url":"https://acme-cafe.example/menu","canonical_url":"https://acme-cafe.example/menu","title":"Menu","content_type":"page","last_seen_at":"2026-06-12T07:55:00Z","checksum":"sha256:9b2e0000000000000000000000000000000000000000000000000000000009b2","language":"en","trust":0.9}

schema.jsonld

A schema.org JSON-LD projection of high-importance entities and facts for general crawlers. Required: @context, @graph. Each @graph member has @type (a schema.org type) and @id (resolving to a node-local entity_* or fact_* id).

json
{"@context":"https://schema.org","@graph":[{"@type":"Product","@id":"#entity_017","name":"Single-Origin Ethiopia","offers":{"@type":"Offer","price":"16.99","priceCurrency":"EUR"}}]}

sitemap-ai.xml

An AI-oriented sitemap (urlset) listing node files and key pages with lastmod and an ai:importance extension (1-5). The default namespace is the sitemaps schema; the bAInquet extension namespace is https://bainquet.online/ns/ai. Described structurally, not JSON-Schema-validated.

xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:ai="https://bainquet.online/ns/ai">
  <url><loc>https://acme-cafe.example/.well-known/ai/facts.en.jsonl</loc><lastmod>2026-06-12</lastmod><ai:importance>4</ai:importance></url>
</urlset>

trust.json Stable (advanced, gated)

The node's verification and trust posture, used by consumers to weight a node. Required: schema_version, verification_method, trust_tier, verification_state, trust_score. Optional: last_checked_at, failure_count. The tier ranking, highest to lowest, is dns_txt, plugin_signed, well_known, meta_tag, manual. verification_state is one of pending, verified, failed, grace, revoked. The tier is mirrored into manifest.json.trust_tier.

json
{"schema_version":"1.0.0","verification_method":"dns_txt","trust_tier":"dns_txt","verification_state":"verified","last_checked_at":"2026-06-12T06:00:00Z","failure_count":0,"trust_score":0.95}

openapi.json Stable (advanced, gated)

A standard OpenAPI 3.1 document describing the website's own API surface, when present. Validated against the OpenAPI 3.1 meta-schema, not a bAInquet record schema. The only bAInquet constraints are a thin wrapper: openapi must be 3.1.x, and info.x-bainquet-node-version should echo the node version label.

json
{"openapi":"3.1.0","info":{"title":"Acme Cafe API","version":"1.0.0","x-bainquet-node-version":"2026-06-12.1"},"paths":{}}

Advanced exporters Stable

These files are built and emitted when the plan-derived capability is present (advanced/business; embeddings-manifest.json additionally requires the Q-11 embeddings flag). llms-full.txt is a free, always-emitted file.

FilePurpose
llms-full.txtExpanded, deterministic human-narrative dump of high-importance facts and entities (free, zero LLM).
embeddings-manifest.jsonMaps chunk_* ids to engine-stored pgvector rows or a sidecar; vectors are never inlined.
priority-feed.jsonlRanked feed of the highest-importance records for crawlers that fetch a subset first.
changes.jsonlChange feed versus the previous published version.
versions.jsonHistory index of published versions, ULID-ordered.
citations.jsonlPer-record citation provenance for attribution-aware consumers.
mcp.jsonA Model Context Protocol manifest advertising the node as an MCP resource surface.

Documented limits

A few advanced files carry limits the publish pipeline tightens over time: versions.json lists the current version (full persisted history and prior manifest checksums are backfilled by the publisher), and changes.jsonl reports file-level changes versus the previous version (record-level diffing is a planned refinement). The files are emitted and schema-valid today.

Determinism

Given identical graph content and a frozen build timestamp, exporter output is byte-identical. Exporters never use the wall clock, RNG, or map-iteration order; they sort by stable keys. This is what lets the cas/ store copy unchanged files forward by reference and lets golden-file tests pin the output.

Hard rules enforced on export

  • A hidden fact is never emitted.
  • Only the surviving fact per (subject, predicate) is emitted; superseded and merged facts are dropped.
  • Only resolved relationships are emitted; dangling edges are suppressed.
  • Stale Q&A is excluded entirely.
  • Editorial and internal fields (editorialStatus, pinned, reviewedBy, originalValue) are never serialized.

Owner-controlled structured data for AI.