Node files
A Knowledge Node is a set of static files served from a CDN. This page documents every file a published node contains: its purpose, its shape, and a real example, plus the stable-id namespaces and the storage layout.
The entry point is always manifest.json. A consumer reads manifest.json, then ai.json, then the record files it needs, branching on each record's schema_version. See Consuming a node for the full read order, and Knowledge graph for the model these files are projected from.
Two version axes
Each file embeds a per-schema schema_version (frozen at 1.0.0). The contract document that defines these schemas carries its own contractVersion (0.3.0). They move independently: a breaking change to one file bumps only that file's schema and its schema_version const.
Storage layout
Publishing writes to object storage under a per-site root, bainquet/{siteHash}/ where siteHash = sha256(websiteId)[0:16]:
bainquet/{siteHash}/
versions/{ulid}/... # immutable, per-version copies of every file
cas/{sha[0:2]}/{sha} # content-addressed dedupe store
latest/... # mutable serving copies
latest/manifest.json # THE pointer, flipped LAST- Immutable files (
.jsonl,schema.jsonld,openapi.json) go toversions/{ulid}/with a one-year immutable cache header and are deduped through the content-addressedcas/store. - Mutable files (
manifest.json,llms.txt,ai.json,sitemap-ai.xml,trust.json) go tolatest/with a short max-age plus stale-while-revalidate header. - Atomic flip: the publisher uploads all immutable files, patches the mutable files, HEAD-verifies that every manifest-referenced file is present (
publish.atomic_conflictotherwise), writes the mutable files tolatest/, and finally writeslatest/manifest.jsonas the single last mutating PUT. Any error before the flip leaves the previously publishedlatest/fully intact. A per-site Redis lock prevents concurrent publishes (publish.conflict).
The ulid is the monotonic ordering key; the human label (2026-06-12.1) is display only. To pin a build, read from versions/{ulid}/; to always get the live build, read from latest/.
Stable-id namespaces
Every exported record carries a stable, namespace-prefixed string id. Internal Postgres uuid primary keys never appear in node files. Each id is <prefix>_<hash>, where the hash is a website-scoped hash of the record's natural key, so the id is stable across versions for the same logical record.
| Prefix | Record | Natural key |
|---|---|---|
entity_ | entity | (entityType, canonicalKey) |
fact_ | fact | (subject, predicate, language) |
rel_ | relationship | (fromStableId, relationType, toRef) |
chunk_ | AIChunk | canonical chunkId |
qa_ | synthetic Q&A | (question, language) |
src_ | source | url |
Cross-file references use these stable ids, never the uuid. For example facts.jsonl.subject_entity_id points to entities.jsonl.id, and chunks.jsonl.source_id points to sources.jsonl.id. A value matching a uuid where a stable id is required is a publish-integrity failure (node.uuid_leak).
Universal record fields
Every derived-knowledge record line (entities, facts, relationships) carries this envelope. Purely structural files (sources, sitemap) omit confidence/provenance_method; chunks may carry source_id but not confidence/provenance_method.
| Field | Type | Notes |
|---|---|---|
id | string | stable, namespace-prefixed |
schema_version | string (semver) | per-schema frozen version (1.0.0); a const per schema |
language | string (BCP-47) | per-language files repeat the language they belong to |
confidence | number [0,1] | provenance-derived (see the confidence ladder) |
provenance_method | enum | cms_field schema_org seo_meta text_extraction llm_inferred |
source_url | string (uri) | provenance |
source_id | string | points to sources.jsonl.id |
Encoding for all files: UTF-8, LF line endings. Each .jsonl file is exactly one JSON object per line, no array wrapper, no blank lines, final line ends with \n. Monetary value is always numeric; the formatted string lives in display. Timestamps are RFC 3339 UTC.
Files produced today Stable
Each of these has a built exporter module in apps/api/src/modules/exporters/internal/exporters/. The free exporters always run with zero LLM. The advanced exporters (trust, openapi) gate on plan-derived capabilities.
manifest.json
The root index and the single entry point. It lists every file present (path, content type, checksum, byte size, and record count for .jsonl), so a consumer can fetch only what it needs and skip unchanged files between versions. It is emitted last; it references only files already uploaded in the same version directory. capabilities is plan-derived (never hand-set); trust_tier mirrors trust.json.trust_tier when present.
Required fields: schema_version, node_version (ULID sort key), brand, domain, website_id, generated_at, capabilities, languages, default_language, files. Each files[] entry has path, content_type, checksum (sha256:<64 hex>), bytes, immutable, and optionally records, language, produced_by.
{"schema_version":"1.0.0","node_version":"01J9Z2K7QF8MABCD3","node_version_label":"2026-06-12.1","brand":"bainquet","domain":"bainquet.online","website_id":"acme-cafe","generated_at":"2026-06-12T08:30:00Z","capabilities":["facts","entities","relationships","chunks","qa","sources","sitemap"],"languages":["en","de"],"default_language":"en","trust_tier":"dns_txt","files":[{"path":"facts.en.jsonl","content_type":"application/x-ndjson","checksum":"sha256:3f1c0000000000000000000000000000000000000000000000000000000003f1","bytes":40213,"records":118,"language":"en","produced_by":"facts-jsonl@1.0.0","immutable":true},{"path":"manifest.json","content_type":"application/json","checksum":"sha256:aa100000000000000000000000000000000000000000000000000000000000aa","bytes":912,"immutable":false}]}ai.json
A machine "about this node" descriptor: brand, website identity, the entry path to manifest.json, a description, the languages, license, contact, a refresh hint, and an ordered consumes consumption hint. Required: schema_version, brand, website_id, entry, license.
{"schema_version":"1.0.0","brand":"bainquet","website_id":"acme-cafe","entry":"manifest.json","description":"Structured AI node for Acme Cafe","languages":["en","de"],"license":"CC-BY-4.0","contact":"hello@acme-cafe.example","refresh_hint_seconds":86400,"consumes":["facts.en.jsonl","entities.en.jsonl","chunks.en.jsonl"]}llms.txt
A plain-text profile following the llms.txt convention: an H1 title, a blockquote summary, and a markdown link list to the key node files. It is described structurally, not JSON-Schema-validated.
# Acme Cafe — AI Node
> Specialty coffee roaster and cafe in Lisbon. Structured for AI consumption by bAInquet.
## Files
- [Facts](facts.en.jsonl): typed facts
- [Entities](entities.en.jsonl): people, products, placesentities.jsonl (per-language: entities.<lang>.jsonl)
One entity per line: a thing the site is about. Required: id (entity_*), schema_version, type, name, language, confidence, provenance_method. The open attributes bag is the only field that permits extension.
{"id":"entity_017","schema_version":"1.0.0","type":"Product","name":"Single-Origin Ethiopia","language":"en","confidence":0.95,"provenance_method":"schema_org","source_id":"src_004","attributes":{"roast":"light"}}facts.jsonl (per-language: facts.<lang>.jsonl)
One typed subject-predicate-value statement per line. The value stays typed (string, number, boolean, or object); monetary values stay numeric, with the formatted string in display. Required: id (fact_*), schema_version, subject, predicate, value, language, confidence, provenance_method. Optional: unit, currency, display, subject_entity_id (entity_*), importance (1-5), freshness (stable/seasonal/frequent/real_time), valid_from, valid_until.
Editorial and internal fields (editorialStatus, pinned, reviewedBy, originalValue) are never exported. A hidden fact is never emitted. Only the surviving fact per (subject, predicate) is emitted; superseded and merged facts are dropped.
{"id":"fact_042","schema_version":"1.0.0","subject":"Single-Origin Ethiopia","predicate":"price","value":16.99,"unit":"EUR","currency":"EUR","display":"16.99 EUR","subject_entity_id":"entity_017","importance":4,"freshness":"seasonal","language":"en","confidence":0.98,"provenance_method":"cms_field","source_id":"src_004"}relationships.jsonl (per-language: relationships.<lang>.jsonl)
One typed edge per line between two entities. Required: id (rel_*), schema_version, type, from_entity_id (entity_*), to_entity_id (entity_*), resolution_status, language, confidence, provenance_method. Only resolved edges are emitted; dangling edges are suppressed until the relationship-resolution job resolves them.
{"id":"rel_009","schema_version":"1.0.0","type":"PRODUCED_BY","from_entity_id":"entity_017","to_entity_id":"entity_003","resolution_status":"resolved","language":"en","confidence":0.9,"provenance_method":"text_extraction","source_id":"src_004"}chunks.jsonl (per-language: chunks.<lang>.jsonl)
One retrieval-sized text chunk per line. Required: id (chunk_*), schema_version, text, tokens_estimate, language. Optional: entity_ids (referenced entity_* ids), source_id. The token estimate is computed once at creation. Embedding vectors are never inlined here.
{"id":"chunk_125","schema_version":"1.0.0","text":"Acme Cafe roasts single-origin Ethiopian beans...","tokens_estimate":58,"entity_ids":["entity_017"],"language":"en","source_id":"src_004"}qa.jsonl (per-language: qa.<lang>.jsonl)
One synthetic question/answer pair per line, generated deterministically on the free tier. Required: id (qa_*), schema_version, question, answer, language. Optional: fact_ids (the fact_* rows the answer was derived from), entity_ids, confidence, provenance_method. Stale Q&A is excluded entirely; the internal stale flag is never serialized.
{"id":"qa_031","schema_version":"1.0.0","question":"How much is the Single-Origin Ethiopia?","answer":"16.99 EUR.","fact_ids":["fact_042"],"language":"en","confidence":0.98,"provenance_method":"cms_field"}sources.jsonl
One source per line: the provenance record that every source_id points back to. Language-agnostic file (sources are shared across languages), though each row carries its own language. Required: id (src_*), schema_version, url, content_type, last_seen_at, checksum, language, trust. Optional: canonical_url, title. The trust value is derived from the verification method strength.
{"id":"src_004","schema_version":"1.0.0","url":"https://acme-cafe.example/menu","canonical_url":"https://acme-cafe.example/menu","title":"Menu","content_type":"page","last_seen_at":"2026-06-12T07:55:00Z","checksum":"sha256:9b2e0000000000000000000000000000000000000000000000000000000009b2","language":"en","trust":0.9}schema.jsonld
A schema.org JSON-LD projection of high-importance entities and facts for general crawlers. Required: @context, @graph. Each @graph member has @type (a schema.org type) and @id (resolving to a node-local entity_* or fact_* id).
{"@context":"https://schema.org","@graph":[{"@type":"Product","@id":"#entity_017","name":"Single-Origin Ethiopia","offers":{"@type":"Offer","price":"16.99","priceCurrency":"EUR"}}]}sitemap-ai.xml
An AI-oriented sitemap (urlset) listing node files and key pages with lastmod and an ai:importance extension (1-5). The default namespace is the sitemaps schema; the bAInquet extension namespace is https://bainquet.online/ns/ai. Described structurally, not JSON-Schema-validated.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:ai="https://bainquet.online/ns/ai">
<url><loc>https://acme-cafe.example/.well-known/ai/facts.en.jsonl</loc><lastmod>2026-06-12</lastmod><ai:importance>4</ai:importance></url>
</urlset>trust.json Stable (advanced, gated)
The node's verification and trust posture, used by consumers to weight a node. Required: schema_version, verification_method, trust_tier, verification_state, trust_score. Optional: last_checked_at, failure_count. The tier ranking, highest to lowest, is dns_txt, plugin_signed, well_known, meta_tag, manual. verification_state is one of pending, verified, failed, grace, revoked. The tier is mirrored into manifest.json.trust_tier.
{"schema_version":"1.0.0","verification_method":"dns_txt","trust_tier":"dns_txt","verification_state":"verified","last_checked_at":"2026-06-12T06:00:00Z","failure_count":0,"trust_score":0.95}openapi.json Stable (advanced, gated)
A standard OpenAPI 3.1 document describing the website's own API surface, when present. Validated against the OpenAPI 3.1 meta-schema, not a bAInquet record schema. The only bAInquet constraints are a thin wrapper: openapi must be 3.1.x, and info.x-bainquet-node-version should echo the node version label.
{"openapi":"3.1.0","info":{"title":"Acme Cafe API","version":"1.0.0","x-bainquet-node-version":"2026-06-12.1"},"paths":{}}Advanced exporters Stable
These files are built and emitted when the plan-derived capability is present (advanced/business; embeddings-manifest.json additionally requires the Q-11 embeddings flag). llms-full.txt is a free, always-emitted file.
| File | Purpose |
|---|---|
llms-full.txt | Expanded, deterministic human-narrative dump of high-importance facts and entities (free, zero LLM). |
embeddings-manifest.json | Maps chunk_* ids to engine-stored pgvector rows or a sidecar; vectors are never inlined. |
priority-feed.jsonl | Ranked feed of the highest-importance records for crawlers that fetch a subset first. |
changes.jsonl | Change feed versus the previous published version. |
versions.json | History index of published versions, ULID-ordered. |
citations.jsonl | Per-record citation provenance for attribution-aware consumers. |
mcp.json | A Model Context Protocol manifest advertising the node as an MCP resource surface. |
Documented limits
A few advanced files carry limits the publish pipeline tightens over time: versions.json lists the current version (full persisted history and prior manifest checksums are backfilled by the publisher), and changes.jsonl reports file-level changes versus the previous version (record-level diffing is a planned refinement). The files are emitted and schema-valid today.
Determinism
Given identical graph content and a frozen build timestamp, exporter output is byte-identical. Exporters never use the wall clock, RNG, or map-iteration order; they sort by stable keys. This is what lets the cas/ store copy unchanged files forward by reference and lets golden-file tests pin the output.
Hard rules enforced on export
- A hidden fact is never emitted.
- Only the surviving fact per
(subject, predicate)is emitted; superseded and merged facts are dropped. - Only
resolvedrelationships are emitted;danglingedges are suppressed. - Stale Q&A is excluded entirely.
- Editorial and internal fields (
editorialStatus,pinned,reviewedBy,originalValue) are never serialized.
Related
- Knowledge graph: the model these files project from.
- Consuming a node: the read order for an AI consumer.
- Status and scope: what is shipped versus planned.