Consuming a node
This page is for the read side: how an LLM, agent, crawler, or MCP client reads a published Knowledge Node and gets accurate, owner-verified data without scraping HTML.
A node is a set of static files on a CDN. There is exactly one entry point, a fixed recommended read order, and a clear choice between pinning an immutable version and following the live pointer.
Start at the manifest
manifest.json is the entry point. It lists every file present with its checksum, byte size, and record count, so a consumer can decide what to fetch before fetching it. Read the manifest first, then branch on what you need.
If you have ai.json, it is the friendlier starting descriptor: its entry field points to manifest.json, and its consumes field is an ordered hint of which record files to read.
{"schema_version":"1.0.0","brand":"bainquet","website_id":"acme-cafe","entry":"manifest.json","description":"Structured AI node for Acme Cafe","languages":["en","de"],"license":"CC-BY-4.0","contact":"hello@acme-cafe.example","refresh_hint_seconds":86400,"consumes":["facts.en.jsonl","entities.en.jsonl","chunks.en.jsonl"]}Two different "manifests"
The per-site node manifest.json described here is the entry point for one website's published node. It is distinct from the platform discovery manifest at https://api.bainquet.online/.well-known/bainquet/manifest.json, which is for connector authors and links to the integration recipe, OpenAPI, and schemas. If you are reading a website's knowledge, you want the node manifest.json.
Recommended read order
ai.json(ormanifest.jsondirectly): identify the node, its languages, default language, license, and refresh hint. Readai.json.consumesfor the suggested file order.manifest.json: get the authoritativefiles[]list. Pick the per-language variants you need (facts.en.jsonl, notfacts.jsonl, when the node is multilingual). Use each entry'schecksumto skip files you already have, andrecordsandbytesto budget the fetch.trust.json(when present) ormanifest.json.trust_tier: decide how much to trust this node before using its data. See trust weighting below.- The record files you need, branching on each record's
schema_version:facts.jsonlfor typed claims (price, hours, attributes), each withconfidence,provenance_method, and asource_id.entities.jsonlfor the things facts are about; joinfacts.jsonl.subject_entity_idtoentities.jsonl.id.chunks.jsonlfor retrieval-sized text passages, each linking toentity_idsand asource_id.qa.jsonlfor ready-made question/answer pairs withfact_idsprovenance.relationships.jsonlfor typed edges between entities (onlyresolvededges are present).sources.jsonlto resolve anysource_idback to its URL, checksum, and per-sourcetrust.
A minimal consumer can answer most questions from facts.jsonl + entities.jsonl + sources.jsonl. A retrieval pipeline adds chunks.jsonl. A Q&A assistant can serve qa.jsonl directly.
Read records as line-delimited JSON
Every .jsonl file is one JSON object per line, no array wrapper. Stream it line by line. Each line is independent and validates against its own schema, so a consumer never has to load a whole file into memory.
import json, urllib.request
base = "https://cdn.bainquet.online/bainquet/<siteHash>/latest/"
manifest = json.load(urllib.request.urlopen(base + "manifest.json"))
# Pick the facts file for the default language from the manifest.
lang = manifest["default_language"]
facts_file = next(f for f in manifest["files"]
if f["path"] == f"facts.{lang}.jsonl")
# Stream it line by line.
for raw in urllib.request.urlopen(base + facts_file["path"]):
fact = json.loads(raw)
if fact["confidence"] >= 0.9:
print(fact["subject"], fact["predicate"], fact.get("display", fact["value"]))Branch on schema_version per record. A node may carry files at different schema versions during a rollout, and unknown fields must be ignored for forward compatibility.
Trust weighting
Before acting on a node's data, weight it:
trust.json.verification_stateshould beverified. Agrace,failed, orrevokednode has lapsed or lost ownership proof; treat its data with caution or skip it.trust.json.trust_tier(mirrored inmanifest.json.trust_tier) ranks how ownership was proven, highest to lowest:dns_txt,plugin_signed,well_known,meta_tag,manual.- Per record,
provenance_methodandconfidencetell you the source quality. On the free tier everything is deterministic (cms_field,schema_org,seo_meta,text_extraction). On paid tiers,llm_inferredrecords are model-generated and carry the lower confidence band; separate them if your use case needs only owner-supplied data. - Per source,
sources.jsonl.trustcarries a derived score you can use to rank competing claims.
Immutable versions versus the live pointer
The CDN serves two views of every node under bainquet/{siteHash}/:
latest/is the mutable serving copy.latest/manifest.jsonis the pointer flipped last on every publish, with a short cache plus stale-while-revalidate. Read fromlatest/to always get the current node.versions/{ulid}/holds an immutable, per-version copy of every file, served with a one-year immutable cache header. Read thenode_versionULID from the manifest, then read fromversions/{ulid}/to pin a build that will never change under you. This is the right choice for reproducible runs, caching, and citations.
The ulid in node_version is the monotonic ordering key; the node_version_label (2026-06-12.1) is display only. Compare ULIDs, never labels, to tell which of two builds is newer.
Because immutable files are content-addressed and deduped, an unchanged file keeps the same checksum across versions. A consumer that caches by manifest.json.files[].checksum can skip refetching any file whose checksum has not moved.
Reading through an MCP server
For paid websites, bAInquet runs an MCP server that exposes one website's verified graph to MCP clients (Claude Desktop, IDEs, agents) as read-only tools over stdio JSON-RPC. Instead of fetching files, the client calls tools:
| Tool | Arguments | Returns |
|---|---|---|
search_website_knowledge | websiteId, query, limit? (1-50, default 10) | ranked entity, fact, and chunk hits |
get_entity_facts | websiteId, entityId | the merge-resolved facts for one entity (hidden facts excluded) |
list_node_files | websiteId | the file list of the latest published node version |
Every tool requires a websiteId and returns only that website's data. All tools are read-only; none mutate the graph. A limit out of range is clamped, not rejected.
A typical Claude Desktop configuration points at the server binary and a database URL:
{
"mcpServers": {
"bainquet": {
"command": "node",
"args": ["/absolute/path/to/bainquet/apps/mcp/dist/server.js"],
"env": {
"DATABASE_URL": "postgres://bainquet:bainquet_local_pw@localhost:5432/bainquet"
}
}
}
}In production, each paid website gets its own logical MCP endpoint; the cap.mcp plan gate is checked before any tool runs.
Files or MCP
Use the CDN files for broad, cacheable, public consumption (crawlers, batch pipelines, citations). Use MCP for interactive, query-driven access where the client wants to search and drill into one website's graph live.
Related
- Node files: every file's shape and example.
- Knowledge graph: the provenance and trust model behind the data.
- Status and scope: which read paths are shipped.