Skip to content

Consuming a node

This page is for the read side: how an LLM, agent, crawler, or MCP client reads a published Knowledge Node and gets accurate, owner-verified data without scraping HTML.

A node is a set of static files on a CDN. There is exactly one entry point, a fixed recommended read order, and a clear choice between pinning an immutable version and following the live pointer.

Start at the manifest

manifest.json is the entry point. It lists every file present with its checksum, byte size, and record count, so a consumer can decide what to fetch before fetching it. Read the manifest first, then branch on what you need.

If you have ai.json, it is the friendlier starting descriptor: its entry field points to manifest.json, and its consumes field is an ordered hint of which record files to read.

json
{"schema_version":"1.0.0","brand":"bainquet","website_id":"acme-cafe","entry":"manifest.json","description":"Structured AI node for Acme Cafe","languages":["en","de"],"license":"CC-BY-4.0","contact":"hello@acme-cafe.example","refresh_hint_seconds":86400,"consumes":["facts.en.jsonl","entities.en.jsonl","chunks.en.jsonl"]}

Two different "manifests"

The per-site node manifest.json described here is the entry point for one website's published node. It is distinct from the platform discovery manifest at https://api.bainquet.online/.well-known/bainquet/manifest.json, which is for connector authors and links to the integration recipe, OpenAPI, and schemas. If you are reading a website's knowledge, you want the node manifest.json.

  1. ai.json (or manifest.json directly): identify the node, its languages, default language, license, and refresh hint. Read ai.json.consumes for the suggested file order.
  2. manifest.json: get the authoritative files[] list. Pick the per-language variants you need (facts.en.jsonl, not facts.jsonl, when the node is multilingual). Use each entry's checksum to skip files you already have, and records and bytes to budget the fetch.
  3. trust.json (when present) or manifest.json.trust_tier: decide how much to trust this node before using its data. See trust weighting below.
  4. The record files you need, branching on each record's schema_version:
    • facts.jsonl for typed claims (price, hours, attributes), each with confidence, provenance_method, and a source_id.
    • entities.jsonl for the things facts are about; join facts.jsonl.subject_entity_id to entities.jsonl.id.
    • chunks.jsonl for retrieval-sized text passages, each linking to entity_ids and a source_id.
    • qa.jsonl for ready-made question/answer pairs with fact_ids provenance.
    • relationships.jsonl for typed edges between entities (only resolved edges are present).
    • sources.jsonl to resolve any source_id back to its URL, checksum, and per-source trust.

A minimal consumer can answer most questions from facts.jsonl + entities.jsonl + sources.jsonl. A retrieval pipeline adds chunks.jsonl. A Q&A assistant can serve qa.jsonl directly.

Read records as line-delimited JSON

Every .jsonl file is one JSON object per line, no array wrapper. Stream it line by line. Each line is independent and validates against its own schema, so a consumer never has to load a whole file into memory.

python
import json, urllib.request

base = "https://cdn.bainquet.online/bainquet/<siteHash>/latest/"
manifest = json.load(urllib.request.urlopen(base + "manifest.json"))

# Pick the facts file for the default language from the manifest.
lang = manifest["default_language"]
facts_file = next(f for f in manifest["files"]
                  if f["path"] == f"facts.{lang}.jsonl")

# Stream it line by line.
for raw in urllib.request.urlopen(base + facts_file["path"]):
    fact = json.loads(raw)
    if fact["confidence"] >= 0.9:
        print(fact["subject"], fact["predicate"], fact.get("display", fact["value"]))

Branch on schema_version per record. A node may carry files at different schema versions during a rollout, and unknown fields must be ignored for forward compatibility.

Trust weighting

Before acting on a node's data, weight it:

  • trust.json.verification_state should be verified. A grace, failed, or revoked node has lapsed or lost ownership proof; treat its data with caution or skip it.
  • trust.json.trust_tier (mirrored in manifest.json.trust_tier) ranks how ownership was proven, highest to lowest: dns_txt, plugin_signed, well_known, meta_tag, manual.
  • Per record, provenance_method and confidence tell you the source quality. On the free tier everything is deterministic (cms_field, schema_org, seo_meta, text_extraction). On paid tiers, llm_inferred records are model-generated and carry the lower confidence band; separate them if your use case needs only owner-supplied data.
  • Per source, sources.jsonl.trust carries a derived score you can use to rank competing claims.

Immutable versions versus the live pointer

The CDN serves two views of every node under bainquet/{siteHash}/:

  • latest/ is the mutable serving copy. latest/manifest.json is the pointer flipped last on every publish, with a short cache plus stale-while-revalidate. Read from latest/ to always get the current node.
  • versions/{ulid}/ holds an immutable, per-version copy of every file, served with a one-year immutable cache header. Read the node_version ULID from the manifest, then read from versions/{ulid}/ to pin a build that will never change under you. This is the right choice for reproducible runs, caching, and citations.

The ulid in node_version is the monotonic ordering key; the node_version_label (2026-06-12.1) is display only. Compare ULIDs, never labels, to tell which of two builds is newer.

Because immutable files are content-addressed and deduped, an unchanged file keeps the same checksum across versions. A consumer that caches by manifest.json.files[].checksum can skip refetching any file whose checksum has not moved.

Reading through an MCP server

For paid websites, bAInquet runs an MCP server that exposes one website's verified graph to MCP clients (Claude Desktop, IDEs, agents) as read-only tools over stdio JSON-RPC. Instead of fetching files, the client calls tools:

ToolArgumentsReturns
search_website_knowledgewebsiteId, query, limit? (1-50, default 10)ranked entity, fact, and chunk hits
get_entity_factswebsiteId, entityIdthe merge-resolved facts for one entity (hidden facts excluded)
list_node_fileswebsiteIdthe file list of the latest published node version

Every tool requires a websiteId and returns only that website's data. All tools are read-only; none mutate the graph. A limit out of range is clamped, not rejected.

A typical Claude Desktop configuration points at the server binary and a database URL:

json
{
  "mcpServers": {
    "bainquet": {
      "command": "node",
      "args": ["/absolute/path/to/bainquet/apps/mcp/dist/server.js"],
      "env": {
        "DATABASE_URL": "postgres://bainquet:bainquet_local_pw@localhost:5432/bainquet"
      }
    }
  }
}

In production, each paid website gets its own logical MCP endpoint; the cap.mcp plan gate is checked before any tool runs.

Files or MCP

Use the CDN files for broad, cacheable, public consumption (crawlers, batch pipelines, citations). Use MCP for interactive, query-driven access where the client wants to search and drill into one website's graph live.

Owner-controlled structured data for AI.