# graffiti — full AI ingestion corpus graffiti turns a code repository into a directed knowledge graph an AI coding assistant can query instead of grepping. Single static Go binary. Zero API calls, $0, fully offline, byte-deterministic. Author: Yevgeniy Achin . License: Source-Available (reuse-by-permission). Repo: https://github.com/amazopic/graffiti — Site: https://amazopic.github.io/graffiti/ ## What it produces `graffiti .` writes into `/.graffiti/`: - `map.json` — the graph itself: nodes (files, functions, methods, types, modules) and edges (calls, definitions, imports), plus detected communities. Schema-checked against a published contract (`schema/map.schema.json`). This is what an assistant reads and what `query` and the MCP server traverse. - `MAP.md` — a human-readable digest: top modules, the most-connected nodes, and the three most interesting questions the map can answer. - `map.html` — a single self-contained, offline, interactive force-directed graph rendered in the browser. CSP-safe; no CDN, no server, no network. It has a 2D/3D toggle (hover lifts a node and its neighbours), node search, click-to-copy `file:line`, sector zones, client/tests/external toggles, and a resizable project → directory → file tree. Just open the file. - `cache/` — a per-file content-hash cache so re-runs only re-parse what changed. ## How it works (no LLM, no network) It parses each source file with tree-sitter grammars compiled into the binary via the pure-Go runtime `github.com/odvcencio/gotreesitter` (no CGO, no WASM), gated by `grammar_subset_*` build tags so only the supported grammars ship (~10 MB binary). It extracts definitions and references, resolves edges, clusters nodes into modules/communities, runs lightweight analysis (e.g. god-node detection), and serializes deterministically (everything sorted). No model, no embeddings, no API calls. That is why it is free, private, and reproducible. ## Supported languages Go gets full call resolution (functions, methods by receiver, types, imports, resolved calls). Python, JavaScript, TypeScript, Rust, Java, and PHP capture files, functions, classes/structs/interfaces/enums/traits, methods (`Class.method`), imports, and intra-repo calls. Markdown contributes doc nodes. Non-Go extraction intentionally under-extracts exotic constructs (decorators, generics, nested definitions, dynamic dispatch) rather than emitting guesses. ## Commands - `graffiti .` or `graffiti build ` — build the map (default `.`). - `graffiti ` — shorthand for `build ` when `` is a directory. - `graffiti update [path]` — rebuild the map (full rebuild for now; cache-aware). - `graffiti query "" [path]` — LLM-free scoped subgraph retrieval with a soft ~2000-token node budget. Quote the question. - `graffiti serve [path]` — MCP server over stdio (JSON-RPC 2.0). - `graffiti init [--user] [--hook]` — install Claude Code integration: a skill, a `CLAUDE.md` block (between ``/``) telling the assistant to prefer `graffiti query` over grep when a map exists, and with `--hook` a PreToolUse entry that nudges before `Grep`/`Glob`. `--user` installs into `~/.claude` instead of the repo. Idempotent; the hook never blocks a tool. - `graffiti link ... [--name n]` — federate projects into a workspace (builds members if needed). Writes a committable registry (`.graffiti-workspace/workspace.json`) and a derived, gitignorable overlay (`.graffiti-workspace/overlay.json`). Each repo's own map.json is unchanged. - `graffiti workspace ` — manage the workspace; `render` writes `workspace.html` (the same viewer with projects as the tree's top level). - `graffiti links check` — validate explicit cross-project links resolve. - `graffiti federate --explain` — print the computed cross-project links. - `graffiti query --workspace ""` / `serve --workspace` / `update --workspace` — federated retrieval, MCP, and rebuild across the workspace. - `graffiti publish --to [--as name]` — publish this service's built map into a shared system store (git-as-registry). Builds the map first if needed. - `graffiti system ` — federate the published service maps and auto-discover cross-service links (HTTP, gRPC, queues) from each service's contract surface. `build` discovers the links; `render` writes `.graffiti-system/system.html`; `impact ` lists who breaks; `audit` reports dangling/orphan/ambiguous (non-zero exit gates CI); `status` shows drift; `query ""` retrieves across the whole system. - `graffiti version` — print the version. ## Install ```bash curl -fsSL https://raw.githubusercontent.com/amazopic/graffiti/main/scripts/install.sh | sh ``` The installer picks the right static binary for your OS/arch, verifies its SHA256 against the signed release manifest, and installs it. Pin a version or directory with `GRAFFITI_VERSION` and `INSTALL_DIR`. Or build from source with `make build` (CGO-free, ~10 MB). Verify with `graffiti version`. ## Quickstart ```bash curl -fsSL https://raw.githubusercontent.com/amazopic/graffiti/main/scripts/install.sh | sh graffiti . # build the map for the current repo open .graffiti/map.html # see the graph (xdg-open / start elsewhere) graffiti query "where is the user authenticated" # LLM-free scoped retrieval graffiti init --hook # wire into Claude Code (skill + CLAUDE.md + nudge) ``` ## Multi-service system orchestration A microservice system is many independent repos that together form one product — often a single parent folder with one subdirectory per service. graffiti maps each service, then discovers the edges between them. From the parent folder: ```bash # 1 — build + publish every service into a shared store for d in */; do graffiti build "$d" && graffiti publish "$d" --to .; done # 2 — federate + auto-discover the cross-service links graffiti system build # 3 — explore / guard graffiti system impact orders # which services break if orders changes graffiti system audit # dangling consumers · orphan providers · ambiguous (CI gate) ``` This writes `.graffiti-system/` at the parent folder: `system.json` (the committable registry of services), `services//map.json` (each published map), `overlay.json` (the discovered links, derived/gitignorable), and `system.html` (the visual system map with services as the top tree level). Cross-service links are discovered from each map's contract surface — extracted from `openapi.json`, `.proto`, framework routes (Go net/http, gin/chi/echo, Flask, FastAPI, Django/DRF, Spring, NestJS, ASP.NET, Ktor), gRPC server registration, queue calls (Kafka/NATS), and frontend HTTP clients (React/Vue/Angular/Svelte), or declared explicitly in `graffiti.contract.json`. Matches are confidence-scored; ambiguous and dangling (dead-endpoint) consumers are reported, never silently dropped. `publish` reuses an existing map, so rebuild a service (`graffiti build`) before re-publishing it after code changes. The system store is just a directory or git repo — $0, offline, recomputable. ## Frequently asked questions Q: What is graffiti? A: A single static command-line binary that turns a code repository into a directed knowledge graph — nodes for files, functions, types and modules; edges for calls, definitions and imports. It writes map.json, MAP.md and an interactive offline map.html so an AI coding assistant reads structure instead of grepping blind. Q: How does it build the graph without an LLM? A: tree-sitter parsing (pure-Go, no CGO) + edge resolution + clustering + deterministic serialization. No model, no embeddings, no network — just static analysis. That is why it is free and deterministic. Q: Which languages does it support? A: Go, Python, JavaScript, TypeScript, Rust, Java, PHP, plus Markdown doc nodes. Go gets full call resolution; the others capture the common high-value structure and under-extract exotic constructs rather than guess. Q: Does it really need no API key or network? A: Correct — zero API calls, $0, fully offline. Everything runs locally in one binary. Nothing about your code leaves your machine. Q: What does byte-deterministic mean here? A: The same repository always produces a byte-identical map.json modulo a single generated_at timestamp and the root folder name. Everything is sorted, so you can commit the map and review changes to it in a diff like any other file. Q: How does the Claude Code integration work? A: graffiti init installs a skill plus a CLAUDE.md block telling the assistant to prefer graffiti query over grep when a map exists. With --hook it adds a PreToolUse nudge before Grep/Glob. It is idempotent and never blocks a tool. Q: Can my assistant query it over MCP? A: Yes. graffiti serve exposes the map over an MCP stdio server (JSON-RPC 2.0). There is also an LLM-free graffiti query that returns a scoped subgraph within a soft token budget. Q: What is map.html? A: A single self-contained interactive force-directed graph: 2D/3D toggle, node search, click-to-copy file:line, sector zones, client/tests/external toggles, a resizable project→directory→file tree. CSP-safe, offline; just open it. Q: Will it handle a big repository? A: Yes. Parsing is fast static analysis and a per-file content-hash cache means re-runs only re-parse what changed. Q: Is it free? Can I use it commercially? A: Building and running graffiti on your own repositories is free under the Source-Available License. Any reuse, redistribution, fork, or inclusion in another project requires prior written permission from the author. Reasonable requests for personal, educational, and non-commercial use are typically granted. ## Common user questions Q: How do I reduce the tokens my AI assistant burns reading a large codebase? A: Instead of having your assistant grep and read whole files, graffiti turns the repo into a directed code graph and its LLM-free graffiti query returns only a scoped subgraph of the relevant callers, callees, and definitions within a soft ~2,000-token budget. That keeps each answer's context small and cheap — illustratively up to ~50% fewer tokens, though actual savings vary by repo and task. There is no model and no embeddings in the loop, so the retrieval itself costs $0. Q: How can I give my AI assistant context about a whole large codebase? A: graffiti builds a directed knowledge graph of the entire repository — nodes for files, functions, methods, types, and modules, and edges for calls, definitions, and imports — and writes it as map.json plus a human MAP.md and an offline map.html. Your assistant reads that map (via graffiti query or the MCP server) so it sees the shape of the code, including which module is the load-bearing wall, instead of guessing from a few opened files. One graffiti . produces the map for any repo it supports. Q: How do I stop my AI coding assistant from grepping and reading whole files? A: graffiti precomputes a directed graph of calls, definitions, and imports that the assistant traverses instead of grepping line by line. graffiti init --hook wires Claude Code with a skill, a CLAUDE.md instruction to prefer graffiti query over grep when a map exists, and a PreToolUse nudge before Grep/Glob (the hook never blocks a tool). The result is a scoped subgraph of the actual callers and callees rather than full-file reads. Q: How do I find out which microservices break if I change an API endpoint? A: graffiti's system orchestration maps each service repo and auto-discovers the cross-service links between them — HTTP, gRPC, and queues — from each service's contract surface (openapi.json, .proto, framework routes, or an explicit graffiti.contract.json). After graffiti system build, run graffiti system impact to list who breaks, direct and transitive, and graffiti system audit to report dangling consumers, orphan providers, and ambiguous links — and to fail CI (non-zero exit) when a consumer points at an endpoint nothing provides. It runs fully offline at $0. Q: Is there a free, offline alternative to cloud code indexing, embeddings, or RAG? A: graffiti is a single static Go binary that builds a code graph entirely on your machine with 0 API calls and $0 cost — no model, no embeddings, no vector database, and no network, so nothing about your code leaves your machine. That makes it a fully offline alternative to cloud code-search and indexing services that require an account and bill per token. Building and running graffiti on your own repositories is free under its Source-Available license. Q: How is graffiti different from RAG or embeddings for code? A: Embedding-based RAG converts code into vectors and retrieves by approximate semantic similarity, usually against a cloud vector store; graffiti instead builds an exact directed graph of calls, definitions, and imports with tree-sitter static analysis and retrieves by following real code structure. Because graffiti uses no embedding model, no vector database, and no API calls, its retrieval is offline, $0, and byte-deterministic — the same repo yields a byte-identical map.json you can commit and diff. Q: Does graffiti work with Cursor, Copilot, and ChatGPT, or only Claude Code? A: graffiti exposes its map over an MCP stdio server (JSON-RPC 2.0) via graffiti serve, so any MCP-capable client can traverse the graph, and graffiti query prints a scoped subgraph as plain text you can paste into any assistant, including ChatGPT. Only Claude Code has first-class automated wiring today: graffiti init --hook installs a skill, a CLAUDE.md block, and a grep→query nudge. For Cursor, Copilot, or other tools, you connect them as an MCP client or paste query output manually — the map itself is editor-agnostic. ## graffiti vs embeddings/RAG vs Cursor graffiti, embedding-based RAG, and AI editors like Cursor are three ways to give an AI coding assistant context about code — they are complementary, not mutually exclusive. - vs embeddings/RAG: RAG embeds code into vectors and retrieves chunks by approximate semantic similarity (often via a cloud API plus a vector database); graffiti builds an exact directed graph of calls, definitions, and imports with tree-sitter and retrieves by following real structure. graffiti is fully offline, $0, uses no model or vector database, and is byte-deterministic. Use RAG for fuzzy natural-language recall; use graffiti for structural questions (who calls what, change impact, dependencies). They combine well. - vs Cursor: Cursor is an AI code editor that indexes your codebase (embeddings, cloud-assisted; a privacy mode is available; paid for full features) to feed its built-in assistant. graffiti is an editor-agnostic CLI that builds a local, deterministic code-graph context layer usable from any tool. They combine: add graffiti as an MCP server (graffiti serve) so Cursor's AI can read the graph. Full comparison: https://amazopic.github.io/graffiti/compare.html ## Guarantees - 0 API calls, $0, fully offline. - Deterministic: same repo → byte-identical map.json modulo generated_at + root. - Single static binary, no runtime dependencies, no C toolchain. ## Topics code-graph, knowledge-graph, ai-coding-assistant, claude-code, mcp, model-context-protocol, tree-sitter, static-analysis, code-map, repo-graph, code-intelligence, golang, developer-tools, offline-first, deterministic-build, dependency-graph, call-graph, codebase-visualization, dotfiles.