Architecture audit: LSP / salsa / rowan

An audit comparing Panache’s incremental language-server design against rust-analyzer and salsa, recording the gaps worth adopting.

Audit date: 2026-06-09. Compares panache’s incremental + language-server architecture against rust-analyzer’s source and salsa’s documented patterns, and records which patterns are worth adopting. This is a reference, not a mandate — the gaps in §3 are recommendations, ordered by leverage.

Panache is modeled, like rust-analyzer, on a single-writer salsa database with snapshot reads over a lossless rowan CST. The headline finding is that the core is already closely aligned — in several dimensions panache is more aligned than a from-scratch port would be: it assigns durability tiers, keeps its symbol/project index inside salsa, uses a salsa accumulator, interns paths/labels, and already ships an experimental incremental reparse. The genuine gaps relative to rust-analyzer are therefore narrower and different from what a generic checklist would flag: the two with real leverage are a filesystem-inside-the-database boundary violation and no_eq firewall coverage, not “add durability” or “pull the index into salsa.”

Reference sources


1. Current architecture (verified)

Language server — src/lsp.rs, src/lsp/

lsp-server 0.7 (synchronous) driving a crossbeam_channel::select! event loop (main_loop, src/lsp.rs:87). The main loop is the sole salsa writer: all mutations flow through did_open/did_change/did_save/did_close on the main thread, which call SalsaDb::update_file_text* (src/salsa.rs:1793). Worker threads receive a StateSnapshot (src/lsp/global_state.rs:101) carrying a cloned SalsaDb handle (an Arc bump — Send, !Sync) plus an Arc<DocumentMap>. Crucially the snapshot never carries a rowan::SyntaxNode (a !Send cursor); only GreenNodes travel, and workers root their own cursors locally.

Two thread pools (src/lsp/global_state.rs:205-210): a main pool sized to the physical core count, and a dedicated single-thread fmt_pool so external formatters (which can block for hundreds of milliseconds) never starve hover/completion latency — the same split rust-analyzer makes.

Scheduling combines three mechanisms (src/lsp/global_state.rs:36,220-225, src/lsp/dispatch.rs:469-493):

  • DebounceDIAGNOSTICS_DEBOUNCE = 200ms; each did_change resets the per-URI deadline, so a keystroke burst collapses into one lint pass.
  • Generation counterslint_generations is bumped on schedule and again on dispatch; a result tagged with a stale generation is dropped in on_task, backstopping the finish-during-edit race.
  • Coalescing — latest version per URI wins.

$/cancelRequest is honored, and read handlers are wrapped in catch_cancelled (src/lsp/helpers.rs:22), which catches a salsa Cancelled unwind and maps it to an LSP ContentModified response. Document sync is incremental, with UTF-16↔︎UTF-8 position conversion in src/lsp/conversions.rs. Diagnostics are published keyed by (uri, generation); the LSP version field is left None (generation counters serve the same role internally).

Incremental layer — src/salsa.rs

salsa 0.26 (src/salsa.rs:14). The database trait is Db: salsa::Database (:1755) with one concrete SalsaDb { storage, file_cache } (:1762).

InputsFileText { text } (:16) and FileConfig { config } (:22), keyed per file. InternedInternedPath<'db> and InternedLabel<'db> (:28-38) deduplicate path and label allocations across the project graph and definition indexes.

Firewall queryrefdef_set(file, config) (:71) lifts the reference-definition label set out of the parse. It returns a RefdefMap, and salsa value-equality on the underlying set is order-independent, so a paragraph edit that doesn’t change any [label]: url definition short-circuits here and downstream consumers see no invalidation pulse. This is panache’s one true rust-analyzer-style firewall.

Parseparsed_document(file, config) (:103, no_eq, unsafe(non_update_types)) caches ParsedDocument { green, errors }: the GreenNode (Arc-backed, Send + Sync) plus the embedded-sublanguage (malformed YAML) syntax errors, parsed once and shared by the whole per-document pipeline. no_eq is correct here — GreenNode is not Eq — and invalidation rides purely on the input text via refdef_set.

Derived queries (≈17 total) include per-file symbol_usage_index, definition_index, heading_outline, citation_definition_index, bibliography_index, the consolidated built_in_lint_plan (:266), and the multi-file project_graph (:1631). Cross-file diagnostics are surfaced through a #[salsa::accumulator] GraphDiagnostic (:1627).

Durability — all three tiers are in use: LOW for open-buffer edits, MEDIUM for config, HIGH for files loaded from disk (src/salsa.rs:1794,1840, documented in Language Server).

A file_cache: Arc<Mutex<HashMap<PathBuf, FileText>>> (:1764) maps path → input entity so sibling files can be looked up by path. See §3 (G2) — this side-table is also where the filesystem leaks into the database.

rowan usage

GreenNodeBuilder drives a single forward pass in the parser (crates/panache-parser/src/parser/core.rs). Typed AST is newtype AstNode wrappers over SyntaxNode (crates/panache-parser/src/syntax/), the rust-analyzer pattern. Panache goes beyond whole-file reparse with an experimental incremental suffix reparse (parse_incremental_suffix_with_refdefs, src/lsp/documents.rs:216), gated by runtime_settings.experimental_incremental_parsing, with section-window and restart-offset strategies. There are no SyntaxNodePtr/AstPtr (stable cross-edit pointers) — none are needed yet.

Linter & formatter

The linter walks the tree once into a LintIndex bucketed by SyntaxKind, then runs every rule against that index (src/linter/). built_in_lint_plan (src/salsa.rs:266) is a salsa query that returns built-in diagnostics plus staged ExternalLintJobs; external linters run on demand off-thread, never inside a salsa query (queries must stay pure).

The formatter core lives in crates/panache-formatter/ behind a host bridge (src/formatter.rs). The LSP format_document handler calls crate::format (src/lsp/handlers/formatting.rs:34), which re-parses the text fresh rather than reading the cached parsed_document — unlike hover/symbols, which use the cached green tree from the snapshot.

Disk cache — src/cache.rs

A persistent CLI-side lint/format cache keyed on file, config, and tool fingerprints. It is CLI-only (the LSP relies on salsa’s in-memory memoization). The tool_fingerprint = panache@{version} does not change on a dev rebuild — a known footgun captured in AGENTS.md (“Disk lint cache trap”).


2. Already aligned with rust-analyzer (no action)

Pattern rust-analyzer panache
Single-writer db + snapshot reads AnalysisHost::apply_change is the only writer; many Analysis snapshots main loop sole writer; StateSnapshot clones for workers
Cancel-on-edit a write bumps the revision → in-flight reads unwind Cancelled; ide boundary catches newer edit cancels in-flight reads → catch_cancelledContentModified
Debounce alternative latency intents + cancel/retry 200ms debounce + generation counters + coalescing
Firewall + backdating thin derived queries stable under churn refdef_set (set-equal → backdates) firewalls the parse
Lossless CST + on-demand typed AST rowan green nodes + zero-cost AstNode identical model
Parse never fails (tree, Vec<SyntaxError>) ParsedDocument { green, errors }
Format isolated from latency-sensitive reads dedicated handling dedicated single-thread fmt_pool

Three further patterns put panache ahead of a minimal port — do not churn them:

  • Durability tiers already assigned (LOW/MEDIUM/HIGH). Many salsa ports never set durability; panache does.
  • Symbol/project index lives inside salsa (symbol_usage_index, definition_index, project_graph) rather than as an external Arc-swapped structure — so it participates in incremental invalidation for free.
  • Accumulator + interning are both in idiomatic use (GraphDiagnostic, InternedPath/InternedLabel).

3. Gaps to adopt (ordered by leverage)

3.1 Type-level read/write split — highest leverage, lowest risk

NoteAdopted

StateSnapshot now carries a read-only crate::salsa::Analysis newtype instead of a raw SalsaDb, exposing workers only snap.db() -> &dyn Db. Because every salsa mutation (set_text/set_config, update_file_text, evict_file_text, ensure_file_text_cached) requires &mut, the shared-only view removes all &mut reachability from worker threads at compile time — the single-writer rule is now a type-level guarantee on the read path. The &mut-capable handle stays private to the main loop’s GlobalState.salsa. (Input creation via &dyn Db remains possible but is benign — a fresh orphan id that mutates nothing — mirroring rust-analyzer, whose with_db does not block it either.)

  • rust-analyzer: AnalysisHost is the only writer (apply_change); Analysis is an immutable snapshot exposing only &self reads, each routed through with_db(|db| …)Cancelled::catch. The single-writer rule is a compile-time guarantee.
  • panache: “the main loop is the sole writer” is only a convention. StateSnapshot.db (src/lsp/global_state.rs:102) is a full SalsaDb, so a worker owns a clone and could call &mut setters or create inputs — nothing in the types prevents it.
  • Recommend: an Analysis-style newtype wrapping SalsaDb and exposing only the read queries (parsed_tree_root, symbol_usage_index, definition_index, project_graph, …) handed to workers; keep the &mut-capable handle private to the main loop. This encodes the invariant the snapshot doc comment already relies on. Files: src/salsa.rs (newtype + read API), src/lsp/global_state.rs (StateSnapshot), src/lsp/handlers/*.

3.2 Filesystem access inside the database — panache-specific, high impact

NoteAdopted

Db::file_text is now a pure cache lookup: it returns None for any path the writer has not loaded and never touches std::fs or creates an input off a &self read path. Discovery-and-load moved to the writer in SalsaDb::load_referenced_files — a fixpoint that runs project_graph, loads every referenced include/sibling/bibliography file from disk, and repeats until the referenced set is stable. The LSP drives it from did_open/did_save/ did_close, the debounced lint dispatch (so includes added mid-edit load on the writer, not lazily on a worker), and the file watcher; the CLI lint path calls the same method up front.

Because file_text is a plain trait method (not a tracked query), an absent lookup records no salsa dependency, so loading a file would otherwise leave project_graph/metadata memos stale. A CacheGeneration salsa input that both queries read, bumped by the writer after each load, supplies the missing invalidation. project_graph also now records project-walk siblings with add_document before the file_text check so unloaded siblings stay visible to the fixpoint. The residual filesystem reads inside project_graph (the find_project_documents walk and collect_includes existence checks) are unchanged and remain tracked by G3.

  • rust-analyzer: the vfs crate provides consistent file-system snapshots; all file contents enter the database as inputs via apply_change on the writer. Queries never touch the filesystem — the database is a pure function of its inputs.
  • panache: SalsaDb::get_or_load_file_text (src/salsa.rs:1777) reads std::fs and creates a FileText input from a &self read path. It is reached through Db::file_text (:1880) from inside project_graph and definition_index_with_includes (src/lsp/global_state.rs:169) — i.e. on worker threads, during what is nominally a read. Three consequences:
    1. queries become non-deterministic w.r.t. the filesystem (a sibling file changing on disk is observed without going through a tracked input write);
    2. salsa inputs are created off the single writer;
    3. file contents live in a Mutex side-table outside salsa’s dependency graph.
  • Recommend: discover-and-load includes/bibliography files on the writer (main thread) when the project graph first references them, then make Db::file_text a pure lookup that returns None for unloaded paths. This restores query purity and the single-writer invariant. Dovetails with G3 (the file_cache becomes the FileId/VFS map). Files: src/salsa.rs, src/lsp/global_state.rs, src/lsp/documents.rs.

3.3 FileId / SourceRoot — promoted by G2: enables per-file invalidation

NoteAdopted (core)

A FileId newtype plus a vfs-style path↔︎id map (src/salsa.rs) subsumes the former file_cache. file_text is now a #[salsa::input] whose value is Option<Arc<str>> — an absent (referenced-but-unloaded) file is the input set to None, so a query reading it takes a real per-file dependency on that file’s presence, and absent (None) stays distinguishable from present-but-empty (Some("")) for the bibliography “failed to read” diagnostic.

The global CacheGeneration counter is deleted. Discovery convergence now rides a FileSet salsa input (the set of interned ids): project_graph reads it, and the writer’s load_referenced_files fixpoint interns each referenced path (adding its id to the set — re-running project_graph) then loads it from disk (flipping NoneSome — re-running through the per-file dependency). Both channels live inside the dependency graph. The key granularity win: metadata reads no global firewall (its bibliography dependency is the per-file bib input), so a sibling load no longer invalidates unrelated documents’ metadata memos.

The <memory> sentinel is retired: the path-keyed tracked queries (project_graph, metadata, built_in_lint_plan, definition_index, citation_definition_index, symbol_usage_index, heading_outline, yaml_*) no longer take a PathBuf; they resolve the document path from the FileText identity via a VFS reverse map (Db::path_of). An in-memory buffer is a distinct FileId with path_of == None (fixing a latent collision where two untitled buffers shared one "<memory>" key); a pathless buffer yields an empty project_graph.

Deferred (bullet 3 / bullet 4 below): grouping inputs into SourceRoots as the durability unit (durability stays per-input — HIGH for disk loads, MEDIUM for FileSet/config, LOW for open buffers), and folding the residual find_project_documents / collect_includes filesystem reads behind a VFS directory-listing input. The file watcher still bridges those residual fs reads, but via a targeted intern_file (re-running only project_graph) rather than the old global bump.

  • rust-analyzer: an opaque FileId plus a vfs; handlers convert URI → FileId at the boundary so paths never leak into analysis, and SourceRoots are the unit durability is assigned to. Critically, file_text is an input keyed by FileId whose value is Option<Arc<str>> — an absent file is an input set to None, so a query that reads it takes a real salsa dependency on that file’s presence.
  • panache: keys directly on PathBuf; in-memory buffers fall back to a <memory> sentinel path (src/lsp/global_state.rs:161); durability is set per-input ad hoc. Db::file_text is a plain trait method (a cache lookup), not a tracked input — so an absent lookup records no dependency. G2 worked around this with a single global CacheGeneration input that project_graph/metadata read and the writer bumps after every load.

Why this is now the highest-value next step

The CacheGeneration firewall G2 introduced is correct but coarse: every load bumps one global counter, invalidating all project_graph/metadata memos, not just the consumers of the file that changed. Loads are infrequent today (open/save/watcher/debounce, never per-keystroke) and the tool is single-root, so the blast radius is tolerable — but it scales with (open documents × files loaded), and it is a manual invalidation channel sitting outside the dependency graph, which is exactly the smell §3.2 set out to remove. Adopting a FileId-keyed tracked file_text input replaces it with salsa’s own per-file invalidation:

  • An unloaded path is an input set to None; project_graph reading it takes a dependency on that file’s presence. The writer setting it to Some(text) invalidates only the queries that read that file — no global pulse.
  • CacheGeneration and its read-sites in project_graph/metadata, plus the manual bump_cache_generation calls in load_referenced_files and the file watcher, can then be deleted.
  • The load_referenced_files fixpoint stays (discovery is still parse-driven — a VFS does not enumerate includes for us), but its passes converge through ordinary input writes rather than a generation counter.

Design

  • A thin FileId newtype + a vfs-style path↔︎id map subsuming file_cache. Handlers convert URI → FileId at the boundary so PathBuf stops leaking into analysis (retires the <memory> sentinel: in-memory buffers are just a FileId with no backing path).
  • file_text becomes #[salsa::input] keyed by FileId, value Option<Arc<str>>. The writer creates the (possibly None) input on first reference and sets contents on load; absent-vs-empty stays distinguishable (preserving the bibliography “failed to read” diagnostic that a plain empty string would lose).
  • Group inputs into SourceRoots so library/installed/bibliography files get Durability::HIGH as a unit instead of per-input ad hoc.
  • Optionally fold the residual filesystem reads §3.2 left in project_graph (the find_project_documents walk and collect_includes existence checks) behind the VFS snapshot, making project_graph a pure function of its inputs — the last piece of the §3.2 goal.
  • Files: src/salsa.rs (FileId, tracked file_text, delete CacheGeneration), src/lsp/global_state.rs, src/lsp/documents.rs, src/main.rs (load_referenced_files caller). Also the foundation for future multi-root workspaces.

3.4 Firewall coverage / no_eq overuse — medium leverage

  • rust-analyzer: a core invariant is that “typing inside a function’s body never invalidates global derived data.” Thin, Eq-returning firewall queries backdate and stop invalidation from propagating across files.
  • panache: many tracked queries are no_eq, unsafe(non_update_types)parsed_document, metadata, the yaml_* family, built_in_lint_plan, citation_definition_index, bibliography_index. no_eq defeats backdating: even when the structured output is unchanged, an edit pulses invalidation into every downstream consumer. Only refdef_set currently acts as a backdating firewall, and it sits above the parse — so cross-file queries that depend on a sibling’s definition_index/heading_outline may re-run on edits that don’t change the relevant structure.
  • Recommend: add an incremental-invalidation test (a panache analog of ravel’s tests/salsa_incremental.rs) asserting that a paragraph-body edit in one file reuses the project_graph / definition_index / heading_outline memos of other files. Where the assertion fails, interpose a thin Eq-returning firewall query the way refdef_set already firewalls the parse (e.g. a heading_anchors(file) returning a BTreeSet that backdates when the anchor set is unchanged). Files: src/salsa.rs, new test under tests/.
  • Status (addressed for the cross-file leak): measurement showed the per-file indices the recommendation names — definition_index, heading_outline, symbol_usage_index — already return Eq types and backdate, and an edit to one file never touches another’s FileText input, so their sibling memos were never the leak. The one real cross-file over-invalidation was project_graph: it read each member’s full parse (parsed_tree_root -> no_eq parsed_document), so a body edit in any member re-ran the entire graph (and pulsed the writer load fixpoint + navigation). Fixed by lifting the range-free edge data into an Eq firewall query project_edges(file) (plus a backdating file_is_present(file) presence flag) and adding project_structure(root) — the structural graph the writer fixpoint, definition_index_with_includes, and the navigation/workspace-symbol handlers now consume. project_graph is unchanged and remains the GraphDiagnostic accumulator source, because include/duplicate diagnostics carry byte ranges that must track edits (and the cross-doc “first defined at path:line” message couples files through line numbers, so a clean diagnostics split is deferred). The incremental-invalidation test lives in src/salsa.rs (body_edit_in_member_reuses_structural_and_sibling_memos plus a structural_edit_in_member_reexecutes_project_structure over-suppression guard), built on a salsa WillExecute event-log harness, rather than a separate tests/ file — it reuses the existing salsa test fixtures and the private logging constructor.
  • Remaining: two follow-ups are deliberately open. (1) project_graph still re-runs on a member’s body edit (it is the diagnostics source), so the per-file diagnostics firewall above is unstarted. (2) The wider no_eq family the section opens with — metadata, the yaml_* queries, built_in_lint_plan, citation_definition_index, bibliography_index — was not swept: the new test only asserts the project_structure / sibling per-file-index axis. Extending the harness with cases like “a body edit in sibling A reuses B’s built_in_lint_plan” and firewalling any that fail is the natural next increment.

3.5 Structural incremental reparse + stable pointers — highest effort, gate on benchmarks

  • rust-analyzer: incremental_reparse tries reparse_token (edit inside one token), then reparse_block (edit inside one subtree → re-parse just that block and splice the new GreenNode), then full reparse. Immutable, structurally-shared green nodes make splicing reuse untouched subtrees. SyntaxNodePtr/AstPtr survive reparses for source maps.
  • panache: has an experimental suffix reparse (re-parse from a restart offset to end-of-document); no structural block splice, no stable pointers.
  • Recommend: (a) benchmark the current suffix reparse first — it may already be enough; the structural reparse_token/reparse_block splice is strictly more work and only pays off if profiling shows suffix reparse dominating on large documents. (b) Add SyntaxNodePtr/AstPtr only when a feature needs a stable cross-edit reference — none does today, so adopt-when-needed. Files: crates/panache-parser/src/parser/, hooked into src/salsa.rs.

4. Secondary observations (no separate gap)

  • No thread-intent priority. panache isolates formatting on its own pool but has no ThreadIntent::{Worker, LatencySensitive} analog on the main pool, so a heavy lint can still delay a latency-sensitive hover. Measured 2026-06 on a 12-physical-core box with temporary per-read wait=/run= instrumentation: read wait≈0 across hover/symbol/definition/formatting even with concurrent lint, and the contended burst (many docs sharing a .bib linting at once) never pushed in-flight lints near the pool size. The portable ThreadIntent analog (shared pool + priority queue + lint cap) is therefore deferred until a real burst, or a low-core machine, shows wait tracking lint duration.
  • Format path re-parses. Resolved. format_document and format_range (src/lsp/handlers/formatting.rs) now format via crate::format_with_tree, reusing the salsa-cached parse through StateSnapshot::parsed_tree (the same parse hover/symbols read) instead of parsing fresh — saving a parse per format request (two per range format). Each falls back to a fresh parse only if the document isn’t open. Idempotency is unaffected.
  • Disk-cache fingerprint trap. Already captured in AGENTS.md: the dev-build tool fingerprint doesn’t change on a code edit, so the CLI can serve stale cached diagnostics. A development footgun, not an architecture gap — restated here for completeness.
  • Mixed diagnostics plumbing. built_in_lint_plan threads diagnostics through its return value while project_graph uses a #[salsa::accumulator]. Both are idiomatic; the split is explicit and fine — noted as a difference, not a gap.