Architecture audit: LSP / salsa / rowan
Audit date: 2026-06-09. Compares panache’s incremental + language-server architecture against rust-analyzer’s source and salsa’s documented patterns, and records which patterns are worth adopting. This is a reference, not a mandate — the gaps in §3 are recommendations, ordered by leverage.
Panache is modeled, like rust-analyzer, on a single-writer salsa database with snapshot reads over a lossless rowan CST. The headline finding is that the core is already closely aligned — in several dimensions panache is more aligned than a from-scratch port would be: it assigns durability tiers, keeps its symbol/project index inside salsa, uses a salsa accumulator, interns paths/labels, and already ships an experimental incremental reparse. The genuine gaps relative to rust-analyzer are therefore narrower and different from what a generic checklist would flag: the two with real leverage are a filesystem-inside-the-database boundary violation and no_eq firewall coverage, not “add durability” or “pull the index into salsa.”
Reference sources
- salsa book — overview and durability: https://salsa-rs.netlify.app/overview.html, https://salsa-rs.netlify.app/reference/durability.html
- salsa
Cancelled: https://docs.rs/salsa/latest/salsa/enum.Cancelled.html - rust-analyzer architecture guide: https://rust-analyzer.github.io/book/contributing/architecture.html
- rust-analyzer
AnalysisHost/Analysis(crates/ide/src/lib.rs): https://github.com/rust-lang/rust-analyzer/blob/master/crates/ide/src/lib.rs - rust-analyzer
main_loop.rs/global_state.rs: https://github.com/rust-lang/rust-analyzer/tree/master/crates/rust-analyzer/src - rowan incremental reparse (
reparse_token/reparse_block): https://github.com/rust-lang/rust-analyzer/blob/master/crates/syntax/src/parsing/reparsing.rs
1. Current architecture (verified)
Language server — src/lsp.rs, src/lsp/
lsp-server 0.7 (synchronous) driving a crossbeam_channel::select! event loop (main_loop, src/lsp.rs:87). The main loop is the sole salsa writer: all mutations flow through did_open/did_change/did_save/did_close on the main thread, which call SalsaDb::update_file_text* (src/salsa.rs:1793). Worker threads receive a StateSnapshot (src/lsp/global_state.rs:101) carrying a cloned SalsaDb handle (an Arc bump — Send, !Sync) plus an Arc<DocumentMap>. Crucially the snapshot never carries a rowan::SyntaxNode (a !Send cursor); only GreenNodes travel, and workers root their own cursors locally.
Two thread pools (src/lsp/global_state.rs:205-210): a main pool sized to the physical core count, and a dedicated single-thread fmt_pool so external formatters (which can block for hundreds of milliseconds) never starve hover/completion latency — the same split rust-analyzer makes.
Scheduling combines three mechanisms (src/lsp/global_state.rs:36,220-225, src/lsp/dispatch.rs:469-493):
- Debounce —
DIAGNOSTICS_DEBOUNCE = 200ms; eachdid_changeresets the per-URI deadline, so a keystroke burst collapses into one lint pass. - Generation counters —
lint_generationsis bumped on schedule and again on dispatch; a result tagged with a stale generation is dropped inon_task, backstopping the finish-during-edit race. - Coalescing — latest version per URI wins.
$/cancelRequest is honored, and read handlers are wrapped in catch_cancelled (src/lsp/helpers.rs:22), which catches a salsa Cancelled unwind and maps it to an LSP ContentModified response. Document sync is incremental, with UTF-16↔︎UTF-8 position conversion in src/lsp/conversions.rs. Diagnostics are published keyed by (uri, generation); the LSP version field is left None (generation counters serve the same role internally).
Incremental layer — src/salsa.rs
salsa 0.26 (src/salsa.rs:14). The database trait is Db: salsa::Database (:1755) with one concrete SalsaDb { storage, file_cache } (:1762).
Inputs — FileText { text } (:16) and FileConfig { config } (:22), keyed per file. Interned — InternedPath<'db> and InternedLabel<'db> (:28-38) deduplicate path and label allocations across the project graph and definition indexes.
Firewall query — refdef_set(file, config) (:71) lifts the reference-definition label set out of the parse. It returns a RefdefMap, and salsa value-equality on the underlying set is order-independent, so a paragraph edit that doesn’t change any [label]: url definition short-circuits here and downstream consumers see no invalidation pulse. This is panache’s one true rust-analyzer-style firewall.
Parse — parsed_document(file, config) (:103, no_eq, unsafe(non_update_types)) caches ParsedDocument { green, errors }: the GreenNode (Arc-backed, Send + Sync) plus the embedded-sublanguage (malformed YAML) syntax errors, parsed once and shared by the whole per-document pipeline. no_eq is correct here — GreenNode is not Eq — and invalidation rides purely on the input text via refdef_set.
Derived queries (≈17 total) include per-file symbol_usage_index, definition_index, heading_outline, citation_definition_index, bibliography_index, the consolidated built_in_lint_plan (:266), and the multi-file project_graph (:1631). Cross-file diagnostics are surfaced through a #[salsa::accumulator] GraphDiagnostic (:1627).
Durability — all three tiers are in use: LOW for open-buffer edits, MEDIUM for config, HIGH for files loaded from disk (src/salsa.rs:1794,1840, documented in Language Server).
A file_cache: Arc<Mutex<HashMap<PathBuf, FileText>>> (:1764) maps path → input entity so sibling files can be looked up by path. See §3 (G2) — this side-table is also where the filesystem leaks into the database.
rowan usage
GreenNodeBuilder drives a single forward pass in the parser (crates/panache-parser/src/parser/core.rs). Typed AST is newtype AstNode wrappers over SyntaxNode (crates/panache-parser/src/syntax/), the rust-analyzer pattern. Panache goes beyond whole-file reparse with an experimental incremental suffix reparse (parse_incremental_suffix_with_refdefs, src/lsp/documents.rs:216), gated by runtime_settings.experimental_incremental_parsing, with section-window and restart-offset strategies. There are no SyntaxNodePtr/AstPtr (stable cross-edit pointers) — none are needed yet.
Linter & formatter
The linter walks the tree once into a LintIndex bucketed by SyntaxKind, then runs every rule against that index (src/linter/). built_in_lint_plan (src/salsa.rs:266) is a salsa query that returns built-in diagnostics plus staged ExternalLintJobs; external linters run on demand off-thread, never inside a salsa query (queries must stay pure).
The formatter core lives in crates/panache-formatter/ behind a host bridge (src/formatter.rs). The LSP format_document handler calls crate::format (src/lsp/handlers/formatting.rs:34), which re-parses the text fresh rather than reading the cached parsed_document — unlike hover/symbols, which use the cached green tree from the snapshot.
Disk cache — src/cache.rs
A persistent CLI-side lint/format cache keyed on file, config, and tool fingerprints. It is CLI-only (the LSP relies on salsa’s in-memory memoization). The tool_fingerprint = panache@{version} does not change on a dev rebuild — a known footgun captured in AGENTS.md (“Disk lint cache trap”).
2. Already aligned with rust-analyzer (no action)
| Pattern | rust-analyzer | panache |
|---|---|---|
| Single-writer db + snapshot reads | AnalysisHost::apply_change is the only writer; many Analysis snapshots |
main loop sole writer; StateSnapshot clones for workers |
| Cancel-on-edit | a write bumps the revision → in-flight reads unwind Cancelled; ide boundary catches |
newer edit cancels in-flight reads → catch_cancelled → ContentModified |
| Debounce alternative | latency intents + cancel/retry | 200ms debounce + generation counters + coalescing |
| Firewall + backdating | thin derived queries stable under churn | refdef_set (set-equal → backdates) firewalls the parse |
| Lossless CST + on-demand typed AST | rowan green nodes + zero-cost AstNode |
identical model |
| Parse never fails | (tree, Vec<SyntaxError>) |
ParsedDocument { green, errors } |
| Format isolated from latency-sensitive reads | dedicated handling | dedicated single-thread fmt_pool |
Three further patterns put panache ahead of a minimal port — do not churn them:
- Durability tiers already assigned (LOW/MEDIUM/HIGH). Many salsa ports never set durability; panache does.
- Symbol/project index lives inside salsa (
symbol_usage_index,definition_index,project_graph) rather than as an externalArc-swapped structure — so it participates in incremental invalidation for free. - Accumulator + interning are both in idiomatic use (
GraphDiagnostic,InternedPath/InternedLabel).
3. Gaps to adopt (ordered by leverage)
3.1 Type-level read/write split — highest leverage, lowest risk
StateSnapshot now carries a read-only crate::salsa::Analysis newtype instead of a raw SalsaDb, exposing workers only snap.db() -> &dyn Db. Because every salsa mutation (set_text/set_config, update_file_text, evict_file_text, ensure_file_text_cached) requires &mut, the shared-only view removes all &mut reachability from worker threads at compile time — the single-writer rule is now a type-level guarantee on the read path. The &mut-capable handle stays private to the main loop’s GlobalState.salsa. (Input creation via &dyn Db remains possible but is benign — a fresh orphan id that mutates nothing — mirroring rust-analyzer, whose with_db does not block it either.)
- rust-analyzer:
AnalysisHostis the only writer (apply_change);Analysisis an immutable snapshot exposing only&selfreads, each routed throughwith_db(|db| …)→Cancelled::catch. The single-writer rule is a compile-time guarantee. - panache: “the main loop is the sole writer” is only a convention.
StateSnapshot.db(src/lsp/global_state.rs:102) is a fullSalsaDb, so a worker owns a clone and could call&mutsetters or create inputs — nothing in the types prevents it. - Recommend: an
Analysis-style newtype wrappingSalsaDband exposing only the read queries (parsed_tree_root,symbol_usage_index,definition_index,project_graph, …) handed to workers; keep the&mut-capable handle private to the main loop. This encodes the invariant the snapshot doc comment already relies on. Files:src/salsa.rs(newtype + read API),src/lsp/global_state.rs(StateSnapshot),src/lsp/handlers/*.
3.2 Filesystem access inside the database — panache-specific, high impact
Db::file_text is now a pure cache lookup: it returns None for any path the writer has not loaded and never touches std::fs or creates an input off a &self read path. Discovery-and-load moved to the writer in SalsaDb::load_referenced_files — a fixpoint that runs project_graph, loads every referenced include/sibling/bibliography file from disk, and repeats until the referenced set is stable. The LSP drives it from did_open/did_save/ did_close, the debounced lint dispatch (so includes added mid-edit load on the writer, not lazily on a worker), and the file watcher; the CLI lint path calls the same method up front.
Because file_text is a plain trait method (not a tracked query), an absent lookup records no salsa dependency, so loading a file would otherwise leave project_graph/metadata memos stale. A CacheGeneration salsa input that both queries read, bumped by the writer after each load, supplies the missing invalidation. project_graph also now records project-walk siblings with add_document before the file_text check so unloaded siblings stay visible to the fixpoint. The residual filesystem reads inside project_graph (the find_project_documents walk and collect_includes existence checks) are unchanged and remain tracked by G3.
- rust-analyzer: the
vfscrate provides consistent file-system snapshots; all file contents enter the database as inputs viaapply_changeon the writer. Queries never touch the filesystem — the database is a pure function of its inputs. - panache:
SalsaDb::get_or_load_file_text(src/salsa.rs:1777) readsstd::fsand creates aFileTextinput from a&selfread path. It is reached throughDb::file_text(:1880) from insideproject_graphanddefinition_index_with_includes(src/lsp/global_state.rs:169) — i.e. on worker threads, during what is nominally a read. Three consequences:- queries become non-deterministic w.r.t. the filesystem (a sibling file changing on disk is observed without going through a tracked input write);
- salsa inputs are created off the single writer;
- file contents live in a
Mutexside-table outside salsa’s dependency graph.
- Recommend: discover-and-load includes/bibliography files on the writer (main thread) when the project graph first references them, then make
Db::file_texta pure lookup that returnsNonefor unloaded paths. This restores query purity and the single-writer invariant. Dovetails with G3 (thefile_cachebecomes the FileId/VFS map). Files:src/salsa.rs,src/lsp/global_state.rs,src/lsp/documents.rs.
3.3 FileId / SourceRoot — promoted by G2: enables per-file invalidation
A FileId newtype plus a vfs-style path↔︎id map (src/salsa.rs) subsumes the former file_cache. file_text is now a #[salsa::input] whose value is Option<Arc<str>> — an absent (referenced-but-unloaded) file is the input set to None, so a query reading it takes a real per-file dependency on that file’s presence, and absent (None) stays distinguishable from present-but-empty (Some("")) for the bibliography “failed to read” diagnostic.
The global CacheGeneration counter is deleted. Discovery convergence now rides a FileSet salsa input (the set of interned ids): project_graph reads it, and the writer’s load_referenced_files fixpoint interns each referenced path (adding its id to the set — re-running project_graph) then loads it from disk (flipping None→Some — re-running through the per-file dependency). Both channels live inside the dependency graph. The key granularity win: metadata reads no global firewall (its bibliography dependency is the per-file bib input), so a sibling load no longer invalidates unrelated documents’ metadata memos.
The <memory> sentinel is retired: the path-keyed tracked queries (project_graph, metadata, built_in_lint_plan, definition_index, citation_definition_index, symbol_usage_index, heading_outline, yaml_*) no longer take a PathBuf; they resolve the document path from the FileText identity via a VFS reverse map (Db::path_of). An in-memory buffer is a distinct FileId with path_of == None (fixing a latent collision where two untitled buffers shared one "<memory>" key); a pathless buffer yields an empty project_graph.
Deferred (bullet 3 / bullet 4 below): grouping inputs into SourceRoots as the durability unit (durability stays per-input — HIGH for disk loads, MEDIUM for FileSet/config, LOW for open buffers), and folding the residual find_project_documents / collect_includes filesystem reads behind a VFS directory-listing input. The file watcher still bridges those residual fs reads, but via a targeted intern_file (re-running only project_graph) rather than the old global bump.
- rust-analyzer: an opaque
FileIdplus avfs; handlers convert URI →FileIdat the boundary so paths never leak into analysis, andSourceRoots are the unit durability is assigned to. Critically,file_textis an input keyed byFileIdwhose value isOption<Arc<str>>— an absent file is an input set toNone, so a query that reads it takes a real salsa dependency on that file’s presence. - panache: keys directly on
PathBuf; in-memory buffers fall back to a<memory>sentinel path (src/lsp/global_state.rs:161); durability is set per-input ad hoc.Db::file_textis a plain trait method (a cache lookup), not a tracked input — so an absent lookup records no dependency. G2 worked around this with a single globalCacheGenerationinput thatproject_graph/metadataread and the writer bumps after every load.
Why this is now the highest-value next step
The CacheGeneration firewall G2 introduced is correct but coarse: every load bumps one global counter, invalidating all project_graph/metadata memos, not just the consumers of the file that changed. Loads are infrequent today (open/save/watcher/debounce, never per-keystroke) and the tool is single-root, so the blast radius is tolerable — but it scales with (open documents × files loaded), and it is a manual invalidation channel sitting outside the dependency graph, which is exactly the smell §3.2 set out to remove. Adopting a FileId-keyed tracked file_text input replaces it with salsa’s own per-file invalidation:
- An unloaded path is an input set to
None;project_graphreading it takes a dependency on that file’s presence. The writer setting it toSome(text)invalidates only the queries that read that file — no global pulse. CacheGenerationand its read-sites inproject_graph/metadata, plus the manualbump_cache_generationcalls inload_referenced_filesand the file watcher, can then be deleted.- The
load_referenced_filesfixpoint stays (discovery is still parse-driven — a VFS does not enumerate includes for us), but its passes converge through ordinary input writes rather than a generation counter.
Design
- A thin
FileIdnewtype + avfs-style path↔︎id map subsumingfile_cache. Handlers convert URI →FileIdat the boundary soPathBufstops leaking into analysis (retires the<memory>sentinel: in-memory buffers are just aFileIdwith no backing path). file_textbecomes#[salsa::input]keyed byFileId, valueOption<Arc<str>>. The writer creates the (possiblyNone) input on first reference and sets contents on load; absent-vs-empty stays distinguishable (preserving the bibliography “failed to read” diagnostic that a plain empty string would lose).- Group inputs into
SourceRoots so library/installed/bibliography files getDurability::HIGHas a unit instead of per-input ad hoc. - Optionally fold the residual filesystem reads §3.2 left in
project_graph(thefind_project_documentswalk andcollect_includesexistence checks) behind the VFS snapshot, makingproject_grapha pure function of its inputs — the last piece of the §3.2 goal. - Files:
src/salsa.rs(FileId, trackedfile_text, deleteCacheGeneration),src/lsp/global_state.rs,src/lsp/documents.rs,src/main.rs(load_referenced_filescaller). Also the foundation for future multi-root workspaces.
3.4 Firewall coverage / no_eq overuse — medium leverage
- rust-analyzer: a core invariant is that “typing inside a function’s body never invalidates global derived data.” Thin,
Eq-returning firewall queries backdate and stop invalidation from propagating across files. - panache: many tracked queries are
no_eq, unsafe(non_update_types)—parsed_document,metadata, theyaml_*family,built_in_lint_plan,citation_definition_index,bibliography_index.no_eqdefeats backdating: even when the structured output is unchanged, an edit pulses invalidation into every downstream consumer. Onlyrefdef_setcurrently acts as a backdating firewall, and it sits above the parse — so cross-file queries that depend on a sibling’sdefinition_index/heading_outlinemay re-run on edits that don’t change the relevant structure. - Recommend: add an incremental-invalidation test (a panache analog of ravel’s
tests/salsa_incremental.rs) asserting that a paragraph-body edit in one file reuses theproject_graph/definition_index/heading_outlinememos of other files. Where the assertion fails, interpose a thinEq-returning firewall query the wayrefdef_setalready firewalls the parse (e.g. aheading_anchors(file)returning aBTreeSetthat backdates when the anchor set is unchanged). Files:src/salsa.rs, new test undertests/. - Status (addressed for the cross-file leak): measurement showed the per-file indices the recommendation names —
definition_index,heading_outline,symbol_usage_index— already returnEqtypes and backdate, and an edit to one file never touches another’sFileTextinput, so their sibling memos were never the leak. The one real cross-file over-invalidation wasproject_graph: it read each member’s full parse (parsed_tree_root->no_eqparsed_document), so a body edit in any member re-ran the entire graph (and pulsed the writer load fixpoint + navigation). Fixed by lifting the range-free edge data into anEqfirewall queryproject_edges(file)(plus a backdatingfile_is_present(file)presence flag) and addingproject_structure(root)— the structural graph the writer fixpoint,definition_index_with_includes, and the navigation/workspace-symbol handlers now consume.project_graphis unchanged and remains theGraphDiagnosticaccumulator source, because include/duplicate diagnostics carry byte ranges that must track edits (and the cross-doc “first defined at path:line” message couples files through line numbers, so a clean diagnostics split is deferred). The incremental-invalidation test lives insrc/salsa.rs(body_edit_in_member_reuses_structural_and_sibling_memosplus astructural_edit_in_member_reexecutes_project_structureover-suppression guard), built on a salsaWillExecuteevent-log harness, rather than a separatetests/file — it reuses the existing salsa test fixtures and the private logging constructor. - Remaining: two follow-ups are deliberately open. (1)
project_graphstill re-runs on a member’s body edit (it is the diagnostics source), so the per-file diagnostics firewall above is unstarted. (2) The widerno_eqfamily the section opens with —metadata, theyaml_*queries,built_in_lint_plan,citation_definition_index,bibliography_index— was not swept: the new test only asserts theproject_structure/ sibling per-file-index axis. Extending the harness with cases like “a body edit in sibling A reuses B’sbuilt_in_lint_plan” and firewalling any that fail is the natural next increment.
3.5 Structural incremental reparse + stable pointers — highest effort, gate on benchmarks
- rust-analyzer:
incremental_reparsetriesreparse_token(edit inside one token), thenreparse_block(edit inside one subtree → re-parse just that block and splice the newGreenNode), then full reparse. Immutable, structurally-shared green nodes make splicing reuse untouched subtrees.SyntaxNodePtr/AstPtrsurvive reparses for source maps. - panache: has an experimental suffix reparse (re-parse from a restart offset to end-of-document); no structural block splice, no stable pointers.
- Recommend: (a) benchmark the current suffix reparse first — it may already be enough; the structural
reparse_token/reparse_blocksplice is strictly more work and only pays off if profiling shows suffix reparse dominating on large documents. (b) AddSyntaxNodePtr/AstPtronly when a feature needs a stable cross-edit reference — none does today, so adopt-when-needed. Files:crates/panache-parser/src/parser/, hooked intosrc/salsa.rs.
4. Secondary observations (no separate gap)
- No thread-intent priority. panache isolates formatting on its own pool but has no
ThreadIntent::{Worker, LatencySensitive}analog on the main pool, so a heavy lint can still delay a latency-sensitive hover. Measured 2026-06 on a 12-physical-core box with temporary per-readwait=/run=instrumentation: readwait≈0across hover/symbol/definition/formatting even with concurrent lint, and the contended burst (many docs sharing a.biblinting at once) never pushed in-flight lints near the pool size. The portableThreadIntentanalog (shared pool + priority queue + lint cap) is therefore deferred until a real burst, or a low-core machine, showswaittracking lint duration. - Format path re-parses. Resolved.
format_documentandformat_range(src/lsp/handlers/formatting.rs) now format viacrate::format_with_tree, reusing the salsa-cached parse throughStateSnapshot::parsed_tree(the same parse hover/symbols read) instead of parsing fresh — saving a parse per format request (two per range format). Each falls back to a fresh parse only if the document isn’t open. Idempotency is unaffected. - Disk-cache fingerprint trap. Already captured in
AGENTS.md: the dev-build tool fingerprint doesn’t change on a code edit, so the CLI can serve stale cached diagnostics. A development footgun, not an architecture gap — restated here for completeness. - Mixed diagnostics plumbing.
built_in_lint_planthreads diagnostics through its return value whileproject_graphuses a#[salsa::accumulator]. Both are idiomatic; the split is explicit and fine — noted as a difference, not a gap.