darcsweb - reviews/post-claude-efficiency.md

[root] / reviews / post-claude-efficiency.md
# Post-change efficiency review — darcsweb

Scope: the uncommitted darcs changes on `app/Main.hs`, `src/DarcsWeb/Config.hs`,
`src/DarcsWeb/Darcs.hs`, `src/DarcsWeb/Html.hs`, `test/Properties/Clone.hs`,
`test/Properties/Html.hs` as captured in `/tmp/darcs-diff.txt`.

The diff contains three clear efficiency wins (one-pass summary, hash-first
patch lookup, native `Text` `esc`) plus a few new allocation hotspots
introduced by the breadcrumb refactor. Findings are ordered by severity.

---

## Findings

### F1 — MED: `esc` emits one `Text` per character on the render hot path

1. Severity: MED
2. File and line: `src/DarcsWeb/Html.hs:464`-`src/DarcsWeb/Html.hs:472`
3. Inefficiency: The new `esc = T.concatMap escChar` is strictly faster than
   the previous `T.pack . HtmlPure.esc . T.unpack` bridge (no `String`
   intermediate, no `unpack/pack`, no per-character `(:)` list cells on the
   boxed heap), so the change is a net win — but the chosen implementation
   still allocates one fresh `Text` per non-special character (`escChar c =
   T.singleton c`) and `concatMap` glues them together. `Data.Text`'s
   `concatMap` is internally `concat . map`, so escaping a 1 MiB diff
   produces ~1 M single-character `Text` values and then a concat. That is
   why `esc` remains the dominant cost on the patch-detail page even after
   this rewrite.
4. When it becomes a real problem: any page that HTML-escapes large `Text`,
   in particular `renderPatchDetail` → `highlightDiff (psDiff ps)` at
   `src/DarcsWeb/Html.hs:255` and `src/DarcsWeb/Html.hs:476`-`src/DarcsWeb/Html.hs:483`.
   For a repository with multi-megabyte diffs this quickly dominates render
   time and transient heap residency.
5. Optimization direction: replace `T.concatMap escChar` with a single
   pass that finds the next escapable char via `T.break (\c -> c == '<' ||
   c == '>' || c == '&' || c == '"' || c == '\'')`, appends the unmodified
   prefix in one go, then the escape, and recurses. Equivalence with
   `HtmlPure.esc` is already pinned by
   `test/Properties/Html.hs:39`-`test/Properties/Html.hs:41`, so the swap is
   safe. Alternatively build into a `Data.Text.Lazy.Builder` and force once
   at the end; this would also let `highlightDiff` stream without
   `T.concat . map ... . T.lines` at `src/DarcsWeb/Html.hs:477`.

   Verdict on the original question "is native `esc` actually faster than
   the `String` round-trip": yes, clearly, but the remaining per-char
   allocation keeps `esc` on the hot path.

---

### F2 — MED: `getRepoSummary` still materialises every patch's summary text just to count them

1. Severity: MED
2. File and line: `src/DarcsWeb/Darcs.hs:200`-`src/DarcsWeb/Darcs.hs:222`
   (specifically `allSummaries = mapRL extractPatchListing patchRL` at 205
   and `count = length allSummaries` forced at 210).
3. Inefficiency: `getRepoSummary` computes `allSummaries` via
   `mapRL extractPatchListing patchRL`, where `extractPatchListing` calls
   `renderString $ summary p` per patch (`src/DarcsWeb/Darcs.hs:385`-`src/DarcsWeb/Darcs.hs:392`).
   Then `evaluate count` forces the full list, so every patch — not just
   the first `maxPatchCount` — is rendered to produce a summary `Text`,
   even though the summary page only displays `take 10 recent` (see
   `app/Main.hs:257`) and the tag list. Compared with the old
   `listRepos + getRepoPatches + getRepoTags` trio this is still a net win
   because the old code did two full traversals (`getRepoPatches` rendered
   500 listings, `getRepoTags` rendered `mapRL extractPatchListing` over
   the full set), but the new hot path still pays for rendering a listing
   for every patch — i.e. `count` and `tags` on a 50 k‑patch repo force
   50 k `renderString` calls per summary request.
4. When it becomes a real problem: summary requests on large repos (tens
   of thousands of patches) or under load. Users hitting `/repo/X/summary`
   pay full O(N) summary rendering for every request.
5. Optimization direction:
   - For `count`, use `lengthRL patchRL` (or equivalently
     `length (mapRL info patchRL)`) — it avoids calling
     `extractPatchListing`, which is what makes the per-element work
     expensive. `PatchInfo` is already cheap.
   - For `tags`, filter on `info` first (using `isTag`) and only run
     `extractPatchListing` on the tags. The equivalent already exists in
     `getBasicRepoInfo` at `src/DarcsWeb/Darcs.hs:115`-`src/DarcsWeb/Darcs.hs:127`
     (maps `info` only).
   - `recent` already uses `take maxPatchCount`, which is fine.
   - The intent "one traversal instead of three" is preserved; the change
     is just to separate the cheap `PatchInfo` traversal from the
     expensive summary rendering.

   Verdict on the original question "does `getRepoSummary` actually save
   work vs the old 3-call summary": yes, it removes the `listRepos` scan
   of every repository in `cfgRepoDir` and deduplicates one
   `readPatches`, which are the big-O wins. But it regresses one sub-axis
   (all patches are now rendered to listings rather than just
   `maxPatchCount + all-tags`), which is why the above tightening
   matters.

---

### F3 — LOW/MED: Tree- and blob-breadcrumb builders are quadratic in path depth

1. Severity: LOW→MED (depth-dependent)
2. File and line: `src/DarcsWeb/Html.hs:322`-`src/DarcsWeb/Html.hs:326`
   and `src/DarcsWeb/Html.hs:344`-`src/DarcsWeb/Html.hs:347`.
3. Inefficiency: Both breadcrumb builders use
   `scanl (\acc p -> acc ++ [p]) [] parts` to build the list of cumulative
   prefixes, then call `encodePathList prefix` (which re-encodes every
   segment from scratch via `T.intercalate "/" . map encodePathSegment`,
   `src/DarcsWeb/Html.hs:363`-`src/DarcsWeb/Html.hs:364`) on each prefix.
   `acc ++ [p]` is O(n) in the prefix length, so `scanl` is O(k²) in list
   cells for a path of depth k, and the per-crumb re-encoding via
   `encodePathList` makes the total work O(k² · avg_segment_encode_cost).
4. When it becomes a real problem: deep tree/blob paths (say 10+
   directory components) or adversarial requests that push the depth
   high. For typical repos this is dwarfed by the `esc` cost, but the
   algorithmic shape is still worse than the old linear `buildCrumbs`
   recursion that extended `acc` by appending `Text` on each step. This
   is a regression in asymptotic shape introduced by the refactor —
   small in absolute terms for realistic depths, but worth fixing while
   the code is being touched.
5. Optimization direction: build the href incrementally by carrying the
   already-encoded prefix as `Text` instead of a `[Text]`, e.g. using
   `foldl'` with an accumulator of type `(Text, Text)` where the first
   component is the cumulative percent-encoded path and the second is
   the rendered breadcrumb `Text`. That drops the per-step `acc ++ [p]`
   and the repeated `map encodePathSegment` over the entire prefix.

---

### F4 — LOW: `renderTreeRow` recomputes `pathSegments subPath` and
`encodePathSegment repoName` per entry

1. Severity: LOW
2. File and line: `src/DarcsWeb/Html.hs:388`-`src/DarcsWeb/Html.hs:406`.
3. Inefficiency: Inside `map (renderTreeRow repoName subPath) entries`
   (called from `renderTreeTable` at `src/DarcsWeb/Html.hs:384`), every
   row recomputes `nameSeg = encodePathSegment repoName` and
   `entrySegs = pathSegments subPath ++ [name]`. Both are constant in
   `repoName`/`subPath` across the whole page, but are re-evaluated per
   entry because they are bound inside `renderTreeRow`. For a directory
   with N files, that is N redundant path-segment splits plus N
   redundant percent-encodings of `repoName`.
4. When it becomes a real problem: large directories (hundreds to
   thousands of entries). The cost is small per row, but it is pure
   duplicated work introduced by the refactor — the pre-change code
   called `esc repoName` per row as well, so this is a wash in number
   of per-row string walks but strictly more work than necessary.
5. Optimization direction: hoist `nameSeg` and `pathSegments subPath`
   to `renderTreeTable` and pass them in. The same pattern applies (to
   a smaller degree) to `renderShortLogRow`, `renderFullLogEntry`,
   `renderTagRow`, and `renderRepoRow`, which each re-run
   `encodePathSegment repoName` per row — acceptable, but if
   `renderTreeTable` is fixed the others become mechanical to match.

---

### F5 — LOW: Matched patch hashes its `PatchInfo` twice in `getRepoPatch`

1. Severity: LOW
2. File and line: `src/DarcsWeb/Darcs.hs:163`-`src/DarcsWeb/Darcs.hs:166`
   together with `src/DarcsWeb/Darcs.hs:406`-`src/DarcsWeb/Darcs.hs:412`.
3. Inefficiency: `selectMatching` calls `patchHashText (info piap)` to
   test against `targetHash`, and `extractPatchFull`→`patchInfoToSummary`
   stores the same hash in `psHash` via another `patchHashText pinfo`
   call. For the matched patch, the SHA-1 is computed twice. This is
   not hot — it is O(1) per request — but the allocator-visible call
   pattern was changed in this diff and the duplication is new.
4. When it becomes a real problem: essentially never on its own; noted
   for completeness because the efficiency review was asked to focus on
   `getRepoPatch`'s short-circuit.
5. Optimization direction: thread the already-computed hash `Text`
   into `extractPatchFull`, or have `selectMatching` return a
   pre-filled `PatchSummary` with `psHash = targetHash`.

   Verdict on the original question "does `getRepoPatch` short-circuit
   correctly so that non-matching patches never trigger diff
   rendering": **yes**. `selectMatching` computes the SHA-1 only and
   wraps the expensive `extractPatchFull` inside the `Just` branch of
   the hash equality test, so non-matching patches cost one SHA-1 per
   `PatchInfo` and never enter `renderString $ showPatch ForDisplay p`.
   This is the main correctness win of the refactor.

---

## Things that were checked and found OK

- `safeCanonicalize` at `src/DarcsWeb/Darcs.hs:241`-`src/DarcsWeb/Darcs.hs:246`
  adds one `try` per resolution; the IO cost is negligible next to the
  underlying `canonicalizePath` syscall chain.
- Strict `ByteString` blob read + `decodeUtf8'` at
  `src/DarcsWeb/Darcs.hs:365`-`src/DarcsWeb/Darcs.hs:373` eliminates the
  previous lazy-handle leak and is a strict efficiency win.
- The NUL-byte sniff at `src/DarcsWeb/Darcs.hs:377`-`src/DarcsWeb/Darcs.hs:378`
  uses `BS.take 8192` before `BS.elem`, so it is bounded regardless of
  file size.
- `isCloneSubPath` at `src/DarcsWeb/Darcs.hs:283`-`src/DarcsWeb/Darcs.hs:293`
  runs a small `splitPath` + pattern match; O(1) in the request shape.
- `parsePort` at `app/Main.hs:127`-`app/Main.hs:133` runs once at startup.
- `cfgLookupDefault` now uses `fromMaybe` at
  `src/DarcsWeb/Config.hs:40`; semantically identical, marginally less
  allocation, not performance-relevant.
- The Scotty route changes in `app/Main.hs` do not change the number of
  passes over the repository.

---

## Paths worth benchmarking

- `/repo/:name/summary` on a repository with a large patch count — to
  validate both the net win over the old 3-call version and the
  remaining summary-render cost flagged in F2.
- `/repo/:name/patch/:hash` on a repository with many large patches —
  to confirm that non-matching requests no longer render every diff
  (F5 context) and to measure `esc`/`highlightDiff` cost (F1).
- `/repo/:name/tree/<deep-path>` — to confirm whether F3's O(k²) shape
  is observable at realistic depths.

## Load assumptions behind the review

- Typical request rate is modest (single-user or small-team darcsweb
  host), so per-request constant factors matter more than throughput.
- Repositories can be large (thousands of patches; individual diffs
  up to a few MiB). This is where F1 and F2 dominate.
- No reverse-proxy cache is assumed in front of darcsweb; every
  request hits the handlers.

## Scalability questions not verifiable from code alone

- Whether `mapRL` in the installed `darcs-2.18.5` is strict or lazy in
  its output list. If it is lazy, `getRepoPatch`'s list comprehension
  `[p | Just p <- selected]` with `(p:_)` already stops at the first
  match with no further work. If it is strict, all N `PatchInfo`s are
  walked and hashed regardless — still vastly cheaper than the old
  "render all diffs" code, but not constant time.
- Whether `renderString $ summary p` in `extractPatchListing` is a
  real hot allocator on the measured repo sizes, or whether darcs
  caches it internally via the `hopefullyM` branch. F2's fix is safe
  either way, but its size depends on this.