# Post-change efficiency review — darcsweb
Scope: the uncommitted darcs changes on `app/Main.hs`, `src/DarcsWeb/Config.hs`,
`src/DarcsWeb/Darcs.hs`, `src/DarcsWeb/Html.hs`, `test/Properties/Clone.hs`,
`test/Properties/Html.hs` as captured in `/tmp/darcs-diff.txt`.
The diff contains three clear efficiency wins (one-pass summary, hash-first
patch lookup, native `Text` `esc`) plus a few new allocation hotspots
introduced by the breadcrumb refactor. Findings are ordered by severity.
---
## Findings
### F1 — MED: `esc` emits one `Text` per character on the render hot path
1. Severity: MED
2. File and line: `src/DarcsWeb/Html.hs:464`-`src/DarcsWeb/Html.hs:472`
3. Inefficiency: The new `esc = T.concatMap escChar` is strictly faster than
the previous `T.pack . HtmlPure.esc . T.unpack` bridge (no `String`
intermediate, no `unpack/pack`, no per-character `(:)` list cells on the
boxed heap), so the change is a net win — but the chosen implementation
still allocates one fresh `Text` per non-special character (`escChar c =
T.singleton c`) and `concatMap` glues them together. `Data.Text`'s
`concatMap` is internally `concat . map`, so escaping a 1 MiB diff
produces ~1 M single-character `Text` values and then a concat. That is
why `esc` remains the dominant cost on the patch-detail page even after
this rewrite.
4. When it becomes a real problem: any page that HTML-escapes large `Text`,
in particular `renderPatchDetail` → `highlightDiff (psDiff ps)` at
`src/DarcsWeb/Html.hs:255` and `src/DarcsWeb/Html.hs:476`-`src/DarcsWeb/Html.hs:483`.
For a repository with multi-megabyte diffs this quickly dominates render
time and transient heap residency.
5. Optimization direction: replace `T.concatMap escChar` with a single
pass that finds the next escapable char via `T.break (\c -> c == '<' ||
c == '>' || c == '&' || c == '"' || c == '\'')`, appends the unmodified
prefix in one go, then the escape, and recurses. Equivalence with
`HtmlPure.esc` is already pinned by
`test/Properties/Html.hs:39`-`test/Properties/Html.hs:41`, so the swap is
safe. Alternatively build into a `Data.Text.Lazy.Builder` and force once
at the end; this would also let `highlightDiff` stream without
`T.concat . map ... . T.lines` at `src/DarcsWeb/Html.hs:477`.
Verdict on the original question "is native `esc` actually faster than
the `String` round-trip": yes, clearly, but the remaining per-char
allocation keeps `esc` on the hot path.
---
### F2 — MED: `getRepoSummary` still materialises every patch's summary text just to count them
1. Severity: MED
2. File and line: `src/DarcsWeb/Darcs.hs:200`-`src/DarcsWeb/Darcs.hs:222`
(specifically `allSummaries = mapRL extractPatchListing patchRL` at 205
and `count = length allSummaries` forced at 210).
3. Inefficiency: `getRepoSummary` computes `allSummaries` via
`mapRL extractPatchListing patchRL`, where `extractPatchListing` calls
`renderString $ summary p` per patch (`src/DarcsWeb/Darcs.hs:385`-`src/DarcsWeb/Darcs.hs:392`).
Then `evaluate count` forces the full list, so every patch — not just
the first `maxPatchCount` — is rendered to produce a summary `Text`,
even though the summary page only displays `take 10 recent` (see
`app/Main.hs:257`) and the tag list. Compared with the old
`listRepos + getRepoPatches + getRepoTags` trio this is still a net win
because the old code did two full traversals (`getRepoPatches` rendered
500 listings, `getRepoTags` rendered `mapRL extractPatchListing` over
the full set), but the new hot path still pays for rendering a listing
for every patch — i.e. `count` and `tags` on a 50 k‑patch repo force
50 k `renderString` calls per summary request.
4. When it becomes a real problem: summary requests on large repos (tens
of thousands of patches) or under load. Users hitting `/repo/X/summary`
pay full O(N) summary rendering for every request.
5. Optimization direction:
- For `count`, use `lengthRL patchRL` (or equivalently
`length (mapRL info patchRL)`) — it avoids calling
`extractPatchListing`, which is what makes the per-element work
expensive. `PatchInfo` is already cheap.
- For `tags`, filter on `info` first (using `isTag`) and only run
`extractPatchListing` on the tags. The equivalent already exists in
`getBasicRepoInfo` at `src/DarcsWeb/Darcs.hs:115`-`src/DarcsWeb/Darcs.hs:127`
(maps `info` only).
- `recent` already uses `take maxPatchCount`, which is fine.
- The intent "one traversal instead of three" is preserved; the change
is just to separate the cheap `PatchInfo` traversal from the
expensive summary rendering.
Verdict on the original question "does `getRepoSummary` actually save
work vs the old 3-call summary": yes, it removes the `listRepos` scan
of every repository in `cfgRepoDir` and deduplicates one
`readPatches`, which are the big-O wins. But it regresses one sub-axis
(all patches are now rendered to listings rather than just
`maxPatchCount + all-tags`), which is why the above tightening
matters.
---
### F3 — LOW/MED: Tree- and blob-breadcrumb builders are quadratic in path depth
1. Severity: LOW→MED (depth-dependent)
2. File and line: `src/DarcsWeb/Html.hs:322`-`src/DarcsWeb/Html.hs:326`
and `src/DarcsWeb/Html.hs:344`-`src/DarcsWeb/Html.hs:347`.
3. Inefficiency: Both breadcrumb builders use
`scanl (\acc p -> acc ++ [p]) [] parts` to build the list of cumulative
prefixes, then call `encodePathList prefix` (which re-encodes every
segment from scratch via `T.intercalate "/" . map encodePathSegment`,
`src/DarcsWeb/Html.hs:363`-`src/DarcsWeb/Html.hs:364`) on each prefix.
`acc ++ [p]` is O(n) in the prefix length, so `scanl` is O(k²) in list
cells for a path of depth k, and the per-crumb re-encoding via
`encodePathList` makes the total work O(k² · avg_segment_encode_cost).
4. When it becomes a real problem: deep tree/blob paths (say 10+
directory components) or adversarial requests that push the depth
high. For typical repos this is dwarfed by the `esc` cost, but the
algorithmic shape is still worse than the old linear `buildCrumbs`
recursion that extended `acc` by appending `Text` on each step. This
is a regression in asymptotic shape introduced by the refactor —
small in absolute terms for realistic depths, but worth fixing while
the code is being touched.
5. Optimization direction: build the href incrementally by carrying the
already-encoded prefix as `Text` instead of a `[Text]`, e.g. using
`foldl'` with an accumulator of type `(Text, Text)` where the first
component is the cumulative percent-encoded path and the second is
the rendered breadcrumb `Text`. That drops the per-step `acc ++ [p]`
and the repeated `map encodePathSegment` over the entire prefix.
---
### F4 — LOW: `renderTreeRow` recomputes `pathSegments subPath` and
`encodePathSegment repoName` per entry
1. Severity: LOW
2. File and line: `src/DarcsWeb/Html.hs:388`-`src/DarcsWeb/Html.hs:406`.
3. Inefficiency: Inside `map (renderTreeRow repoName subPath) entries`
(called from `renderTreeTable` at `src/DarcsWeb/Html.hs:384`), every
row recomputes `nameSeg = encodePathSegment repoName` and
`entrySegs = pathSegments subPath ++ [name]`. Both are constant in
`repoName`/`subPath` across the whole page, but are re-evaluated per
entry because they are bound inside `renderTreeRow`. For a directory
with N files, that is N redundant path-segment splits plus N
redundant percent-encodings of `repoName`.
4. When it becomes a real problem: large directories (hundreds to
thousands of entries). The cost is small per row, but it is pure
duplicated work introduced by the refactor — the pre-change code
called `esc repoName` per row as well, so this is a wash in number
of per-row string walks but strictly more work than necessary.
5. Optimization direction: hoist `nameSeg` and `pathSegments subPath`
to `renderTreeTable` and pass them in. The same pattern applies (to
a smaller degree) to `renderShortLogRow`, `renderFullLogEntry`,
`renderTagRow`, and `renderRepoRow`, which each re-run
`encodePathSegment repoName` per row — acceptable, but if
`renderTreeTable` is fixed the others become mechanical to match.
---
### F5 — LOW: Matched patch hashes its `PatchInfo` twice in `getRepoPatch`
1. Severity: LOW
2. File and line: `src/DarcsWeb/Darcs.hs:163`-`src/DarcsWeb/Darcs.hs:166`
together with `src/DarcsWeb/Darcs.hs:406`-`src/DarcsWeb/Darcs.hs:412`.
3. Inefficiency: `selectMatching` calls `patchHashText (info piap)` to
test against `targetHash`, and `extractPatchFull`→`patchInfoToSummary`
stores the same hash in `psHash` via another `patchHashText pinfo`
call. For the matched patch, the SHA-1 is computed twice. This is
not hot — it is O(1) per request — but the allocator-visible call
pattern was changed in this diff and the duplication is new.
4. When it becomes a real problem: essentially never on its own; noted
for completeness because the efficiency review was asked to focus on
`getRepoPatch`'s short-circuit.
5. Optimization direction: thread the already-computed hash `Text`
into `extractPatchFull`, or have `selectMatching` return a
pre-filled `PatchSummary` with `psHash = targetHash`.
Verdict on the original question "does `getRepoPatch` short-circuit
correctly so that non-matching patches never trigger diff
rendering": **yes**. `selectMatching` computes the SHA-1 only and
wraps the expensive `extractPatchFull` inside the `Just` branch of
the hash equality test, so non-matching patches cost one SHA-1 per
`PatchInfo` and never enter `renderString $ showPatch ForDisplay p`.
This is the main correctness win of the refactor.
---
## Things that were checked and found OK
- `safeCanonicalize` at `src/DarcsWeb/Darcs.hs:241`-`src/DarcsWeb/Darcs.hs:246`
adds one `try` per resolution; the IO cost is negligible next to the
underlying `canonicalizePath` syscall chain.
- Strict `ByteString` blob read + `decodeUtf8'` at
`src/DarcsWeb/Darcs.hs:365`-`src/DarcsWeb/Darcs.hs:373` eliminates the
previous lazy-handle leak and is a strict efficiency win.
- The NUL-byte sniff at `src/DarcsWeb/Darcs.hs:377`-`src/DarcsWeb/Darcs.hs:378`
uses `BS.take 8192` before `BS.elem`, so it is bounded regardless of
file size.
- `isCloneSubPath` at `src/DarcsWeb/Darcs.hs:283`-`src/DarcsWeb/Darcs.hs:293`
runs a small `splitPath` + pattern match; O(1) in the request shape.
- `parsePort` at `app/Main.hs:127`-`app/Main.hs:133` runs once at startup.
- `cfgLookupDefault` now uses `fromMaybe` at
`src/DarcsWeb/Config.hs:40`; semantically identical, marginally less
allocation, not performance-relevant.
- The Scotty route changes in `app/Main.hs` do not change the number of
passes over the repository.
---
## Paths worth benchmarking
- `/repo/:name/summary` on a repository with a large patch count — to
validate both the net win over the old 3-call version and the
remaining summary-render cost flagged in F2.
- `/repo/:name/patch/:hash` on a repository with many large patches —
to confirm that non-matching requests no longer render every diff
(F5 context) and to measure `esc`/`highlightDiff` cost (F1).
- `/repo/:name/tree/<deep-path>` — to confirm whether F3's O(k²) shape
is observable at realistic depths.
## Load assumptions behind the review
- Typical request rate is modest (single-user or small-team darcsweb
host), so per-request constant factors matter more than throughput.
- Repositories can be large (thousands of patches; individual diffs
up to a few MiB). This is where F1 and F2 dominate.
- No reverse-proxy cache is assumed in front of darcsweb; every
request hits the handlers.
## Scalability questions not verifiable from code alone
- Whether `mapRL` in the installed `darcs-2.18.5` is strict or lazy in
its output list. If it is lazy, `getRepoPatch`'s list comprehension
`[p | Just p <- selected]` with `(p:_)` already stops at the first
match with no further work. If it is strict, all N `PatchInfo`s are
walked and hashed regardless — still vastly cheaper than the old
"render all diffs" code, but not constant time.
- Whether `renderString $ summary p` in `extractPatchListing` is a
real hot allocator on the measured repo sizes, or whether darcs
caches it internally via the `hopefullyM` branch. F2's fix is safe
either way, but its size depends on this.