# Post-change bug review (Claude)
Scope: uncommitted darcs diff against `app/Main.hs`, `src/DarcsWeb/Config.hs`,
`src/DarcsWeb/Darcs.hs`, `src/DarcsWeb/Html.hs`, `test/Properties/Clone.hs`,
`test/Properties/Html.hs`. Every finding below is checked against the 11-item
`reviews/codex-reconciliation.md` fix list, the current working tree, and the
formally-verified helpers in `gen/` / `verified/`.
## Findings
### 1. LOW — Missing regression test for tree-subpath safety gate
- **File / line**: `src/DarcsWeb/Darcs.hs:307-309`, `test/Properties/Clone.hs` (absent)
- **What is wrong**: The reconciliation list (item 2, `HIGH`) explicitly
requested "a small property test for representative unsafe cases" to pin the
new `isSafeSubPath` guard inside `getRepoTree`. No such test was added; the
test module still only covers `isSafeSubPath` / `isCloneSubPath` directly,
not the `getRepoTree`-level early rejection.
- **Why it can fail in practice**: A future refactor that inlines or reorders
the guard (for example, moving the `isDir <- doesDirectoryExist fullDir`
call above the safety check) would silently reintroduce the
`_darcs`/dotfile disclosure path. There is no automated signal to catch that
regression.
- **Minimal fix**: Add a `Properties.Clone` (or new `Properties.Tree`)
property such as
getRepoTree repoPath "_darcs/prefs" === Nothing
getRepoTree repoPath "../etc" === Nothing
getRepoTree repoPath ".ssh/id_rsa" === Nothing
against a temporary repo fixture (or a pure helper that exposes just the
`isSafeSubPath`-based precondition).
### 2. LOW — Missing regression test for tree / blob breadcrumb encoding
- **File / line**: `src/DarcsWeb/Html.hs:318-354`, `test/Properties/Html.hs` (absent)
- **What is wrong**: The reconciliation list (item 3, `MED`) rewrote
`renderTreeBreadcrumb` and `renderBlobBreadcrumb` to fix the
HTML-injection / URL-injection sink documented in codex Q1, introducing
the new `pathSegments` / `encodePathList` helpers. No property in
`test/Properties/Html.hs` asserts that a label containing `<`, `"`, `?`,
`#` is HTML-escaped in the visible part and percent-encoded in the
`href`. The only test touching the rewrite is `prop_esc_matches_verified`,
which covers `esc` alone.
- **Why it can fail in practice**: The breadcrumb functions are not exported,
and a future simplification that returns to raw concatenation in either
function would not be caught by any test. Given that this code was the
stored-HTML-injection sink that motivated the rewrite, it should be pinned.
- **Minimal fix**: Export `renderTreeBreadcrumb` / `renderBlobBreadcrumb`
(or a thin wrapper) from `DarcsWeb.Html` behind an
`-- Exported for testing` block and add properties asserting
`not (T.isInfixOf "<script" (renderTreeBreadcrumb "r" "<script>"))` and
`T.isInfixOf "%3F" (renderTreeBreadcrumb "r" "a?b")`.
### 3. LOW — `renderBlobBreadcrumb` degrades to a dangling trailing " / " on an empty sub-path
- **File / line**: `src/DarcsWeb/Html.hs:337-354`
- **What is wrong**: When `subPath` is empty, `pathSegments subPath` returns
`[]`, the `case reverse parts` branch binds `(dirParts, fileName) = ([], "")`,
and the function renders `… / <a …>[root]</a> / </div>`. The final
`" / " <> esc ""` leaves a visible trailing separator with no file name.
- **Why it can fail in practice**: The blob route is
`^/repo/([^/]+)/blob/(.+)$`, so Scotty will not normally pass an empty
sub-path. However, the helper is total and could be reused from a future
call site (e.g. a new blob-root handler) where the input is empty; the
current shape produces subtly broken markup without any error signal.
Not security-relevant because `esc ""` is inert.
- **Minimal fix**: Guard the empty case explicitly, e.g.
renderBlobBreadcrumb _ subPath | T.null subPath =
"<div class=\"tree-path\"><a href=\"/repo/" <> … <> "/tree\">[root]</a></div>\n"
or, equivalently, omit the `" / " <> esc fileName` tail when `fileName` is
empty.
### 4. LOW — `getRepoSummary` forces full `PatchSummary` rendering for every patch
- **File / line**: `src/DarcsWeb/Darcs.hs:196-222`
- **What is wrong**: `allSummaries = mapRL extractPatchListing patchRL` builds
one `PatchSummary` per patch, and `PatchSummary`'s fields are all strict
(`src/DarcsWeb/Types.hs:12-22`). `count = length allSummaries` forces only
the list spine, but the subsequent `tags = filter psIsTag allSummaries`
forces each element to WHNF, which in turn forces
`psSummary = T.pack $ renderString $ summary p` for every single patch in
the repository, regardless of whether it is a tag or ever displayed.
- **Why it can fail in practice**: This changes the asymptotic work of the
`summary` route compared to the previous `getBasicRepoInfo` (which used
only `mapRL info`) plus bounded `getRepoPatches`. On repos with tens of
thousands of patches the summary page becomes noticeably slower and uses
more memory, in exchange for a single-pass API. This is functionally
equivalent to the old `getRepoTags` cost (which also rendered summaries
for every patch), so it is not a regression versus the combined old
path — but the reconciliation list (item 7) framed this as a PERF
improvement and it only partially delivers one.
- **Minimal fix**: Compute tags with `mapRL info` first, then call
`extractPatchListing` only on the recent / tag subset, e.g.
let infos = mapRL info patchRL
count = length infos
lastDate = case infos of (i:_) -> T.pack (piDateString i); _ -> ""
tagInfos = filter isTag infos
tags <- …render on demand…
recent <- …render for take maxPatchCount only…
so the `O(n)` summary rendering work is avoided when the user never opens
log / patch detail.
### 5. LOW — Lazy fields inside returned `PatchSummary` lists can still throw outside `try`
- **File / line**: `src/DarcsWeb/Darcs.hs:130-140` (`getRepoPatches`),
`src/DarcsWeb/Darcs.hs:174-188` (`getRepoTags`),
`src/DarcsWeb/Darcs.hs:200-222` (`getRepoSummary`)
- **What is wrong**: All three helpers call `withRepositoryLocation … $
RepoJob $ \repository -> do` inside `try`, force only the list spine with
`evaluate (length …)`, and return the list. Patch-content access goes
through `hopefullyM piap` which reads patch data from disk; with strict
`PatchSummary` fields this data is forced per element when the *renderer*
walks the list, i.e. after the `try` frame has popped and the repository
handle has been released. An `IOException` inside the darcs patch reader
escapes to Scotty as a 500.
- **Why it can fail in practice**: A race with external `darcs optimize`,
filesystem ENOSPC / permissions change, or a truncated patch file on disk
results in an uncaught exception on the summary / log / patch routes.
This is a pre-existing pattern, not introduced by this diff, but the diff
widened the surface (new `getRepoSummary`) without tightening the
evaluation story.
- **Minimal fix**: Fully evaluate the returned `PatchSummary` list inside
the `try` frame, for example with
`let r = …; evaluate (rnf r)` (adding `Control.DeepSeq.NFData` instances
to `PatchSummary`) or with `mapM evaluate` at the call site before
returning.
### 6. LOW — Host header fallthroughs and port parsing are not covered by tests beyond what existed before
- **File / line**: `app/Main.hs:127-133` (`parsePort`)
- **What is wrong**: The new `parsePort` helper has no QuickCheck or unit
coverage. The patch deleted the partial `read` but did not add any test
that pins the accepted / rejected set (empty string, trailing comment,
negative, 0, 65536, leading whitespace, hex).
- **Why it can fail in practice**: A future refactor to `parsePort` (e.g.
allowing a hex prefix, or accepting port 0 for OS-assigned ports) would
not be detected. Low severity because the function is startup-only and
the failure mode is a clean `exitFailure`, not a crash at request time.
- **Minimal fix**: Add a pure helper
parsePortPure :: String -> Maybe Int
parsePortPure s = readMaybe s >>= \p -> guard (p > 0 && p < 65536) >> pure p
and cover it with a QuickCheck property in `test/Properties/Clone.hs`
(or a new `Properties.Config` module).
## Residual risks
- **`isSafeSubPath` does not reject HTML-special characters** (e.g. `<`,
`>`, `"`, `&`, `?`, `#`). Defense against those classes depends on
`encodePathSegment` / `esc` at render time and on `jailCheck` at IO time.
This is a layered defense, not a bug, but it means future renderers must
be careful to route new URL construction through `encodePathSegment`
rather than `esc`.
- **TOCTOU between `doesFileExist canonical` and `file canonical`** in
`serveStatic` (`app/Main.hs:382-388`) and `serveClone`
(`app/Main.hs:447-453`). The `safeCanonicalize` addition does not close
this window; symlink replacement between the check and the serve is still
possible. Pre-existing and flagged only as a residual risk.
- **`getRepoBlob` does not reject subpaths starting with `/`** via
`isSafeSubPath` (the verified helper filters empty segments before
checking). Safety relies entirely on `jailCheck` catching the absolute
path after `System.FilePath.</>` replaces the first operand. This is
correct today, but the layered defense is not expressed in tests.
- **Scotty pathParam is URL-decoded**; a NUL byte in a repo name is not
explicitly rejected by `withRepo` (only `/`, `\`, `..`, leading `.` are).
If a reverse proxy allowed `%00` through, `isDarcsRepo` would raise an
`IOException` that escapes to a framework 500. Pre-existing.
## Missing test coverage
- No test that `getRepoTree` rejects `_darcs`, `../..`, `.git`,
dot-prefixed, or absolute sub-paths (covers reconciliation item 2).
- No test that `renderTreeBreadcrumb` / `renderBlobBreadcrumb`
percent-encode `?`, `#`, space in hrefs and HTML-escape `<` in labels
(covers reconciliation item 3).
- No test that `getRepoBlob` returns the `binaryPlaceholder` for a file
containing a NUL byte in the first 8 KiB, or for a file whose bytes are
not valid UTF-8.
- No test that `isCloneSubPath` rejects `_darcs`-nested payloads such as
`inventories/_darcs/config` — the `prop_clone_allowlist_refines_safe`
property is a tautology because `isCloneSubPath` short-circuits on
`isSafeSubPath`.
- No test for `parsePort`'s accepted / rejected set.
- No test for `safeCanonicalize` returning `Nothing` on IO errors (e.g. a
NUL byte in the candidate path) — the whole fix for reconciliation
item 5 rests on that behavior.
## Changed behavior that remains unverified
- The new one-pass `getRepoSummary` (reconciliation item 7) changes the
order in which `withRepositoryLocation`, `readPatches`, tag filtering,
and description reading run. No regression test asserts that the
returned `RepoInfo.riLastChange` and `riPatchCount` match what
`listRepos` + `getRepoPatches` + `getRepoTags` produced before.
- The native `esc` rewrite is pinned by `prop_esc_matches_verified`, but
the callers that previously used `HtmlPure.esc` indirectly (e.g. via
`highlightDiff`) have no separate property asserting their output is
unchanged byte-for-byte.
- `isCloneSubPath`'s use of `splitPath` (which does not filter empty
segments) means a subpath containing `//` or a leading `/` may be
rejected even if the same data would pass `isSafeSubPath`. Whether that
asymmetry is intentional is not captured by any property.