darcsweb - reviews/post-claude-bugs.md

[root] / reviews / post-claude-bugs.md
# Post-change bug review (Claude)

Scope: uncommitted darcs diff against `app/Main.hs`, `src/DarcsWeb/Config.hs`,
`src/DarcsWeb/Darcs.hs`, `src/DarcsWeb/Html.hs`, `test/Properties/Clone.hs`,
`test/Properties/Html.hs`. Every finding below is checked against the 11-item
`reviews/codex-reconciliation.md` fix list, the current working tree, and the
formally-verified helpers in `gen/` / `verified/`.

## Findings

### 1. LOW — Missing regression test for tree-subpath safety gate
- **File / line**: `src/DarcsWeb/Darcs.hs:307-309`, `test/Properties/Clone.hs` (absent)
- **What is wrong**: The reconciliation list (item 2, `HIGH`) explicitly
  requested "a small property test for representative unsafe cases" to pin the
  new `isSafeSubPath` guard inside `getRepoTree`. No such test was added; the
  test module still only covers `isSafeSubPath` / `isCloneSubPath` directly,
  not the `getRepoTree`-level early rejection.
- **Why it can fail in practice**: A future refactor that inlines or reorders
  the guard (for example, moving the `isDir <- doesDirectoryExist fullDir`
  call above the safety check) would silently reintroduce the
  `_darcs`/dotfile disclosure path. There is no automated signal to catch that
  regression.
- **Minimal fix**: Add a `Properties.Clone` (or new `Properties.Tree`)
  property such as

      getRepoTree repoPath "_darcs/prefs"      === Nothing
      getRepoTree repoPath "../etc"            === Nothing
      getRepoTree repoPath ".ssh/id_rsa"       === Nothing

  against a temporary repo fixture (or a pure helper that exposes just the
  `isSafeSubPath`-based precondition).

### 2. LOW — Missing regression test for tree / blob breadcrumb encoding
- **File / line**: `src/DarcsWeb/Html.hs:318-354`, `test/Properties/Html.hs` (absent)
- **What is wrong**: The reconciliation list (item 3, `MED`) rewrote
  `renderTreeBreadcrumb` and `renderBlobBreadcrumb` to fix the
  HTML-injection / URL-injection sink documented in codex Q1, introducing
  the new `pathSegments` / `encodePathList` helpers. No property in
  `test/Properties/Html.hs` asserts that a label containing `<`, `"`, `?`,
  `#` is HTML-escaped in the visible part and percent-encoded in the
  `href`. The only test touching the rewrite is `prop_esc_matches_verified`,
  which covers `esc` alone.
- **Why it can fail in practice**: The breadcrumb functions are not exported,
  and a future simplification that returns to raw concatenation in either
  function would not be caught by any test. Given that this code was the
  stored-HTML-injection sink that motivated the rewrite, it should be pinned.
- **Minimal fix**: Export `renderTreeBreadcrumb` / `renderBlobBreadcrumb`
  (or a thin wrapper) from `DarcsWeb.Html` behind an
  `-- Exported for testing` block and add properties asserting
  `not (T.isInfixOf "<script" (renderTreeBreadcrumb "r" "<script>"))` and
  `T.isInfixOf "%3F" (renderTreeBreadcrumb "r" "a?b")`.

### 3. LOW — `renderBlobBreadcrumb` degrades to a dangling trailing " / " on an empty sub-path
- **File / line**: `src/DarcsWeb/Html.hs:337-354`
- **What is wrong**: When `subPath` is empty, `pathSegments subPath` returns
  `[]`, the `case reverse parts` branch binds `(dirParts, fileName) = ([], "")`,
  and the function renders `… / <a …>[root]</a> / </div>`. The final
  `" / " <> esc ""` leaves a visible trailing separator with no file name.
- **Why it can fail in practice**: The blob route is
  `^/repo/([^/]+)/blob/(.+)$`, so Scotty will not normally pass an empty
  sub-path. However, the helper is total and could be reused from a future
  call site (e.g. a new blob-root handler) where the input is empty; the
  current shape produces subtly broken markup without any error signal.
  Not security-relevant because `esc ""` is inert.
- **Minimal fix**: Guard the empty case explicitly, e.g.

      renderBlobBreadcrumb _ subPath | T.null subPath =
        "<div class=\"tree-path\"><a href=\"/repo/" <> … <> "/tree\">[root]</a></div>\n"

  or, equivalently, omit the `" / " <> esc fileName` tail when `fileName` is
  empty.

### 4. LOW — `getRepoSummary` forces full `PatchSummary` rendering for every patch
- **File / line**: `src/DarcsWeb/Darcs.hs:196-222`
- **What is wrong**: `allSummaries = mapRL extractPatchListing patchRL` builds
  one `PatchSummary` per patch, and `PatchSummary`'s fields are all strict
  (`src/DarcsWeb/Types.hs:12-22`). `count = length allSummaries` forces only
  the list spine, but the subsequent `tags = filter psIsTag allSummaries`
  forces each element to WHNF, which in turn forces
  `psSummary = T.pack $ renderString $ summary p` for every single patch in
  the repository, regardless of whether it is a tag or ever displayed.
- **Why it can fail in practice**: This changes the asymptotic work of the
  `summary` route compared to the previous `getBasicRepoInfo` (which used
  only `mapRL info`) plus bounded `getRepoPatches`. On repos with tens of
  thousands of patches the summary page becomes noticeably slower and uses
  more memory, in exchange for a single-pass API. This is functionally
  equivalent to the old `getRepoTags` cost (which also rendered summaries
  for every patch), so it is not a regression versus the combined old
  path — but the reconciliation list (item 7) framed this as a PERF
  improvement and it only partially delivers one.
- **Minimal fix**: Compute tags with `mapRL info` first, then call
  `extractPatchListing` only on the recent / tag subset, e.g.

      let infos        = mapRL info patchRL
          count        = length infos
          lastDate     = case infos of (i:_) -> T.pack (piDateString i); _ -> ""
          tagInfos     = filter isTag infos
      tags   <- …render on demand…
      recent <- …render for take maxPatchCount only…

  so the `O(n)` summary rendering work is avoided when the user never opens
  log / patch detail.

### 5. LOW — Lazy fields inside returned `PatchSummary` lists can still throw outside `try`
- **File / line**: `src/DarcsWeb/Darcs.hs:130-140` (`getRepoPatches`),
  `src/DarcsWeb/Darcs.hs:174-188` (`getRepoTags`),
  `src/DarcsWeb/Darcs.hs:200-222` (`getRepoSummary`)
- **What is wrong**: All three helpers call `withRepositoryLocation … $
  RepoJob $ \repository -> do` inside `try`, force only the list spine with
  `evaluate (length …)`, and return the list. Patch-content access goes
  through `hopefullyM piap` which reads patch data from disk; with strict
  `PatchSummary` fields this data is forced per element when the *renderer*
  walks the list, i.e. after the `try` frame has popped and the repository
  handle has been released. An `IOException` inside the darcs patch reader
  escapes to Scotty as a 500.
- **Why it can fail in practice**: A race with external `darcs optimize`,
  filesystem ENOSPC / permissions change, or a truncated patch file on disk
  results in an uncaught exception on the summary / log / patch routes.
  This is a pre-existing pattern, not introduced by this diff, but the diff
  widened the surface (new `getRepoSummary`) without tightening the
  evaluation story.
- **Minimal fix**: Fully evaluate the returned `PatchSummary` list inside
  the `try` frame, for example with
  `let r = …; evaluate (rnf r)` (adding `Control.DeepSeq.NFData` instances
  to `PatchSummary`) or with `mapM evaluate` at the call site before
  returning.

### 6. LOW — Host header fallthroughs and port parsing are not covered by tests beyond what existed before
- **File / line**: `app/Main.hs:127-133` (`parsePort`)
- **What is wrong**: The new `parsePort` helper has no QuickCheck or unit
  coverage. The patch deleted the partial `read` but did not add any test
  that pins the accepted / rejected set (empty string, trailing comment,
  negative, 0, 65536, leading whitespace, hex).
- **Why it can fail in practice**: A future refactor to `parsePort` (e.g.
  allowing a hex prefix, or accepting port 0 for OS-assigned ports) would
  not be detected. Low severity because the function is startup-only and
  the failure mode is a clean `exitFailure`, not a crash at request time.
- **Minimal fix**: Add a pure helper

      parsePortPure :: String -> Maybe Int
      parsePortPure s = readMaybe s >>= \p -> guard (p > 0 && p < 65536) >> pure p

  and cover it with a QuickCheck property in `test/Properties/Clone.hs`
  (or a new `Properties.Config` module).

## Residual risks

- **`isSafeSubPath` does not reject HTML-special characters** (e.g. `<`,
  `>`, `"`, `&`, `?`, `#`). Defense against those classes depends on
  `encodePathSegment` / `esc` at render time and on `jailCheck` at IO time.
  This is a layered defense, not a bug, but it means future renderers must
  be careful to route new URL construction through `encodePathSegment`
  rather than `esc`.
- **TOCTOU between `doesFileExist canonical` and `file canonical`** in
  `serveStatic` (`app/Main.hs:382-388`) and `serveClone`
  (`app/Main.hs:447-453`). The `safeCanonicalize` addition does not close
  this window; symlink replacement between the check and the serve is still
  possible. Pre-existing and flagged only as a residual risk.
- **`getRepoBlob` does not reject subpaths starting with `/`** via
  `isSafeSubPath` (the verified helper filters empty segments before
  checking). Safety relies entirely on `jailCheck` catching the absolute
  path after `System.FilePath.</>` replaces the first operand. This is
  correct today, but the layered defense is not expressed in tests.
- **Scotty pathParam is URL-decoded**; a NUL byte in a repo name is not
  explicitly rejected by `withRepo` (only `/`, `\`, `..`, leading `.` are).
  If a reverse proxy allowed `%00` through, `isDarcsRepo` would raise an
  `IOException` that escapes to a framework 500. Pre-existing.

## Missing test coverage

- No test that `getRepoTree` rejects `_darcs`, `../..`, `.git`,
  dot-prefixed, or absolute sub-paths (covers reconciliation item 2).
- No test that `renderTreeBreadcrumb` / `renderBlobBreadcrumb`
  percent-encode `?`, `#`, space in hrefs and HTML-escape `<` in labels
  (covers reconciliation item 3).
- No test that `getRepoBlob` returns the `binaryPlaceholder` for a file
  containing a NUL byte in the first 8 KiB, or for a file whose bytes are
  not valid UTF-8.
- No test that `isCloneSubPath` rejects `_darcs`-nested payloads such as
  `inventories/_darcs/config` — the `prop_clone_allowlist_refines_safe`
  property is a tautology because `isCloneSubPath` short-circuits on
  `isSafeSubPath`.
- No test for `parsePort`'s accepted / rejected set.
- No test for `safeCanonicalize` returning `Nothing` on IO errors (e.g. a
  NUL byte in the candidate path) — the whole fix for reconciliation
  item 5 rests on that behavior.

## Changed behavior that remains unverified

- The new one-pass `getRepoSummary` (reconciliation item 7) changes the
  order in which `withRepositoryLocation`, `readPatches`, tag filtering,
  and description reading run. No regression test asserts that the
  returned `RepoInfo.riLastChange` and `riPatchCount` match what
  `listRepos` + `getRepoPatches` + `getRepoTags` produced before.
- The native `esc` rewrite is pinned by `prop_esc_matches_verified`, but
  the callers that previously used `HtmlPure.esc` indirectly (e.g. via
  `highlightDiff`) have no separate property asserting their output is
  unchanged byte-for-byte.
- `isCloneSubPath`'s use of `splitPath` (which does not filter empty
  segments) means a subpath containing `//` or a leading `/` may be
  rejected even if the same data would pass `isSafeSubPath`. Whether that
  asymmetry is intentional is not captured by any property.