darcsweb - reviews/codex-reconciliation.md

[root] / reviews / codex-reconciliation.md
# Codex/Claude Reconciliation

## Q1. Tree-breadcrumb XSS (`codex` security H1 vs Claude)

Verdict: the sink is real, but my original H1 phrasing was too strong. `renderTree` appends raw `subPath` into `breadcrumbs` at [src/DarcsWeb/Html.hs:287](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:287)-[288](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:288), and `renderPage` injects `breadcrumbs` verbatim into `<nav>` at [src/DarcsWeb/Html.hs:43](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:43). A committed directory name can reach this path: the tree route passes the URL subpath through to `getRepoTree`, and `doesDirectoryExist` / `jailCheck` only verify existence and containment, not escaping or re-encoding.

The percent-decoding point matters in favor of exploitability, not against it: Scotty regex routes operate on decoded path text, so a browser request for `%3Cimg...%3E` becomes `<img...>` before `subPath` is rendered. That said, the site-wide CSP at [app/Main.hs:195](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:195)-[214](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:214) is strict enough that I would not currently call this a reliable script-execution XSS; it is a reachable stored HTML-injection bug that should be fixed before any CSP relaxation.

The right fix is still: escape each displayed segment and percent-encode href path segments. `renderTreeBreadcrumb` at [src/DarcsWeb/Html.hs:297](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:297)-[308](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:308) already escapes each visible `p`, but `newAcc` is only HTML-escaped in the `href`, not URL-encoded. `renderBlobBreadcrumb` at [src/DarcsWeb/Html.hs:367](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:367)-[381](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:381) does **not** have the same raw-HTML bug because the top-level breadcrumb uses `esc subPath` at [src/DarcsWeb/Html.hs:356](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:356), and `buildCrumbs` escapes each label; its remaining issue is the same href-encoding problem, plus the low-risk partial `init` / `last`.

## Q2. Clone endpoint `_darcs/` over-exposure (`codex` M3 vs Claude)

Verdict: real. `isSafeSubPath` in [gen/PathPure.hs:70](/home/fritjof/repositories/playground/darcsweb/gen/PathPure.hs:70)-[97](/home/fritjof/repositories/playground/darcsweb/gen/PathPure.hs:97) only rejects empty, dot, hidden, `_darcs`, and `.git` segments. `serveClone` at [app/Main.hs:419](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:419)-[436](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:436) then serves any existing regular file under `repoPath/_darcs/` whose subpath passes that predicate. So `_darcs/prefs/defaultrepo`, `_darcs/prefs/motd`, `_darcs/format`, and optional pack files all pass if present.

I would keep this as `MED` and fix it in the one-PR pass because the diff is small and the policy becomes explicit. Your four-class allowlist matches the base hashed HTTP-clone surface and the current QuickCheck expectations in [test/Properties/Clone.hs:131](/home/fritjof/repositories/playground/darcsweb/test/Properties/Clone.hs:131)-[139](/home/fritjof/repositories/playground/darcsweb/test/Properties/Clone.hs:139): `hashed_inventory`, `inventories/*`, `patches/*`, `pristine.hashed/*`.

There is one caveat: darcs also supports HTTP packs from `darcs optimize http`. If you want to preserve that optimization path, also allow `_darcs/packs/basic.tar.gz` and `_darcs/packs/patches.tar.gz`. I did **not** find evidence that `_darcs/prefs/*` or `_darcs/format` need to be public for clone.

## Q3. `read` on port (Claude H1 vs codex hardening note)

Verdict: fix now, but not as `HIGH`. The bug is at [app/Main.hs:96](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:96), but it is startup-only and only triggered by an operator-written config file. `readMaybe` + a clear error on stderr + `exitFailure` is a good tiny robustness fix, not over-engineering.

## Q4. Blob endpoint lazy `readFile` (Claude H2 vs codex M2)

Verdict: Claude’s remediation is directionally right, but I would separate must-have from nice-to-have. The must-have part is replacing [src/DarcsWeb/Darcs.hs:249](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Darcs.hs:249) with strict `BS.readFile` under `try`, followed by explicit UTF-8 handling (`decodeUtf8'`). That removes the lazy-handle leak and locale-sensitive decoding, which are the two clearly real bugs.

The NUL-byte sniff in the first 8 KiB plus a `"binary file, cannot display"` placeholder is a good small extra and I would include it if touching this code anyway, because it avoids misclassifying obvious binary files as 404s. If scope gets tight, the 80% reduction is: strict bytes, `try`, explicit UTF-8 decode; the NUL sniff is cheap but not the core fix. Also note that a text file full of `<` still expands during HTML escaping, so the `esc` hotspot is a separate performance fix, not solved by binary detection.

## Q5. `canonicalizePath` exception escape (Claude H3)

Verdict: yes, the request-time `canonicalizePath` call sites should be wrapped. The affected sites are [app/Main.hs:365](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:365), [400](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:400), [425](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:425), and [src/DarcsWeb/Darcs.hs:178](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Darcs.hs:178).

On the local Scotty 0.22 install, the uncaught-exception path is a plain-text `Internal Server Error: ...` response, so Claude’s leak concern is credible; even without pinning the exact Scotty version, uncaught exceptions definitely escape as framework 500s today. `try` + `Nothing => 404` is the right behavior for request-derived candidate paths. This does **not** mask legitimate startup config failures as long as the startup canonicalization in `main` at [app/Main.hs:102](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:102)-[103](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:103) stays fail-fast.

## Q6. Single ordered fix list for one PR

1. [src/DarcsWeb/Darcs.hs:134-145](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Darcs.hs:134) — Patch detail renders full diffs for every patch before checking the requested hash. Compare `targetHash` against `PatchInfo` first and only call `extractPatchFull` for the first match. Tag: `CRITICAL`.

2. [src/DarcsWeb/Darcs.hs:196-213](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Darcs.hs:196), [app/Main.hs:298-309](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:298) — Tree browsing bypasses the verified safe-subpath policy and exposes `_darcs` / hidden directory structure. Reject non-empty tree subpaths unless `isSafeSubPath` succeeds, and add a small property test for representative unsafe cases. Tag: `HIGH`.

3. [src/DarcsWeb/Html.hs:287-304](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:287), [319-335](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:319), [356-381](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:356) — Tree/blob breadcrumbs and tree/blob hrefs mix raw path text into HTML and use HTML escaping where URL encoding is required. Replace ad hoc string assembly with a segment-based helper that `esc`s labels, percent-encodes href segments, and makes blob breadcrumb handling total at the same time. Tag: `MED`.

4. [src/DarcsWeb/Darcs.hs:98-104](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Darcs.hs:98), [229-250](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Darcs.hs:229) — Repo-controlled file reads still use lazy, locale-sensitive `String` IO on the request path. Switch to strict `ByteString` reads under `try`, decode explicitly as UTF-8, return empty description on bad repo descriptions, and show a binary placeholder for obvious non-text blobs. Tag: `MED`.

5. [app/Main.hs:365](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:365), [400](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:400), [425](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:425), [src/DarcsWeb/Darcs.hs:178](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Darcs.hs:178) — Request-derived `canonicalizePath` failures currently escape as framework 500s. Add a tiny `safeCanonicalize` helper for request-time path resolution and treat `Left` as 404 while logging server-side if desired. Tag: `MED`.

6. [app/Main.hs:421-436](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:421) — Clone serving is broader than the darcs HTTP protocol actually needs. Add an explicit allowlist for `hashed_inventory`, `inventories/*`, `patches/*`, `pristine.hashed/*`, and optionally `_darcs/packs/{basic,patches}.tar.gz` if you want to preserve `optimize http` pack support. Tag: `MED`.

7. [app/Main.hs:236-249](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:236), [src/DarcsWeb/Darcs.hs:63-81](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Darcs.hs:63), [121-160](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Darcs.hs:121) — The summary page rescans all repos and rereads the target repo twice. Introduce a single-repo summary helper that gathers `RepoInfo`, recent patches, and tags in one pass instead of `listRepos` + `getRepoPatches` + `getRepoTags`. Tag: `PERF`.

8. [src/DarcsWeb/Html.hs:413-425](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:413) — `esc` and `highlightDiff` still round-trip through `String` on the hottest render path. Replace `esc` with a Text-native implementation and add a QuickCheck equivalence property against `HtmlPure.esc`. Tag: `PERF`.

9. [app/Main.hs:96](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:96) — Port parsing still uses partial `read`. Replace it with `readMaybe`, a port-range check, and an explicit diagnostic before exit. Tag: `LOW`.

10. [src/DarcsWeb/Html.hs:37](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:37), [39-48](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:39), [56-64](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:56) — The page chrome is missing favicon and landmark semantics, and the index page has no visible `<h1>`. Add `<link rel="icon">`, swap the wrapper `div`s to `<header>/<main>/<footer>` while keeping classes, and add an index-page heading. Tag: `LOW`.

11. [app/Main.hs:74](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:74), [413-414](/home/fritjof/repositories/playground/darcsweb/app/Main.hs:413), [src/DarcsWeb/Config.hs:39](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Config.hs:39), [src/DarcsWeb/Html.hs:109](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:109), [274](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:274) — The remaining idiom cleanups are low-risk readability wins. Replace `foldl (flip id)`, `findRepo`, `maybe x id`, and `length > 0` with the obvious forms while touching nearby code anyway. Tag: `READ`.

## Out of scope for this pass

- Rewriting [src/DarcsWeb/Html.hs:31-53](/home/fritjof/repositories/playground/darcsweb/src/DarcsWeb/Html.hs:31) and the Scotty call sites to use lazy builders / streaming responses throughout. That is worthwhile, but it touches every renderer and is larger than the rest of this pass.
- Adding cross-request caches, ETags, or 304 handling for repo metadata and HTML pages. Good next step, but stateful and bigger than a “small diff” cleanup PR.
- Splitting `Html.hs` / `Main.hs`, extracting handler modules, or introducing new dependencies. Those are structural refactors, not one-pass corrections.