darcsweb - reviews/codex-security.md

[root] / reviews / codex-security.md
# Security Review — darcsweb

## Summary
The pure extracted helpers are narrow and mostly used correctly, but the highest-risk problems sit outside the proved core: expensive `darcs` traversals on unauthenticated routes, one stored XSS in the tree view, and several filesystem/decoding edges that can still throw. The worst issue is the patch-detail endpoint, which appears to render full diffs for the entire repository history before it knows whether the requested hash matches. The other recurring theme is that the Coq-backed path/escaping guarantees are not applied consistently at the HTTP boundary.

## Critical findings

### C1. Patch detail requests do O(history × diff-size) work before hash filtering
File: `src/DarcsWeb/Darcs.hs:134`, `src/DarcsWeb/Darcs.hs:139`, `src/DarcsWeb/Darcs.hs:263`

Why it is dangerous: `getRepoPatch` builds `allPatches = mapRL extractPatchFull patchRL` and only then filters by `psHash`. `extractPatchFull` renders the full diff with `showPatch ForDisplay` for each patch. On a large repository, a single request for a missing hash or an old patch can force diff generation across the entire patch history, which is an unauthenticated CPU and memory exhaustion primitive.

Concrete exploit scenario: an attacker repeatedly requests `/repo/large-repo/patch/does-not-exist` or a known oldest patch hash. Each request walks the whole patch set and renders every diff before returning `404` or the target patch, tying up the process and making the site unavailable.

Proposed fix: compare hashes using `PatchInfo` before rendering any diff, and only call `extractPatchFull` for the first matching patch. If the darcs library can query by patch identity directly, use that. Also add an explicit cap on rendered diff size and fail closed when the patch body exceeds it.

## High findings

### H1. Stored XSS in tree breadcrumbs via unescaped `subPath`
File: `src/DarcsWeb/Html.hs:287`, `src/DarcsWeb/Html.hs:288`, `src/DarcsWeb/Html.hs:295`, `src/DarcsWeb/Html.hs:43`

Why it is dangerous: `renderTree` appends raw `subPath` into the breadcrumb HTML (`" / " <> subPath`) and `renderPage` injects `breadcrumbs` directly into `<nav>` without escaping. A repository directory name containing HTML markup therefore becomes active HTML on the tree page.

Concrete exploit scenario: a repository contains a directory named `<img src=x onerror=alert(1)>`. Visiting `/repo/<repo>/tree/%3Cimg%20src=x%20onerror=alert(1)%3E` renders that payload inside the breadcrumb bar and executes script in the site's origin.

Proposed fix: stop passing pre-rendered breadcrumb HTML around. Represent breadcrumbs as structured `(label, href)` data, escape every label with `esc`, and percent-encode every path segment when building `href` values.

### H2. Index and summary routes recompute repository metadata across the entire repo set on every request
File: `app/Main.hs:228`, `app/Main.hs:238`, `src/DarcsWeb/Darcs.hs:63`, `src/DarcsWeb/Darcs.hs:80`, `src/DarcsWeb/Darcs.hs:107`

Why it is dangerous: both `/` and `/repo/:name/summary` call `listRepos`, and `listRepos` calls `getBasicRepoInfo` for every repository. `getBasicRepoInfo` opens each repo with `readPatches` and counts the whole patch list just to display last-change metadata. This makes a single summary page request proportional to all hosted repositories, not just the one being viewed.

Concrete exploit scenario: on an instance hosting many large repos, an attacker can repeatedly hit `/repo/<any>/summary` and force full-history scans across the entire repository set. That is a low-effort unauthenticated availability attack.

Proposed fix: cache repo metadata outside the request path, or at minimum load summary-page metadata only for the requested repo. If last-change and patch-count must be shown on `/`, compute them asynchronously and serve stale-but-safe cached values.

## Medium / Low findings

### M1. Tree browsing bypasses the verified safe-subpath policy and exposes hidden directory structure
File: `app/Main.hs:298`, `src/DarcsWeb/Darcs.hs:196`, `src/DarcsWeb/Darcs.hs:208`, `src/DarcsWeb/Darcs.hs:231`

Why it is dangerous: the blob and clone paths call the Coq-extracted `isSafeSubPath`, but the tree path does not. `getRepoTree` accepts any in-jail subdirectory and then lists its contents, only filtering hidden names in the final directory entries. Direct requests to `.git`, `.ssh`, or other dot-directories under the working tree are therefore possible.

Concrete exploit scenario: `/repo/<repo>/tree/.git` or `/repo/<repo>/tree/.ssh` will list non-hidden files inside those directories when they exist. Even without direct file reads, that leaks internal layout and filenames that the UI otherwise tries to hide.

Proposed fix: use the same validated path type for tree/blob/clone access. Root can remain a separate case, but any non-empty tree subpath should pass `isSafeSubPath` before touching the filesystem.

### M2. Repository-controlled files can trigger uncaught filesystem and decoding exceptions
File: `src/DarcsWeb/Darcs.hs:98`, `src/DarcsWeb/Darcs.hs:103`, `src/DarcsWeb/Darcs.hs:208`, `src/DarcsWeb/Darcs.hs:245`, `src/DarcsWeb/Darcs.hs:249`

Why it is dangerous: `readRepoDescription`, `listDirectory`, `getFileSize`, and `Prelude.readFile` are used on repo-controlled paths without a `try` boundary. `Prelude.readFile` also depends on locale decoding. A malformed or unreadable file can therefore escape `IO` as a 500 instead of degrading safely.

Concrete exploit scenario: a hosted repo contains a non-UTF-8 `_darcs/prefs/repo_description` or a blob file that decodes badly under the server locale. Requests for `/` or `/repo/<repo>/blob/...` then throw and return 500; if the bad description is hit through `listRepos`, it can take down the index and summary pages.

Proposed fix: wrap all repo filesystem reads in `try`, use strict `ByteString` reads with explicit UTF-8 handling, and treat unreadable or undecodable files as missing or binary rather than letting exceptions escape.

### M3. Medium — needs confirmation: the clone endpoint exposes every non-hidden file under `_darcs/`, not just clone protocol objects
File: `app/Main.hs:419`, `app/Main.hs:423`, `gen/PathPure.hs:70`, `gen/PathPure.hs:92`

Why it is dangerous: `serveClone` relies on `isSafeSubPath`, but that predicate only forbids dot-prefixed segments, `_darcs`, and `.git`. It does not whitelist the specific objects needed for read-only HTTP clone, so files such as `_darcs/prefs/defaultrepo` remain fetchable if present.

Concrete exploit scenario: if repository-local darcs preferences contain upstream URLs, usernames, or other deployment-specific metadata, an unauthenticated client can retrieve them directly through `/clone/<repo>/_darcs/...` even though they are not needed to serve inventories and patches.

Proposed fix: replace the generic safe-subpath check with a protocol-level allowlist for cloneable objects such as `hashed_inventory`, `inventories/*`, `patches/*`, and `pristine.hashed/*`. Keep the current path-safety check as a second line of defense, not the only one.

## Verification gaps

The extracted modules in `gen/` look like direct mechanical extractions, so the main divergence is not inside `HtmlPure`, `PathPure`, or `CspPure`; it is in the handwritten wrappers and routes that decide when those helpers are used.

`HtmlPure.esc` is proved, but `src/DarcsWeb/Html.hs:287` bypasses it and reintroduces raw HTML injection in the tree breadcrumb path. There is no property test today that says "all route-derived text rendered into HTML stays escaped in every context".

`PathPure.is_safe_sub_path` is proved and tested in `test/Properties/Clone.hs:119`, but the tree route ignores it at `app/Main.hs:298` and `src/DarcsWeb/Darcs.hs:196`. That means the proof boundary is drawn around blob/clone only, while the directory-listing edge remains unverified handwritten `FilePath` logic.

The pure modules do not cover the expensive `darcs` IO boundary at all. There are no properties around `getRepoPatch`, `getBasicRepoInfo`, or `getRepoTags` that constrain work done per request, stop early on a match, or assert graceful failure on unreadable repositories.

Additional QuickCheck-style properties I would add:

- For arbitrary repo/path names, rendered HTML never contains unescaped raw `<`, `>`, `"`, or `'` from user-controlled labels in breadcrumb, table, and link contexts.
- Tree, blob, and clone routes enforce the same hidden-path policy for every non-root subpath.
- Filesystem failures on repo-controlled paths degrade to `Nothing` or empty metadata instead of throwing.
- Clone serving only accepts the minimal darcs HTTP object set, not arbitrary `_darcs` files.

I could not execute the test suite in this environment: with GHC 8.8.4 and cabal 3.0.0.0, `cabal test darcsweb-test` failed dependency resolution after selecting `scotty-0.30`, which requires `base >= 4.14`. That makes the current bounds in `darcsweb.cabal:42` too loose to guarantee reproducible verification here.

## Hardening recommendations

Introduce validated types for untrusted path fragments. `RepoName`, `TreeSubPath`, `BlobSubPath`, and `CloneObjectPath` should be smart constructors, not raw `Text`/`FilePath` values threaded through handlers. Right now the wrong state is still representable in `app/Main.hs:385` and `src/DarcsWeb/Darcs.hs:196`.

Replace partial and locale-sensitive text handling. `app/Main.hs:96` should use `readMaybe` for `port`, and repo/config file reads should move from lazy `String` IO (`src/DarcsWeb/Config.hs:20`, `src/DarcsWeb/Darcs.hs:103`, `src/DarcsWeb/Darcs.hs:249`) to strict `ByteString` plus explicit decoding.

Put caches or precomputed indexes in front of `readPatches` on hot routes. The current request path repeatedly opens repositories and walks entire histories for metadata and tag views (`src/DarcsWeb/Darcs.hs:107`, `src/DarcsWeb/Darcs.hs:155`).

Tighten dependency bounds or pin a tested solver plan. `darcsweb.cabal:44` is broad enough that the build becomes environment-sensitive; a `cabal.project.freeze`, an explicit upper bound known to work with GHC 8.8, or CI-tested matrix bounds would make verification repeatable.

Reconsider `-rtsopts` for production binaries in `darcsweb.cabal:41` and `darcsweb.cabal:64`. If runtime RTS tuning is not an explicit deployment requirement, disabling or restricting it is the safer default.