# Security Review — darcsweb
## Summary
Overall posture is reasonably careful: the jail checks canonicalize paths,
CSP / security headers are applied by middleware, and the high-risk string
functions (`esc`, `is_safe_sub_path`, `build_csp`, `add_trailing_slash`) are
extracted from Coq so the hot path is formally verified. However, the tree
endpoint bypasses the verified `isSafeSubPath` predicate and therefore
exposes the whole `_darcs/` directory through the normal browsing UI, which
is the single biggest concrete exposure. Several partial functions (`read`,
`init`, `last`, `head`), un-caught `canonicalizePath` exceptions, and lazy
`readFile` in the blob endpoint give an outside attacker multiple ways to
turn crafted URLs / headers into 500s or lingering file handles. The
verification boundary is drawn correctly for the three `gen/` modules but
does not cover `encodePathSegment`, `sanitizeHost`, or the whole IO layer;
those rely on QuickCheck only and the property sets have gaps.
## Critical findings
### C1. `_darcs/` internals reachable through `/repo/:name/tree/_darcs` (path-traversal-equivalent via missing safety predicate)
- **Location:** `src/DarcsWeb/Darcs.hs:196-213` (`getRepoTree`), `app/Main.hs:298-309` (tree subdirectory route).
- **Why it is dangerous:** `getRepoBlob` (`src/DarcsWeb/Darcs.hs:232`) correctly rejects unsafe sub-paths with `isSafeSubPath`, but `getRepoTree` does **not** call `isSafeSubPath` at all. The regex route `"^/repo/([^/]+)/tree/(.+)$"` accepts any `subPath`, including `_darcs`, `.git`, `..`, `.ssh`, etc. `jailCheck` only verifies the canonical path stays inside `repoPath`, which is true for `repoPath </> "_darcs"`. The visibility filter at `src/DarcsWeb/Darcs.hs:209` only strips `_darcs` / `.git` / dot-files from the *listing of children*, not from the directory that is being listed, so the attacker can list `_darcs`'s own children (e.g. `patches/`, `inventories/`, `hashed_inventory`) and navigate deeper.
- **Exploit scenario:** `GET /repo/foo/tree/_darcs` returns a directory listing of `patches`, `inventories`, `pristine.hashed`, `hashed_inventory`, `prefs`, etc. The rendered `<a>` links plus user-supplied URLs like `/repo/foo/tree/_darcs/prefs` and `/repo/foo/blob/_darcs/prefs/author` then leak internal state. Even though `getRepoBlob` is protected, the tree endpoint alone discloses internal file names, sizes, and existence (a reconnaissance oracle). `GET /repo/foo/tree/.git` similarly exposes any accidentally-committed `.git/` directory.
- **Fix:** In `getRepoTree`, reject before doing anything else:
```haskell
getRepoTree repoPath subPath
| not (null subPath) && not (isSafeSubPath subPath) = return Nothing
| otherwise = ...
```
(Empty `subPath` is the root listing and must stay allowed; `isSafeSubPath ""` returns False by design.) Add a QuickCheck property in `test/Properties/Clone.hs` that `getRepoTree` rejects `"_darcs"`, `".git"`, `".."`, `".hidden"`, and any `a/../b`. Also harden the existing root-listing filter to skip entries whose name fails `is_safe_segment`, so that a future bug reintroducing the hole still doesn't let hidden files become clickable.
## High findings
### H1. `read` on the config port is a partial function and crashes the process on any non-numeric value
- **Location:** `app/Main.hs:96` — `port = read (cfgLookupDefault "port" "3000" cfgMap)`.
- **Why it is dangerous:** `Text.Read.read` throws a bottom (`Prelude.error "no parse"`) on any unparseable input. An operator mistyping `port = 3000 # production` (trailing comment) or `port = 3000\r` (stray CR from Windows line endings is stripped by `lines`, so mostly OK, but `port = 8080 ` with trailing garbage, or hex, or empty value `port =`) will take the server down at boot with an error thrown from pure code. The error is not logged through syslog because it happens before `withSyslog`, so in daemon mode it is silently invisible.
- **Exploit scenario:** Not directly remotely exploitable, but a supply-chain attacker with write access to the config file gets an unrecoverable crash-loop (reliability attack). More importantly, the same pattern will likely be copied for future options, and once an attacker-controlled config value enters via any channel (e.g. an admin UI, env var) it becomes an unauthenticated crash.
- **Fix:** Replace `read` with `Text.Read.readMaybe` and fail cleanly:
```haskell
import Text.Read (readMaybe)
port <- case readMaybe (cfgLookupDefault "port" "3000" cfgMap) of
Just p | p > 0 && p < 65536 -> return (p :: Int)
_ -> hPutStrLn stderr "Invalid 'port' in config" >> exitFailure
```
Apply the same pattern to every `read` call added later. Consider enabling `-Wincomplete-uni-patterns` and banning `read` project-wide with a `hlint` rule.
### H2. Blob endpoint uses lazy `Prelude.readFile` + locale encoding on untrusted content up to 10 MiB
- **Location:** `src/DarcsWeb/Darcs.hs:249` — `txt <- T.pack <$> Prelude.readFile fullPath`.
- **Why it is dangerous:** Three problems stacked:
1. `Prelude.readFile` is lazy: it returns a `String` backed by a still-open file handle. If the HTML renderer short-circuits (exception in `esc`, connection drop mid-response), the handle leaks until the next GC. Repeated requests can exhaust file descriptors.
2. `readFile` decodes using the process locale, not UTF-8. A non-ASCII file is silently mangled depending on `$LC_ALL`, which both corrupts the rendered page and can change byte length unpredictably — relevant because `getFileSize` is checked against `maxBlobSize` in bytes, but `T.pack` counts Unicode code points, so the limit is not a true cap on the resulting `Text` memory after decoding.
3. Binary files (images, tarballs) get HTML-escaped character-by-character through the Coq-extracted `HtmlPure.esc` — which operates on `[Char]` — producing a huge transient `String` and a large HTML page (a 10 MiB binary blob can easily render to 40–60 MiB of HTML). An attacker that can commit a 10 MiB binary file and then triggers N parallel `/blob/...` requests can exhaust memory.
- **Exploit scenario:** Commit a 10 MiB file full of `<` characters (each expands to `<`, 4× growth). Fire `curl` in parallel against `/repo/x/blob/big`. Each in-flight request allocates ~40 MiB of `String` plus a `Text` copy. The process OOMs long before the per-request cap is reached. Separately, a file containing a sequence of bytes invalid in the current locale triggers `hGetContents`'s decoder to throw an `IOError` from within the lazy stream, which escapes `getRepoBlob` as an unhandled exception because the `try` wrapper is only around the darcs calls, not around `readFile`.
- **Fix:**
1. Switch to strict byte-level IO: `BS.readFile` + explicit UTF-8 validation via `TE.decodeUtf8'`. Reject non-UTF-8 blobs with a "binary file, cannot display" placeholder.
2. Detect binary content (e.g. any NUL byte in the first 8 KiB) and refuse to render.
3. Wrap the read in `try :: IO (Either SomeException Text)` and return `Nothing` on failure.
4. Enforce the size cap *after* decoding, or in terms of `Text.length`, to prevent locale-based amplification.
### H3. `canonicalizePath` exceptions escape `IO` and surface as 500 responses / HTML-in-plain-text leaks
- **Location:** `app/Main.hs:365` (serveStatic), `app/Main.hs:400` (withRepo), `app/Main.hs:425` (serveClone), `src/DarcsWeb/Darcs.hs:178` (jailCheck).
- **Why it is dangerous:** `System.Directory.canonicalizePath` is documented to throw `IOError` on permission denied, invalid path bytes (NUL), loop in symlinks, and other non-`ENOENT` failures. None of the call sites are wrapped in `try`, so the exception propagates to Scotty's default error handler, which returns a 500 containing the exception's `show` (leaks server paths to the client and bypasses the `render404` path which would have applied the CSP via middleware — actually middleware still applies, but the body is an exception message).
- **Exploit scenario:**
1. NUL injection: if `T.unpack name` contains a NUL (not rejected by `withRepo` — see also M3), `candidate` contains NUL, and the `isDarcsRepo` / `canonicalizePath` chain throws `InvalidArgument` from the FFI layer. Attacker controls the URL path, gets a stack-trace-like 500.
2. Symlink loop: an operator-accidentally-created symlink loop inside `cfgRepoDir` makes every request for that repo throw `too many levels of symbolic links`.
3. Permission denied: a directory inside the repo tree with mode 0000 returns a permission error during the tree walk, leaking disclosure of its existence via the error message.
- **Fix:** Add a small helper:
```haskell
safeCanonicalize :: FilePath -> IO (Maybe FilePath)
safeCanonicalize p = either (\(_ :: SomeException) -> Nothing) Just <$> try (canonicalizePath p)
```
and route all canonicalization calls through it, treating `Nothing` as "404 Not Found". Also wrap `doesFileExist` / `doesDirectoryExist` / `listDirectory` the same way — they can also throw on permission errors.
### H4. `renderBlobBreadcrumb` uses `init` / `last` on a `splitOn` result — crashes on empty `subPath`
- **Location:** `src/DarcsWeb/Html.hs:369-371` — `let parts = T.splitOn "/" subPath; dirParts = init parts; fileName = last parts`.
- **Why it is dangerous:** Both `init` and `last` are partial on `[]`. The blob endpoint is guarded upstream by a `regex "^/repo/([^/]+)/blob/(.+)$"` which requires at least one character, so the current caller always supplies a non-empty string and `T.splitOn "/" nonEmpty` returns a non-empty list. But this is a tight coupling between a route pattern and a render function two modules away; adding an `/repo/:name/blob` alias, or calling `renderBlob` from a future feature with an empty path, crashes the rendering thread. The CSP middleware already runs, so the client would see a 500 with no body; the crash is logged only via syslog if you are in daemon mode.
- **Exploit scenario:** A straightforward refactor adding `get "/repo/:name/blob" ...` would silently introduce a crash reachable by `GET /repo/foo/blob`. Also, if `T.splitOn` semantics change in a future `text` release (the cabal bound allows up to `text < 3`), the invariants no longer hold.
- **Fix:** Make the function total:
```haskell
renderBlobBreadcrumb repoName subPath =
case reverse (T.splitOn "/" subPath) of
[] -> renderRootOnly repoName
[fileName] -> renderRootOnly repoName <> " / " <> esc fileName
(fileName:revDirs) -> ...
```
Or require `subPath` to be non-empty by construction via a newtype. Enable `-Wincomplete-uni-patterns` / `-Wmissing-methods` to catch future regressions.
## Medium / Low findings
### M1. `withRepo` does not reject NUL bytes, control characters, or Windows-reserved names in repo names
- **Location:** `app/Main.hs:385-406`. The guard is `T.any (== '/') name || T.any (== '\\') name || ".." `T.isInfixOf` name || T.isPrefixOf "." name`.
- **Risk:** A repo name of `foo\0bar` passes the guard, is concatenated with `</>`, and the C-based `doesDirectoryExist` truncates at NUL on POSIX (then throws, see H3). A name containing `\n` survives into log messages and HTML output (the `esc` function only escapes `<>&"'` — a newline in a log line allows log-injection if an operator greps logs).
- **Fix:** Require `T.all isSafeRepoChar name` where `isSafeRepoChar c = c > '\x20' && c /= '/' && c /= '\\' && c /= '\x7f'`. Also lower-bound length ≥ 1 and upper-bound (e.g. 255). Consider a single verified predicate `is_safe_repo_name` and add a Coq lemma that every character is printable ASCII.
### M2. `serveStatic` treats a URL-decoded path that resolves *to* the jail directory itself as "inside" because of the trailing-slash check, but `doesFileExist` on a directory returns False — benign, but the logic is duplicated
- **Location:** `app/Main.hs:366-380`. The jail comparison `jailDir `isPrefixOf` canonical` (where `jailDir = addTrailingSlash (cfgStaticDir cfg)`) can *never* accept `canonical == cfgStaticDir cfg` because the canonical path for a file inside the dir always has the trailing slash absent. That's fine — but the same logic in `withRepo` (`app/Main.hs:401`) applies the same check to the repo directory, and there the `canonical` for the repo IS just `cfgRepoDir/<name>` with no trailing slash — also fine. The inconsistency with `jailCheck` in `src/DarcsWeb/Darcs.hs:178-181` which additionally accepts `canonical == init jailSlash` is a code-smell. Consolidate on a single verified helper (see `PathPure.add_trailing_slash` — add a `PathPure.is_inside : String -> String -> Bool` with a proof).
### M3. `Prelude.readFile` is used in `Config.parseConfigFile` (lazy + locale-sensitive)
- **Location:** `src/DarcsWeb/Config.hs:20`.
- **Risk:** Startup-only, so low risk, but if the config path is a FIFO or a slow NFS mount, startup hangs silently. Also the config parser does `break (== '=')` without supporting quoted strings, so a value containing `=` is truncated at the first one (`parseLine` in `src/DarcsWeb/Config.hs:29` — documented behavior, but not obvious to operators).
- **Fix:** Use `withFile`/`hGetContents'` (strict) or `System.IO.readFile'` (if base ≥ 4.15), which avoids the lazy-handle hazard. For the grammar, document the limitation in the config file example or switch to a real parser.
### M4. Integer overflow on `getFileSize` → `Int` on 32-bit targets
- **Location:** `src/DarcsWeb/Darcs.hs:219` — `size <- if isDir then return 0 else fromIntegral <$> getFileSize path`.
- **Risk:** `getFileSize` returns `Integer`, and `teSize :: Int`. On a 32-bit build (`base >= 4.12` does not require 64-bit), a 5 GiB file silently becomes a wrong value (including negative). Not exploitable on the usual 64-bit deployment target, but the cabal dependency bounds do not force 64-bit.
- **Fix:** Either widen `teSize` to `Int64` / `Integer`, or clamp: `size = min (fromIntegral (maxBound :: Int)) (fromIntegral rawSize)`. Also note `maxBlobSize :: Integer` is compared with `getFileSize` (Integer) directly — that part is correct, but the `Int`-valued `teSize` is the hole.
### M5. Lazy IO in `readRepoDescription`
- **Location:** `src/DarcsWeb/Darcs.hs:99-104` — `T.strip . T.pack <$> readFile descFile`.
- **Risk:** Same class as H2 (lazy handle, locale decoding). Called once per repo per request for the index page, so a malicious description file full of invalid-UTF-8 bytes can throw mid-render and 500 the index. Easy DoS against the root page.
- **Fix:** `BS.readFile` + `TE.decodeUtf8With T.lenientDecode`. Also cap at e.g. 16 KiB.
### M6. `listRepos` / `getRepoTree` do not bound the number of entries
- **Location:** `src/DarcsWeb/Darcs.hs:63-67`, `src/DarcsWeb/Darcs.hs:208-213`.
- **Risk:** An operator who points `cfgRepoDir` at a directory with millions of subdirectories (accidental or adversarial) makes every index-page request iterate them all. `getRepoTree` has the same shape at repo-subdirectory level.
- **Fix:** Add a cap (e.g. 10 000 entries) analogous to `maxPatchCount`, and render "N more entries, listing truncated" if exceeded.
### M7. `encodePathSegment` and `sanitizeHost` are not formally verified
- **Location:** `src/DarcsWeb/Clone.hs:63-133`.
- **Risk:** The percent-encoder and host validator are security-critical (they are the reason a crafted `Host` header cannot inject a path or protocol). They have good QuickCheck coverage (`test/Properties/Clone.hs:58-116`), but not the same assurance as `PathPure`. In particular, `encodePathSegment` implements UTF-8 re-encoding by hand (`Data.Bits` / `shiftR` / hex table) — a prime target for an off-by-one that accidentally emits a literal `/`. The QuickCheck prop `prop_clone_encoded_has_no_reserved` only tests a handful of reserved characters.
- **Fix:** Either (a) replace with `Network.HTTP.Types.URI.encodePathSegments` from `http-types` (already a dependency) — this is the standard, battle-tested encoder — or (b) port the logic into Coq (`ClonePure`) with a proof that for every non-unreserved byte the output has the shape `%HH` and that no output character is in the reserved set.
### M8. CSP builder strips only `;` — the directive *name* is not sanitized
- **Location:** `gen/CspPure.hs:37-39` (`build_directive`), all call sites in `app/Main.hs:195-202`.
- **Risk:** Today all directive names are hard-coded string literals in `Main.hs`, so there is no live risk. But `build_directive` applies `sanitize_value` only to the value half, so a future contributor who adds `("script-src", userControlled)` — or worse, `(userControlled, value)` — gets a CSP-injection foot-gun. The API shape makes the insecure path the default.
- **Fix:** Change the type signature so directive names are an enum / newtype with a finite set of allowed values, or sanitize names the same way as values (strip `;`, whitespace, commas, newlines). Add a Coq proof that for any (name, value) pair the output contains at most `(length directives - 1)` semicolons.
### M9. Lazy `T.lines` in `highlightDiff` drops trailing blank lines (correctness, not security) and rebuilds the diff with O(n²) `T.concat` of small chunks
- **Location:** `src/DarcsWeb/Html.hs:418-425`.
- **Risk:** Diff output with a trailing newline loses it in the rendered page. Combined with H2's 10 MiB blob risk, a 10 MiB patch with 10⁶ lines becomes 10⁶ `Text` fragments; `T.concat` is linear in total length but allocates per-fragment. Medium DoS.
- **Fix:** Use a `Builder` (`Data.Text.Lazy.Builder`) or stream directly into the response. Pair with a per-patch size cap.
### M10. `parseDarcsDate` silently accepts only 14-digit strings; `formatAbsolute` produces garbage for shorter / non-digit input
- **Location:** `src/DarcsWeb/Html.hs:431-458`.
- **Risk:** Dates not in `YYYYMMDDHHMMSS` format are returned verbatim from `relativeDate`, then HTML-escaped, so there's no XSS risk. But `formatAbsolute` on a non-digit 14-character string returns something like `alph-ab-et -ic:al:ly` (garbage), which then ends up in the page. Cosmetic / robustness. Consider validating that the input is 14 digits before splitting.
### M11. The daemonization routine does not `chdir("/")`, does not `umask(0)`, and does not close inherited fds beyond 0/1/2
- **Location:** `app/Main.hs:151-159`, `app/Main.hs:161-176`.
- **Risk:** Low. A daemon started from a CWD that is later removed keeps that directory busy. Open fds inherited from the parent (stderr from the shell, the config file handle) remain open.
- **Fix:** After the second fork, do `setCurrentDirectory "/"` (from `System.Directory`), set umask to `0o077`, and close every fd `> 2` (portably via `/proc/self/fd` on Linux, or `closefrom(3)`). Also add `O_CLOEXEC` (`0x80000` on Linux) to the `c_open` flags in `openDevNull` to prevent leaking `/dev/null` to children, and `close` the original `nullFd` after dupping.
### M12. No rate limiting, no request-size limit, no graceful handling of slow clients
- **Location:** `app/Main.hs:133-139`.
- **Risk:** Expected trade-off for a read-only mirror behind Caddy, but the code relies on Caddy providing this. If the README's recommended deployment changes, or someone exposes port 3000 directly (the `bind` setting accepts `0.0.0.0`!), the server is trivially slow-loris-able and memory-exhaustible.
- **Fix:** Document the requirement explicitly, and/or set `setMaximumBodyFlush`, `setTimeout`, `setSlowlorisSize` on the Warp settings. Reject binding to `0.0.0.0` unless an explicit `--allow-public` flag is set.
### L1. `openDevNull` hardcodes `O_RDWR = 2` and never closes `nullFd`
- **Location:** `app/Main.hs:161-176`. Portable on Linux, but not on every POSIX variant. Missing `O_CLOEXEC`.
### L2. `cfgLookupDefault` uses `String` keys and silently ignores unknown config keys
- **Location:** `src/DarcsWeb/Config.hs:38-39`. Typos like `binnd = 0.0.0.0` fall through to the default with no warning.
- **Fix:** Track the set of known keys at boot; warn about any unrecognized key.
### L3. `findRepo` uses `foldr` with accumulator that overwrites earlier matches with later ones
- **Location:** `app/Main.hs:413-414`. `foldr (\r acc -> if riName r == name then Just r else acc) Nothing repos` actually takes the *first* match (because `foldr`'s `acc` is the result of the tail), so behaviour is correct, but this is non-obvious. Prefer `Data.List.find` for clarity.
### L4. `cspHeader` is recomputed never (CAF) but encoded via `BC.pack` from `String` — bytes are ISO-8859-1, not UTF-8
- **Location:** `app/Main.hs:194-202`. Today all directive names/values are ASCII, so there is no observable difference. Worth a comment or a conversion via `TE.encodeUtf8 . T.pack` to be explicit.
### L5. `logSyslog` uses `withCStringLen` which encodes via the current locale, not UTF-8
- **Location:** `app/Main.hs:147-148`. Non-ASCII log messages (e.g. a repo name with an accented character appearing in a future log line) may be mangled in syslog. Use `withCAStringLen` + explicit UTF-8 encoding.
### L6. `head stripped` in `Config.meaningful` is partial but guarded
- **Location:** `src/DarcsWeb/Config.hs:26`. Guard `not (null stripped)` is syntactically adjacent — safe today — but future refactoring can separate them. Prefer `case stripped of (c:_) -> c /= '#'; [] -> False`.
### L7. `splitPath` exported from `DarcsWeb.Darcs` is unused outside tests
- **Location:** `src/DarcsWeb/Darcs.hs:189-191`. The cabal file already scopes it with a comment, but exposing unnecessary internals widens the attack surface for future misuse. Move behind a `DarcsWeb.Internal` module or use Haddock `#hide`.
## Verification gaps
1. **`getRepoTree` diverges from the verified path model.** `PathPure.is_safe_sub_path` guarantees rejection of `_darcs`, `.git`, `..`, hidden segments — but the Haskell caller only applies the predicate in `getRepoBlob`. Every IO-edge that maps a URL sub-path to a filesystem path MUST route through `isSafeSubPath`; add a lint rule / code review checklist. A natural formalization is a small wrapper type (e.g. `newtype SafeSubPath = SafeSubPath FilePath`) whose only constructor is `mkSafeSubPath :: FilePath -> Maybe SafeSubPath` using the verified predicate.
2. **`CspPure.build_csp` is proven not to emit `;` inside a value, but not inside a *directive name*.** The test suite’s `prop_csp_no_value_semicolons` only checks values. Add a proof / property that `count ';' (build_csp ds) <= length ds - 1`.
3. **`sanitizeHost`, `validAuthority`, `encodePathSegment` are not in Coq.** They are the second-most-critical string transformations in the project (after `esc`). At minimum, add QuickCheck properties stating:
- `validAuthority h ==> all (`elem` allowedAuthorityChars) h` (currently implicit via structural check);
- `encodePathSegment t` contains no character in `"/?#& \t\r\n"`;
- `encodePathSegment t` is idempotent modulo case of the hex digits when fed ASCII-only input of unreserved chars.
Long term, port them to Coq alongside `PathPure`.
4. **No property exercises the end-to-end IO layer.** All QuickCheck tests target the pure modules. There is no test that (a) sets up a temp directory, (b) throws symlinks / hidden files at `serveStatic` / `withRepo` / `serveClone` / `getRepoTree` / `getRepoBlob`, and asserts the 404 / 403 / content outcome. These functions are exactly where the verified pieces get wired into IO; the coverage gap is precisely the one that lets C1 and H3 exist.
- Recommended properties:
- `forall p in generator(unsafe), getRepoTree repoRoot p == Nothing`
- `forall symlink escaping repoRoot, getRepoBlob repoRoot symlink == Nothing`
- `forall config without "port", main exits cleanly with a diagnostic` (property over `parseOpts` + port parsing)
- `buildCloneUrl` + `withRepo` round-trip: the clone URL for a validated repo name always percent-decodes back to that same name.
5. **`parseConfigFile` is not modeled at all.** It accepts arbitrary UTF-8 and could benefit from a tiny parser spec + QuickCheck.
## Hardening recommendations
### Compiler flags (darcsweb.cabal)
Replace `ghc-options: -Wall` in each stanza with:
```
ghc-options:
-Wall
-Wcompat
-Widentities
-Wincomplete-record-updates
-Wincomplete-uni-patterns
-Wpartial-fields
-Wredundant-constraints
-Wmissing-export-lists
-Wmissing-home-modules
-Wno-unused-do-bind
-fwarn-tabs
```
For the executable, additionally: `-rtsopts=-N -A64m -RTS` is fine, but consider capping heap: `-with-rtsopts="-N -M512M"` to convert OOM into a clean process death instead of host-memory exhaustion.
### Dependency bounds
- `base >= 4.12` admits GHC 8.6 (2018) which has known codegen bugs. Raise to `base >= 4.16` (GHC 9.2+, LTS).
- `text >= 1.2 && < 3` spans a breaking change around `text-2.0` (UTF-8 by default). Pin to `text >= 2.0 && < 3` and update code that assumes per-byte semantics.
- `directory >= 1.3 && < 2` — OK, but document that `canonicalizePath` semantics changed in `directory-1.3.2`; require at least that version.
### Safer defaults
- Ban `read` and `readFile` in the codebase (hlint rule). Use `readMaybe`, `BS.readFile`, `TE.decodeUtf8'`.
- Wrap every `canonicalizePath` / `doesFileExist` / `doesDirectoryExist` / `listDirectory` call in `try`; treat any exception as "not found".
- Introduce newtype wrappers for security-relevant strings:
```haskell
newtype SafeSubPath = SafeSubPath FilePath
newtype SafeRepoName = SafeRepoName Text
newtype JailedPath = JailedPath FilePath
```
Only smart constructors (`mkSafeSubPath`, etc.) in their defining modules; every IO function that touches a filesystem path takes a newtype argument. This makes C1 a compile-time error.
- Replace the ad-hoc bind-address default `"127.0.0.1"` with a type: `data BindAddr = Loopback | Public IPv4` and require a `--allow-public` flag to construct the second.
- Move all extracted pure modules (`HtmlPure`, `PathPure`, `CspPure`) behind a consolidated `DarcsWeb.Verified` re-export with `Haddock` pointers to the corresponding `.v` file, and enforce via `cabal-fmt` / CI that no other module directly imports them — only through the wrappers that give them domain types.
### Middleware / HTTP
- Add `Strict-Transport-Security` (if you know TLS will be terminated upstream — safe to always send).
- Add `Cross-Origin-Opener-Policy: same-origin` and `Cross-Origin-Resource-Policy: same-origin`.
- Drop `X-Robots-Tag` to `noindex` only (the additional `nofollow` makes sense only on pages with outbound links you don't want followed; the rendered diff/log pages are a reasonable target for search engines on an intentionally-public mirror — re-evaluate).
- Consider adding a `Cache-Control: private, max-age=60` for non-static endpoints, both for performance and to reduce load (DoS mitigation) — today every repo-list request re-walks the repo tree.
### Tests
- Add `test/Properties/Path.hs` covering `jailCheck`, `getRepoTree`, `getRepoBlob` against symlinked and hidden-file temp-dir fixtures.
- Add a QuickCheck property that `GET /repo/:name/tree/:subpath` returns 404 whenever `isSafeSubPath subpath == False` — a simple integration test using `scotty` + `wai-extra`'s `runSession`.
- Add a property that the index page is O(1) HTML output size regardless of the number of dot-files inside `cfgRepoDir` (uses M6 cap).
- Require 100% branch coverage on `app/Main.hs`'s `withRepo`, `serveStatic`, `serveClone` (they are the security perimeter).