# Migration Plan: Refactoring `Stats.hs` HTML Generation
This document outlines a comprehensive migration plan for refactoring `build/Stats.hs` from manual string concatenation to a type-safe HTML combinator library, specifically `blaze-html`.
## Current Architecture and Issues
Currently, `build/Stats.hs` generates the HTML for the `/build/` and `/stats/` telemetry pages by manually concatenating raw strings (e.g., `"
" ++ ...`).
This approach has several drawbacks:
1. **Security (XSS):** It is trivial to introduce Cross-Site Scripting (XSS) vulnerabilities if dynamic content (like post titles) is not manually escaped before being interpolated into the HTML string. The audit report specifically flagged the `link` function for this.
2. **Correctness:** It is easy to produce malformed HTML (e.g., missing closing tags, improperly nested elements, unescaped attributes) because the compiler cannot verify the structure of the string.
3. **Maintainability:** Complex HTML structures (like the 52-week activity heatmap) become difficult to read, modify, and debug when buried within string interpolation logic.
4. **Elegance:** It goes against the functional paradigm of building type-safe abstractions.
## Proposed Solution: `blaze-html`
`blaze-html` is a fast, mature, type-safe HTML combinator library for Haskell. It allows you to construct HTML documents using native Haskell functions and operators. By ensuring text and attribute values are escaped by default, it substantially reduces XSS risk. Furthermore, it improves structural correctness and reduces malformed markup by constructing HTML through typed combinators instead of ad hoc string concatenation.
**Scope:** This migration covers `build/Stats.hs` only. The separate `Site.hs` JSON-string-concat issue from the audit report is a distinct fix and is not addressed here.
For SVG generation (the heatmap), we will **not** add `blaze-svg` as a dependency. It is not currently in `cabal.project.freeze` and adding it would risk the dependency-resolution instability the audit already flagged. Instead, SVG elements will be emitted via blaze-html's custom-element facility (`Text.Blaze.Internal.customParent` / `customAttribute`), or via a small local helper module. This achieves type-safe SVG emission without a new dependency.
### 1. Dependency Updates
`blaze-html 0.9.2.0` is already pinned in `cabal.project.freeze` as a transitive dependency of Hakyll/Pandoc. The only required change is to declare it explicitly in `levineuwirth.cabal`.
* **Modify `levineuwirth.cabal`:** Add `blaze-html >= 0.9 && < 0.10` to the `build-depends` section of the `site` executable.
* **No freeze update required.** The package is already resolved; no `cabal freeze` run is needed.
### 2. Module Imports
In `build/Stats.hs`, import the core `blaze-html` modules:
```haskell
import qualified Text.Blaze.Html5 as H
import qualified Text.Blaze.Html5.Attributes as A
import Text.Blaze.Html.Renderer.String (renderHtml)
```
For SVG custom elements (heatmap), use blaze-html's internal custom-element facility:
```haskell
import qualified Text.Blaze.Internal as BI
```
Hakyll's `makeItem` takes a `String`, so `renderHtml :: Html -> String` is the correct renderer. Use it and stop there — the stats page is a few dozen KB at most and performance is not a concern.
### 3. Refactoring Strategy
The refactoring process should be approached incrementally, function by function. **Crucially, intermediate functions must return `H.Html`, with rendering to `String` occurring only at the absolute outer boundary.**
#### Phase 1: URL Sanitization and Core Helpers
While `blaze-html` escapes text and attributes, it **does not validate URLs**. An attacker could still inject `javascript:alert(1)` into an `href` attribute. We must introduce URL validation alongside our typed HTML helpers.
* **URL Validation:**
`isSafeUrl` is defense-in-depth: in current code every URL is produced by Hakyll's `getRoute` or constructed as a `/tag/` string, so there is no live XSS surface. Nevertheless, include it to prevent regressions.
The naive prefix check in string-land fails on `JavaScript:` (case), `\tjavascript:` (leading whitespace), and `data:text/html` attacks. Use a case-insensitive, stripped allowlist instead:
```haskell
import Data.Char (isSpace, toLower)
isSafeUrl :: String -> Bool
isSafeUrl u =
let norm = map toLower (dropWhile isSpace u)
in any (`isPrefixOf` norm) ["/", "https://", "mailto:", "#"]
safeHref :: String -> H.AttributeValue
safeHref u
| isSafeUrl u = H.stringValue u
| otherwise = H.stringValue "#"
```
Note: `http://` is intentionally excluded (mixed-content over HTTPS).
* **`link`:**
* *New:*
```haskell
link :: String -> String -> H.Html
link url title = H.a H.! A.href (safeHref url) $ H.toHtml title
```
* **`section`:**
* *New:*
```haskell
section :: String -> String -> H.Html -> H.Html
section id_ title body = do
H.h2 H.! A.id (H.stringValue id_) $ H.toHtml title
body
```
* **`table` and `dl`:**
These will utilize monadic `do` notation or `mapM_` over lists to generate rows and cells, returning `H.Html` natively.
* **Static TOC builders (`statsTOC`, `pageTOC`):** These also emit string-concat HTML and must be migrated here alongside the other primitives, not left for later.
#### Phase 2: Structural Components
Tackle the larger layout functions once the basic primitives are type-safe.
* **`renderContent`, `renderPages`, `renderDistribution`, `renderTagsSection`, `renderLinks`, `renderEpistemic`, `renderOutput`, `renderRepository`, `renderBuild`, `renderCorpus`, `renderNotable`, `renderMonthlyVolume`, `renderStatsTags`:**
All of these return `String` today and must be updated to return `H.Html`. They will compose the newly typed helper functions (`section`, `table`, `dl`).
*Example logic for a table row:*
```haskell
H.tr $ mapM_ (H.td . H.toHtml) cells
```
#### Phase 2.5: Lift the Heatmap's Inline `