KDE defaults bandwidth to Silverman per axis; football multi-clusters over-smooth as a known trade
Zero-config KDE uses Silverman's rule (`h = σ × n^(-1/6)`) independently on x and y. Silverman is designed for unimodal distributions, and multi-cluster football patterns (wing + centre + striker zones) over-smooth as a result. That's a known limitation, surfaced as a `[kde.low-confidence]` warning on sparse inputs.
Context
Bandwidth selection is the load-bearing parameter for a KDE. Four plausible defaults:
- Silverman — closed-form, deterministic, per-axis. Smooths too much on multi-modal data.
- Scott — another closed-form rule; marginally less smoothing than Silverman.
- Cross-validated — data-driven, better fit, expensive, non-deterministic between small data tweaks.
- Fixed constant — simple but arbitrary.
Football events are typically multi-modal (wingers vs centre-backs vs strikers operate in distinct zones). A Silverman-smoothed heat blob over all of them is visually clean but analytically blurry. The alternatives each have their own cost (Scott is barely different; CV is expensive and jittery; fixed is arbitrary).
Decision
bandwidth: "auto" defaults to Silverman, applied per-axis. bandwidth
accepts a number (same units as the canonical pitch frame) or a
[bwX, bwY] tuple for asymmetric smoothing. Sparse inputs (< 10 events)
emit [kde.low-confidence]. Non-finite or negative bandwidths fall back
with a warning.
Consequences
- Default output is deterministic and reproducible in tests.
- Consumers analysing multi-cluster patterns pass a tighter manual bandwidth; the spec documents this as the canonical override.
- Cross-validated bandwidth is deferred; adds cost and non-determinism without a clearly better editorial default.
- The compute layer exposes the resolved bandwidth on the model so consumers can inspect what was used — important when the fallback kicks in silently.