accepted

KDE defaults bandwidth to Silverman per axis; football multi-clusters over-smooth as a known trade

Zero-config KDE uses Silverman's rule (`h = σ × n^(-1/6)`) independently on x and y. Silverman is designed for unimodal distributions, and multi-cluster football patterns (wing + centre + striker zones) over-smooth as a result. That's a known limitation, surfaced as a `[kde.low-confidence]` warning on sparse inputs.

KDE algorithmdefault-behaviour

Context

Bandwidth selection is the load-bearing parameter for a KDE. Four plausible defaults:

  • Silverman — closed-form, deterministic, per-axis. Smooths too much on multi-modal data.
  • Scott — another closed-form rule; marginally less smoothing than Silverman.
  • Cross-validated — data-driven, better fit, expensive, non-deterministic between small data tweaks.
  • Fixed constant — simple but arbitrary.

Football events are typically multi-modal (wingers vs centre-backs vs strikers operate in distinct zones). A Silverman-smoothed heat blob over all of them is visually clean but analytically blurry. The alternatives each have their own cost (Scott is barely different; CV is expensive and jittery; fixed is arbitrary).

Decision

bandwidth: "auto" defaults to Silverman, applied per-axis. bandwidth accepts a number (same units as the canonical pitch frame) or a [bwX, bwY] tuple for asymmetric smoothing. Sparse inputs (< 10 events) emit [kde.low-confidence]. Non-finite or negative bandwidths fall back with a warning.

Consequences

  • Default output is deterministic and reproducible in tests.
  • Consumers analysing multi-cluster patterns pass a tighter manual bandwidth; the spec documents this as the canonical override.
  • Cross-validated bandwidth is deferred; adds cost and non-determinism without a clearly better editorial default.
  • The compute layer exposes the resolved bandwidth on the model so consumers can inspect what was used — important when the fallback kicks in silently.
← All decisions