A Hidden Markov Model does not observe the market regime directly. It infers it — from a sequence of observable returns — by assuming that the visible data was generated by an unobserved latent process switching between a finite number of hidden states. The inference is probabilistic: the model produces, for each trading day, a posterior probability distribution across states rather than a definitive label. That distinction — probability distribution rather than point estimate — turns out to be the source of most of the edge the HMM provides.

Stack$Trader's Phase 2 deployed a 3-state Gaussian HMM trained on SPY daily returns over a five-year window (2021-03-30 through 2026-02-27, 1,235 trading days). The implementation uses the hmmlearn library's GaussianHMM class with covariance type 'full', trained via the Baum-Welch expectation-maximization algorithm to convergence. The model has three hidden states, each parametrized by a Gaussian emission distribution — mean μ and variance σ² — and a 3×3 transition probability matrix A governing how likely the process is to remain in or exit each state on any given day.

1 The Three-State Solution

After training, the Viterbi algorithm decodes the most likely hidden state sequence given the observed returns. The three emergent states were labeled by their statistical properties, not by assumption:

State 0
Grind
Days: 812
% of time: 66%
Mean daily return: +0.088%
Ann. volatility: 11.3%
Theta edge: Real & scalable
State 1
Mean-Reversion
Days: 213
% of time: 17%
Mean daily return: +0.055%
Ann. volatility: 23.2%
Theta edge: Zero at all DTE
State 2
Crisis
Days: 210
% of time: 17%
Mean daily return: −0.071%
Ann. volatility: 23.2%
Theta edge: Negative

Two observations from this decomposition are immediately striking. First, Grind and Crisis share nearly identical volatility levels — the HMM's emission distributions have similar σ² for States 1 and 2 — which means a volatility filter alone cannot distinguish between them. The critical discriminant is the mean return: +0.088% per day in Grind versus −0.071% in Crisis. A simple VIX threshold would classify both as "elevated volatility" and either block all trades or allow all of them indiscriminately. The HMM separates them.

Second, Mean-Reversion occupies exactly the same volatility band as Crisis but carries a positive — if statistically insignificant — mean return. This state represents the market's decompression phase: volatility is elevated, participants are adjusting positions, but the directional panic has subsided. It is not a premium-selling environment, but it is not a crisis either. Treating it as either would be wrong.

Baum-Welch in brief

The EM algorithm iterates between an E-step — computing the expected number of state transitions and time spent in each state given current parameters — and an M-step that updates the emission distributions and transition matrix to maximize the data likelihood. Convergence is to a local maximum, so multiple restarts with different initializations were used.

The HMM's output at inference time is not a state label but a forward-backward posterior vector: model.predict_proba(obs) returns for each observation day a three-element vector [P(Grind), P(MeanRev), P(Crisis)] that sums to 1.0. It is the treatment of this vector — how it is consumed downstream — that separates the Phase 2 approach from naive regime filtering.

2 Binary Filter vs. Posterior Weighting

The naive approach to HMM-based regime filtering is binary: if the decoded state is Grind, trade; otherwise, abstain. This was Experiment A's baseline, and it improved Sharpe from 1.93 to 2.67 — a genuine improvement, but not the dominant result. The key insight was that binary decoding discards information. On a day where P(Grind) = 0.51, the binary decoder says "trade" with the same conviction as a day where P(Grind) = 0.97. These are not equivalent situations.

The posterior confidence gate replaces the binary threshold with a continuous filter on P(Grind)². The squared posterior amplifies the discrimination: a day with P(Grind) = 0.9 scores 0.81, while a day with P(Grind) = 0.6 scores 0.36 — a much starker separation than the linear posterior produces. The gate condition is P(Grind)² > 0.25, equivalent to P(Grind) > 0.5, but the squaring serves a second function: it feeds directly into a position-sizing multiplier in the PortfolioModel.

Posterior Confidence Gate — agents.py · HMMAgent.score() γ_t = P(S_t = Grind | O_{1:T}) ← forward-backward posterior confidence = γ_t² ← squared to amplify discrimination entry_allowed ⟺ confidence > 0.25 (i.e. γ_t > 0.5) size_multiplier = confidence ← scales contract count continuously Where O_{1:T} is the observed return sequence and S_t the latent state at time t.

In practice, the gate admits exactly 812 days — those where the Viterbi state label is Grind — because when the HMM is in the Grind state it is typically very confident. The posterior distribution collapses near [1, 0, 0] in the heart of a stable Grind regime and spreads across states near regime transitions. The gate effect is therefore most powerful at transition boundaries, where the binary decoder is most likely to issue a false entry signal just as the regime is deteriorating.

Strategy Entries Edge/trade Sharpe p-value
Baseline — always in 1,235 0.272% 1.93 1.000
Binary HMM (Grind = trade) 1,025 0.337% 2.67 0.015
Posterior > 0.5 816 0.402% 4.03 0.005
P(grind)² > 0.25 812 0.403% 4.07 0.004

The implementation of the HMMAgent inside agents.py exposes the posterior score on the [-1, +1] scale the Supervisor expects. A P(Grind) of 1.0 maps to +1.0; P(Crisis) of 1.0 maps to −1.0; the Mean-Reversion state maps to 0.0 (neutral — don't trade, but don't panic). The continuous mapping preserves the full information content of the posterior for the Supervisor's weighted aggregation.

agents.py — HMMAgent.score() (simplified)
def score(self, data: pd.DataFrame, ctx: MarketConditions) -> float:
    returns = data['close'].pct_change().dropna().values.reshape(-1, 1)

    # Fit or reuse cached model
    if self._model is None:
        self._model = GaussianHMM(
            n_components=3,
            covariance_type='full',
            n_iter=200,
            random_state=42
        ).fit(returns)

    # Posterior probabilities for the most recent bar
    posteriors = self._model.predict_proba(returns)
    latest     = posteriors[-1]          # [P(Grind), P(MR), P(Crisis)]

    grind_idx  = self._identify_grind_state()
    crisis_idx = self._identify_crisis_state()

    p_grind  = latest[grind_idx]
    p_crisis = latest[crisis_idx]

    # Map to [-1, +1]: grind → positive, crisis → negative, MR → near 0
    raw_score = p_grind - p_crisis

    # Confidence gate: attenuate near-boundary posteriors
    confidence = p_grind ** 2
    return raw_score * min(1.0, confidence / 0.25)

One implementation detail worth noting: _identify_grind_state() does not hard-code state index 0 as Grind. It inspects the trained model's emission parameters after fitting and assigns labels by comparing mean returns and volatility levels — the state with the highest mean daily return and lowest variance is Grind, the state with the most negative mean is Crisis. This makes the agent robust to HMM initialization variance: different random seeds will produce different state orderings, but the label assignment is always correct.

3 The Holding Period Scaling Law

The most consequential finding of Phase 2 was not the HMM filter itself but what it revealed about the temporal structure of the Grind regime's edge. The experiment varied holding period from 1 to 20 days while conditioning on the same Grind regime filter, measuring per-trade edge and Sharpe at each horizon. The result was a near-perfect linear scaling that maps directly to the mechanics of theta decay.

Grind Regime — Edge vs. Holding Period (5y SPY, 2,000 permutations)
1d
0.088%  p=0.06
3d
0.257%  p=0.01
5d
0.400%  p=0.006
10d
0.747%  p=0.001
15d
1.127%  p=0.0005
20d
1.502%  p=0.0005

The scaling is approximately 0.075% per additional day of holding in the Grind regime. This is not a coincidence. Under Black-Scholes, daily theta for an at-the-money option is approximately proportional to implied volatility divided by twice the square root of time to expiration. In a Grind regime — low realized vol, stable drift, no gap risk — the collected premium decays toward zero along precisely this schedule, and none of it is given back via assignment or stop-out. The holding period analysis is empirical confirmation that the theoretical theta decay schedule is being harvested in practice, without contamination from the regime-driven volatility events that would interrupt it.

The Mean-Reversion regime shows zero edge at every holding period. This result is not simply that the edge is smaller in Mean-Rev — it is genuinely absent. The elevated implied volatility during Mean-Rev periods is correctly priced: the market is charging fair value for the insurance, and the premium seller receives no risk-adjusted excess return. Selling premium into elevated Mean-Rev vol is not a conservative strategy; it is a structurally zero-edge trade that exposes the portfolio to the very tail risk the premium is supposed to compensate for.

"Selling premium into an elevated Mean-Reversion regime is not conservative. The IV is correctly priced. You are paid nothing for the tail risk you are accepting."

Phase 2 Report — Stack$Trader Development Journal
4 The Transition Topology

Experiment B tested whether regime transition probabilities — the entries of the HMM's learned A matrix — could provide entry signals at regime inflection points. The hypothesis was that the beginning of a new Grind regime, identified by a high transition probability from Crisis or Mean-Reversion into Grind, should be an attractive entry point: regime just turned favorable, premium still elevated from the prior stress period.

The data refuted this hypothesis completely, but in a way that revealed something structurally important about how market regimes sequence:

Observed Regime Transition Topology — 5y SPY (1,235 days)
Crisis
210 days
Mean-Rev
213 days
Grind
812 days
✕   Direct Crisis → Grind transitions: 0 observed in 5 years

In five years of SPY daily data, there were zero direct transitions from Crisis to Grind. The process always routes through Mean-Reversion first — a structural decompression phase where volatility remains elevated but the directional collapse has stopped. This is not an artifact of the HMM's training. It is a real feature of how equity market stress resolves: volatility does not snap back to 11% annualized from 23%+; it decays through a transitional period of elevated-but-declining uncertainty before regime stabilization.

The practical consequence is that any entry strategy that attempts to catch the "inflection point" at the start of a Grind regime will always enter during the tail of the Mean-Reversion phase, where the theta edge is zero. The only consistently correct behavior is to wait for the HMM to achieve a high posterior confidence in the Grind state before entering — which is precisely what the P(Grind)² gate implements.

5 Ticker-Level HMM and the Federated Approach

Phase 3 generalized the HMM result beyond SPY by testing whether individual ticker regimes are coherent with — or independent of — the market-level SPY HMM classification. The Multi-Ticker Coherence test ran two configurations for each of AAPL, NVDA, and TSLA: trading on SPY's Grind regime (Market HMM) and trading on a per-ticker HMM trained on the individual stock's own return series (Self HMM).

Ticker Market HMM Edge Market p Self HMM Edge Self p Verdict
AAPL 1.84% 0.07 1.77% 0.07 Coherent
TSLA 3.98% 0.11 5.80% 0.06 Self-dominant
NVDA 0.88% 0.36 0.75% 0.38 No edge

AAPL is the canonical coherent asset: its return distribution closely tracks SPY's regime states, and the Market HMM and Self HMM produce nearly identical results. When SPY grinds, AAPL grinds. The systematic premium-selling edge generalizes directly.

TSLA demonstrated idiosyncratic regime structure: its own HMM produces a 5.80% per-trade edge that the market-level HMM understates at 3.98%. TSLA experiences quiet windows — periods of low volatility and directional drift — that are independent of SPY's macro regime. Forcing a market-level regime label onto TSLA misses these idiosyncratic Grind periods entirely. The result established the default behavior in supervisor_config.json: mce_hmm_ref_ticker = current_ticker, so each stock's HMM is trained on its own return series.

NVDA was the critical failure. Both Market HMM and Self HMM produced p-values above 0.30 — statistically indistinguishable from random timing. The 3-state Gaussian HMM is a good model for index-like instruments where volatility clustering is the dominant dynamic. For a hyper-growth semiconductor stock with frequent earnings-driven gap events and momentum-driven volatility regimes, the model is misspecified. NVDA's return distribution is too fat-tailed and too asymmetric for a Gaussian emission model to capture the structure that matters.

Implementation note — supervisor_config.json

Per-ticker HMM configuration is stored as "hmm_ref_ticker": "TICKER" in each ticker's config block. Setting this to the ticker itself enables the Federated HMM. Setting it to "SPY" forces Market HMM coherence. For index ETFs (SPY, QQQ, IWM), both configurations produce equivalent results.

6 The HMM Agent in the Full Ensemble

By Phase 7, the HMMAgent occupied a fixed weight of 0.20–0.40 across all five character presets — never the highest-weighted agent, but never absent. Its role in the ensemble shifted over the project's development from primary signal to regime filter: a component whose negative score (high P(Crisis) or high P(MeanRev)) carries veto weight regardless of what the technical or microstructure agents say.

The Supervisor's disagreement penalty amplifies this effect. When the HMMAgent scores −0.85 (high crisis posterior) and the TechnicalAgent scores +0.7 (bullish price momentum), the high variance across agent scores triggers a disagreement penalty that reduces the composite below entry threshold. This behavior is correct: a technically bullish picture during a statistical crisis regime is precisely the setup most likely to produce an abrupt reversal and assignment event. The HMM catches it; the technical analysis misses it.

The holding period analysis from Phase 2 also directly shaped the DTE selection logic wired into the engine. A confirmed Grind regime with high P(Grind) sets DTE targets in the 15–20 day range to maximize the linear theta scaling demonstrated in Experiment E. A deteriorating Grind — P(Grind) declining from prior session — shifts the DTE recommendation shorter, reducing gamma exposure before the potential regime transition materializes. The HMM is not just a binary gate on entries; it is a continuous signal that modulates position structure throughout the holding period.

The final architecture point is the most important one: the HMM provides no directional alpha. Phase 6's theta-normalized permutation test confirmed this at p=1.000 — the system's timing edge is zero when the theta component is removed. What the HMM provides is a filter that concentrates trading activity in the 66% of days where theta decay operates without structural interruption, and eliminates exposure during the 34% of days where it does not. The 66% contain all the edge. The 34% contain all the risk. Knowing which is which, reliably and in real time, is the entire contribution of the Hidden Markov Model to the Stack$Trader ensemble.

· · ·