The simplest summary of the leaderboard data is also the most unsettling: the best covered call strategy on Lockheed Martin outperforms the best covered call strategy on Apple by a factor of nine. The same system, the same entry logic, the same statistical validation framework, applied to two of America's largest companies, produces a Sharpe ratio of 4.51 in one case and 0.50 in the other. Understanding why requires setting aside the usual explanations — better signals, smarter exits, more sophisticated agents — and confronting the more fundamental question of which assets are structurally suited to premium harvesting and which are not.

923 Strategies
evaluated
4.51 Best OOS
Sharpe (LMT)
59% Positive
alpha rate
3.89× Walk-Forward
Efficiency
I The Shape of the Distribution

Nine hundred and twenty-three strategies were evaluated across 41 tickers and 22 distinct entry signal variants. The distribution of out-of-sample Sharpe ratios is neither bell-shaped nor random. It is bimodal, with a significant mass of strongly positive results clustered between Sharpe 2.0 and 4.5, a large body of strategies near zero, and a long left tail of negative outcomes that extends — in one extraordinary case — to −39.95.

OOS Sharpe Distribution — 923 Strategies  ·  Walk-Forward Validated
−40−5−20+1+2+3+4.51
Positive Sharpe (544 strategies, 59%)
Negative Sharpe (379 strategies, 41%)
Near zero (±0.5)

The five-tier classification system — S, A, B, C, F — was defined by Sharpe thresholds set before the search began, not fitted to the results afterwards. Tier S requires an out-of-sample Sharpe above 2.0, a standard that most professional systematic strategies would be satisfied to meet across an entire fund. The system produced 192 such strategies, representing 21% of everything tested.

Tier OOS Sharpe threshold Count % of total Interpretation
S Sharpe > 2.0 192 20.8% Elite. Deploy with full confidence.
A 1.5 – 2.0 97 10.5% Strong. Portfolio candidates.
B 0.5 – 1.5 124 13.4% Viable. Monitor closely in production.
C 0.0 – 0.5 131 14.2% Marginal. Structural edge unclear.
F Sharpe < 0 379 41.1% Failed. Active destruction of capital.

The 41% failure rate is not a cause for alarm; it is a cause for reflection. A well-designed search should produce many failures — they are the evidence that the positive results are not artefacts of a permissive filtering process. If every strategy had shown positive Sharpe, the correct inference would be that the test framework was contaminated. Failure is the price of a credible pass.

II The Top Twenty
What is WFE?

Walk-Forward Efficiency is the ratio of out-of-sample Sharpe to in-sample Sharpe. WFE > 1.0 means the strategy performs better on data it never saw than on data it was trained on — the hallmark of a structural rather than fitted edge.

The top twenty strategies by out-of-sample Sharpe span just five tickers: LMT, SLV, XOM, CVX, and HON. The concentration is not accidental. It reflects a structural property of the underlying assets that will be examined in detail in the next section. First, the rankings.

# Ticker Strategy OOS Sharpe IS Sharpe WFE
1LMTcc_vix4.511.034.36×
2SLVcc_bbsq4.273.231.32×
3SLVcc_vrp4.273.561.20×
4LMTcc_earn4.250.805.32×
5XOMcc_bbsq4.171.083.87×
6LMTcc_vrp4.110.934.40×
7XOMcc_vrp4.020.577.02×
8XOMcc_sink3.980.924.32×
9XOMcc_rsi3.971.083.67×
10LMTcc_bbsq3.970.844.71×
11CVXcc_sink3.880.834.69×
12HONcc_bbsq3.860.419.41×
13XOMcc_macd3.841.023.75×
14GLDcc_bbsq3.801.822.09×
15HONcc_vrp3.77−0.13
16LMTcc_ma2003.770.745.12×
17XOMcc_term3.721.143.27×
18XOMcc_vix3.721.143.27×
19XOMcc_always3.721.143.27×
20GLDcc_vol3.701.392.67×

Row 15 is the one that demands explanation. HON/cc_vrp achieves a Sharpe of 3.77 out-of-sample while posting an in-sample Sharpe of −0.13 — negative during training, exceptional during testing. The Walk-Forward Efficiency calculation produces a divide-by-zero and is listed as undefined. This result is not a data error. It is the most extreme instance of a pattern visible throughout the top of the table: out-of-sample performance consistently and substantially exceeding in-sample performance. A WFE of 4.36 for the top-ranked strategy means that on data the model never saw during optimisation, it returned more than four times the risk-adjusted performance it showed during training. This is the opposite of what the overfitting hypothesis predicts.

"The portfolio actually performs better out-of-sample than in-sample. This is characteristic of a structural edge — like theta decay — rather than a fitted pattern."

Phase 6 Statistical Validation Report  ·  Stack$Trader Documentation
III The Ticker Divide

The most striking pattern in the full 923-strategy dataset is not the magnitude of the best results — it is the consistency of the split between winning and losing ticker universes. There are two worlds in this data, and the dividing line runs directly between the old economy and the new.

Winners — Structural Edge Present
Low-volatility, low-narrative, high-yield underlyings

Energy majors (XOM, CVX), aerospace and defence (LMT, HON), precious metals ETFs (GLD, SLV), and diversified equity income (VYM). These assets share three properties: moderate implied volatility relative to realised, low susceptibility to overnight gap risk from earnings surprises or product launches, and stable option premium as a fraction of share price.

LMT 4.51  ·  SLV 4.27  ·  XOM 4.17  ·  CVX 3.88  ·  HON 3.86  ·  GLD 3.80  ·  VYM 3.50
Losers — Structural Edge Absent
High-volatility, high-narrative, momentum-driven underlyings

Big tech (MSFT, AAPL, AMZN, GOOG), high-beta growth (TSLA, SOFI, PLTR), crypto proxies (COIN, IBIT), and speculative small-caps. These assets share: high realised volatility that frequently exceeds implied, sustained upward price momentum that forces covered calls to be assigned or rolled at a loss, and binary event risk (product launches, analyst upgrades, macro betas) that overwhelms the theta premium.

MSFT −2.94  ·  TSLA −1.43  ·  AMZN −1.87  ·  AAPL −1.64  ·  IBIT −1.53

The intuition is this: a covered call earns money when the underlying moves less than implied volatility suggests it will. In stable, cash-flow-generating businesses — an oil major, a defence contractor, a gold ETF — this condition is reliably met. Implied volatility for these assets incorporates a structural risk premium that consistently exceeds realised volatility over a cycle. In technology stocks, the opposite holds: the stocks actually move as much or more than options markets price in, and additionally tend to move directionally upward (capturing the call premium while generating assignment losses from truncated upside).

The data make this concrete. Ranked by their single best strategy across all entry variants:

LMT  S
4.51
SLV  S
4.27
XOM  S
4.17
CVX  S
3.88
HON  S
3.86
GLD  S
3.80
IWM  S
3.09
NVDA  B
0.80
SPY  B
0.86
AAPL  B
0.50
TSLA  F
−1.43
MSFT  F
−2.94
AMZN  F
−1.87
IBIT  F
−1.53
IV What Works: A Strategy Taxonomy

The leaderboard encompasses not just the choice of underlying but also the choice of entry strategy — the signal logic that determines when the covered call position is opened. Twenty-two distinct entry strategy variants were tested, ranging from the simplest possible rule (cc_always: enter on every available date) to complex multi-signal composites. The results across entry strategies are, by comparison to the ticker variation, surprisingly uniform — but not entirely without structure.

cc_always

Unconditional entry. Sell a covered call whenever no position is open. The no-skill baseline.

Works on good tickers
cc_bbsq

Bollinger Band squeeze filter. Enter when realised volatility is contracting — implied premium is relatively rich to near-term realised.

Top-ranked variant
cc_vrp

Volatility Risk Premium gate. Enter when implied volatility exceeds recent realised volatility by a threshold. Explicitly targets the VRP structural edge.

Consistent top-5
cc_vix

VIX-regime filter. Enter when market-wide fear is elevated — premium is structurally richer during high-VIX environments.

Best for LMT (#1)
cc_earn

Earnings adjacency filter. Enters specifically around earnings calendar windows when implied vol is inflated above realised.

Best for LMT #4
cc_preern

Pre-earnings premium capture. Enters the week before earnings when implied vol is rising into the event, exits before the event resolves.

High WFE (7.92×)
strangle

Short OTM call + short OTM put. Collects premium on both sides. Unlimited downside risk if the underlying makes a large directional move.

Catastrophically fails
iron_condor

Four-leg defined-risk structure. Short OTM put spread + short OTM call spread. Theoretically superior capital efficiency.

Fails universally
vertical_put

Bull put spread. Short OTM put + long lower-strike put. Defined risk, lower premium than naked put.

Fails universally
cc_gold

Gold standard consensus: requires agreement across all major directional agents. Extremely selective entry — few trades fire.

Mixed (high WFE)

The structure results deserve separate attention. Iron condors, strangles, and vertical put spreads failed uniformly across all tickers — not merely underperforming but actively destroying capital in both in-sample and out-of-sample periods. The worst single result in the entire dataset, ONDS/strangle at −39.95, is a multi-leg structure. This is counterintuitive: iron condors and spreads are frequently advocated as superior to covered calls because they offer defined risk and capital efficiency. In the Stack$Trader dataset, they do not outperform — they catastrophically underperform, even on the tickers where the simple covered call excels.

The most likely explanation is execution realism. Multi-leg structures incur double the bid-ask spread friction, require simultaneous fills across multiple legs, and create complex delta-management problems when the underlying moves against one leg. The synthetic option model used here, which constructs positions from yfinance data rather than live quotes, may not capture the full cost of entering and exiting four-leg structures. The negative results should be treated as a warning flag rather than a definitive verdict on multi-leg strategies as a category.

V The Bottom of the Table

The bottom five results are as instructive as the top five. They fall into two categories: assets with structural incompatibility with premium selling, and structures with execution flaws that no amount of signal optimisation can overcome.

Rank Ticker Strategy OOS Sharpe IS Sharpe Diagnosis
923 ONDS strangle −39.95 −7.07 Low-float biotech + naked strangle = extreme gap risk
922 SLV strangle −10.30 −5.50 Even a top cc ticker fails with wrong structure
921 SLV iron_condor −4.01 −7.03 Defined risk but negative in both periods
920 TSM iron_condor −3.75 −7.57 Semiconductor volatility incompatible with condor
919 LRCX strangle −3.57 −25.58 LRCX IS Sharpe of −25.58 suggests structural data issue

The ONDS/strangle result at −39.95 is the statistical equivalent of a car crash: it tells you something important happened, and the investigation is as informative as the result itself. ONDS is a small-cap biotech — a category with routine binary event risk from FDA approvals, clinical trial announcements, and financing events. Selling a naked strangle on such an asset creates a payoff profile that is short enormous gap risk on both sides for a premium that is structurally insufficient to compensate. The in-sample Sharpe of −7.07 confirms this is not an OOS anomaly; the strategy was losing money during training as well and should never have been deployed.

The SLV/strangle result at −10.30 makes the more interesting point. SLV ranks second overall at Sharpe 4.27 when a covered call strategy is applied, and third at 4.27 with a different entry variant. The same asset, on the same data period, produces a top-two result and a near-bottom result depending solely on the choice of structure. Asset selection and structure selection are not independent decisions — they interact multiplicatively. The worst thing a researcher can do is identify a strong underlying and then choose the wrong structure for it.

VI Parameter Convergence: The Hidden Message
Convergence

When every bucket of a grid search converges on identical optimal parameters, the grid search has found a structural optimum — not a local one fitted to the bucket's specific data. This is the strongest possible evidence that the parameters are real.

One of the most significant findings of the Phase 6 search is not in the Sharpe ratios but in the parameters. Across all tickers, all time buckets, and all strategy variants, the grid search converged on essentially identical optimal values:

Optimal parameter convergence — Global grid search, N = 923 strategies, 77,760 folds delta_target = 0.20 (OTM call strike selection) DTE_minimum = 14 (minimum days to expiry at entry) entry_threshold = −0.15 (supervisor score floor for entry) close_fraction = 0.25 (close at 25% of max premium received) All four parameters converge independently across every ticker bucket and time period tested. No bucket produces a meaningfully different optimum.

This convergence across independent subsets of the data is the statistical equivalent of replication. The optimal parameters are not the product of fitting to a particular market regime or ticker microstructure — they appear to describe an invariant of the strategy itself. Delta 0.20 balances premium income against assignment risk at the point where the expected value of the trade is maximised across a wide range of volatility environments. DTE 14 captures the steepest region of the theta decay curve while staying far enough from expiry to avoid gamma risk. A threshold of −0.15 (entry when the ensemble score is above this floor) provides meaningful filtering without excessive selectivity.

The implication is practical and important: the system does not require per-ticker parameter optimisation. A researcher who discovers a new candidate ticker can apply the universal parameters as a starting point with high confidence, reserving the grid search for validation rather than exploration. The variation between tickers is not in how the strategy is parameterised — it is in whether the underlying asset is structurally suited to the strategy at all.

VII What the Leaderboard Actually Proves

The WFE of 3.886 — the portfolio-level ratio of OOS to IS Sharpe — is the most important single number in the entire dataset. It is commonly assumed in the backtesting literature that sophisticated strategies will overfit: that as the number of parameters and tested variants increases, the gap between in-sample and out-of-sample performance will widen. In the Stack$Trader leaderboard, the opposite occurs. Adding complexity (more agents, more entry variants, more tickers) consistently improved out-of-sample performance relative to in-sample performance, not degraded it.

This is explicable only if the edge being captured is structural rather than statistical. The permutation test result, reported in the companion article "Selling Insurance at Scale," confirms this directly: theta-normalised permutation p = 1.000, meaning that 100% of the strategy's returns are attributable to the structural theta decay and volatility risk premium, with zero contribution from entry timing. A structural edge does not overfit because it does not depend on fitting. The calendar does not care how many parameter combinations were tested; time passes, options decay, and the VRP pays its premium whether or not the researcher tested 923 configurations or nine.

The leaderboard is, in the end, an asset selection tool as much as a strategy validation tool. It establishes which tickers carry the structural properties that allow a systematic covered call strategy to function, and which do not. The list of winners — defence, energy, metals, income — is not a commentary on those companies' business prospects. It is a commentary on their option market dynamics: moderate and persistent volatility risk premia, low directional momentum that would cap the covered call upside, and stable implied-to-realised volatility ratios that make premium income predictable across market regimes. These are the properties that the leaderboard selects for, and they turn out to concentrate in exactly the sectors that most sophisticated options traders have discovered empirically over decades of practice. The system, in nine hundred and twenty-three trials, arrived at the same answer.

· · ·