There is a particular kind of engineering error that punishes optimism. Not the careless mistake — the forgotten semicolon, the off-by-one index — but the well-reasoned, carefully designed decision that turns out to be catastrophically wrong about one critical assumption. The Databento incident was that kind of error.
The project was less than a week old. The goal was straightforward: download a comprehensive historical archive of U.S. options data, build a backtester, and begin evaluating systematic premium-selling strategies. The data vendor was Databento, a professional-grade market data provider whose OPRA feed — Options Price Reporting Authority — carries every options quote from every registered U.S. exchange, at every strike, at every expiration, for every underlying, updated in real time and archived historically at tick resolution.
The initialization parameters were set with the confidence of someone who had spent two decades specifying process parameters in an Intel cleanroom: 2020–2024 options chains, forty-five tickers, one-minute resolution. Comprehensive. Defensible. Exactly what a rigorous backtesting effort would need.
The resulting download order was thirty-five terabytes. The pending bill: $99,000.
"We've cancelled all jobs on this account."
a Databento support engineer, Databento Engineering — Support Support ticketThe support ticket was opened immediately. A support ticket was opened immediately. Databento's support engineering team reversed the execution load within hours. The five-figure charge was cancelled. No money changed hands.
But the experience had already done something more consequential than generate an invoice. It had exposed a single-point-of-failure dependency at the foundation of the entire project: a trading system whose research pipeline requires access to expensive proprietary data is a system that can be held hostage, accidentally bankrupted, or simply priced out of operation whenever a vendor changes its pricing model. For a private researcher working without institutional backing, that dependency is existential.
broker/synthetic_options.py written
Black-Scholes replication engine generates synthetic options chains from free yfinance OHLCV data. No external API required.
broker/data_providers.py protocol established
DataProvider abstraction decouples the engine from any specific data source, permanently.
The solution was architectural rather than tactical. Rather than finding a cheaper data vendor or negotiating a research tier with Databento, the system was redesigned from scratch around data independence. The result was broker/synthetic_options.py — a Black-Scholes replication matrix that generates historically accurate options chains from free, publicly available daily price data.
The logic is more elegant than it might sound. Black-Scholes option pricing requires five inputs: underlying price, strike price, time to expiration, risk-free rate, and implied volatility. The first four are either directly observed or trivially constructed. For implied volatility, the synthetic engine uses a thirty-day rolling realized volatility calculated from the free price history — not the true implied volatility, but a statistically valid proxy that produces theoretical prices consistent with actual historical option premiums for backtesting purposes. Each ticker generates roughly fifty-four thousand synthetic contracts per run, covering the full expiration calendar across a five-year historical window.
For each ticker and each trading day, generate_synthetic_chain() iterates over the full range of realistic strikes and expirations, computing bs_call() and bs_put() prices, delta, gamma, theta, and vega for each contract. The output is a Parquet file containing ~54,000 rows per ticker — a complete synthetic options market available for any backtesting scenario, generated in minutes from data anyone can access for free.
The DataProvider protocol, built alongside the synthetic engine, enforces a strict interface between the trading engine and whatever data source is currently active. Three implementations exist: YFinanceProvider for free historical OHLCV, VaultExecutionModel for empirical pricing from the live tick archive, and SchwabOptionProvider for live broker connectivity. The engine queries whichever provider is configured and receives standardized data structures regardless of source. The abstraction layer means that swapping from synthetic backtesting to live trading requires changing a configuration parameter, not rewriting any logic in the core engine.
The architectural irony is not subtle. The system was nearly bankrupted by a single data call in its first week, and the direct consequence of that near-failure is that it now generates its own data, owns it completely, and owes nothing to any vendor for its ability to backtest any strategy on any ticker at any historical depth.
- A synthetic options engine generating 54,000 contracts per ticker from free public data — the primary backtesting infrastructure for all seven development phases.
- A DataProvider protocol that permanently decouples the trading engine from any specific data vendor, enabling a clean transition to live broker data when Phase 7 ships.
- A proprietary tick-level data archive accumulated daily by
standalone_recorder.py, which grows in value the longer the system runs — a compounding data asset with zero ongoing acquisition cost. - An architectural instinct — never trust a single data dependency — that shaped every subsequent infrastructure decision in the project.
There is one additional thing the incident produced that does not appear in any configuration file or module: an acute awareness of the difference between a research project and a production system. A research project can afford to be surprised by a $99,000 bill. A production system that manages real capital cannot. The rigor applied to data architecture in Phase 1 — the abstraction layers, the fallback mechanisms, the independence from vendor lock-in — was the first expression of an engineering discipline that would later manifest in kill switches, unit test suites, position-size hard limits, and paper trading ledgers. The Databento incident did not just produce good code. It calibrated the standard of care that the rest of the system would need to meet.
a Databento support engineer reversed the charges. The project got to continue. The lesson cost nothing in the end — except a few hours of elevated concern and a permanent change in how the system thinks about infrastructure dependencies.
That seems like a reasonable price.