The Honest Backtest — TradeGladiator

Why this document exists

The standard playbook in retail-trading-signal marketing is to show one chart with a perfect entry, attach a 90% win rate that quietly omits sample size, and point you at a checkout. We've watched enough of those companies disappear to know it works once and corrodes everything after.

The alternative is harder and slower: run the same strategy through realistic costs, count every trade, publish the failures by name, and only deploy what survives. This is that document, for our first audit cycle.

6 tested. 1 deployed. 5 retired with names and reasons. That ratio is the contract. Most strategies don't survive realistic costs, and that's the entire point of running the audit.

Methodology

Every strategy was run through three increasingly strict cost regimes. v1 confirms the code compiles and fires. v2 introduces realistic commission, slippage, and small sizing. v3 layers in bug fixes where v2's audit revealed actual code defects.

The promote gate

PF ≥ 1.2 across ≥ 30 trades after realistic costs · drawdown ≤ 5% of equity. Below those numbers we do not deploy, even if v1 looked promising.

Cost regimes

v1: 10% sizing, 0% commission, 0 ticks slippage, default targets.
v2/v3: 2% sizing, 0.05% commission, 2 ticks slippage, custom R:R.

Results — six strategies, three passes

PF = profit factor (gross profit / gross loss). N = trade count. Green-highlighted row is the only strategy that survived all three passes.

Strategy	TF	v1 PF / N	v2 PF / N	v3 PF / N	Decision
LIQUIDITY_SWEEP_REVERSAL	15m	7.12 / 9	1.21 / 43	1.07 / 57	Live · v2
TREND_PULLBACK_EMA	15m	0.93 / 521	0.73 / 261	0.76 / 243	Retired
SMC_FVG_CONTINUATION	15m	1.07 / 1237	0.65 / 779	0.59 / 944	Retired
SMC_BREAKER_RETEST	15m	33.49 / 3	0.59 / 9	0.18 / 6	Retired
OPENING_RANGE_BREAKOUT	5m	0.82 / 101	0.20 / 9	0.31 / 39	Retired
MEAN_REVERSION_VWAP	15m	0.02 / 20	0.68 / 3	0.15 / 567	Retired

Promoted · Live in production

LIQUIDITY_SWEEP_REVERSAL v2

Profit factor 1.21 across 43 trades on SPY 15m · Drawdown $1,720 against the 2% sizing equity curve · Win rate 35% × R:R 2.5. v3 attempted to improve via tighter stop and adjusted R:R; PF dropped to 1.07. The v2 settings remain the production version.

Production scope: SPY 15m only · 2% sizing · 0.05% commission baked in. Cross-symbol generalization (NQ, ES, individual momentum stocks) is Phase 5 work.

The bug fixes that didn't save anything

A v2 audit pass surfaced three real defects in the underlying code. We fixed all three. v3 measures whether those fixes turned losing strategies into winners.

Trend_PullbackEMA — pullbackTouchedBull and pullbackTouchedBear evaluated identically despite their directional names. Fixed with a proper directional check.

SMC_BreakerRetest — OB retry inside the chochLookback window read the current bar's structure, not the structure that existed at CHoCH time. Fixed by snapshotting the structure at trigger time.

OpeningRange_Breakout — the breakout level check fired on every bar that held above the ORB high, with no "entered this break" guard. Fixed with explicit entry flags reset on isNewDay.

None of the fixes flipped a losing strategy into a winning one. Bug fixes can stop a strategy from over-firing — saving costs — but they can't manufacture an edge that isn't there.

Five lessons we're keeping

Realistic-cost backtests are a binary edge filter.

v1 surfaced what looked like three winners. v2 — the same code with realistic commission, slippage, and 2% sizing — left exactly one. If a strategy's profitability lives or dies at the cost layer, it has no edge to begin with.

Bug fixes don't manufacture edge that isn't there.

Three of the v3 strategies had real defects. We fixed all three. None of the fixes turned a losing strategy into a winning one.

Don't over-optimize a working strategy.

v3 LIQUIDITY_SWEEP tightened the stop hoping to improve v2's PF 1.21. Result: 1.07. We reverted. If a strategy is producing edge with healthy sample size, the next move is out-of-sample testing, not parameter tuning.

Sample size of ≥30 trades is the gate.

v1 saw "PF 33.49" (three trades) and "PF 7.12" (nine trades) and called both winners. Both collapsed once N grew. We hold strategies to ≥30 realistic-cost trades before assigning meaning to the profit factor.

Strategy class determines R:R.

Mean-reversion strategies that target VWAP-mid (~1× risk) succeed where the same strategy class targeting 2× risk fails. Don't pick R:R as a universal knob — derive it from the empirical winner-distribution of the strategy class.

Want the next audit when it ships?

Phase 5 expands the universe (NQ / ES / momentum stocks) and adds walk-forward out-of-sample validation. Drop your email and we'll send the next issue when it's published.

No spam. One audit per quarter. Unsubscribe in one click.