Does the scoring actually work?

It used to score uptrend-continuation setups highest. Backtesting showed that underperformed — BUY/STRONG BUY signals lost to stocks the tool labeled AVOID, badly enough that the highest-confidence tier averaged a −0.90% return. We flipped the one variable responsible (trend direction preference, nothing else) and re-validated on a held-out test period the new formula was never tuned against: a consistent +0.22% to +0.24% edge of BUY/STRONG BUY over AVOID, in both halves of a 7-year, 285-ticker, 294k-sample run. We then tested whether all six original indicators were earning their weight - Volume and Stochastic weren't, so the live formula now uses just trend, RSI, MACD, and Bollinger Bands, which matched or slightly beat the original on the same held-out split: +0.192% to +0.250%. The table below shows the live scoring's current numbers. We're leaving this page up because a tool that hides its own backtest — including the part where it was wrong — isn't one you should trust.

Samples analyzed13,920

Baseline avg. 20-day return (any day, any stock)+0.27%

BUY/STRONG BUY avg. return (4,226 signals)+0.35%

Edge over baseline+0.07%

By recommendation tier

Tier	Samples	Win rate*	Avg return (20d)
STRONG BUY	2,230	41.0%	+0.43%
BUY	1,996	40.5%	+0.25%
HOLD	2,147	42.6%	+0.53%
AVOID	7,547	34.8%	+0.16%

By swing score (is a higher score actually better?)

Score range	Samples	Win rate*	Avg return (20d)
0-39	7,584	34.8%	+0.16%
40-59	2,110	42.7%	+0.55%
60-74	1,996	40.5%	+0.25%
75-89	2,106	41.2%	+0.42%
90-100	124	38.4%	+0.67%

How to read this, and where it can mislead you:

* Win rate only counts samples where price clearly hit the target or the stop within 20 trading days. Samples where neither happened ("neither") are excluded from win rate but still count toward average return.
This tests the current S&P 500 constituent list against its own recent history - it doesn't include stocks that were dropped from the index, which can make the past look better than it actually was (survivorship bias).
Two years is one market regime. A model that looks good here can still fail in a different one (e.g. a sustained downturn). The validation numbers above used a 7-year window split into train/test halves specifically to check this, but even 7 years is not every regime.
No fees, slippage, or capital constraints are modeled. Samples overlap by design (this measures statistical association, not a portfolio you could have actually run).
The current scoring went through two rounds of held-out testing: first flipping trend direction (one specific, principled hypothesis, not a parameter search), then checking whether all six original indicators were earning their weight, which is how Volume and Stochastic got dropped - see src/lib/scoring-variants.ts. That's safer than fitting parameters directly to a backtest, but it's still a small number of hypotheses out of many possible ones.
Want the larger run yourself? /api/backtest?universe=all&years=5 runs the live scoring across our full ~285-ticker universe./api/research-backtest?scorer=trend-rsi-only&universe=all&years=7 re-runs the train/test validation against a 2-indicator version we tested but didn't adopt (less consistent train-to-test than the 4-indicator version in use). Both take about a minute and aren't cached on first call with these params.

Last computed: 6/20/2026, 1:09:24 PM · recomputed roughly once a day.