Methodology

Data source

All backtests referenced in public CRYPTYX documentation are generated by the composite metric slicer at /api/metrics/slicer/composite. The slicer takes a set of two to four metric conditions (z-score thresholds across any of the eight factor classes) and returns, per asset, the set of trigger days on which every condition held simultaneously, together with forward log returns at eight holding horizons: 1d, 7d, 14d, 30d, 60d, 90d, 180d, and 365d.

No filtering, adjustment, or editorial smoothing is applied on the publication side. The numbers shown in integration scorecards are the numbers the slicer returns.

Sharpe, hit rate, drawdown

Sharpe is computed from the distribution of forward log returns at the stated horizon: mean divided by standard deviation, annualised by the square root of the trading-day count in a year. It is reported without a risk-free-rate adjustment because holding periods are short relative to cash alternatives in the asset class.
Hit rate is the fraction of episodes with positive forward return at the stated horizon. For short-side strategies, short win rate is reported instead — the fraction of episodes with negative forward return.
DD P95 is the 95th percentile of intra-episode drawdown across the episode set. Each episode's drawdown is the worst peak-to-trough fall between trigger day and horizon end, measured against daily highs and lows rather than closes. This is the single statistic we use for position sizing — it answers “across historical triggers, what was the drawdown that only 5% of episodes exceeded?”

Horizon selection

A single strategy produces eight rows of statistics — one per holding horizon. Scorecards quote one row per strategy, selected by best risk-adjusted return (highest Sharpe for long strategies, most-negative Sharpe for short strategies), not by highest hit rate.

We deliberately avoid selecting by hit rate. At longer horizons (60d and above), consecutive episodes overlap in calendar time, so their forward returns are correlated rather than independent. A high hit rate across overlapping episodes often reflects sample overlap rather than predictive edge; Sharpe — because it penalises return volatility — is less vulnerable to this effect but not immune to it. Neither statistic is a clean proxy for out-of-sample performance.

Episode independence

The slicer returns the raw trigger count at each horizon. Episodes are not adjusted for calendar overlap. A 30d holding period with 52 episodes over a five-year window implies approximately sixty non-overlapping 30d windows available, so the reported count is close to the independent sample size. A 180d holding period with the same 52 episodes implies roughly ten non-overlapping windows of the same length — the effective sample size is far smaller than the reported count.

This is why CRYPTYX scorecards tend to quote horizons of 30d or shorter. It is not a statement about where the best edge lives; it is a statement about where the statistic is most defensible.

What appears in public scorecards

The agent's runtime library includes every strategy we consider tradeable. The public scorecards do not. A strategy appears in a public scorecard only if the composite slicer returns enough trigger episodes at a short-to-medium horizon to quote a single statistic meaningfully.

Strategies are excluded from the public scorecard — though still run by the agent in production — when any of the following apply:

The composite has fired fewer than the threshold number of times historically, so the Sharpe computed over the sample is dominated by variance rather than signal.
The best risk-adjusted horizon is 60d or longer, where overlap makes a single quoted statistic misleading without a paragraph of caveats that does not belong in an integration README.
Risk-adjusted return at the best horizon is negative — the strategy is preserved in the library as an anti-signal or fade-candidate input to the conviction score, but publishing it with a negative Sharpe invites comparison to strategies that were never meant to be headline edges.

This is an editorial rule about publication, not about the agent's behaviour. The agent's own decision to trade a strategy depends on composite-level conviction scoring — not on whether the strategy is listed in the public scorecard.

Asset scope

Some factor classes are defined only for specific assets:

CORR-class metrics compare an asset's return stream against a reference (typically BTC or ETH) and are undefined when the target asset is the reference itself.
OPT-class metrics are derived from options markets and are available for BTC and ETH only.

Where an integration scorecard reports an ETH row instead of a BTC row, or vice versa, it reflects this scope — not asset preference.

Currency

The backtest statistics on CRYPTYX integration pages are regenerated against live historical data on the same API the agent uses at runtime. We do not maintain a separate publication dataset or a fixed marketing snapshot. New historical data lands daily, so numbers may shift marginally between visits — the delta is a feature of the underlying research surface, not a revision.