The Metric
Time-to-Parity (TTP) is the number of days between the first frontier model to cross a SWE-bench Verified threshold and the first open-weight (downloadable, locally runnable, no guardrails) model to match. We track seven thresholds from 49% to 94%. The trendline shows TTP compressing from 440 days to 106 days — a 4.15× compression in roughly two years.
Forward Projection Methods
We project when non-restricted models will cross the Mythos parity thresholds using two methods. Toggle between them to compare.
Leading-Edge (Primary). Tracks the best non-restricted model score at each point in time (the “envelope”). Directly answers the threat question: how fast is the best available model advancing toward Mythos-level capability? Higher R² but smaller sample (N=5–6). This is the method used in the headline projection.
GPQA Diamond — projection to 94.5%
SWE-bench Pro — projection to 77.8%
CyberGym — projection to 83.1%
We compute everything from five independent benchmark sources across three dimensions (reasoning, software engineering, cybersecurity). The full model roster is available in the Model Explorer.
Access Model Categories
- Restricted. Not available to end users. Currently: Claude Mythos Preview (93.9% SWE-V, 100% Cybench, 83.1% CyberGym). Withheld from general release on explicit cybersecurity grounds. A threat actor cannot obtain this capability today.
- Frontier. Available via API with logging, rate limits, and terms of use. Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro. A threat actor can use these but leaves an audit trail and is subject to guardrails.
- Open-weight. Downloadable model weights. Runnable locally with no logging, no guardrails, no audit trail. GLM-5.1, MiniMax M2.5, Kimi K2.5, DeepSeek V3.2. This is the threat-relevant category. Once downloaded, it is decentralized intelligence — the lab that created it has no ability to restrict or monitor use.
Why three categories matter: On CyberGym, an open-weight model (GLM-5.1, 68.7%) has already surpassed the best frontier models (Opus 4.6, 66.6%; GPT-5.4, 66.3%). The only model ahead is restricted (Mythos, 83.1%). Collapsing frontier and open-weight into one “publicly available” category hides the fact that the most security-relevant capability tier — unmonitored, decentralized — is already ahead of the monitored tier.