Compare Cassi
Understanding the Scoring
The percentage here is derived from Metaculus' Peer Score (based on log-loss Scoring rule) and Forecast Bench's Difficulty-Adjusted Brier Score Scoring rule. The scoring rules themselves are a little difficult, but they can be simplified to show how much better or worse the forecast was than the median forecaster on Metaculus or Forecast Bench.
The reason for these adjustments is that some questions are much easier than others. For example, predicting whether the Earth will still exist tomorrow is easier than what the FTSE 100 will be in 10 years' time. This makes naive comparisons of scoring and accuracy deceptive: for comparison purposes, what matters is how one forecaster performed compared with another on a similar set of questions. You can't really compare accuracy between different sets of questions, it would be like comparing apples to orangutans. So, we calculate the score that Cassi has over the sets of questions and tournaments it forecasts on and compare it to the reference groups already mentioned. The percentage we give is how much better or worse that score is, normalised so every question across all the platforms and tournaments is of equal weight.
| Rank | Team | Organization | Model | Overall | Brier |
|---|---|---|---|---|---|
| 1 | ForecastBench | ForecastBench | Superforecaster median forecast | 70.8 | 0.085 |
| 2 | Cassi-AI | Cassi-AI | Cassi ensemble_2_crowdadj | 67.8 | 0.104 |
| 3 | ForecastBench | Gemini-3-Pro-Preview (zero shot with crowd forecast) | 67.7 | 0.104 | |
| 3 | xAI | xAI | Grok 4.20 (Preview) | 67.7 | 0.104 |
| 5 | Lightning Rod Labs | Lightning Rod Labs | Foresight-32B | 67.2 | 0.108 |
| 6 | ForecastBench | OpenAI | GPT-5-2025-08-07 (zero shot with crowd forecast) | 67.1 | 0.108 |
| 7 | ForecastBench | Anthropic | Claude-3-7-Sonnet-20250219 (scratchpad with crowd forecast) | 66.9 | 0.109 |
| 7 | ForecastBench | xAI | Grok-4-0709 (zero shot with crowd forecast) | 66.9 | 0.11 |
| 7 | ForecastBench | xAI | Grok-4-Fast-Reasoning (zero shot with crowd forecast) | 66.9 | 0.109 |
| 7 | ForecastBench | OpenAI | GPT-4.5-Preview-2025-02-27 (scratchpad with crowd forecast) | 66.9 | 0.11 |
| 11 | ForecastBench | Anthropic | Claude-Sonnet-4-5-20250929 (zero shot with crowd forecast) | 66.8 | 0.11 |
| 11 | ForecastBench | OpenAI | GPT-4.5-Preview-2025-02-27 (zero shot with crowd forecast) | 66.8 | 0.11 |
| 11 | ForecastBench | Anthropic | Claude-Opus-4-5-20251101 (zero shot with crowd forecast) | 66.8 | 0.11 |
| 14 | ForecastBench | xAI | Grok-4-1-Fast-Reasoning (zero shot with crowd forecast) | 66.7 | 0.111 |
| 15 | ForecastBench | Anthropic | Claude-Opus-4-1-20250805 (zero shot with crowd forecast) | 66.6 | 0.112 |
| 15 | ForecastBench | Gemini-2.5-Pro (zero shot with crowd forecast) | 66.6 | 0.112 | |
| 15 | Cassi-AI | Cassi-AI | Cassi ensemble_2 | 66.6 | 0.111 |
| 18 | ForecastBench | OpenAI | O3-2025-04-16 (zero shot with crowd forecast) | 66.5 | 0.112 |
| 19 | ForecastBench | Gemini-2.5-Pro-Preview-03-25 (zero shot with crowd forecast) | 66.4 | 0.113 | |
| 20 | ForecastBench | OpenAI | GPT-5-Mini-2025-08-07 (zero shot with crowd forecast) | 66.3 | 0.114 |
| 20 | ForecastBench | OpenAI | O4-Mini-2025-04-16 (zero shot with crowd forecast) | 66.3 | 0.114 |
About Forecastbench's #1 Spot
Forecastbench's Superforecaster median forecast is the score of the best collections of elite human Superforecasters. This median is beyond any individual human's best performance.
Forecasts
Tournaments
-
— 26th Apr, 2026
-
— 11th Apr, 2026
-
— 29th Mar, 2026
-
— 15th Mar, 2026
-
— 28th Feb, 2026
-
— 4th Feb, 2026
-
— 16th Mar, 2026
-
— 20th Dec, 2025
-
— 22nd Nov, 2025
-
— 8th Nov, 2025
-
— 25th Oct, 2025
-
— 31st Dec, 2025
-
— 31st Dec, 2025
-
— Ongoing
-
— Ongoing
-
— Ongoing
-
— Ongoing
-
— Ongoing
-
— Ongoing