Kalshi's first research report is out: How collective intelligence outperforms Wall Street think tanks in predicting CPI

Dec 24, 2025 14:04:33

Share to

This article is from: Kalshi Research

Compiled by｜Odaily Planet Daily Azuma

Editor’s note: The leading prediction market platform Kalshi announced yesterday the launch of a brand new research column, Kalshi Research, aimed at providing Kalshi's internal data to scholars and researchers interested in topics related to prediction markets. The first research report of this column has been released, titled " Kalshi Outperforms Wall Street in Predicting Inflation" (Beyond Consensus: Prediction Markets and the Forecasting of Inflation Shocks).

The following is the original content of the report, translated by Odaily Planet Daily.

Overview

Typically, analysts and senior economists from large financial institutions provide estimates of expected values in the week leading up to the release of important economic statistics. These predictions, when aggregated, are referred to as "consensus expectations," which are widely regarded as an important reference for insights into market changes and position adjustments.

In this research report, we compare the performance of consensus expectations with the implied pricing from Kalshi's prediction markets (hereinafter sometimes referred to as "market predictions") in forecasting the actual value of the same core macroeconomic signal—year-over-year overall inflation rate (YOY CPI).

Key Highlights

Overall accuracy advantage: In all market environments (including normal and shock environments), Kalshi's predictions have an average absolute error (MAE) that is 40.1% lower than consensus expectations.
"Shock Alpha": During significant shocks (greater than 0.2 percentage points), Kalshi's predictions have a MAE that is 50% lower than consensus expectations within a one-week forecast window, and this advantage expands to 60% the day before data release; during moderate shocks (between 0.1 - 0.2 percentage points), Kalshi's predictions also have a MAE that is 50% lower than consensus expectations within a one-week forecast window, expanding to 56.2% the day before data release.
Predictive Signal: When the deviation between market predictions and consensus expectations exceeds 0.1 percentage points, the probability of a shock occurring is approximately 81.2%, rising to about 82.4% the day before data release. In cases where market predictions and consensus expectations are inconsistent, market predictions are more accurate in 75% of cases.

Background

Macroeconomic forecasters face an inherent challenge: the most critical moments for predictions—when markets are disordered, policies shift, and structural breaks occur—are precisely when historical models are most likely to fail. Financial market participants typically release consensus forecasts a few days before key economic data is published, aggregating expert opinions into market expectations. However, despite their value, these consensus views often share similar methodological paths and information sources.

For institutional investors, risk managers, and policymakers, the stakes of prediction accuracy are asymmetrical. In non-controversial times, slightly better predictions provide limited value; but during market turmoil—when volatility spikes, correlations break down, or historical relationships fail—superior accuracy can yield significant Alpha returns and limit drawdowns.

Thus, understanding the behavior of parameters during periods of market volatility is crucial. We will focus on a key macroeconomic indicator—the year-over-year overall inflation rate (YOY CPI)—which is a core reference for future interest rate decisions and an important signal for measuring economic health.

We compared and evaluated the predictive accuracy across multiple time windows before the official data release. Our core finding is that the so-called "Shock Alpha" indeed exists—i.e., during tail events, market-based predictions can achieve additional predictive accuracy compared to consensus benchmarks. This outperformance is not merely of academic interest; it can significantly enhance signal quality at critical moments when predictive errors carry the highest economic costs. In this context, the truly important question is not whether prediction markets are "always right," but whether they provide a differentiated signal that is worth incorporating into traditional decision-making frameworks.

Methodology

Data

We analyzed the daily implied predictions from prediction market traders on the Kalshi platform, covering three time points: one week before data release (aligned with the consensus expectation release), the day before release, and the morning of release. Each market used was (or had been) a real, tradable market, reflecting actual capital positions under different liquidity levels. For consensus expectations, we collected institutional-level YoY CPI consensus forecasts, which are typically published about a week before the official data release by the U.S. Bureau of Labor Statistics.

The sample period spans from February 2023 to mid-2025, covering over 25 monthly CPI release cycles across various macroeconomic environments.

Shock Classification

We categorized events based on the "unexpected magnitude" relative to historical levels. "Shocks" are defined as the absolute difference between consensus expectations and actual published data:

Normal events: YOY CPI prediction errors of less than 0.1 percentage points;
Moderate shocks: YOY CPI prediction errors between 0.1 and 0.2 percentage points;
Major shocks: YOY CPI prediction errors exceeding 0.2 percentage points.

This classification method allows us to examine whether predictive advantages exhibit systematic differences as the difficulty of prediction changes.

Performance Metrics

To assess predictive performance, we employed the following metrics:

Mean Absolute Error (MAE): The primary accuracy metric, calculated as the average of the absolute differences between predicted and actual values.
Win Rate: When the difference between consensus expectations and market predictions reaches or exceeds 0.1 percentage points (rounded to one decimal place), we record which prediction is closer to the final actual result.
Predictive Time Span Analysis: We track how the accuracy of market valuations evolves from one week before release to release day, revealing the value of continuously incorporating information.

Results: CPI Prediction Performance

Overall Accuracy Advantage

In all market environments, market-based CPI predictions have an average absolute error (MAE) that is 40.1% lower than consensus predictions. Across all time spans, the MAE of market-based CPI predictions ranges from 40.1% lower than consensus expectations (one week ahead) to 42.3% lower (the day before).

Moreover, when there is a divergence between consensus expectations and market implied values, Kalshi's market-based predictions demonstrate statistically significant win rates, ranging from 75.0% one week ahead to 81.2% on the day of release. If we include cases where predictions are tied with consensus expectations (to one decimal place), market-based predictions are approximately 85% aligned with or outperforming consensus one week ahead.

Such a high directional accuracy rate indicates that when market predictions diverge from consensus expectations, this divergence itself carries significant informational value regarding "the likelihood of a shock event occurring."

"Shock Alpha" Indeed Exists

The difference in predictive accuracy is particularly pronounced during shock events. In moderate shock events, when release times are aligned, market predictions have a MAE that is 50% lower than consensus expectations, and this advantage expands to 56.2% or more the day before data release; in major shock events, when release times are aligned, market predictions also have a MAE that is 50% lower than consensus expectations, reaching 60% or more the day before data release; whereas in normal environments without shocks, the performance of market predictions and consensus expectations is roughly comparable.

Although the sample size for shock events is small (which is reasonable in a world where "shocks are inherently highly unpredictable"), the overall pattern is very clear: when the predictive environment is most challenging, the information aggregation advantage of the market is most valuable.

However, the more important point is not just that Kalshi's predictions perform better during shock periods, but that the divergence between market predictions and consensus expectations may itself signal an impending shock. In cases of divergence, the win rate of market predictions relative to consensus expectations reaches 75% (within comparable time windows). Additionally, threshold analysis further indicates that when the deviation between market and consensus exceeds 0.1 percentage points, the probability of a shock occurring is approximately 81.2%, rising to about 84.2% the day before data release.

This significant difference with practical implications suggests that prediction markets can serve not only as competitive forecasting tools alongside consensus expectations but also as a "meta-signal" regarding predictive uncertainty, transforming the divergence between market and consensus into a quantifiable early indicator for warning of potential unexpected outcomes.

Derivative Discussion

An obvious question arises: Why do market predictions outperform consensus predictions during shocks? We propose three complementary mechanisms to explain this phenomenon.

Heterogeneity of Market Participants and "Wisdom of Crowds"

While traditional consensus expectations integrate views from multiple institutions, they often share similar methodological assumptions and information sources. Econometric models, Wall Street research reports, and government data releases constitute a highly overlapping common knowledge base.

In contrast, prediction markets aggregate positions held by participants with different information bases: including proprietary models, industry insights, alternative data sources, and experience-based intuitive judgments. This diversity of participants is grounded in the theory of "wisdom of crowds," which suggests that when participants possess relevant information and their prediction errors are not fully correlated, aggregating independent predictions from diverse sources often yields superior estimates.

When macro environments undergo "state switches," the value of this information diversity becomes particularly pronounced—individuals with fragmented, localized information interact in the market, allowing their information fragments to combine into a collective signal.

Differences in Incentive Structures for Participants

Consensus forecasters at the institutional level often operate within complex organizational and reputational systems that systematically deviate from the goal of "purely pursuing predictive accuracy." The career risks faced by professional forecasters create an asymmetric payoff structure—significant predictive errors can incur substantial reputational costs, while even highly accurate predictions, especially those achieved by significantly deviating from peer consensus, may not yield proportionate career rewards.

This asymmetry induces "herding behavior," where forecasters tend to cluster their predictions around consensus values, even when their private information or model outputs suggest different outcomes. The reason is that, within the professional system, the cost of "being wrong in isolation" often outweighs the benefits of "being right in isolation."

In stark contrast, the incentive mechanisms faced by prediction market participants align predictive accuracy directly with economic outcomes—accurate predictions mean profits, while inaccurate predictions mean losses. In this system, reputational factors are virtually nonexistent; the only cost of deviating from market consensus is economic loss, entirely dependent on whether the prediction is correct. This structure exerts stronger selection pressure on predictive accuracy—participants who can systematically identify errors in consensus predictions will continuously accumulate capital and enhance their influence in the market through larger position sizes; whereas those who mechanically follow consensus will suffer ongoing losses when the consensus proves incorrect.

During periods of significantly rising uncertainty, when the career costs for institutional forecasters deviating from expert consensus reach their peak, this differentiation in incentive structures is often most pronounced and economically significant.

Efficiency of Information Aggregation

A noteworthy empirical fact is that even one week before data release—this time point aligns with the typical time window for consensus expectation releases—market predictions still exhibit a significant accuracy advantage. This indicates that the market advantage does not merely stem from the "speed of information acquisition" often attributed to prediction market participants.

On the contrary, market predictions may more efficiently aggregate information fragments that are too dispersed, too industry-specific, or too vague to be formally incorporated into traditional econometric forecasting frameworks. The relative advantage of prediction markets may not lie in accessing public information earlier, but rather in their ability to more effectively synthesize heterogeneous information within the same time frame—whereas consensus mechanisms based on surveys often struggle to efficiently process this information, even with the same time window.

Limitations and Considerations

Our findings come with an important caveat. Given that the overall sample covers only about 30 months, significant shock events are inherently rare by definition, which means that the statistical power for larger tail events remains limited. Longer time series would enhance future inferential capabilities, although current results strongly suggest the superiority of market predictions and the differentiation of signals.

Conclusion

We have documented the significant and economically meaningful outperformance of prediction markets relative to expert consensus expectations, particularly during shock events when predictive accuracy is most critical. Market-based CPI predictions have an overall error that is approximately 40% lower, and during major structural changes, this error reduction can reach about 60%.

Based on these findings, several future research directions become particularly important: first, to investigate whether "Shock Alpha" events themselves can be predicted through volatility and prediction divergence indicators using larger sample sizes across various macroeconomic indicators; second, to determine the liquidity thresholds above which prediction markets can consistently outperform traditional forecasting methods; and third, to explore the relationship between prediction market values and those implied by high-frequency trading financial instruments.

In an environment where consensus predictions heavily rely on strongly correlated model assumptions and shared information sets, prediction markets provide an alternative information aggregation mechanism that can capture state switches earlier and process heterogeneous information more efficiently. For entities needing to make decisions in an economic environment characterized by rising structural uncertainty and increasing frequencies of tail events, "Shock Alpha" may not only represent a gradual improvement in predictive capability but should also become a fundamental component of their robust risk management infrastructure.

Original link

(Source)