Why Do Transformers Fail to Forecast Time Series In-Context?
Conference 2025, Duke University, August, 2025
We provide the first rigorous theoretical analysis of why Transformers underperform on time series forecasting in-context. Under AR(p) data, we prove that linear self-attention cannot beat classical linear predictors in expected MSE, show asymptotic recovery of the optimal linear predictor as context grows, and demonstrate exponential collapse under Chain-of-Thought inference.
