Databricks is excited to partner with OpenAI on GPT-5.5, their latest frontier model. GPT-5.5 is OpenAI’s strongest frontier model for agentic work in enterprise, complex document reasoning, and long-horizon coding agents. GPT-5.5 also now powers Codex, OpenAI’s coding agent.
GPT-5.5 Features and Benefits
GPT-5.5 is the smartest frontier model yet and the next step toward a new way of getting work done. It understands what you’re trying to do more quickly and can take on more of the work itself. Codex, OpenAI’s coding agent, is now powered by GPT-5.5, with stronger reasoning and execution capabilities for developer workflows.
The same strengths that make GPT-5.5 great at coding also make it powerful for everyday work on a computer. Because the model is better at understanding intent, it can move more naturally through the full loop of knowledge work: finding information, understanding what matters, using tools, checking the output, and turning raw material into something useful.
It can write and debug code, research online, analyze data, create documents and spreadsheets, operate software, and move across tools until a task is finished. Instead of carefully managing every step, you can give GPT-5.5 a messy, multi-part task and trust it to plan, use tools, check its work, recover from ambiguity, and keep going.
GPT-5.5 sets the state-of-the-art performance
To understand how these improvements translate into real enterprise workloads, we evaluated GPT-5.5 on OfficeQA, Databricks’ benchmark for document-heavy, multi-step analytical tasks customers perform every day. OfficeQA, built from 89,000 pages of U.S. Treasury Bulletins, measures a model’s ability to retrieve information across documents, interpret complex tables, and perform precise calculations grounded in real enterprise data.
When given the right documents (OfficeQA Pro LLM with Oracle PDF + Web Search), GPT-5.5 scored 64.66%, a decent jump from GPT-5.4’s 57.14%, representing a ~13% improvement and a new state-of-the-art on this benchmark. This tests the ceiling of what the model can do when retrieval is already handled.
In a full-agent workflow eval (OfficeQA Pro Agent Harness), where the model must find the right documents, parse them, and compute answers on its own using the Codex agent harness, GPT-5.5 scored 52.63%, up from GPT-5.4’s 36.10%. That’s a 46% reduction in errors, showing that GPT-5.5’s gains aren’t just theoretical; they hold up in realistic, end-to-end enterprise workflows.
GPT-5.5 is coming soon to Databricks. Bring frontier reasoning to your enterprise data, securely, and at scale.
