The future of data lakehouse for the agentic era

Traditional lakehouses were engineered for the era of reporting, not the high-velocity, multimodal demands of AI agents. To bridge this gap, architecture must evolve into an AI-native foundation — one that replaces batch processing with continuous feedback loops and live data streams. This shift gives agents the reliable context they need to transform raw data into action and unlock all enterprise data (structured and unstructured) across cloud boundaries.

Today, we announced our next-generation cross-cloud Lakehouse that delivers four core breakthroughs:

Fully managed Iceberg storage with enterprise-grade features, giving you the benefits of open-source flexibility plus performance, scale, governance, and multimodal processing.
New cross-cloud interoperability, bringing Google’s high-performance, scalable foundation and AI capabilities to your data, supporting an expansive data ecosystem.
A high-performance Apache Spark experience, accelerating your data science workloads with exceptional performance and your choice of developer environments.
AI-powered, always-on context, enabling AI agents to reason in real time across operational and analytical data.

This agentic-first lakehouse approach can deliver an estimated 117% ROI with payback in under six months. Spotify is already unlocking innovation with Google Cloud’s Lakehouse.

“Spotify is leveraging Google Cloud’s Apache Iceberg products as part of our efforts to build a truly modern data lakehouse that removes the silos between our data lakes and warehouses. This architecture provides us with an interoperable and abstracted storage interface, allowing our teams to process the same data across BigQuery, Dataflow, and other open-source engines without duplication. It will simplify our governance and unlock the ability to innovate at a scale that was previously impossible.”
— Ed Byne, Product Manager, Spotify

Accenture, a key partner in this journey, sees this as a fundamental shift in how enterprises operate:

“To reinvent enterprise operations, organizations must collapse the data boundaries that fragment their intelligence. By utilizing the Google Cloud Lakehouse and ‘zero-copy’ innovation, we can help customers activate agentic AI with surgical precision. Whether leveraging high-performance Apache Spark for complex data science or delivering scale for industries like retail and life sciences, this AI-native foundation transforms trapped data into real-time action.”
— Scott Alfieri, Global Lead, Accenture Google Business Group

Openness without compromise

With Google Cloud’s unique, vertically integrated infrastructure, you get the open-source flexibility of Apache Iceberg backed by a fully integrated, managed data-to-AI experience. You can manage all your multimodal data with unified governance, and get your data estate ready for agents with always-on context. By connecting your Iceberg tables directly to engines like BigQuery and Managed Service for Apache Spark, you can accelerate your AI workloads in real time.

Today, we announced four new innovations to make your Iceberg experience on Google Cloud even stronger:

Fully managed Iceberg storage with read/write interoperability: Experience unified Apache Iceberg tables, managed via the Lakehouse runtime catalog (formerly BigLake metastore). This provides read and write interoperability between BigQuery and Managed Service for Apache Spark including Iceberg-compatible OSS engines like Spark, Trino, and Flink, and third-party engines like Databricks and Snowflake (Preview).
The power of BigQuery with Iceberg: Access advanced runtimes, automatic table management, partitioning, multi-table transactions, and history-based optimization for REST catalog tables (Preview) and Iceberg tables managed by BigQuery Catalog (GA).
A unified multimodal foundation: Use BigQuery ObjectRefs (GA) to merge unstructured data in Cloud Storage with structured data in Iceberg. This simplifies multimodal analysis and manages conversational insights through BigQuery AI.
Unified management and governance: Enhance enterprise trust with open lakehouse governance (Preview) via Knowledge Catalog (formerly Dataplex). Securely ground your agents with business context using end-to-end data lineage, search, quality profiling, and table-level access controls for your Iceberg estate.

Cross-cloud power without the friction

Cross-cloud is today’s enterprise reality, and your agents need a scalable solution to work across data, no matter where it is. Production cross-cloud data access often fails to scale due to high egress overheads and performance bottlenecks. So we are introducing a new high-performance, scalable cross-cloud experience by bringing Google’s AI capabilities to your AWS and Azure data that delivers similar price-performance characteristics to cloud-native solutions:

A new AI-native cross-cloud lakehouse: Lakehouse cross-cloud interconnect and cross-cloud caching (Preview) provides BigQuery and Managed Service for Apache Spark (formerly Dataproc) with high-performance access to AWS Iceberg data at scale. This new capability, powered by high-throughput, low-latency, cross-cloud connectivity and advanced cross-cloud query processing innovations, delivers price-performance characteristics similar to AWS-native data platform solutions. You can run Gemini-powered use cases, such as building agents in Gemini Enterprise and BigQuery AI functions, over your Amazon S3 Iceberg data.
An interoperable ecosystem powered by open standards: To help you easily discover and analyze all your enterprise data across any engine or cloud, we are launching Lakehouse catalog federation (Preview) for AWS Glue, Databricks, SAP, and Snowflake, with Confluent Tableflow coming later this year. This foundation enables simple access to data across clouds through BigQuery and Managed Spark, supported by an expanding partner ecosystem that includes bi-directional access for Databricks, Oracle Autonomous Database, and Snowflake pipeline support for dbt, serving with Clickhouse, and catalog integrations with Atlan and Datahub (preview). And, advanced Lakehouse Governance (Preview), guarantees that security protocols and access permissions are immediately enforced throughout this unified environment.

High-performance Spark, built for enterprise scale

Managed Service for Apache Spark offers a unified, high-performance experience that accelerates everything from data engineering to agentic AI development. It empowers data teams to extract maximum value from their enterprise data without friction by delivering key advantages:

Frictionless, agentic data science: Customers can gain a highly flexible data science environment, integrating Colab Enterprise, Gemini Enterprise, and local IDEs with BigQuery and managed Spark. This allows developers to run Python, Spark, and SQL on a single, unified copy of data, eliminating movement and optimizing engine choice while the Lakehouse runtime catalog Iceberg REST catalog endpoint (Preview) automates table management.
Better Spark processing: Lightning Engine for Apache Spark delivers up to 2x the price-performance over the leading high-speed Spark alternative. This engine uses vectorized execution, intelligent caching, and optimized I/O to provide industry-leading performance on Iceberg, Parquet, and Delta formats without requiring any code changes.

Built for the scale and speed required by agentic AI

Google Cloud’s Lakehouse is a high-performance, real-time foundation businesses can use to scale in the agentic era. We use AI to discover hidden relationships within your enterprise data to provide 24/7 curated context for your agents, making it easier to activate them instantly using databases with new integrations. This includes:

Always-on context for agents: Knowledge Catalog (formerly Dataplex) builds a unified foundation by aggregating business context from your entire data landscape, including Iceberg. It delivers continuous enrichment by learning how your enterprise actually uses data, using Smart Storage to automatically map complex relationships within unstructured files. To guarantee trust and relevance, it uses access-control-aware, high-precision search powered by Google Search innovations. This instantly identifies trusted context and feeds it to AI agents, ensuring they deliver reliable, grounded results.
Ready-to-use BigQuery and Looker agents: BigQuery provides built-in Conversational Analytics, a Data Engineering Agent, and a Data Science Agent (Preview) that work with your cross-cloud, multimodal data. Looker Conversational Analytics provides insights in natural language to your business users. You can build your own agents with your lakehouse as the foundation with Google-native tools like Agent Developer Kit (ADK) and Model Context Protocol (MCP).
Real-time analysis and agentic activation of operational data: Integrate operational data into your lakehouse. Spanner, AlloyDB, and Cloud SQL support real-time change replication into BigQuery (GA) and Iceberg (Preview). Analytical data in Iceberg can also be served with low latency using AlloyDB and Spanner (Preview).

Get started today

Built for the AI era, our cross-cloud Lakehouse delivers uncompromising open Iceberg storage, enables multi-cloud interoperability, and equips your AI agents with always-on context for closed-loop activation. Learn more and start building today.

Latest post

Elevating Austria: Google invests in its first data center in the Alps.

They Made D4vd a Star. Now They Want Him Convicted of Murder

A Practical Guide to Optimizing Hosting Deployment

The future of data lakehouse for the agentic era

Elevating Austria: Google invests in its first data center in the Alps.

Meta To Train AI Using Employee Mouse And Keyboard Data

Roo Code pivots to cloud-based agent, says IDEs aren’t the future of coding

Day 1 at Google Cloud Next ‘26 recap

Elevating Austria: Google invests in its first data center in the Alps.

They Made D4vd a Star. Now They Want Him Convicted of Murder

A Practical Guide to Optimizing Hosting Deployment

Today’s NYT Strands Hints, Answer and Help for April 23 #781

Elevating Austria: Google invests in its first data center in the Alps.

They Made D4vd a Star. Now They Want Him Convicted of Murder

A Practical Guide to Optimizing Hosting Deployment

Today’s NYT Strands Hints, Answer and Help for April 23 #781

Latest post

The future of data lakehouse for the agentic era

Openness without compromise

Cross-cloud power without the friction

High-performance Spark, built for enterprise scale

Built for the scale and speed required by agentic AI

Get started today

Related Posts