Snowflake, the Data Cloud company, recently announced new advancements to its single, unified platform that makes it easier for organizations to get value from all of their data while continuing to deliver improved performance for customers’ most critical workloads. With new innovations like Document AI (private preview), Snowflake launched a new large language model (LLM) built from Applica’s pioneering generative AI technology to help customers understand documents and put their unstructured data to work. Snowflake also unveiled updates to Iceberg Tables (private preview soon) to further eliminate data silos and allow organizations to use open table formats with fast performance and enterprise-grade governance for both data in Snowflake’s catalogue and data managed by another catalogue. In addition, the new Snowflake Performance Index (SPI) provides increased transparency and metrics around Snowflake’s ongoing performance improvements, revealing that query duration has improved by 15 percent for stable customer workloads since Snowflake began tracking this metric over the last eight months — reinforcing how Snowflake continues to advance price per performance for customers.
Over the next five years, over 90 per cent of the world’s data will be unstructured in the form of documents, images, video, audio, and more according to IDC2. This massive volume of unstructured data is routinely stored by organizations, however gaining valuable insight from it has historically required manual, error-prone processes and limited expert skillsets. Building on Snowflake’s support for unstructured data, Snowflake’s built-in Document AI will make it effortless for organizations to understand and extract value from documents using natural language processing.
Document AI stems from Snowflake’s acquisition of Applica (Sept. 2022) and leverages its purpose-built, multimodal LLM. By integrating this model within Snowflake’s platform, organizations will be able to easily extract content like invoice amounts or contractual terms from documents and fine-tune results using a visual interface and natural language. Customers are using Document AI to help their teams be smarter about their businesses and enhance user productivity in secure and scalable ways. Snowflake is starting with Document AI and plans to expand these capabilities to more types of unstructured data.
As Apache Iceberg continues to grow in popularity as the industry standard for open table formats, Snowflake is making it easier for enterprises to extend the value of the Data Cloud to Iceberg data. With Iceberg Tables, organizations can work with data in their own storage in the Apache Iceberg format, whether that data is managed by Snowflake or managed externally, while still benefiting from Snowflake’s ease of use, performance, and unified governance. This simplifies data management by eliminating the need for organizations to move or copy data between systems, all the while increasing flexibility and reducing costs. In addition, Apache Iceberg’s growing ecosystem of diverse adopters, contributors, and commercial offerings future-proofs storage, preventing vendor lock-in and frequent migrations. Customers like Booking.com are leveraging Iceberg Tables today to bring the power of the Data Cloud to all of their data.
Snowflake’s number one value is to ‘put customers first.’ Snowflake remains focused on delivering continuous innovations with regular releases to improve performance and efficiency, often requiring no action from customers. With this in mind, Snowflake is introducing the new SPI to quantify improvements over time by analyzing actual customer workloads. Based on the SPI, query duration time has improved by 15 per cent since Snowflake began tracking this metric over the last eight months for stable customer workloads in Snowflake. The SPI highlights Snowflake’s commitment to continuously optimizing cost and performance for customers and providing them with more transparency on the quantitative performance impact of platform capabilities and hardware improvements on their production workloads over time.
In addition, Snowflake is advancing its single platform to support a broader set of advanced analytics capabilities including pre-built machine learning functions for SQL users (public preview), and expanding its unified governance and privacy with new data quality metrics and classification features (both in private preview).
“Snowflake’s single platform continues to be core to our innovation strategy, and we’re constantly enhancing it so customers can access, understand, and protect their data seamlessly while benefiting from Snowflake’s leading performance, scale, and governance,” said Christian Kleinerman, SVP of Product, Snowflake. “We’re unlocking a new data era for customers, leveraging AI and eliminating silos previously bound by format, location, and more to revolutionize how organizations put their data to work and drive insights with the Data Cloud.”
Snowflake also announced new innovations that extend data programmability for data science, data engineering, and application development. The company is expanding the scope of Snowpark so developers can unlock broader infrastructure options such as accelerated computing with NVIDIA GPUs and AI software to run more workloads within Snowflake’s secure and governed platform without complexity, including a wider range of AI and machine learning (ML) models, APIs, internally-developed applications, and more. Using Snowpark Container Services, Snowflake customers also get access to an expansive catalogue of third-party software and apps including LLMs, Notebooks, MLOps tools, and more within their account.
Snowflake is simplifying and scaling how users develop, operationalize, and consume ML models, unveiling new innovations so more organizations can bring their data and ML models to life. Snowpark continues to serve as Snowflake’s secure deployment and processing of non-SQL code with various runtimes and libraries — expanding who can build and what gets built in the Data Cloud. It lets builders work with data more effectively in their programming languages and tools of choice while providing organizations with the automation, governance, and security guarantees missing in legacy data lakes and big data environments.