Shiv Gupta

Snowflake Data Cloud World Tour: Recap

· Shiv Gupta

There have been some exciting developments at Snowflake. I had the opportunity to be at Snowflake’s Data Cloud World Tour in Toronto. Snowflake has continued to evangelize the notion of bringing application workloads to the data rather than the other way around. Unsurprisingly, the agenda largely centered around this overarching theme. I write my thoughts on a couple of topics I found interesting below.

Snowpark Container Services

Traditionally, developers have hosted applications on on-premise or public cloud providers and brought in data into infrastructure hosted close to the apps – say, for example, an API hosted on EC2 calling a Snowpark API to run aggregations on a Snowflake table before returning an HTTP response.

With Snowpark Container Services, developers can host any containerized application workload – an API, an LLM, a React app – inside of Snowflake. And because resources created within Snowpark Container Services are first-class citizens within Snowflake’s data governance and security primitives, teams can benefit from the added control and security over their data and minimize risks associated with moving sensitive data from one platform to another.

So what?

This enables some exciting use-cases. Organizations looking to use Generative AI with sensitive data can host entire LLMs inside their Snowflake infrastructure, obviating the need to move sensitive data to an external vendor or platform outside of the organization’s data governance and security control plane. What this means is developers can write a SQL query to prompt a Llama-2 Chat model hosted entirely in Snowflake1. Here’s an example, to make the point abundantly clear:

select 
    
    customer_id,
    
    survey_response_text,
    
    -- llama_prompt is a UDF that calls a Llama model 
    -- hosted inside Snowflake
    llama_prompt("
        Classify the following survey response as good, 
        bad, or neutral: " || 
        survey_response_text
    ) as response_classification

from survey_responses

Teams are not limited to inference, either: for domain-specific generative AI use-cases, teams can run a containerized workload in Snowflake to fine-tune Llama using data already in Snowflake, and then host the fine-tuned version for inference.

There are other second-order effects to customers hosting containerized applications in Snowflake. Application logs, for example, are likely going to start adding to customers’ storage bills. Providing customers with tools to manage the additional complexity associated with running containerized workloads and keeping cloud spends in check is going to be key.

Snowflake Native Apps

Snowflake Native Apps enable a whole host of use-cases for data providers to commercialize their data to Snowflake Customers. Additionally, it allows vendors to host their applications within end-customers’ Snowflake accounts. It dovetails nicely with Snowpark Container Services, allowing builders to securely expose functionality to customers without compromising on security for either party.

Hex, for example, plans to allow customers to pick Snowflake as a deployment option for customers in addition to multi-tenant clouds and single-tenant VPCs on AWS. I anticipate data observability and data integration vendors to follow suit to provide more deployment flexibility for customers.

Closing Thoughts

The narrative of Snowflake as a Data Cloud is in full-swing with the developments around Snowflake Container Services and Native Apps. Almost a third of the Forbes 2000 companies is a Snowflake customer. As a customer and a data practitioner, I’m going to be observing the extent to which these organizations add to Snowflake’s bottom line with significant upfront commitments utilizing Snowpark Container Services and other offerings.


  1. This is possible today with External Functions, with some limits. Running inference on a large language model without your data leaving Snowflake is made far easier with Snowpark Container Services. ↩︎