Snowflake Computing delivers a modern, cloud-based data warehousing platform which is giving traditional database vendors a run for their money. Built from the ground up to exploit the capabilities of dynamic cloud infrastructure, Snowflake is available as an entirely managed data warehouse on AWS and Azure.

Snowflake and Databricks Integration

Snowflake and Databricks IntegrationSOURCE: SNOWFLAKE

Databricks is a Big Data company that offers a commercial version of Apache Spark on mainstream public cloud platforms including AWS and Azure. Its Unified Analytics product brings together best of the breed tools and technologies to deliver an end-to-end platform for data engineers and data scientists.

Almost every data platform is getting augmented with machine learning to support predictive analytics. Customers of Snowflake and Databricks asked both the companies to deliver an integrated solution that avoids redundant processes to implement analytics. The latest announcement addresses the challenges in building integrated data pipelines that cut across both the platforms.

With Snowflake’s data warehouse as the repository, and Databricks’ Unified Analytics delivering Spark-based analytics, data scientists can train models while analysts can run dashboards, all at the same data, while new data continues to flow into the data warehouse without any downside or disruption.

The integration is available as a connector that brings together ETL, data warehousing, and machine learning without needing to set up, configure and manage complex pipelines.

Data engineers can use the API from Python or Scala to access data from Snowflake. They can exploit the power of Apache Spark by reading Snowflake data as a native Dataframe. They can train powerful machine learning algorithms on the data retrieved from the warehouse and store the output back in Snowflake. Users can rely on familiar SQL syntax to query the data stored in the warehouse.

Without this integration, customers had to deal with complex data transformation pipelines to move datasets back and forth from Snowflake, which results in duplication and redundancy of data. With Databricks API, Snowflake becomes a first-class data source to build real-time pipelines based on Apache Spark and other ML frameworks such as TensorFlow.

Every database vendor is moving towards the integration of machine learning with the database engine. Recently at Google Cloud NEXT, Google’s BigQuery got the support for machine learning. Microsoft is embedding an ML engine in SQL Server, its flagship database that runs both in the cloud and enterprise data center.

The partnership between Snowflake and Databricks is a welcome sign. It brings best of both the worlds through the combination of an enterprise data warehouse and predictive analytics platforms.

loader