What is the future of data warehousing with relation to the cloud? originally appeared on Quora: the place to gain and share knowledge, empowering people to learn from others and better understand the world.
The days are numbered for on-premise data warehousing solutions like Teradata, Oracle, SQL Server and DB2. Cloud data warehouse solutions like Google BigQuery, Snowflake, Redshift and Azure SQL/DW are cloud hosted data warehouses based on modern architectures that do away with the traditional management headaches associated with data warehouse technologies. Increasingly, we will see these new cloud data warehouses take market share from legacy, on-premise alternatives. Analytics workloads will power the first wave of migration to cloud data warehouses. Once enterprises see the value of leveraging the cloud and have properly dealt with the differences in managing costs there, we’ll see OLTP types of workloads also migrate to the cloud.
Also, we will see more and more use of the cloud file system as a data lake. Amazon S3, Azure ADLS, and Google Cloud Storage are all becoming suitable as a cloud data lake storage layer. Each respective cloud vendor is adding query capabilities on top of this raw storage via Hadoop or their data warehouse offerings (as external tables) that make the distributed file systems good alternatives to loading data into a warehouse. Using the cloud data lake as the primary storage layer will lead to more “just in time” or “schema on read” data architectures and less traditional ETL and data movement that has been the hallmark of data warehousing to date.
Increasingly, organizations will adopt SaaS applications instead of on-premise, traditional enterprise software. This trend has some major implications for data architecture and analytics. As more and more data moves into SaaS vendor clouds, enterprises will find themselves unable to tap into and blend that data across vendor silos. Virtualization is a key strategy to reunite this data into a logical layer for analytics and enterprises should plan to adopt a data architecture that will insulate themselves from this disruption.