The Path to Data Centricity 1: Why a Data Warehouse
Imagine you are the manager of a large store and you see your net income go down. Is it because your suppliers increased their prices? Because your customers are churning? Because your new products are failing?
Data warehouses were developed for business reporting purposes to support companies' decisions with facts and data. As the word indicates, a data warehouse is a consolidated, structured repository for storing data assets to do reporting. Data warehouses are still built according to the same data modelling principles as before. ETLs transform data according to the same predefined technology approaches (schemas) that are specific to reporting.
Data warehouses have helped companies make decisions by:
- Collecting all meaningful structured data in a central repository
- Formatting the data to be easily analysed
- Allowing the definition of data subsets for specialized reporting
- Allowing the application of governance on data access
- Allowing the application of data quality assessments.
However, with the increasing variety of data and the advent of artificial intelligence, data warehouses have shown their limits:
- Algorithms that require direct data access
- Databases that are not based on SQL
- Data still needs to be transported to feed the warehouse, usually with ETLs (but we will see in a next article that both data hub and lakehouse patterns replace the ETLs with a streaming platform).
We can conclude setting up a data warehouse is the must-have first step for data driven businesses because this gives companies a good understanding of the current status of their business (or at t-minus 1 day).
Euranova has expertise in designing and setting up cloud infrastructures to support a data warehouse. We have modelling expertise in most data warehousing systems (Kimball, Inmon, Data Vault). If you want to know more about our services, you can contact us.
Coming up: The Path to Data Centricity 2: Why a Data Lake
This article is the first in a series of blog posts in which we highlight the differences between storing solutions to support your data-driven strategy. In the next part, we will discuss (governed) data lakes.