You are currently viewing Data Warehouse

Data Warehouse

(Big) Data management can be a great asset for a company if you carefully plan and execute a data strategy based on a clear Data Roadmap. With all the available data extraction, storage, modeling, and analysis opportunities, data can give you an edge over your competition. Doing nothing with data will do the opposite: lagging behind the competition, missing opportunities, and losing grip on an organization that demands agility in this era of information technology and a 24/7 global economy.

Nowadays, starting with query dumps and ad-hoc requests by management for a few KPIs/Dashboards is not the right approach if you want to create an agile decision-making culture based on a continuous flow of available data.

Without a plan and clear vision of corporate management, data and data management can hurt a company more than it does any good. An unstable data architecture with inconsistent (defragmented/incorrect) data does not support a company’s decision-making process. On the contrary: it leads to confusion, inefficiencies, and disinformation.

So, where do you start? When I started with data and Business Intelligence, this question popped up. The first question you must ask yourself is, “what does my company require?” Suppose your company is not actively executing a strategy of data management. In that case, the answer I would give is to start with a small and solid (but scalable) data architecture foundation, able to manage the expansion of data flows and data requirements/applications: start small.

When you start small, you don’t immediately require complex Data Lakes, Data Lake-Houses, Cloud data platforms, and Data-Meshes. An Enterprise Data Warehouse is more than sufficient, and you might not even have to shift to another data management approach.

In this post, I’ll explain the basics of a Data Warehouse and when this solution is the right fit for your company. I’ll also show the disadvantages and limitations of a Data Warehouse and my approach to setting up a Data Warehouse.

Origin of Data Warehouses

One of the first data storage concepts started in the 1980s and had its roots at IBM. Researchers Barry Devlin and Paul Murphy were responsible for developing the first business Data Warehouse. Their goal was to create an architectural model for data flow from operational systems to environmental decision support.

The concept of data storage evolved in the development of actual Data Warehouses during the 1990s and 2000s because of the immense cultural and technological changes that took place: computerization, globalization, and networking. Because of this, more data became available. More data sounds good, but only if this data is structured and organized. Corporations did not store data in a structured way yet, which meant that they had to cope with lousy system integration of data in which inconsistent and fragmented data was stored, making it impossible to generate helpful business information required for decision-making. To structure the data, corporations developed a solution to support them in consolidating the data they took from all databases. This structured data could help them in their decision-making: the Data Warehouse.

What is a Data Warehouse?

A Data Warehouse is a database that brings together data from different systems. You have to store data securely, and one of the prerequisites of a Data Warehouse is that it is immediately available when you or the application using it needs it. Databases exist for this purpose. Different applications use their databases, such as ERP, CRM, DMS, or HRM systems. For unambiguous occurrences in other systems and to link data from different systems, a central database is needed: the Data Warehouse. A Wata Warehouse contains structured data from different databases.

The previously mentioned operational systems’ databases (ERP, CRM, DMS, HRM, etc.) contain structured data. Their characteristic is that you can store this data in table-like structures: relational databases. The result of aggregation, the Data Warehouse, is another structured data database. With the support of business intelligence and data analytics, you can gain valuable insights from this data and make decisions based on it. A good Data Warehouse thus contributes significantly to the success of a data-driven organization.

The essence of business intelligence (BI) is to use the data present in the organization to make management decisions. The Data Warehouse contains the data on which BI dashboards base their visualizations, metrics, and conclusions. Data Warehouses and business intelligence are therefore linked. By storing your data in a structured and unambiguous way in a Data Warehouse, you can get the greatest return from a business intelligence solution.

When should you use a Data Warehouse

Integration

When you store data in different places and different systems, a Data Warehouse is a good fit. When you want a complete overview of all your business processes, customers, and service providers, you must gather and combine all data unambiguously in a central database. This way, you create a unified view of your entire organization and its activities. Your aim should always be to make it easier for your employees to interpret data and databases by linking and interlinking them.

Availability

If you have a Data Warehouse up and running, your employees can understand the meaning of data earlier and better. With separate databases, achieving this goal is often more challenging, especially if these databases are from different providers, have no standard structures, or (e.g., with cloud applications) can be challenging to access. A Data Warehouse ensures good and fast availability of all data, allowing your employees to reach valuable insights sooner.

Self Service

A Data Warehouse combined with an accessible business intelligence solution ensures that your employees can create analyses and reports autonomously. This way, you develop a culture of self-support, which is essential because most organizations have limited access to internal and external BI and ICT experts. You must also invest in a good infrastructure between the Data Warehouse and the BI tools. By properly training and coaching employees, you ensure that even non-technical employees can perform analyses at a high level. They do not have to fall back on the knowledge of the BI/ICT department, which gives the BI/ICT department extra capacity to invest in optimizing business processes with the development of ICT solutions instead of solving basic requests.

Quality of Data

You can not always guarantee the quality of data in the databases of operational systems. No data quality checks occur by default, and users can independently add or change data to the system. As a result, data often turns out to be incorrectly or incompletely stored. With a Data Warehouse, you can improve the data quality by automatically recognizing and correcting errors.

Quick Response

Developers have designed the operational systems in which your company generates data (ERP, CRM, HRM, etc.) to perform numerous small data exchanges at a high rate. The databases of these systems are less suitable for analyzing your business data, which involves simultaneously collecting, merging, processing, and displaying millions of data elements. The structure of a Data Warehouse lends itself better to this, and it is optimized to quickly provide large amounts of information in one go.

Reports

The structured architecture of a Data Warehouse allows your staff to create analyses and reports quickly and easily, which are also easily adaptable. For example, it is easy to switch between months and quarters when analyzing finances. If you ‘play’ with processing and visualizing data in this way, you can gain valuable and unique insights.

Historical Data

A loose database linked to operational systems often does not record the data’s history. There is a good chance that the old value is immediately replaced by the new, without storing the old value anywhere. As a result, much valuable information is lost, which can provide a complete picture of data and performance over time and help you set up a predictive model. By incorporating historical data into analysis, a much more reliable picture emerges than if you only use current information. A Data Warehouse also preserves historical data by storing all changes in the original databases.

Reducing Data Management driven Business Interruptions

By collecting all data in a Data Warehouse, you can perform your analysis from the Data Warehouse, relieving the pressure on the underlying (separate) databases (ERP, CRM, HRM, etc.). The Data Warehouse database allows you to analyze large amounts of data without the source systems experiencing operational problems due to overload. Your business processes can continue unimpeded while you can perform complex and data-intensive analyses in your Data Warehouse.

Disadvantages and Limitations of Data Warehouses

Designing a data warehouse can be a time-consuming and costly task that requires specialized knowledge. It is a lot of work to pull all the data together, process it, and make it available. Starting small is essential, beginning with the business data representing the highest potential value. Make business departments aware they should share all available data. Therefore, explain the usefulness and importance of a Data Warehouse to all departments. By starting to implement a Data Warehouse on a small scale, you can prove its added value step by step. Employee/stakeholder support will gradually increase by beginning with a simple form and expanding it each time.

How to implement a Data Warehouse?

Charting Information needs

Determine what management decisions you want to make in the near future, what insights you want to have, what data-driven questions you (and the business) want to have answers to, and what kind of practices and data your organization is dealing with. Identifying the information needs will help your transition to a Data Warehouse in the most efficient way possible.

Identify all Data Sources

See which databases provide the information needed within your organization. Investigate if your organization meets each data request, how the data is stored, and the quality level of the data. Determine how to deal with privacy-sensitive information because of additional restrictions due to legal obligations. For example, specific personal data, such as ethnic origin or religious beliefs, may only be processed in a Data Warehouse under strict conditions, so always involve your Data Security Officer in this process.

Create an Implementation Schedule

By looking at information needs, you can determine your priorities. Instead of processing all the data first, I would recommend starting with comparatively small and easy projects that do not require a lot of data. That way, you save time, and the data warehouse’s first (positive) results are quickly visible. You can opt for a temporary solution for immediate issues in which the required information is not available yet from your Data Warehouse. This solution is loading and using only current information and excluding historical data. Excluding historical data generates less reliable data but is still useful for a short-term report that you can update later with historical data: don’t wait too long with this, though, because it limits the functionality of your report(s).

Determine the technique for setting up your Data Warehouse

There are different techniques for setting up your Data Warehouse. Design your Data Warehouse carefully to fit the needs of your organization:

  • Determine how the data should be obtained, integrated, and delivered.
  • Determine whether you want a Data Warehouse locally (on-premises) or in the cloud.
  • Look at how the current data stream is processed and how you can visualize and interpret data

Implementation of the Data Warehouse

Implementing a data warehouse is not just about making the BI tools available to users, but it is also about changing how your organization works. Your employees must become familiar with the Data Warehouse and its capabilities step by step. Ensure that your employees’ level of knowledge increases as the Data Warehouse gives you more abilities. Do not postpone introducing the new data-driven approach until you finish your Data Warehouse. By training and involving your employees at the start of the project, you can secure the maximum value of data extraction from your Data Warehouse.

Final Thoughts

When you start with a data management strategy, a Data Warehouse is more than sufficient: start small and don’t make it too complex at first. However, ensure that you do not take the Data Warehouse approach of 30 years ago. You should really consider setting up a Data Warehouse in a Cloud environment instead of running a Data Warehouse locally (on premise). If you set up a Data Warehouse in the cloud, you can easily scale up and extract data quickly without any dependancies (additional hardware, maintenance, etc.). In this setup, your approach of ETL (Extract, Transfer, and Load) changes into a process of ELT: Extract, Load, and Transfer. This is a more modern and effective approach of setting up a Data Warehouse and very common to do this in the Cloud. I would not use this approach when you decide for an on premise solution because of the limitiations I already described (mainly dependancies of system performance and system expansion).

Feel free to contact me if you have questions or in case you have any additional advice/tips about this subject. If you want to keep me in the loop if I upload a new post, make sure to subscribe, so you receive a notification by e-mail.

Gijs Groenland

I live in San Diego, USA together with my wife, son, and daughter. I work as Chief Financial and Information Officer (CFIO) at a mid-sized company.

Leave a Reply