An Introduction to Data Warehouse
This article was published as a part of the Data Science Blogathon.
The following is an in-depth article explaining what data warehousing is as well as its types, characteristics, benefits, and disadvantages. A few of the topics which we will cover in the article are:
1. What is a data warehouse?
2. Why is data warehousing crucial?
3. Applications & characteristics of a data warehouse
4. Advantages and disadvantages of a data warehouse
The article is suitable for anyone ranging from a database professional, database administrator, and freshers who want to understand the basic concepts of the data warehouse.
What is Data Warehouse?
A Data Warehouse (DW) is a repository of large amounts of organized data. The data can be consolidated from multiple sources. DWs are relational databases designed for analytical reporting and on-time decision-making in organizations. The data used for this purpose is isolated and optimized from the source transaction data, so it won’t affect the main business. When an organization introduces any business change, then DW is used to analyse the effects of that change, thus DW can also be used to monitor non-decision making processes.
Data Warehouses are mainly read-only systems since operational data is kept separate from the data warehouses. This provides a good query writing environment for retrieving the highest volume of data. Therefore, DW will act as the backend engine for Business Intelligence tools that display reports and dashboards to business users. It is widely used in banking, financial, retail sectors, etc.
Figure 1: The data marts are subsets of the data warehouse’s data.
Why is Data Warehousing Crucial?
The main reasons why data warehousing is crucial is as follows:
- Data warehouses combine all operational data from several heterogeneous sources of “different formats” and through the process of extract, transform, and load (ETL), they load the data into DW in a “standardized dimensional format” across an organization.
- A data warehouse maintains both current and historical data for analytical reporting and fact-based decision making.
- Improve your business decisions. Successful business leaders develop data-driven strategies and rarely make decisions without considering the facts. Data warehousing makes it easier for corporate decision-makers to access different data sets faster and more efficiently, and to derive insights that will guide their business and marketing strategies.
- Data warehouse platforms enable business leaders to access their organization’s historical activities and evaluate initiatives that have been successful or unsuccessful in the past. It enables executives to see where they can reduce costs, maximize efficiency, and increase sales to boost profit.
Characteristics of a Data Warehouse
The data warehouse’s design is based on the following criteria: subject-oriented, integrated, non-volatile, and time-varying.
1) Subject Oriented: We can call a data warehouse subject-oriented if we can analyse data in a particular area rather than by applying wise data. As a result, decision making becomes simpler. Within an education system, the subject areas could be students, subjects, marks, teachers, etc.
2) Integrated: The data in the data warehouse is integrated from different sources, such as other relational databases, flat files, etc. An immense amount of data is fetched for effective data analysis. However, different sources of data may be in different formats, which may cause data conflicts. All this data is brought together in a consistent format across the whole system in the data warehouse.
3) Non-volatile: Once the data is loaded into the data warehouse, it cannot be changed. In a logical sense, this is acceptable, as a frequent change of data prevents analysis. It is possible to load the operational database on a scheduled basis into a data warehouse. During this process, new data is added, but earlier data is not discarded and remains as historical data.
4) Time-Variant: All the historical data along with the recent data in the data warehouse are crucial to retrieve data of any duration. In order to compare it with the previous years and analyse the trends, all the old data that is 6 months old, 1-year-old, or even older will be required for reports, graphs, etc.
Figure 2:Characteristics of data warehouse
Data Warehouse Applications
In data warehousing, Business Intelligence (BI) is used for decision-making. The BI plays a major role once the data in the DW has been loaded by analysing it and presenting it to the business users. The term “data warehouse applications” effectively implies how data can be manipulated and utilized.
Data warehouse applications fall under three categories: information processing, analytical processing, and data mining.
Information Processing: A data warehouse makes it possible to process the information it stores. Data can be processed through querying, basic statistical analysis, and reporting.
Analytical Processing: The information stored in a data warehouse can be processed analytically. With the help of basic OLAP(Online Analytic Processing Server) operations, such as slice-and-dice operations, drill down and drill up, and pivoting, the data can be analysed.
Data Mining: Through data mining, knowledge can be discovered by finding hidden patterns and associations, constructing analytical models, and performing classification and prediction. Results from data mining can be presented visually.
1. Information Processing: This is a type of application where the data warehouse enables direct, one-on-one contact with the data it stores. Using direct queries on the data with basic statistical analysis of the data.
The tools which DW supports for information processing are:
1.1) Query Tools: By using query tools, the user can explore the data and generates reports or graphics in accordance with the business requirements.
1.2) Reporting Tools: Reporting tools are used when the business wants to see the results in a certain format on a regular basis, such as daily, weekly, or monthly. This type of report can be saved and retrieved at any time.
1.3) Statistics Tools: To generate these results, statistics tools will be used if the business wants to examine data from a broader perspective. By understanding these strategic results, businesses can make predictions and conclusions.
2. Analytical Processing: This is an application that allows the analysis of data stored in a data warehouse. Slice-and-Dice, Drill Down, Roll Up, and Pivoting are some of the operations that can be used to evaluate the data.
If we focus on a single area as part of the business requirement, slicing evaluates the dimensions of that specific region according to the criteria and returns the findings. Dicing is a program that performs analytic processes. Dicing provides a variety of viewpoints by zooming in on a select set of properties across all dimensions. One or more successive slices are used to calculate the dimensions.
2.2) Drill Down: Drill down is an operation for traversing down a summary number to minor detailed levels if the business wishes to get to a more detailed level of any summary number. This gives a good indication of what’s going on and where the company should concentrate its efforts.
3. Data Mining: This is a type of application in which the data warehouse facilitates data knowledge discovery and the findings are visualized using visualization tools. It’s difficult to query and drill down the data warehouse to acquire all potential insights into data as the amount of data grows in various industries. Then data mining enters the scene to help with knowledge discovery.
Figure 3: Applications of data warehouses
Benefits and Disadvantages of Data Warehouses
When a data warehouse system is operational, a business gains the following advantages:
- Business Intelligence Enhancement
- System and query performance improvement.
- Multiple Sources of Business Intelligence
- Data Access in Real-Time
- Intelligence from the past
- Exceptional Return on Investment
Despite the fact that it is a very successful system, it is useful to be aware of some of its flaws:
- Creating a Data Warehouse is an extremely time-consuming and difficult task.
- The cost of maintenance is high because the system requires constant improvements.
- Developers, testers, and users should have adequate training in order to comprehend the DW system.
- It is possible that sensitive data won’t be able to be fed into DW for decision-making.
- Any business process source system restructuring has a significant impact on DW.
The way we store, organize, and analyse data has shifted dramatically in recent years. Cloud-based data warehouses offer improved methods for extracting and analysing information from data. They make data warehousing accessible and effective for small and medium organizations as well as large and well-funded corporations. Instead of investing months and millions of dollars on a monolithic Enterprise Data Warehouse, a cloud-based data warehouse can now be set up in days, with managed services and cloud-based storage starting at a few hundred dollars per month.
I hope I was able to provide a brief background of data warehousing, in the article we covered the definition, types, characteristics, benefits, and drawbacks of Data Warehousing.
Figure 1: http://mbenhaddou.com/2020/01/16/advantages-and-disadvantages-of-a-data-mart/
Figure 3: https://whatisdbms.com/12-applications-of-data-warehouse/
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.