Data Warehouse architecture
A data warehouse is a heterogeneous collection of different data sources organized under a unified schema. There are 2 approaches for constructing a data-warehouse: Top-down approach and Bottom-up approach are explained as below.
1. Top-down approach:
The essential components are discussed below:
- External Sources –
External source is a source from where data is collected irrespective of the type of data. Data can be structured, semi-structured and unstructured as well. - Stage Area –
Since the data, extracted from the external sources do not follow a particular format, so there is a need to validate this data to load into the data warehouse. For this purpose, it is recommended to use the ETL tool.- E(Extracted): Data is extracted from the External data source.
- T(Transform): Data is transformed into the standard format.
- L(Load): Data is loaded into the data warehouse after transforming it into the standard format.
- Data-warehouse –
After cleansing of data, it is stored in the data warehouse as central repository. It actually stores the metadata and the actual data gets stored in the data marts. Note that the data warehouse stores the data in its purest form in this top-down approach. - Data Marts –
Datamart is also a part of the storage component. It stores the information of a particular function of an organization that is handled by a single authority. There can be as many numbers of data marts in an organization depending upon the functions. We can also say that data mart contains a subset of the data stored in the data warehouse. - Data Mining –
The practice of analyzing the big data present in the data warehouse is data mining. It is used to find the hidden patterns that are present in the database or in a data warehouse with the help of an algorithm of data mining.
2. Bottom-up approach:
- First, the data is extracted from external sources (same as happens in a top-down approach).
- Then, the data go through the staging area (as explained above) and loaded into data marts instead of the data warehouse. The data marts are created first and provide reporting capability. It addresses a single business area.
- These data marts are then integrated into the data warehouse.
कोई टिप्पणी नहीं:
एक टिप्पणी भेजें