![]() In this section, the errors found can be fixed, which is based on the Metadata of a pre-defined set of rules. It is crucial to set up the frequent meetings with owners of the source to detect the changes which might impact the data warehouse and the associated ETL process. The future of the source applications depends upon the current data issues of origin, the corresponding data models/ metadata repositories, and receiving a walkthrough of source model and business rules by source owners. In the source Analysis, the focus should not only on the sources but also on the surroundings, to obtain the source documentation. The source to target mapping highly depends on the quality of the source analysis. Data analysis will become the communication medium between the source and the data warehouse team for tackling the outstanding issues. It is essential to capture the results of this assessment correctly. For analyzing the data, it is easier to identify data quality problems such as missing data, inconsistent data, invalid data, constraint problems, parent-child issues such as orphans, duplicated. To analyze the results of the profiled data, Data Analysis is used. It gives a huge amount and variety of data. It is essential to use the correct tool, which is used to automate this process. Data profiling will use analytical techniques to discover the actual content, structure, and quality of the data by analyzing and validating the data pattern and formats and by identifying and validating redundant data across the data source. The objective of data profiling is to know about the sources. ![]() It is used to generate statistics about the sources. Finally, the data lineage should be foreseen throughout the entire ETL process, included the error records produced. Quality Assurance is a process between the different stages that could be defined depending on the need, and these processes can check the completeness of the value do we still have the same number of records or total of specific measures between different ETL stages? This information should be captured as metadata. This metadata information can be plugged into all dimension and fact tables and can be called an audit dimension. Secondly, we should have to be focused on ETL performance. This technique can capture all errors consistently which is based on a pre-defined set of metadata business rules and enables the reporting on them through a simple star schema, which enables a view on the data quality evolution over the time. Here an inside-out approach which is used in Ralph Kimbal screening technique could be used. A proper balance should be there between screening the incoming data as much as possible and not slowing down the entire ETL process when too much checking is done. Monitoring of the data enables the verification of the data, which is moved throughout the entire ETL process and has two main objectives. ETL will reduce the throughput time of the different source to target development activities which form the bulk of the traditional ETL effort. In a medium to large scale data warehouse environment, it is important to standardize the data as much as possible instead of going for customization. Here, the emphasis should be on the functionality offered by the ETL-tool and using it most effectively. Transform and loading the data is all about to integrate the data and finally moving the combined data to the presentation area, which can be accessed by the front end tools by the end-user community. It also states that the most applicable extraction method should be chosen for source date/time stamps, database log tables, hybrid depending on the situation. The main goal of extraction is to collect the data from the source system as fast as possible and less convenient for these source systems. ![]() In other words, ETL focus on Data Quality and MetaData. In today's data warehousing world, this term is extended to E-MPAC-TL or Extract, Monitor, Profile, Analyze, Cleanse, Transform, and Load. ETL stands for Extract, Transform, and Load.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |