Web data analysis the premise of analysis data quality 1

data quality (Data Quality) is the basis and the most important prerequisite and guarantee of the validity and accuracy of data analysis conclusion. Data quality assurance (Data Quality Assurance) is an important part of data warehouse architecture, and it is also an important component of ETL.

we usually through data cleaning (Data cleansing) to filter dirty data, guarantee the validity of the underlying data and the accuracy of data cleaning is generally the front part of the data into the data warehouse, generally once the data into data warehouse, you must ensure that these data are available, the statistics will be to the upper polymerization of this batch of data as the basis of the data set, the upper won’t do any of the check and filtering, and is set in order to ensure the results and summary of all the multi-dimensional polymerization is strictly by the use of the underlying data base stability. But now when we construct the data warehouse is generally not to put all the data cleaning steps before storage, usually part of the data cleaning work put in storage after execution, mainly because the data warehouse has its own advantages on the aspect of data processing, part of the cleaning work in the warehouse will be more simple and efficient but, as long as the data cleaning before the statistical data and aggregation, we can still ensure that use is retained in the data warehouse after cleaning the "clean" data base.

was talking with colleagues about data quality assurance a bit earlier, and was involved in data warehouse related work before, so I’m prepared to sort it out systematically. Before the construction of data warehouse based on Oracle, so the choice is a tool OWB to construct data warehouse provided by Oracle (Oracle Warehouse Builder), which provides a relatively complete data quality assurance process, mainly includes three parts:

Data Profiling

Data Auditing

Data Correcting

Data Profiling

Data Profiling, in fact, have not found the proper translation of Oracle, which is used in the data analysis summary ", but in fact, the word" Profiling "in summary analysis can not reflect its artistic conception, the movie Criminal Minds (Criminal Psychology) students should know the crime analysis of FBI (BAU) of each group set will do a Criminal Profiling on the criminals, to analyze the criminal's identity background, behavior, mental state, so Profiling is more of a process of analysis.




