Saturday, 1 October 2011

Data Mining - A Short Introduction

Data mining is an integral part of data analysis which contains a series of activities that goes from the 'meaning' of the ideas, to the 'analysis' of the data and up to the 'interpretation' and 'evaluation' of the outcome. The different stages of the technique are as follows:
Objectives for Analysis: It is sometimes very difficult to statistically define the phenomenon we wish to analyze. In fact, the business objectives are often clear, but the same can be difficult to formalize. A clear understanding of the crisis and the goals is very important setup the analysis correctly. This is undoubtedly, one of the most complex parts of the process, since it establishes the techniques to be engaged and as such, the objectives must be crystal clear and there should not be any doubt or ambiguity.
Collection, grouping and pre-processing of the data: Once the objectives of the analysis are set and defined, we need to gather or choose the data needed for the study. At first, it is essential to recognize the data sources. Usually data are collected from the internal sources as the same are economical and more dependable and moreover these data also has the benefit of being the outcome of the experiences and procedures of the business itself.

Investigative analysis of the data and their conversion: This stage includes a preliminary examination of the information available. It involves a preliminary assessment of the significance of the gathered data. An exploratory and / or investigative analysis can highlight the irregular data. An exploratory analysis is important because it lets the analyst choose the most suitable statistical method for the subsequent stage of the analysis.
Choosing statistical methods: There are multiple statistical methods that can be put into use for the purpose of analysis, so it is very essential to categorize the existing methods. The choice statistical method is case specific and depends on the problem and also upon the type of information available.

Data analysis on the basis of chosen methods: Once the statistical method is chosen, the same must be translated into proper algorithms for working out the results. Ranges of specialized and non-specialized software are widely available for data mining and as such it is not always required to develop ad hoc computation algorithms for the most 'standard' purpose. However, it is essential that the people managing the data mining method well aware and have a good knowledge and understanding of the various methods of data analysis and also the different software solutions available for the same, so that they may adapt the same in times of need of the company and can flawlessly interpret the results.
Assessment and contrast of the techniques used and selection of the final model for analysis: It is of utmost necessity to choose the best 'model' from the variety of statistical methods accessible. The selection of the model should be based in contrast with the results obtained. When assessing the performance of a specific statistical method and / or type, all other dependent and / or relevant criterions should also be considered. The other criterions may be the constraints on the company both in terms of time and resources or it may be in terms of quality and the accessibility of data.
Elucidation of the selected statistical model and its employment in the decision making process: The scope of data mining is not limited to data analysis rather it is also includes the integration of the results so as to facilitate the decision making process of the company. Business awareness, the pulling out of rules and their use in the decision process allows us to proceed from the diagnostic phase to the phase of decision making. Once the model is finalized and tested with an information set, the categorization rule can be generalized. But the inclusion of the data mining process in the business should not be done in haste; rather the same should always be done slowly, setting out sensible and logical aims. The final aim of data mining is to be an integral supporting part of the company's decision making process.


No comments:

Post a Comment