Data Mining Inside story

#data #mining #anomaly #association #cluster

Akash Deep Nov 30 2021 · 4 min read
Share this

What is Data Mining?

Data Mining is the process of extracting valid, useful, unknown and comprehensible information from data and using it to make proactive knowledge-driven business decisions. Data mining uses statistical procedures to find unexpected patterns in data and identifies associations between variables.

Data Mining Tasks

Let's now move on to the common classes of Data Mining tasks - Anomaly Detection, Associate Learning, Cluster Detection, Classification and Regression.

Anomaly Detection

Anomaly Detection refers to identifying items, events or observations that do not adhere to the expected pattern or the other items in the dataset.

Anomaly Detection Example

A good example is how the tax department models typical tax returns and then identifies returns that differ from this model using anomaly detection. This is used for audits and reviews.

Association Learning

Association learning is the ability to learn and remember the relationship between unrelated items or stimuli or behavior.

Association Learning Example

Association learning is the type of data mining that drives the recommendation engines in major sites like Amazon and Netflix. This would let you know that customers who bought a particular item also bought another item.

Cluster Detection

Cluster Detection is a type of pattern recognition particularly useful in recognizing distinct clusters or sub-categories within the data.

Cluster Detection Example

The purchasing habits of hobbyists like gardeners, artists and model builders would look quite different. By analyzing the purchasing behavior using clustering algorithms, one can detect the various subgroups within the dataset.

Classification

Classification Example

The algorithms can be trained to detect systematic differences between items in each group by learning from a large set of pre-classified examples. The algorithm can then apply these rules to the new classification problems. For instance, a classifier can predict borrowers who cheat on loan payments.

Regression/Prediction

Regression/Prediction uses the historical relationship between a dependent and one or more independent variables to predict values of the dependent variable.

Regression Example

It is a common practice for businesses to use regression to predict stock prices, currency exchange rates, sales, productivity gains and so on. For example, a company might use regression to get insights on how the expenses in past advertising have impacted the sales. Here, the dependent variable is sales and the independent variable is advertising expenditures, number of sales reps and the commission paid.

Below we will learn about data mining process starting from raw data to the point of knowledge discovery.

Knowledge Discovery Process

Now that you have an idea of how data is processed to create knowledge, let's learn about various stages of Knowledge Discovery Process: Problem Definition > Data Preparation > Data Mining > Data Analysis > Knowledge Assimilation

Problem Definition Stage

The problem definition stage is the initial phase of a data mining project, and it focuses on understanding the project objectives, requirements and defining the data mining problem. Based on this, you can identify the data requirement and models.

Data Preparation Stage

This stage involves three key activities and requires more than 70% of the total data mining effort.

  • 'Data Selection': We identify the sources of information select a subset of data required for analysis.
  • 'Data Pre-processing': We join data from various tables and resolve issues such as data conflicts, outliers, and missing data.
  • 'Data Transformation': We use conversions and combinations to generate new data fields like ratios and discretized continuous values.
  • Data Mining Stage

    In this stage data mining technique, we identify the algorithm and tools to be used. Then, we apply the algorithm on the sample data set (also known as training data) and tune the control parameters of the algorithm till we get a satisfying result. Later, we validate the model by running the algorithm against the actual data (also known as test data).

    Data Analysis Stage

    In this stage, we evaluate the mined patterns with respect to the defined goals. We interpret the Data Mining output – in the form of rules or patterns to find new and potentially useful knowledge.

    This is the Holy Grail of the Knowledge Discovery!

    Knowledge Assimilation Stage

    In this stage, we implement the business insights derived from the Data Mining process in the organization’s system for further action. The knowledge becomes active, which means that we can make changes to the system, and measure the impact of the changes. The success of this step determines how effective the Knowledge Discovery process is.

    The final deployment would involve building computerized systems to capture relevant data and to make real time recommendations to business.

    Also, Data Mining Models need to be continuously monitored and refined as several economic factors, business changes and competitor initiatives could impact the performance of the model.

    Data Mining team

    Let's understand the typical team composition required for Data Mining projects. These projects require people with not just great minds but those who have a great eye for data. A Data Mining team typically involves :

  • Domain Expert
  • Database Administrator
  • Statistician
  • Mining Specialist
  • Domain Experts

    Domain Experts are usually people in higher business management functions who know the business environment, processes, customers, and competition.

    Database Administrator

    Database Administrators come with a good understanding of company data, where it is stored, how it is stored, how to access it and how to relate it to other data sources.

    Statisticians

    Statisticians validate and analyze datasets. Their key tasks include analysis, interpretation, and presentation of statistical outputs.

    Data Miner

    Data Miners apply data mining techniques and technically interpret the results. They usually have a background in data analysis and statistics.

    Roles in Data Mining Summary

    In a Data Mining projects, Data Miners play a central role in establishing relationships with Domain Experts for business guidance on their results, with DBAs for access to the data required for their activities and with Statisticians for validating analysis and interpreting statistical outputs.

    Comments
    Read next