All Courses

What is CRISP – DM Methodology?

Adam Brian

2 years ago

Table of Content
  • Business Understanding
  • Data Understanding
  • Data Preparation
  • Modeling
  • Evaluation
  • Deployment
        CRISP - DM stands for Cross Industry Standard Process for Data Mining. The CRISP-DM methodology is practical, flexible and useful when solving business issues with analytics.
The definition of CRISP – DM is a data mining technology or a methodology or a process that helps you or provides you a blueprint to conduct a data mining project. It was implemented in 1996 and was founded by major companies like Daimla Benz, ISL, NCR & OHRA. These companies have actually implemented in around 200 data mining users and tools and then they came up with this model. This is a non proprietory documented and freely available process that’s what the actually designed, so everybody can use it.
How it helps?
CRISP – DM provides a roadmap, it gives you best practices and it provides you structures for better and faster results of using data mining, so that’s how it helps the business to follow while planning and carrying out a data mining project.
CRISP
Business Understanding
         Business Understanding is the first phase where we convert a busniness objective or we understand the project from business perspective and then we convert it to data mining sub tasts, so we convert a business objective  into a data mining objective or a data mining tasks where we can apply technologies for modeling technologies into it.
Four major tasks to be focused in business understanding:
1.     Determine business objective – Here we actually focus and understand what is the true goal of your project and what are some of the impotant factors that we need to know about the business.
2.     Assess the situation – Here we list out what are the assumptions that we need to make, what are the cost benefit analysis that we need to do.
3.     Determinr data mining goals – Here we set objectives for the team or the business.
4.     Provide a proper project – Here we provide a project plan and we set specific outlines and also propose a timeline and you see these are all the tools and techniques that we are going to use.
Data Understanding
       Data Understanding is the second phase which starts with initial collection of data, where we increase the familiarity with the data and we also have to create hypothesis based on the data quality and the data we already have, if we have any interesting data sets we can provide an initial hypothesis with the hidden information that we have collected.
Four major tasks to be focused in data Understanding:
1.     Collecting the data – Data collection is where we collect and aquire the data and when we find there is any problem that you have encountered you have to make note of it.
2.     Describing the data: Describing the data is where we actually examine the surface of the data and if we see any problems that we have during aquiring the data and then we also have an option to see what are the formats that we can set and how much quality and quantity that you have, also you can set records and fields in tablets and all this we can do in the description of data.
3.     Exploring the data – Data exploration is where we create a data exploration report and then what all are our first findings or our initial hypothesis that we have and then we give it as exploration report.
4.     Data Quality – This is the significant task, here we find the missing attributes and then we see if there is any blank fields or if you see any spellings mistakes of the values, we just make a note of the quality of the data that you have, also if you see any conflicts in the data you can mention that as well.
Data Preparation
        Data Preparation is the third phase where we have the data , we have aquired the data , we have the quality so now here in data preparation we set the final data set and we will be using this data set for the modeling which is the next phase. So to give a defination, its all about collecting all the data and setting final data set and that will be fed into the modeling tools thatwe are going to use in the next phase.
Some straighforward actions that we have to do are:
1.     Select – Decide what data we are going to use
2.     Clean – Here we go to the data quality and see if there are any missing attributes or any spelling mistakes, so we clean the data and have the correct verified data.
3.     Construct – Here we actually develop new records or we describe new attributes that we want to create.
4.     Integrate – Here we combine multible records and tables altogether and integrate and aggregate the data.
5.     Format – Here we remove some illegal characters we find or if you want to trim the values as per your model, so all this is done in formatting the data.
Modeling
      Modeling is where we actually propose various model techniques and select and apply them and see if we can apply that and what are the options that we have.
Four major tasks to be focused in Modeling:
1.     Select the model 
2.     Test the model
3.     Create the model 
4.     Assess the model
Evaluation
      In evaluation we actually create and work with our business objectives and then we come up with evaluation sheets and then we come up with process reviewing and then we see if there is anything that we have to determine for the next steps, so here we actually summarize the whole result and then we give it as a business criteria, that is what we do in evaluation.
Deployment
     Here in the final 6th phase we actually deploy, deploying is where we present the report  or decide to carry the project to the next level or we carry it to the business steps.
Some major tasks are:
1.     Plan deployment
2.     Plan monitoring
3.     Plan final report
4.     Review project 
So, here in this article we saw the process of CRISP – DM and how it works. Further we would discuss about CRISP in the upcoming articles.

Submit Review