Crunch Time for Data Mining with CRISP

Copenhagen, September 18, 1997

A consortium of leading suppliers and major users has announced a collaborative project to develop a standard and publicly-available process model for data mining.

CRISP-DM - "CRoss-Industry Standard Process for Data Mining" – is partially funded by the European Commission and brings together: NCR, the world’s leading supplier of data warehouse solutions; Integral Solutions Limited (ISL), developers of the market-leading Clementine Data Mining System; European industrial power center Daimler-Benz; and OHRA, one of the larger Dutch health insurance companies.

Despite the development of technology to support huge databases, the rapid spread of computerisation in all industries presents users with the problem of interpreting vast amounts of data. Although algorithms and tools abound, data mining at present is more of an art than a well-understood, reliable process. There exists no generally available, practical data mining process. This particularly hinders data mining projects and is a major barrier to infrastructural adoption of data mining by large corporate users.

The CRISP-DM project will provide an industry-neutral and tool-neutral process model. Starting from the embryonic knowledge discovery processes used in industry today and responding directly to user requirements, this project will define and validate a data mining process that is generally applicable in diverse industry sectors. This will make large data mining projects faster, more efficient, more reliable, more manageable, and less costly. CRISP will be kept sufficiently lightweight, however, to benefit even small-scale data mining investigations.

To date, research and development in data mining has focused on developing more and better algorithms and technologies. Little attention has been paid to helping the user apply these tools, or providing a supporting framework for data mining projects. CRISP-DM marks a move away from this technology fixation, addressing the needs of all levels of users in deploying data mining technology to solve business problems.

CRISP-DM is not tied to any one data mining tool or technology, although NCR and ISL will add support for the process to their data mining tools. The partners aim to promote and publish the CRISP process model with a view to establishing a standard. Key to this is the CRISP-DM SIG (Special Interest Group). This will involve large-scale data mining users from a wide span of industries, as well as data mining tool vendors and service providers. SIG members will share their experiences, contribute to the design of the process model and gain early access to the results of the project. Initial SIG members are now being recruited, and the SIG will meet for the first time in 4Q97.

CRISP-DM started in July 1997 and will be completed within 18 months. During the project, the CRISP process model will be validated against demanding large-scale applications within OHRA and Daimler-Benz. This will help to guarantee that the CRISP model gives maximum value to users in terms of cost savings, shortened project timescales, and increased confidence in adopting data mining as a core part of their business.