Crunch Time for Data Mining with CRISP
Copenhagen, September 18, 1997
A consortium of leading suppliers and major
users has announced a collaborative project to develop a standard
and publicly-available process model for data mining.
CRISP-DM - "CRoss-Industry
Standard Process for Data Mining"
is partially funded by the European Commission and brings
together: NCR, the worlds leading supplier of data warehouse
solutions; Integral Solutions Limited (ISL), developers of the market-leading
Clementine Data Mining System; European industrial power center
Daimler-Benz; and OHRA, one of the larger Dutch health insurance
companies.
Despite the development of technology to support
huge databases, the rapid spread of computerisation in all industries
presents users with the problem of interpreting vast amounts of
data. Although algorithms and tools abound, data mining at present
is more of an art than a well-understood, reliable process. There
exists no generally available, practical data mining process. This
particularly hinders data mining projects and is a major barrier
to infrastructural adoption of data mining by large corporate users.
The CRISP-DM project will provide an industry-neutral
and tool-neutral process model. Starting from the embryonic knowledge
discovery processes used in industry today and responding directly
to user requirements, this project will define and validate a data
mining process that is generally applicable in diverse industry
sectors. This will make large data mining projects faster, more
efficient, more reliable, more manageable, and less costly. CRISP
will be kept sufficiently lightweight, however, to benefit even
small-scale data mining investigations.
To date, research and development in data mining
has focused on developing more and better algorithms and technologies.
Little attention has been paid to helping the user apply these tools,
or providing a supporting framework for data mining projects. CRISP-DM
marks a move away from this technology fixation, addressing the
needs of all levels of users in deploying data mining technology
to solve business problems.
CRISP-DM is not tied to any one data mining tool
or technology, although NCR and ISL will add support for the process
to their data mining tools. The partners aim to promote and publish
the CRISP process model with a view to establishing a standard.
Key to this is the CRISP-DM SIG (Special Interest Group). This will
involve large-scale data mining users from a wide span of industries,
as well as data mining tool vendors and service providers. SIG members
will share their experiences, contribute to the design of the process
model and gain early access to the results of the project. Initial
SIG members are now being recruited, and the SIG will meet for the
first time in 4Q97.
CRISP-DM started in July 1997 and will be completed
within 18 months. During the project, the CRISP process model will
be validated against demanding large-scale applications within OHRA
and Daimler-Benz. This will help to guarantee that the CRISP model
gives maximum value to users in terms of cost savings, shortened
project timescales, and increased confidence in adopting data mining
as a core part of their business.
|