Excerpts from book:

Knowledge leads to wisdom and better understanding. Data mining builds knowledge from information, adding value to the tremendous stores of data that abound today -- stores that are ever increasing in size and availability. Emerging from the database community in the late 1980's the discipline of data mining grew quickly to encompass researchers from Machine Learning, High Performance Computing, Visualisation, and Statistics, recognising the growing opportunity to add value to data.

Today, this multi-disciplinary effort continues to deliver new techniques and tools for the analysis of very large collections of data. Searching through databases measuring in gigabytes and terabytes data mining delivers discoveries that can change the way an organisation does business. It can enable companies to remain competitive in this modern data rich, knowledge hungry, wisdom scarce world. Data mining delivers knowledge to drive wisdom.

This book presents a unique and easily accessible single stop resource for the data miner. It provides a practical guide to actually doing data mining. It is accessible to the information technology worker, the software engineer, and the data analyst. It also serves well as a textbook for an applications and techniques course on data mining. While much data analysis and modelling relies on a foundation of statistics, the challenge is to not lose the reader in the statistical details. At times the presentation here will leave the statistically sophisticated wanting a more solid treatment.

Part III introduces the basic concepts and processes in data mining, with a focus on preparing for modelling within data mining. Here this book deals with the concepts that will be found in data mining, providing a solid grounding for understanding the many issues and ideas around data mining, serving as the foundation for that which follows.

Part IV reviews the algorithms employed in data mining. The encyclopedic overview covers many tools and techniques deployed within data mining, ranging from decision tree induction and association rules, to multivariate adaptive regression splines and patient rule induction methods. This book also covers standards for sharing data and models.

R has been chosen as the scripting and programming language to present examples and algorithms inside this book. R is relatively easy to learn and is widely used by Statisticians the world over. Indeed, R is perhaps the most powerful statistical and graphical package available. R is open source and available for a number of platforms, including GNU/Linux and MS/Windows.

The book is accessible to many readers and not necessarily just those with strong backgrounds in computer science or statistics. At times this book does introduce some statistical, mathematical, and computer science notations, but intentionally keep it simple. Sometimes this means over-simplifying concepts, but only where it does not lose intent of the concept and only where it retains its fundamental accuracy.