Data mining is a process of discovering patterns, trends, and relationships from data. Data mining involves analysing the data you have from different perspectives and turning it into actionable information. Data mining emphasises prediction rather than description. It is an ideal vehicle for uncovering new patterns from large data sets.
Before this can be done, the data have to transformed into an easily accessible form that is suitable for analysis. Often data sets from various sources have to be combined. This requires technical as well as analytical skills.
We have particular skills in:
- managing big data
- using R data mining tools, including Rattle
- tree models
- random forest models
- cluster analysis and segmentation.
Random Forest Case Study
Our article gives a good example of a Random Forest Model. It also gives a new and improved method of determining variable importance based on using fractional factorial experiments.
Some recommended resources
- Hastie, T., Tibshirani, R., and Friedman, J. 2008. The elements of Statistical Learning. 2nd Edition. Springer-Verlag.
- Williams, G. 2011. Data Mining with R and Rattle: The Art of Excavating Data for Knowledge Discovery, Springer, Use R!