你将学到什么
Compare the performance of different mining methods on a wide range of datasets
Demonstrate how to set up learning tasks as a knowledge flow
Solve data mining problems on huge datasets
Apply equal-width and equal-frequency binning for discretizing numeric attributes
Identify the advantages of supervised vs unsupervised discretization
Evaluate different trade-offs between error rates in 2-class classification
Classify documents using various techniques
Debate the correspondence between decision trees and decision rules
Explain how association rules can be generated and used
Discuss techniques for representing, generating, and evaluating clusters
Perform attribute selection by wrapping a classifier inside a cross-validation loop
Describe different techniques for searching through subsets of attributes
Develop effective sets of attributes for text classification problems
Explain cost-sensitive evaluation, cost-sensitive classification, and cost-sensitive learning
Design and evaluate multi-layer neural networks
Assess the volume of training data needed for mining tasks
Calculate optimal parameter values for a given learning system
课程概况
This course introduces advanced data mining skills, following on from Data Mining with Weka. You’ll process a dataset with 10 million instances. You’ll mine a 250,000-word text dataset. You’ll analyze a supermarket dataset representing 5000 shopping baskets. You’ll learn about filters for preprocessing data, selecting attributes, classification, clustering, association rules, cost-sensitive evaluation. You’ll meet learning curves and automatically optimize learning parameters. Weka originated at the University of Waikato in NZ, and Ian Witten has authored a leading book on data mining.
课程大纲
Running large-scale data mining experiments
Constructing and executing knowledge flows
Processing very large datasets
Analyzing collections of textual documents
Mining association rules
Preprocessing data using a range of filters
Automatic methods of attribute selection
Clustering data
Taking account of different decision costs
Producing learning curves
Optimizing learning parameters in data mining
面向人群
This course is aimed at anyone who deals in data. It follows on from Data Mining with Weka, and you should have completed that first (or have otherwise acquired a rudimentary knowledge of Weka). As with the previous course, it involves no computer programming, although you need some experience with using computers for everyday tasks. High-school maths is more than enough; some elementary statistics concepts (means and variances) are assumed.