A. Azzalini - B. Scarpa
Oxford University Press 2012
ISBN 978-0-19-976710-6
Table of Contents
- Preface
- Preface to the English Edition
- Introduction
- New problems and new opportunities
- Data, more data, and data mines
- Problems in mining
- SQL, OLTP, OLAP, DWH and KDD
- Complications
- All models are wrong
- What is a model?
- From data to model
- A matter of style
- Press the button?
- Tools for computation and graphics
- A-B-C
- Old friends: Linear models
- Basic concepts
- Variable transformations
- Multivariate responses
- Computational aspects
- Computational aspects
- Least squares estimation by successive orthogonalization
- When n is large
- Recursive estimation
- Likelihood
- General concepts
- Linear models with Gaussian error terms
- Binary variables with binomial distribution
- Logistic regression and GLM
- Exercises
- Optimism, Conflicts, and Trade-offs
- Matching the conceptual frame and real life
- A simple prototype problem
- If we knew f(x)...
- But as we do not know f(x)...
- Methods for model selection
- Training sets and test sets
- Cross-validation
- Criteria based on information
- Reduction of dimensions and selection of most appropriate model
- Automatic selection of variables
- Principal component analysis
- Methods of regularization
- Exercises
- Prediction of Quantitative Variables
- Nonparametric estimation: Why?
- Local regression
- Basic formulation
- Choice of smoothing parameters
- Variability bands
- Variable bandwidths and loess
- Extension to several dimensions
- The curse of dimensionality
- Splines
- Spline functions
- Regression splines
- Smoothing splines
- Multidimensional splines
- MARS
- Additive models and GAM
- Projection pursuit
- Inferential aspects
- Effective degrees of freedom
- Analysis of variance
- Regression trees
- Approximations via step functions
- Regression trees: growth
- Regression trees: pruning
- Discussion
- Neural networks
- Case studies
- Traffic prediction in telecommunications
- Insurance pricing
- Exercises
- Methods of Classification
- Prediction of categorical variables
- An introduction based on a marketing problem
- Prediction via logistic regression
- Misclassification tables and adequacy measures
- ROC curve
- Lift curve
- Extension to several categories
- Multivariate logit and multinomial regression
- Ordinal categorical variables and cumulative logit models
- Classification via linear regression
- Case with two categories
- Case with several categories
- Discussion
- Discriminant analysis
- General remarks
- Linear discriminant analysis
- Quadratic discriminant analysis
- Discussion
- Some nonparametric methods
- Classification trees
- Some other topics
- Neural networks
- Support vector machines
- Combination of classifiers
- Bagging
- Boosting
- Random forests
- Case studies
- The traffic of a telephone company
- Churn analysis
- Customer satisfaction
- Web usage mining
- Exercises
- Methods of Internal Analysis
- Cluster analysis
- General remarks
- Distances and dissimilarities
- Non-hierarchical methods
- Hierarchical methods
- Associations among variables
- Elementary notions of graphical models
- Association rules
- Case study: Web usage mining
- Profiling website visitors
- Sequence rules and usage behaviour
- Appendix A Complements of Mathematics and Statistics
- Concepts on linear algebra
- Concepts of probability theory
- Concepts of linear models
- Appendix B Data Sets
- Simulated data
- Car data
- Brazilian bank data
- Data for telephone company customers
- Insurance data
- Choice of fruit juice data
- Customer satisfaction
- Web usage data
- Appendix C Symbols and Acronyms
- References
- Author Index
- Subject Index
Back to the base-page of the book