CORT: Classification OR Regression Trees
Tuesday, Nov. 12, 2002
2:00-3:30 p.m.
Hughes Room
Tree models are powerful tools for signal processing and pattern analysis. The first tree-based methodology to gain wide recognition was CART (Classification and Regression Trees). CART aims to balance the fitness of a tree model to data with the complexity of a tree (measured by the number of leaf nodes). In regression problems, this is a means to balance the familiar bias-variance trade-off. Wavelet denoising methods and complexity penalized decision trees can be interpreted as instances or variations of CART. In this talk, three basic elements of CART are challenged. The primary focus is the penalization strategy employed to prune back an initial, overfitted tree. It is shown that the pruning rule for classification should be different from that used for regression (unlike CART); hence, the title, Classification or Regression Trees. Second, it is argued that growing a tree-structured partition that is specifically fitted to the data is unnecessary. Instead, an approach based on non-adapted (fixed) dyadic tree structures and partitions, much like the trees underlying wavelet analysis, is advocated. It is shown that dyadic trees provide sufficient flexibility, are easy to construct, and produce near-optimal results when properly pruned. Third, the use of a negative log-likelihood measure of empirical risk for regression problems is recommended, instead of the usual sum-of-squared errors criterion. The likelihood-based criterion leads to regression trees that extend wavelet denoising methods to many non-Gaussian regression problems. Applications of tree models in networking and biomedicine will also be discussed.
UC Berkeley Networking
Ashwin Pananjady and Orhan Ocal
Last Modification Date: Wednesday, February 10, 2016