![]() |
![]() |
|
| Statistics Research |
Data Mining Algorithms |
|
Home
People Research Awards Employment Contact |
Algorithms for alerting on data streams
We have developed two novel Bayesian algorithms called hbmix and kfgps that use shrinkage estimation to solve the above problem. We applied this technology first to data in our own industry:
To learn more about our work in this area, contact Chris Volinsky. Classification using PRIMThe patient rule induction method (PRIM), introduced by Jerry Friedman in 1999, is a powerful information mining algorithm. The objective of PRIM is to find response ``hotspots'' (or bumps) in a high-dimensional space of predictor variables. PRIM seeks box-shaped subregions in the predictor space where the average value of the response is significantly larger than its average over the entire space. It achieves this via a sequence of peeling and pasting steps, in which small chunks of the dataset are peeled away (or pasted back on) such that the average response in the resulting box is maximized. TurboPRIM is a local modification of PRIM designed specifically for massive datasets. Our approach is to create an out-of-memory, disk-based implementation of PRIM where the dataset is never stored in the memory of a computer, and all calculations are performed by making a minimal series of passes over the data on disk.To learn more about our work in this area, contact David Poole. |
|
|||||