Talk by Faculty candidate on Friday.
Big data computation model
- \(\) = number of vectors in \(\) seen so far
- \(\) = number of sensors (dimensionality)
- only have \(\) memory available on the system
- \(\) = number of cores
\(\) -median queries
- input: \(\)
- key idea: replace many points by a weighted average point \(\)
- \(\) , where \(\) is the approximated core set.
Online algorithm to build core set: read in \(\) points, compute \(\) -means and best-fit lines, then choose some of the furthest outliers, then compress that data. Read next \(\) points, repeat, then compute the core set of those two sets to get a new core set. Then move on to next points until you read all \(\) and have a final core set. Then run off-line heuristic on this core set of size \(\) .
Does a \(\) approximation.
References to consider:
- Feldman, Landberg, STOC'11
- Feldman, Sohler, Monemizadeh, SoCG'07
- Coreset size: \(\)
- Har-Peled, Mazumdar, 04
- Feldman, Wu, Julian, Sung and Rus. GPS Compression