Category Archives: Core Sets

Big Data talk

Talk by Faculty candidate on Friday.

Big data computation model

  • \(\) = number of vectors in \(\) seen so far
  • \(\) = number of sensors (dimensionality)
  • only have \(\) memory available on the system
  • \(\) = number of cores

\(\) -median queries

  • input: \(\)
  • key idea: replace many points by a weighted average point \(\)
  • \(\) , where \(\) is the approximated core set.

Online algorithm to build core set: read in \(\) points, compute \(\) -means and best-fit lines, then choose some of the furthest outliers, then compress that data. Read next \(\) points, repeat, then compute the core set of those two sets to get a new core set. Then move on to next points until you read all \(\) and have a final core set. Then run off-line heuristic on this core set of size \(\) .

Does a \(\) approximation.

References to consider:

  1. Feldman, Landberg, STOC'11
  2. Feldman, Sohler, Monemizadeh, SoCG'07
    • Coreset size: \(\)
  3. Har-Peled, Mazumdar, 04
  4. Feldman, Wu, Julian, Sung and Rus. GPS Compression

Leave a Comment

Filed under Big Data, Core Sets, Talks