Facebook is generating over 500 TB per day. Google, 20PB.
She wants to tackle the issues of traffic dynamics in large scale networks by injecting flexibility into design.
- design flexible spectrum auction systems to match dynamic traffic, since US auctions off large chunks of the spectrum covering the entire country.
- add flexible wireless links to address traffic hotspots in data centers. (Directional 60GHz links between the different racks that is configurable--but any racks in the way must repeat the signal, since 60GHz is line of sight).
- they use reflections off the ceiling to avoid these problems (3D Beamforming)
Dynamic spectrum auction
- most allocated spectrum sits idle.
- using an auction
- truthful auction as e dominant method to remove gain from cheating. Ex: Vicky auction (second price auction)
- she integrates allocation with pricing: greedy allocation based on bids, winner X pays the bid of its critical neighbor.
How is this different than how radio stations split their frequencies based on location?
Note the use of a Kautz \(\) graph overlay of nodes in the data centers, which allows them to handle arbitrary network size and incremental growth. They show that 92% of paths can run concurrently.
See the PDF, page 4, for the problem description. Find \(\) cluster centers \(\) to minimize the sum of distances from each point to its cluster center.
MSSC is similar to the problem we are trying to solve in the \(\) to \(\) groups of \(\) matching problem. Given \(\) points, we would like to find the \(\) cluster centers that minimize the distance of any point \(\) to its nearest center \(\) (the centroid of that cluster). For fixed \(\) and dimensionality \(\) , it has been shown MSSC is solvable in \(\) , for \(\) input points. For general \(\) and input parameter \(\) (as in our case), the problem is shown to be NP-hard.
MSSC, however, differs from our problem statement, since we require the size of any cluster to be size \(\) such that the cluster center \(\) provides a good representation for \(\) points. The points in each cluster may be divided up to form \(\) groups, where each member comes from a unique cluster, thereby removing bias between the groups. Also, our initial problem statement proposes a greedy method, such that the smallest clusters are found first, whereas MSSC minimizes the sizes of all clusters.
Talk by Faculty candidate on Friday.
Big data computation model
- \(\) = number of vectors in \(\) seen so far
- \(\) = number of sensors (dimensionality)
- only have \(\) memory available on the system
- \(\) = number of cores
\(\) -median queries
- input: \(\)
- key idea: replace many points by a weighted average point \(\)
- \(\) , where \(\) is the approximated core set.
Online algorithm to build core set: read in \(\) points, compute \(\) -means and best-fit lines, then choose some of the furthest outliers, then compress that data. Read next \(\) points, repeat, then compute the core set of those two sets to get a new core set. Then move on to next points until you read all \(\) and have a final core set. Then run off-line heuristic on this core set of size \(\) .
Does a \(\) approximation.
References to consider:
- Feldman, Landberg, STOC'11
- Feldman, Sohler, Monemizadeh, SoCG'07
- Har-Peled, Mazumdar, 04
- Feldman, Wu, Julian, Sung and Rus. GPS Compression
Computational Geometry - An Introduction Through Randomized Algorithms; Mulmuley; 1994; pp 135-
Proposition 4.2.2: The total cost of maintaining a Voronoi diagram in a given update sequence is proportional to the total structural change during that sequence. This ignores a logarithmic factor in the cost of a deletion. It also ignores the cost of locating a conflicting vertex during addition. The expected (total) structural change during a random \(\) -sequence is \(\) .