Selecting diverse subsets from a much larger dataset promises to be much more practical using a new algorithm designed by researchers from the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory and its Laboratory for Information and Decision Systems.
The algorithm begins with a small subset of the data, chosen at random, then picks one point inside the subset and one point outside it and randomly selects one of three simple operations--swapping the points, adding the point outside the subset to the subset, or deleting the point inside the subset. The decision to perform the operation or not is probabilistic, and depends on the improvement in diversity the operation affords. As the subset grows, it becomes harder to add new points unless they dramatically improve diversity. The process repeats until the diversity of the subset reflects that of the full set.
The running time of the algorithm depends on the number of data points in the subset. For example, if the goal is to winnow a data set with 1 million points down to 1,000, the algorithm is 1 billion times faster than its predecessors.
From MIT News
View Full Article
Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA