Microsoft Research is winning a clear-cut “data sorting” victory in the MinuteSort test, posting a score that they say essentially triples the previous title-holder, a 2009 Yahoo team. A new approach to managing data over a network has enabled a Microsoft Research team to set a speed record for sifting through, or “sorting,” a huge amount of data in one minute.
The team, led by Jeremy Elson in the Distributed Systems group at Microsoft Research Redmond, set the new sort benchmark by using a radically different approach to sorting called Flat Datacenter Storage (FDS). The team’s system sorted almost three times the amount of data (1,401 gigabytes vs. 500 gigabytes) with about one-sixth the hardware resources (1,033 disks across 250 machines vs. 5,624 disks across 1,406 machines) which was used by the previous record holder, a team from Yahoo that set the mark in 2009.
Microsoft Research folks created something called “Flat Datacenter Storage,” or FDS for short. The word ‘flat’ is critical. Microsoft described how FDS works in the following way:
[Microsoft Research’s Jeremy] Elson compares FDS to an organizational chart. In a hierarchical company, employees report to a superior, then to another superior, and so on. In a “flat” organization, they basically report to everyone, and vice versa.