FPGASort: A High Performance Sorting Architecture Exploiting Run-time Reconfiguration on FPGAs for Large Problem Sorting
Abstract
This paper analyses different hardware sorting architectures in order to implement a highly scaleable sorter for solving huge problems at high performance up to the GB range in linear time complexity. It will be proven that a combination of a FIFO-based merge sorter and a tree-based merge sorter results in the best performance at low cost. Moreover, we will demonstrate how partial run-time reconfiguration can be used for saving almost half the FPGA resources or alternatively for improving the speed. Experiments show a sustainable sorting throughput of 2GB/s for problems fitting into the on-chip FPGA memory and 1 GB/s when using external memory. These values surpass the best published results on large problem sorting implementations on FPGAs, GPUs, and the Cell processor.
Bibtex
@INPROCEEDINGS{fpga09koch,
AUTHOR = {{Koch}, {Dirk} and {Torresen}, {Jim}},
ADDRESS = {{Monterey, California, USA}},
BOOKTITLE = {{Proceedings of the 19th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2011)}},
PUBLISHER = {ACM},
MONTH = feb,
PAGES = {45--54},
TITLE = {{FPGASort: A High Performance Sorting Architecture Exploiting Run-time Reconfiguration on FPGAs for Large Problem Sorting}},
YEAR = {2011}
}