Data scientists require tools to derive insight and knowledge from increasing volumes of complex data. This points to the importance of advanced analytics, particularly in the physical and biological sciences. Analytics must be able to utilize the full range of available infrastructure, however, the coupling between tools, analytic engines and infrastructure is often rigid. It is often difficult to employ existing solutions in contemporary environments for which they were not natively or originally designed. Many tools were developed at a time when parallelism was not essential. In addition, interoperability at multiple levels remains elusive, and scalable yet general¬ purpose and broadly applicable solutions in the form of analytic libraries and abstractions are noticeable by their absence.
Our work on DIBBs has led to the development of a broad class of highly scalable libraries for problems in multiple areas, including network science, computer vision, bioinformatics and climate science. NSSAC team members have contributed by developing scalable algorithms for network generation and subgraph detection, which have been applied to problems in public health.
Our work on DIBBs has led to the development of a broad class of highly scalable libraries for problems in multiple areas, including network science, computer vision, bioinformatics, and climate science. NSSAC team members have contributed by developing scalable algorithms for network generation and subgraph detection, which have been applied to problems in public health.
Some of the key findings include:
- Highly scalable algorithms for generating instances from different kinds of network models, including preferential attachment and the Chung-Lu model
- Parallel algorithms for subgraph detection and scan statistics, which scale to networks with over a hundred million edges