DIBBS CIF21: Middleware and High-Performance Analytics Libraries for Scalable Data Science


National Science Foundation

Data scientists require tools to derive insight and knowledge from increasing volumes of complex data. This points to the importance of advanced analytics, particularly in the physical and biological sciences. Analytics must be able to utilize the full range of available infrastructure, however, the coupling between tools, analytic engines and infrastructure is often rigid. It is often difficult to employ existing solutions in contemporary environments for which they were not natively or originally designed. Many tools were developed at a time when parallelism was not essential. In addition, interoperability at multiple levels remains elusive, and scalable yet general¬ purpose and broadly applicable solutions in the form of analytic libraries and abstractions are noticeable by their absence.

Project Overview

Our work on DIBBs has led to the development of a broad class of highly scalable libraries for problems in multiple areas, including network science, computer vision, bioinformatics and climate science. NSSAC team members have contributed by developing scalable algorithms for network generation and subgraph detection, which have been applied to problems in public health.


Some of the key findings include:

  • Highly scalable algorithms for generating instances from different kinds of network models, including preferential attachment and the Chung-Lu model
  • Parallel algorithms for subgraph detection and scan statistics, which scale to networks with over hundred million edges


Professor of Computer Science, School of Engineering and Applied Science

Executive Director

Distinguished Professor in Biocomplexity, Biocomplexity Institute

Professor of Computer Science, School of Engineering and Applied Science

Network Systems Science and Advanced Computing
Adiga A; Kuhlman C; Marathe M; Ravi S; Vullikanti A . Proceedings of the 36th International Conference on Machine Learning. PMLR. 2019; 97:82-91

Additional Publications

Maksudul Alam, Maleq Khan, Kalyan S. Perumalla, and Madhav Marathe. Generating massive scale-free networks: Novel parallel algorithms using the preferential attachment model. ACM Transactions on Parallel Computing (TOPC) 7, no. 2 (2020): 1-35.

Saliya Ekanayake, Jose Cadena, Udayanga Wickramasinghe and Anil Kumar Vullikanti. MIDAS: Multilinear Detection at Scale, Journal of Parallel and Distributed Computing (JPDC), Volume 132, pages 363-382, October 2019

Wang, Lijing, Jiangzhuo Chen, and Madhav Marathe. "TDEFSI: Theory-guided Deep Learning-based Epidemic Forecasting with Synthetic Information." ACM Transactions on Spatial Algorithms and Systems (TSAS) 6, no. 3 (2020): 1-39.

A. Adiga, C. Barrett, S. Eubank, C. J. Kuhlman, M. V. Marathe, H. Mortveit, S. S. Ravi, D. J. Rosenkrantz, R. E. Stearns, S. Swarup and A. K. Vullikanti, “Validating Agent-Based Models of Large Networked Systems”, Proc. Winter Simulation Conference (WSC 2019), Washington, DC, Dec. 2019.

Adiga, Abhijin, Chris J. Kuhlman, Madhav Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard Edwin Stearns, and Anil Vullikanti. "Bounds and Complexity Results for Learning Coalition-Based Interaction Functions in Networked Social Systems." In AAAI, pp. 3138-3145. 2020.

Eubank, Stephen, Madhav Marathe, Henning Mortveit, and Anil Vullikanti. "Modeling Urban Mobility Networks Using Constrained Labeled Sequences." In International Conference on Complex Networks and Their Applications, pp. 955-966. Springer, Cham, 2019.

Prathyush Sambaturu, Aparna Gupta, Ian Davidson, S. S. Ravi, Anil Vullikanti, Andrew Warren. Efficient Algorithms for Generating Provably Near-Optimal Cluster Descriptors for Explainability. AAAI 2020: 1636-1643