[“The Learning (SCI) Group at TACC”]

This is the learning (SCI, scalable computation intelligence) group at Texas Advanced Computing Center. We support most of the deep learning applications across TACC platforms including Frontera, Lonestar6, and Longhorn. We have wide research interests in deep learning and high performance computing. Our research foci include:

Scalable Neural Network Optimization
Scientific Deep Learning Applications
Cyberinfrastructure for Deep Learning on Supercomputers

During the past years, we have successfully facilitated a diverse set of scientific deep learning applications. Exemplar applications include:

We also maintained a few deep learning applications with the distributed K-FAC optimizer for the numerical optimization community to empirically evaluate convergence:

People

Active Projects

Recent Publications

[TPDS’22] J. G. Pauloski, L. Huang, W. Xu, I. T. Foster, Z. Zhang. “Deep Neural Network Training with Distributed K-FAC” in IEEE Transactions on Parallel and Distributed Systems, doi: 10.1109/TPDS.2022.3161187.
[SC’21] J. G. Pauloski, Q. Huang, L. Huang, S. Venkataraman, K. Chard, I. T. Foster, Z. Zhang. “KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks” to appear in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2021 (SC).
[Nature Methods’21] Fang, Linjing, Fred Monroe, Sammy Weiser Novak, Lyndsey Kirk, Cara R. Schiavon, Seungyoon B. Yu, Tong Zhang et al. “Deep learning-based point-scanning super-resolution imaging.” Nature methods 18, no. 4 (2021): 406-416.
[SC’20] J. G. Pauloski, Z. Zhang, L. Huang, W. Xu, I. T. Foster. “Convolutional Neural Network Training with Distributed K-FAC” In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1-14. 2020.
[IPDPS’20] Z. Zhang, L. Huang, J. G. Pauloski, I. T. Foster. “Efficient I/O for Neural Network Training with Compressed Data” In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 409-418. IEEE, 2020.
[CLUSTER’19] Z. Zhang, L. Huang, R. Huang, W. Xu, D. S. Katz. “Quantifying the Impact of Memory Errors in Deep Learning.” In 2019 IEEE International Conference on Cluster Computing (CLUSTER), p.1. IEEE, 2019.
[TPDS’19] Y. You, Z. Zhang, J. Demmel, K. Keutzer, C. Hsieh. “Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs.” IEEE Transactions on Parallel and Distributed Systems 30, no. 11 (2019): 2449-2462.
[ICPP’18] Y. You, Z. Zhang, J. Demmel, K. Keutzer, C. Hsieh. “ImageNet Training in Minutes.” In Proceedings of the 47th International Conference on Parallel Processing, p. 1. ACM, 2018. Best Paper Award.

Contact Information

3.226 Advanced Computing Building Texas Advanced Computing Center
10100 Burnet Road
Austin, TX 78758

zzhang (at) tacc.utexas.edu