IBM has developed a modern, flexible AI software stack optimized for the foundation model era, which includes a high-performing, cloud-native AI training stack running on the Red Hat OpenShift Container Platform.
- The training stack involves multi-NIC CNI operator for network resources and a multi-cluster app dispatcher for the prioritization of jobs.
- PyTorch and Ray are used for data preprocessing, distributed training, and model validation.
- The foundation model tuning and serving stack include software libraries to improve inferencing performance, an abstraction layer called Caikit, and state-of-the-art techniques for compute-efficient model adaptation.
- IBM is working with Red Hat to contribute capabilities to open-source communities and Open Data Hub to advance the state of AI workflows on Kubernetes.
















