Kubernetes ML Optimizer, Kubeflow, Improves Data Preprocessing with v1.6

Couldn’t attend Transform 2022? Discover all the summit sessions now in our on-demand library! Look here.


Most often, when organizations deploy applications in hybrid and multicloud environments, they use the open-source container orchestration system Kubernetes.

Kubernetes itself helps plan and manage distributed virtual compute resources and is not optimized by default for any particular type of workload, which is where projects like Kubeflow come in.

For organizations looking to run machine learning (ML) in the cloud, a group of companies including Google, Red Hat, and Cisco helped found the open-source project Kubeflow in 2017. It took three years to complete. effort reaches Kubeflow version 1.0. in March 2020, as the project gathered more supporters and users. Over the past two years, the project has continued to evolve, adding more features to meet the growing demands of ML.

This week, the latest iteration of open source technology became generally available with the release of Kubeflow 1.6. The new release includes security updates and improved functionality for managing Cluster Service runtimes for ML, as well as new ways to more easily specify different artificial intelligence (AI) models to deploy and run.

Event

MetaBeat 2022

MetaBeat will bring together thought leaders to advise on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, California.

register here

“Kubeflow is an open-source machine learning platform for data scientists who want to build and experiment with machine learning pipelines, or machine learning engineers who deploy systems in multiple development environments”, Andreea Munteanu, product manager for AI/ML, Canonical, says VentureBeat.

The challenges of using Kubernetes for ML

There is no shortage of potential challenges that organizations may face when trying to deploy ML workloads in the cloud with Kubernetes.

For Steven Huels, senior director, AI product management and strategy at Red Hat, the biggest issue isn’t necessarily the technology, but the process.

“The biggest challenge we see from users related to data science and machine learning is repeatability – namely, being able to manage the model life cycle from experimentation to production in a reproducible way” , Huels said.

Huels noted that integrating a model experimentation environment through to the service and monitoring environment helps make this consistency more achievable, allowing users to see the value of their data science experiments while that pipelines make these workflows repeatable over time.

In June this year, the Kubeflow Community Release Team released a user survey review report that identified a number of key challenges for machine learning. It should be noted that only 16% of respondents indicated that all ML models they worked on in 2021 were successfully deployed in production and were able to provide business value. The survey also revealed that it takes more than five iterations of a model before it ever goes into production. On a positive note, 31% of respondents said the average life of a model in production was six months or more.

The user survey also identified that data preprocessing is one of the most consuming aspects of ML.

What’s new in Kubeflow 1.6

Canonical’s Munteanu commented that the Kubeflow 1.6 update takes specific steps to help address some of the issues identified by the user survey.

For example, she noted that Kubeflow 1.6 makes data processing more transparent and provides better tracking capabilities, along with metadata improvements. Additionally, Munteanu added that the latest release also brings better test log tracking, enabling efficient debugging in case of data source failure.

In an effort to help more models become truly product-ready, Munteanu said that Kubeflow 1.6 supports population-based training (PBT), speeding up model iteration and improving the likelihood that models will achieve success. production preparation.

Improvements have also been made to the MPI (Message Passing Interface) operator component that can help make training large volumes of data more efficient. Munteanu also noted that PyTorch’s elastic training improvements make model training more efficient and help ML engineers get started quickly.

What’s next for Kubeflow

There are several providers and services that integrate Kubeflow. For example, Canonical has what it calls Charmed Kubeflow, which provides a package and automated approach to running Kubeflow using Ubuntu’s Juju framework. Red Hat integrates Kubeflow components into its OpenShift Data Science product.

Kubeflow project direction is not driven by any particular contributor or vendor.

“Kubeflow is an open source project that is developed with the help of the community, so its direction will ultimately come out of discussions within the community and the Kubeflow project,” Munteanu said.

Munteanu commented that Canonical, with Charmed Kubeflow in mind, focuses on security and also on streamlining user onboarding. Regarding Charmed Kubeflow, she said Canonical is looking to integrate the product with other AI/ML-specific applications that allow AI/ML projects to go to production and scale.

“We see the future of Kubeflow as a critical part of a larger ecosystem-based solution that addresses AI/ML projects and solves a challenge that many companies currently don’t have the resources to address,” Munteanu said.

VentureBeat’s mission is to be a digital public square for technical decision makers to learn about transformative enterprise technology and conduct transactions. Discover our Briefings.

Norma A. Roth