In this article we’ll be looking at end-to-end ML platforms. And while most of these platforms offer robust tools for managing ML pipelines from model training to deployment, and beyond, it’s important to note that data collection and labeling is still left to the tools we covered in part one.
That’s certainly not to take anything away from the incredible platforms we’ll look at here, but it is something important to remember when thinking about implementing ML solutions in your organization.
So let’s jump right in and look at some awesome companies and orgs that are building end-to-end systems to manage machine learning pipelines.
Allegro is a deep learning platform built for computer vision. It includes a suite of tech tools that cover all aspects of deep learning, from development to production and deployment.
The platform features tools built for the following phases of the lifecycle:
- Build and refine datasets: bias elimination, synthetic data creation, version control, and more
- Video/Image annotation: automated labeling, the ability to distribute tagging tasks across teams, easy dataset exploration
- Experiment and train: automated training, result comparison in real-time, and the ability to work with multiple frameworks in parallel
- Optimize and control: model performance tracking, debug detectors, and easy implementation of hyperparameter optimization
- Model deployment: spawn continuously evolving “model offspring” and seamlessly update models on the fly.
Allegro also allows teams to create post-deployment model versions optimized for edge devices, all while ensuring that your organization’s data remains proprietary and private.
They also have a regularly-updated company blog featuring platform news, posts on computer vision technology, and more.
Cnvrg.io is a full-stack data science platform that offers all the tools needed to build, manage, and automate ML workflows, from research to production.
Managing datasets, experiments, and model versions happens in a robust interface that features reusable components and a drag-and-drop UX that’s intuitive and easy to manage across departments and teams.
From this interface, users can also instantly deploy models to production, and to make things easier in the future, data science teams can also create pipelines for future projects that include easy-to-understand visuals, optimized data, and ML “recipes”, so each project and iteration becomes easier for entire organizations to implement.
Cnvrg.io is truly built for collaboration and for large organizations that have a lot of moving parts.
To help users work with their product, Cnvrg.io also maintains a robust company blog, a data sheet for prospective customers, and a webinar to get started.
Determined AI’s promise is that when you use their platform, you’ll be able to rapidly accelerate your deep learning development lifecycle. Here’s how they do it:
- AutoML: Distributed training and state-of-the-art hyperparameter search.
- Infrastructure: The ability to manage and share GPU resources—on-premises, in the could, or both.
- Compatibility: Users can run unmodified TensorFlow, Keras, and PyTorch code on Kubernetes or baremetal.
- Reproducibility and collaboration: Automatically track, share, and reproduce experiments across teams.
- Deployment: Edge, cloud, and mobile; automates architecture search for constrained deployments.
- One-Click Jupyter Notebooks: GPU-powered notebooks allow for robust exploration and visualization.
To help users get started, Determined AI also has plenty of resources: a regularly updated blog, screencast webinars, recording, detailed documentation, and more.
FloydHub is a fully-managed cloud platform for data science teams. They emphasize ease of use and speed at each step of the process.
To start building:
- Jupyter Notebooks and scripts
- Built-in metrics
- GitHub integration
For model training:
- Parallel training
- Full reproducibility
- Workflow integration
- Needs-based scaling
- App integration
- Auto-generated web page sharing
Thus set of tools is meant to take the mundane management tasks out of the ML workflow, so that data science teams can spend their valuable time building and experimenting, with powerful and intuitive collaboration mechanisms in place.
FloydHub also curates an excellent contributor-driven blog, with deep-dive tutorials on all things machine learning, from a beginner’s guide to RNNs with PyTorch, to applied use cases of object detection and localization.
BigML is a comprehensive ML platform that promises to remove the complexities of the ML workflow, leaving data science teams to enhance and automate business decision making.
As the name suggests, this platform is geared towards enterprise clients, offering a single, standardized framework for an entire company, making access, collaboration, and data/model security easier to manage.
In addition, the platform allows users to easily interpret models with interactive visualizations; export models to various environments; and track and reuse model components, all while meeting regulatory and audit compliance requirements that are often instituted at the enterprise level.
With an intuitive dashboard, a REST API, and their own domain-specific language for machine learning (WhizzML), BigML truly offers and end-to-end solution for enterprise clients.
Their resource center also includes education programs, a brand ambassador program, and much more.
Dataiku’s ML platform focuses squarely on enterprise clients. As such, there’s a distinct focus on cross-organization collaboration. This focus on “self-service analytics” comes through in many of the platform’s features:
- Collaboration tools: integrated document sharing, highly-visible and communicative versioning tools, team monitoring interface.
- Flexible data science environments: allows the use of notebooks or a customizable drag-and-drop visual interface at any step of the workflow.
- Data visualization: profile data visually at any step of the process; 20+ charts and 80+ built-in functions.
- Robust model building tools: build + optimize models in Python and R and integrate any external ML library; model performance feedback and metrics.
- Deployment: bundle workflows as a single deployable package through a REST API; monitor in-production data with dashboards and validation policies.
Dataiku also has a wide range of educational resources that will help you get started with their end-to-end platform, including webinars, tutorials, white papers, ad a regularly updated blog.
Valohai’s main pitch is that when data science teams use their platform, they’ll be able to maximize on their talents—namely, building models for production. Their platform promises to largely automate DevOps work, allowing ML teams to “train a model in minutes that otherwise take a week.”
Here’s how they do it:
- Experiment storage, visualization, and tracking: version control, model performance visualization, and more
- Seamless integration from end-to-end: works with any runtime and with any ML code your team writes
- Standardized workflow and best-practices: Uses same tools and best practices as industry leaders (think Uber, Netflix, AirBnB, etc)
- API-first construction to automate complex pipelines: allows integration into existing software pipelines.
- Run models in parallel on 100 GPUs or dozens of TPUs, depending on your use case
- Zero-setup infrastructure: train models in the cloud or on-premise servers with on click, an API call, or a CLI one-liner.
Valohai’s blog includes success stories, content on ML infrastructure, ML tools, tutorials, product updates, and more.
Dataspine aims to be a flexible platform that manages the entire ML workflow on any infrastructure. Included in this goal is the elimination of in-house engineering and infrastructure overhead.
It’s an enterprise-grade solution that runs on cloud, hybrid, or on-premise environment, which keeps control of the tech stack in the hands of ML teams.
- Flexible dev environments: Jupyter and Zeppelin notebooks; integrate open-source frameworks and manage/visualize all in one easy-to-use interface.
- Elastic infrastructure: Dataspine promises to take care of underlying infrastructure concerns for you and your team. Allows users to choose how workloads are run and on what processors.
- One-click deployment: Productionize directly from notebooks or a CLI; monitor performance in real-time; A/B test and safely optimize while in production.
- Single point of control for management: Simplifies workflows by reducing dependencies and maintenance overhead; allows easy integration of popular open-source tools and frameworks.
Dataspine is still in Early Access (as of publication of this list), so now’s the perfect time to try out this platform and see what it can offer your ML and data science teams.
PipelineAI’s headline promises that data science teams can “experiment faster with confidence”.
Customizable deployment, continuous experimentation with ML pipelines, percentage-based model rollouts—these are a few of the headline features. Additionally, PipelineAI works with all major frameworks, hardware, and cloud.
PipelineAI also includes a unified dashboard to help users manage the entire lifecycle, from local dev to live production. This allows for a fully-customizable experience, with instantaneous feedback and feature releases on your schedule.
The platform comes in 3 iterations: Community, Professional, and Enterprise. Many of the tools in this list focus primarily on enterprise clients, so it’s good to see an end-to-end platform with a free tier, as well as a staggered pricing/plan structure.
Another area where PipelineAI shines is with their community. From GitHub to YouTube and Slack, they have a really impressive array of community tools and content that you’ll definitely want to check out.
Deep Cognition aims to be a “one stop hub for deep learning developers”. Their primary platform, Deep Learning Studio, employs both AutoML and a drag-and-drop system that allows access to high-performing models quickly.
They’ve also set up a number of pre-configured, optimized environments designed to remove the hassles of setting up internal DevOps for ML. And then when it’s time to deploy, it’s as simple as a single click, as a REST API or web app.
Deep Learning Studio has a number of specific features:
- Automated data encoding from any popular format and repository.
- Drag-and-drop model interface helps teams design models with ease. Access to pre-trained models and the ability to import model code and edit within the interface.
- Hyperparameter tuning employs a multi-GPU system to cut down on training time.
- Robust experimenting and model version control.
- Model deployment allows downloading of models as binary models or Python libraries.
Polyaxon is a robust end-to-end platform intended to accelerate the ML workflow at the enterprise level. From tracking models metrics over time to tools for scaling, the platform is built for teams that want an agile and reproducible system.
- Tracking: Visualizations of model metrics, hyperparemeters, model versions, and more.
- Orchestration: Manage jobs and experiments with a number of tools: CLI, dashboard, SDKs, or REST API.
- Optimization: Run parallel experiments to find the best model for a given ML task
- Insights: Use visualizations for model search, experiment results, hyperparameters, and more.
- Model Management: Manage model versions, tweak models, validate upon delivery.
- Collaboration: Knowledge distribution tools to manage model versions and performance across teams.
- Compliance: Reproducibility and tools to meet regulator compliance without extra work.
- Scalability: Run jobs on any platform as you business scales (AWS, Azure, GCP, or on-premise hardware)
Clarifai offers a wide array of products and services that reflect their mission to transform enterprise with computer vision. To do this, they offer an interesting pre-trained model gallery with (currently) 14 models that include models for industries like retail, fashion, cuisine, and then a number of generalized models that look at things like textures/patterns, NSFW filters, and more.
Where Clarifai really differentiates, though, is in there custom offering. The platform allows organizations to create their own custom CV models trained on their own data. Within this offering, Clarifai offers to train models for you or give your team the tools to train on its own.
Tools like a data exploration user interface and an intuitive API make model training simple. There’s also a visual data search tool that makes it easier to explore custom datasets, with options for brand security and retail recommenders.
In terms of model deployment, Clarifai offers solutions for API Cloud, On-Premise, and to the edge, with Android and iOS SDKs.
Comet.ML’s platform lives up to its namesake. Fast, sleek, and highly-visible. Much like the other tools in this list, there’s a focus on collaboration and a unified project workflow. Here’s a look at some specific features:
- Compatible with most ML libraries and frameworks (Keras, TF, PyTorch, Scikit-Learn, etc.).
- One-line integration into your training code.
- Easy-to-use interface that allows you to track and compare experiments.
- Collaboration built into the entire platform, with simple project sharing tools that allow all members of an organization to keep track of progress.
- Built-in documentation tool to help teams keep track of changes, updates, etc.
- Easy integration with GitHub and other git providers. Auto-generate PRs with preferred model versions.
Comet.ML also has a robust hyperparameter optimization service that allows ML teams to automatically optimize hyperparams, model architecture, or feature choice—and all of this happens on your local machines.
Comet.ML also has a regularly updated blog, featuring product tutorials and other general ML educational content.
DeepSense.AI communicates the features and value of their product via customer success stories. And there are plenty, in industries ranging from banking and cybersecurity to the public sector; and in solution areas like predictive modeling, NLP, and data analytics.
Because of this strategy, it’s a bit harder to pull specific product features from DeepSense, but from this range of solution, DeepSense.ai can power a whole bunch of aspects of your business, including (but not limited to):
- Building and implementing data architecture
- Customer lifetime value predictions
- Marketing automation
- Recommendation engines
- Demand forecasting
- Quality control
- Document segmentation and analytics
- Image classification
- Comprehensive training programs to get your team up to speed in solving real-life data science challenges.
DeepSense is also fully immersed in research and development, with an entire hub dedicated to joint research with organizations like Google Brain, Intel, and more.
H20’s primary offering (aptly names H20), is one of the leading open source machine learning platforms. Their primary aim with this tool is to “democratize intelligence for everyone”, and they look to achieve this with data science platforms that, they explain, are “ AI to do AI”.
For the primary open source offering, H20 offers algorithms developed from the ground up, the ability to use any familiar programming language, an AutoML workflow, distributed and in-memory processing, and easy deployment.
H20 also offers an enterprise-grade platform. This one isn’t open-source, but it’s packed with a whole bunch of features for bigger organizations looking to scale their ML solutions:
- Auto feature engineering
- ML model interpretability through advanced visualizations and metrics
- Custom NLP model pipeline
- Automatic Python scoring pipelines
- Advanced time series capabilities
- Flexible options for data and deployment
- GPU acceleration powered by Nvidia
H20 is also active in the AI community, with webinars, events, plenty of product resources, and meetups all around the world.
DataRobot wats to empower enterprise client to become AI-driven in solving complex problems, driving ROI, and more.
To do this, they offer two primary products: and autoML platform, and an automated time series platform.
In terms of their autoML platform, here’s what you can expect:
- Auto feature engineering
- Implementation of state-of-the-art open source ML libraries, such as H20 (see above), TensorFlow, Spark ML, and more.
- Easy and intuitive tools for collaboration, implementing best practices for data science teams and ML projects.
- Simple model management and deployment pipelines.
And for their time series offering:
- Time series automation: automatically detects key measures like stationarity, seasonality, and more in developing predictive models.
- Employs proven methods and tools like Facebook Prophecy, ARIMA, etc.
- Robust visualization tools, including API support to integrate modeling into overall business practices.
Regardless of what tool you choose, you’ll have the option to deploy models on a managed cloud or on-premise infrastructure.
DataRobot also has an impressive collection of resources, including white papers, webinars, industry reports, data sheets, and much more.
Pi Exchange is a new end-to-end ML pipeline management platform. It’s “Smart Data Preparation” feature is particularly compelling, with AI-powered data recommendations, actionable insights, flexible feature engineering, and more.
Additionally, the platform has a full suite of model lifecycle management tools, from in-depth model training analytics to advanced performance metrics for models deployed into production.
In terms of resources to help you work with the platform, the Pi.Exchange team has just gotten started (at the time of writing this blurb), so we should expect to see more helpful blog posts, documentation, and more in the coming months.