Ersatz is a web-based general purpose platform for machine learning with support for GPU-based deep learning. It's geared towards aspiring and working data scientists with stuff to do. Ersatz has a number of components designed to make modern machine learning workflows much more efficient. Primarily, these include tools for data wrangling
, model training
, and machine learning infrastructure
Let's walk through it step by step.
Problem formation and data wrangling
Any machine learning project starts with coming to a clear understanding of your goals. This includes answering questions like what kind of data you have access to, whether you have labeled or unlabeled data, what kind of accuracy is needed for a solution to be effective, and what kind of speed is needed. The importance of this step cannot be overstated because if it's done incorrectly, you will have a situation of "garbage in, garbage out".
Data processing and warehousing
Ersatz can be used to manage and share datasets across an organization. Datasets can be uploaded directly through the web interface or through our API. Upon uploading, data is parsed into Ersatz and converted to a distributed set of hdf5 files for storage.
Ersatz provides web-based graphical tools to identify duplicate columns or rows, fill in missing values, and otherwise wrangle your data into shape.
When you are satisfied with your data, you can move on to actually building predictive models.
A screenshot from the Ersatz Data Preview
A screenshot from the Ersatz Column Selection Wizard
Ensembles and Models
Now we're getting to the meat and potatoes of Ersatz: the ensemble manager, parameter chooser, and model backends. We'll explain each.
A list of models in an ensemble
An ensemble is simply a group of machine learning models with a common objective. An easy way to think of an ensemble is as a "project" or a "folder" containing models. The real benefit of an ensemble is how it gets the various models to work together. The ensemble manager is a key differentiator for Ersatz compared to other web-based machine learning platforms. It allows you to, for example, create a deep neural network that combines its outputs with a support vector machine.
Most machine learning models require the user to set some mix of parameters. These often require specific expertise to set correctly. Ersatz automates this process by using machine learning to search a space of parameters for the best set of parameters to train a model.
Essentially, it takes the guess work out so you can simply leave several experiments running and see which ones net out the best. Our parameter search is more effective and faster than both random search and grid search.
An important design goal of Ersatz is to be relatively future proof. We have achieved this by building a robust backend/runner system designed to make integrating with various machine learning tools, such as pylearn2 or sklearn, trivial.
This means Ersatz is compatible with a wide variety of machine learning models and techniques. You can think of Ersatz as a gui manager for running these types of models and combining them into machine learning pipelines.
Monitor training and evaluation
Models are trained and run on specialized worker machines with high performance GPUs for computation up to 40x faster than CPU-based architectures. The data warehouse is responsible for sampling and allocating data to specific workers. The worker processes send training statistics back, where they are then rendered as charts and tables for analysis in the web interface.
Having all of your data, models, and statistics in one place makes for a much more efficient machine learning experience.
A confusion matrix in Ersatz
Training statistics get displayed on a dashboard
A production-ready pipeline
In machine learning, it's not uncommon to have to use several very different tools in your journey from prototype to production-ready. With Ersatz, you can do everything in one environment. All of the models you train can be accessed through the API, so as soon as you have a prototype that works, it's ready to scale up.
If you deal with data and machine learning, Ersatz makes your job easier. It does this by reducing the time you spend preparing data, giving you access to lots of algorithms in one environment, and providing dashboarding and ensembling capabilities.