API


Welcome!

Everything that can be done through our web UI can also be done through our API. What follows is a summary of the general workflow you would use to programatically access our ersatz functionality:

  • Sign up for Ersatz if you haven't and get your API key from here: security credentials page
  • Prepare a file or a string you would like to use as Data.
  • Upload this file or string to api.ersatzlabs.com to be processed and warehoused.
  • Create a Dataset out of your data. This Dataset will be responsible for applying any filters or pre-processing steps you specify.
  • Create an Ensemble to store your models. The ensemble will become your focal point for your experiments and can be thought of as a "project" in many ways.
  • Create a Model inside your ensemble. This is your predictive model.
  • In order to get predictions out of your model, you have to Train it first. So Train it.
  • Now your model is ready to be used for predictions. Make predictions.
  • Got some new data? Update the ensemble and train it some more. More Data almost always = More Better!

You're done! Go home!

Authentication

Once you've signed up for Ersatz, you can find your api key here: security credentials page

Data

Upload File

url: /api/data/

  • file: zip file with dataset
  • key: api key

Currently, only files less than 10MB can be uploaded via API directly. You can however upload larger files from the web interface (up to 10gb).

Example with curl:

     curl --form "file=@my_dataset.zip" \
     "http://api.ersatzlabs.com/api/data/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

Upload String

url: /api/data/

  • data: string with data
  • name: name of the data file
  • file_format: type of data (TIMESERIES, IMAGES, or GENERAL)
  • key: api key

Example with curl:

     curl --data "data=18.08,336|1,0;18.57,308|1,0;\n19.06,213|1,0;19.06,154|1,0;\
     &name=foo.ts\
     &file_format=TIMESERIES" \
     "http://api.ersatzlabs.com/api/data/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

File State

url: /api/data/{pk}

  • pk: file id
  • key: api key

Example with curl:

      curl "http://api.ersatzlabs.com/api/data/1057/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

Success Response:

      {"id": 1057, "created": "2014-05-28T13:25:18.629Z", "shared": false, "name": "foo2.ts", "parse_logs": [{"id": 3697, "timestamp": "2014-05-28T13:25:18.660Z", "message": "Waiting in queue..."}, {"id": 3698, "timestamp": "2014-05-28T13:25:20.900Z", "message": "First timestep has 2 inputs and 2 outputs. Applying this requirement to the entire file."}], "datasets": [], "meta": {"binary_output": true, "data_rows": 1, "output_size": 2, "data_type": "TIMESERIES", "binary_input": false, "size": 13, "min_timesteps": 1, "version": 3, "max_timesteps": 1, "input_size": 2, "classes": {"0": 1}, "empty_rows": 0}, "state": "Ready", "file_format": "TIMESERIES"}

When the file state is 'Ready', you can create a dataset.

Create Dataset

url: /api/dataset/

  • data: id of the uploaded data
  • name: name of the dataset to be created
  • key: api key

Example with curl:

      curl --data "data=1057&name=foo_dataset" \
      "http://api.ersatzlabs.com/api/dataset/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

Training

Create Ensemble

url: /api/ensemble/

  • train_dataset: the id of the dataset that the model will learn from
  • test_dataset: the id of the dataset that the model will validate against
  • key: api key

Example with curl:

      curl --data "train_dataset=278\
      &test_dataset=279" \
      "http://api.ersatzlabs.com/api/ensemble/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

Create Model

url: /api/model/

  • ensemble: the existing parent ensemble of the model to be created
  • model_params: a dictionary of parameters for the model
  • key: api key

Example with curl:

      curl -H "Content-Type: application/json" \
      -d '{"ensemble": 1198, "model_name":"MLP_RECTIFIED", "model_params": {"maxnum_iter": 100}}' \
      "http://api.ersatzlabs.com/api/model/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

An example of model response:

    {"id": 6789, 
     "ensemble": 6789, 
     "model_name": "MLP_RECTIFIED", 
     "model_params": {"maxnum_iter": 100, 
                        "save_freq": 25, 
                        "datasets": [598, 599], 
                        "percent_batches_per_iter": 100, 
                        "dropout": false, 
                        "batch_size": 10, 
                        "layers": [
                            {"dim": 200, 
                             "sparse_init": 10, 
                             "type": "rectified_linear", 
                             "layer_name": "h0"}, 
                            {"dim": 200, 
                             "sparse_init": 10, 
                             "type": "rectified_linear", 
                             "layer_name": "h1"}], 
                        "learning_rate": {
                            "decay_factor": 1.0471285480508996, 
                            "init": 0.1, 
                            "constant": false, 
                            "final": 0.001}, 
                        "momentum": {
                            "constant": false, 
                            "init": 0.161, 
                            "stop": 20, 
                            "final": 0.827, 
                            "start": 5}}, 
    "created": "2014-12-01T01:16:58.261Z", 
    "updated": "2014-12-02T01:04:29.469Z", 
    "state": "TRAIN", 
    "training_time": 3032.10905694957, 
    "traceback": null, 
    "name": null, 
    "is_images": false, 
    "embeddings_json": ""}

Train Model

url: /api/train/

General Keys

  • key: api key
  • models: list of the models to train
  • model_name: name of a model (MRNN, CONV, AUTOENCODER)
  • num_models: number of a model to train in the ensemble
  • file_id: id of an uploaded file that contains the training data
  • start: start training (default: false)

MRNN Specific

  • test_dataset: id of an uploaded file that contains the test data
  • valid_dataset: id of an uploaded file that contains the validation data
  • data_split: percentages used to divide the uploaded file (by file_id) into train, test and valid datasets
  • out_nonlin: output nonlinearity SOFTMAX, SIGMOID, SQ_SIGMOID (squared), LINEAR

MRNN example:

    curl -X POST "http://api.ersatzlabs.com/api/model/2822/restart/\
    ?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

Success Response:

      {"status": "success", "ensemble_id": 45, "ensemble_url": "/train-ensemble/45/"}

CNN Specific

CNN Example:

     curl -X POST "http://api.ersatzlabs.com/api/model/2826/restart/\
     ?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

Success Response:

     {"id": 2826, "ensemble": 1188, "model_name": "CONV", "model_params": {"maxnum_iter": 100, "save_freq": 20, "img_size": 32, "random_sparse": false, "test_freq": 10, "dropout": 0.012, "learning_rate": {"init": 0.0091}, "momentum": {"init": 0.7}}, "created": "2014-05-28T22:52:50.274Z", "updated": "2014-05-28T23:07:55.390Z", "state": "QUEUE", "training_time": 14.3961999416351, "traceback": null, "name": null}

Retrieve Statistics

To retreive statistics for a given model, do as follows:

     curl "http://api.ersatzlabs.com/api/model/000_MODEL_ID_HERE/stats/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

The result might look something like this:

    [{"train_accuracy": 0.3523809523809524, 
        "f1score_train": 0.18363514419852445, 
        "best_loss_s3": null, 
        "outputs_header": ["iteration", 
                            "train_accuracy", 
                            "test_accuracy", 
                            "train_loss", 
                            "test_loss", 
                            "learning_rate", 
                            "momentum", 
                            "last_layer_row_norms_mean", 
                            "last_layer_col_norms_mean", 
                            "iteration_time"], 
        "iteration": 10, 
        "f1score_test": 0.12950191570881225, 
        "train_outputs": [
            [0, 0.37142857142857144, 0.3111111111111111, 1.0986125469207764, 1.0986125469207764, 0.10000000149011612, 0.5, 0.009747949428856373, 0.041322771459817886, 0.01], 
            [1, 0.3523809523809524, 0.28888888888888886, 1.096335530281067, 1.100602626800537, 0.10000000149011612, 0.5, 0.009749012067914009, 0.041327498853206635, 0.01], 
            [2, 0.3523809523809524, 0.28888888888888886, 1.0952028036117554, 1.1021031141281128, 0.06309573352336884, 0.5529412031173706, 0.00974958948791027, 0.04133059084415436, 0.01], 
            [3, 0.3523809523809524, 0.28888888888888886, 1.0951682329177856, 1.1032905578613281, 0.03981071710586548, 0.6058823466300964, 0.009749865159392357, 0.041331611573696136, 0.010212999768555164], 
            [4, 0.3523809523809524, 0.28888888888888886, 1.0956854820251465, 1.103860855102539, 0.025118865072727203, 0.6588236093521118, 0.009749334305524826, 0.041329264640808105, 0.010193999856710434], 
            [5, 0.3523809523809524, 0.28888888888888886, 1.0955480337142944, 1.104212999343872, 0.015848932787775993, 0.7117647528648376, 0.009749775752425194, 0.04133087769150734, 0.01], 
            [6, 0.3523809523809524, 0.28888888888888886, 1.09531831741333, 1.1046916246414185, 0.009999999776482582, 0.7647058963775635, 0.009750294499099255, 0.04133281856775284, 0.01152300089597702], 
            [7, 0.3523809523809524, 0.28888888888888886, 1.0950274467468262, 1.1051627397537231, 0.006309574004262686, 0.8176470398902893, 0.009751074947416782, 0.041336096823215485, 0.011613001115620136], 
            [8, 0.3523809523809524, 0.28888888888888886, 1.0948566198349, 1.1057136058807373, 0.003981071524322033, 0.8705883026123047, 0.009751907549798489, 0.04133984446525574, 0.01164300087839365], 
            [9, 0.3523809523809524, 0.28888888888888886, 1.094704508781433, 1.1059341430664062, 0.002511886414140463, 0.9235294461250305, 0.009752343408763409, 0.04134177789092064, 0.01], 
            [10, 0.3523809523809524, 0.28888888888888886, 1.094557523727417, 1.106178641319275, 0.001584893325343728, 0.9500000476837158, 0.009752794168889523, 0.04134376719594002, 0.01]], 
        "best_loss": 1.100602626800537, 
        "time": 0.6761419773101807, 
        "confusion_matrix_train": {
                                    "1": {"0": 31}, 
                                    "0": {"0": 37}, 
                                    "2": {"0": 37}}, 
        "best_iter": 1, 
        "confusion_matrix": {
                                "1": {"0": 19}, 
                                "0": {"0": 13}, 
                                "2": {"0": 13}}, 
        "id": 7103, 
        "test_accuracy": 0.28888888888888886}]

The specific contents of the response will vary depending on the type of model you are requesting statistics for. This is the data the Ersatz front-end is using to render its charts. You can use this data to create plots of your own or to get access to finer grained statistics.

See the id field in the response near the bottom? You need that to make predictions later.

Make Predictions

The general process to make predictions via our API is as follows:

  • Send a prediction request with your data encoded as a string. This will return a prediction ID.
  • Poll the prediction object via the prediction ID until your results are returned
  • Run time will vary depending on the model
  • It is possible to improve speeds by caching a model on a reserved GPU

In order to make a prediction you need:

  • Your API key from here.
  • The model ID for the model you would like to use for your prediction.
  • Some data encoded as a string to use as input to the model.
  • An "iteration id". You need to request statistics for your model (see here).

For the sake of demonstration, I will use a model ID of 1234 and an iteration ID of 1313 with a model that expects 4 dimensions of input (aka 4 features, 4 columns). I will be expecting what is essentially a pointer to my result, which I will then subsequently use to retrieve my finished prediction.

    curl -H "Content-Type: application/json" \
    -d '{"iterations": [1313], "input_data": "5.1,3.5,1.4,0.2"}' \
    "http://api.ersatzlabs.com/api/predict/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

Your response will look something like this:

    {
        "id": 987654, 
        "state": "QUEUE", 
        "error": null, 
        "results": null, 
        "iterations": [7103], 
        "predicting_time": 0.0, 
        "traceback": null, 
        "input_data": "5.1,3.5,1.4,0.2", 
        "dataset": null
    }

This tells us our data is in queue, about to be processed. We can use the provided prediction ID to retrieve the prediction when ready.

    curl -H "Content-Type: application/json" \
    -d '{"iterations": [1313], "input_data": "5.1,3.5,1.4,0.2"}' \
    "http://api.ersatzlabs.com/api/predict/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

Your response will look something like this.

    {
        "id": 1104, 
        "state": "FINISHED", 
        "error": null, 
        "results": {
            "ensemble_prediction": ["Iris-virginica"], 
            "predictions": [
                    {
                    "output": [["Iris-virginica"]], 
                    "iteration": 7103, 
                    "probs": [[0.33532399, 0.32607865, 0.33859736]]
                    }
                           ]
        }, 
        "iterations": [7103], 
        "predicting_time": 1.382817029953, 
        "traceback": null, 
        "input_data": "5.1,3.5,1.4,0.2", 
        "dataset": null}

Notice results is now filled in where it wasn't before.

A valid concern might be raised at whether predictions are "fast enough" or not. Let's investigate this with some benchmarking.

MRNN

  • key
  • ensemble: ensemble id
  • models: list of models
  • id: model id
  • iteration: iteration of model used for prediction

If you want to run ensemble on file

url: /api/ensemble/run/

  • file_id: a file with data to be classified

If you want to predict using an already uploaded dataset, specify it via id number.

   curl -H "Content-Type: application/json" \
   -d '{"iterations": [16525], "dataset": 285}' \
   "http://api.ersatzlabs.com/api/predict/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

Response:

   {"id": 3901, "state": "QUEUE", "error": null, "results": null, "iterations": [16525], "predicting_time": 0.0, "traceback": null, "input_data": null, "dataset": 285}

If you want to predict using raw data

url: /api/predict/

  • input_data: csv like string with data to classify

Example POST:

   curl -H "Content-Type: application/json" \
   -d '{"iterations": [16525], "input_data": "5.1,3.5,1.4,0.2"}' \
   "http://api.ersatzlabs.com/api/predict/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

Response:

   {"id": 3902, "state": "QUEUE", "error": null, "results": null, "iterations": [16525], "predicting_time": 0.0, "traceback": null, "input_data": "5.1,3.5,1.4,0.2", "dataset": null}

GET Results Example:

   curl "http://api.ersatzlabs.com/api/predict/3902/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

Finished Response:

   {"id": 3902, 
   "state": "FINISHED", 
   "error": null, 
   "results": 
        {"ensemble_prediction": [3], 
        "predictions": [{"output": [3], "iteration": 16525}]
        }, 
   "iterations": [16525], 
   "predicting_time": 1.31751298904419, 
   "traceback": null, 
   "input_data": "5.1,3.5,1.4,0.2", 
   "dataset": null}

CNN

url: /api/predict/image

CNN prediction differs from other API calls because of the nature of the uploaded files.
You should use Content-Type multipart/form-data.

  • key
  • model: id of the finished CNN model
  • file-[0..n]: binary image file

POST example:

  curl -H "Content-Type: multipart/form-data" \
  --form file-0=@catpic.jpg --form iterations="[16523]" \
  "http://api.ersatzlabs.com/api/predict/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

Response:

  {"id": 3906, "state": "QUEUE", "error": null, "results": null, "iterations": [16523], "predicting_time": 0.0, "traceback": null, "input_data": null, "dataset": null}

GET Results Example:

  curl "http://api.ersatzlabs.com/api/predict/3906/?key=d3ece19addabd7b9c6a7d8d8b89e31ae41eda8e4"

Finished Response:

  {"id": 3906, "state": "FINISHED", "error": null, "results": {"predictions": [{"output": [{"labels": [[0.8101209402084351, "cat"], [0.18987905979156494, "dog"]], "filename": "file-0--catpic.jpg"}], "iteration": 16523}]}, "iterations": [16523], "predicting_time": 0.356567859649658, "traceback": null, "input_data": null, "dataset": null}