How To Train An Image Model With Basilica

Basilica is a REST API that embeds high-dimensional data into spaces that are easy to work with.

You send us your images, text, or whatever else you have, and we send you back a vector of floats that you can feed into traditional ML models you're already using, like logistic regressions.

We use transfer learning to produce these embeddings, meaning you can get value out of small datasets that would normally be very difficult to work with.

For this tutorial, we're going to use the Basilica API to solve two common ML problems: classification and clustering. we're going to train a classifier that distinguishes cat images from dog images, and then train a nearest neighbors model that takes dog pictures and looks for similar ones in our training set.

Without Basilica, you would need to train sophisticated models on hundreds of thousands of data points to get good performance on photographs. With Basilica, we're going to get really good results with a few thousand data points, using off-the-shelf models that have existed for decades.

Setup

Basilica has a REST API, but the easiest way to interact with it is the Python client.

You can install the Python client like so:

pip install basilica

You'll also need the dataset. You can download the dataset, and the completed python files we'll be writing in this demo, here:

wget https://storage.googleapis.com/basilica-public/cats_dogs_demo.tgz
tar -xf cats_dogs_demo.tgz

Embedding the images

Let's embed our first image, just to see how the API works. We'll embed dog.1.jpg, this one:

dog 1

import basilica


API_KEY = 'SLOW_DEMO_KEY'

with basilica.Connection(API_KEY) as c:
    embedding = c.embed_image_file('images/dog.1.jpg')
    print(embedding)
[0.381211, 3.11767, 0.0206775, ..., 0.403363, 0.0, 0.754328]

As you can see, we now have a vector of floats instead of an image.

We're going to be playing around with these images a lot, so let's embed all of them with the batch API, and save the embeddings:

import basilica
from six.moves import zip
import json
import os
EMB_DIR = '/tmp/basilica-embeddings/' if not os.path.exists(EMB_DIR): os.mkdir(EMB_DIR)
IMG_DIR = 'images/' API_KEY = 'SLOW_DEMO_KEY' with basilica.Connection(API_KEY) as c: filenames = os.listdir(IMG_DIR) embeddings = c.embed_image_files(IMG_DIR + f for f in filenames) for filename, embedding in zip(filenames, embeddings): with open(EMB_DIR + filename + '.emb', 'w') as f: f.write(json.dumps(embedding)) print(filename)

This should take about a minute.

Now that we have these embeddings, let's see how we can use them to train a traditional ML model.

Training a classifier

Let's train a classifier that distinguishes cat pictures from dog pictures, using sklearn.linear_model.LogisticRegression.

First, we need to import a bunch of things:

import json
import numpy as np
import os
import random
import re
import sklearn.linear_model
import sklearn.preprocessing
import time

Next, let's split the embeddings into train and test sets, then load them into numpy arrays.

EMB_DIR = '/tmp/basilica-embeddings/'
files = [f for f in os.listdir(EMB_DIR)]
random.shuffle(files)
train_size = int(len(files)*0.8)
x_train = np.zeros((train_size, 2048)) x_test = np.zeros((len(files)-train_size, 2048)) y_train = np.zeros(train_size, dtype=int) y_test = np.zeros(len(files)-train_size, dtype=int)
for i in range(train_size): filename = files[i] with open(EMB_DIR + filename, 'r') as f: x_train[i] = json.load(f) y_train[i] = (0 if re.match('.*cat.*', filename) else 1)
for i in range(len(files) - train_size): filename = files[train_size+i] with open(EMB_DIR + filename, 'r') as f: x_test[i] = json.load(f) y_test[i] = (0 if re.match('.*cat.*', filename) else 1)

Finally, let's train the classifier.

x_train = sklearn.preprocessing.normalize(x_train)
x_test = sklearn.preprocessing.normalize(x_test)
model = sklearn.linear_model.LogisticRegression()
model.fit(x_train, y_train)

Easy, right? Let's see how well it did:

print('Train accuracy: %.3f' % model.score(x_train, y_train))
print('Test accuracy: %.3f' % model.score(x_test, y_test))
Train accuracy: 0.990
Test accuracy: 0.987

After the test/train split, we had about 2400 datapoints in our test set. Training a classifier to 98% accuracy on 2400 photographs isn't trivial, and we just did it in fewer lines of code than it took to load the data, using an algorithm older than I am.

This is the power of transfer learning. Basilica produced this embedding using a deep neural net trained on milions of generic images, and the features it learned generalized well to our problem.

Try it! Upload your own image or use one of our sample images.


Processing…
Error uploading image. . Try again!

What sorts of mistakes are we making?

Just as a sanity check, let's take a quick look at the images where our model asigned the lowest probability to the true answer:

test_proba = model.predict_proba(x_test)
collected = zip(files[train_size:], y_test, test_proba)
probabilities = [(pred[y], f) for f, y, pred in collected]
probabilities.sort()
for prob, filename in probabilities[:3]:
    print('%s: %.2f' % (filename, prob))
dog.1004.jpg.emb: 0.10
cat.991.jpg.emb: 0.28
dog.412.jpg.emb: 0.40
dog 1004
cat 991
dog 412

Not bad. It seems like we're making reasonable mistakes.

Nearest neighbors

One of the great things about embeddings is that they're general. You can use the same embedding for lots of different machine learning tasks.

Let's train a model that takes an image of a dog and finds similar dog images in our dataset.

First, we import a bunch of things again:

import basilica
import json
import numpy as np
import os
import random
import re
import sklearn.decomposition
import sklearn.neighbors
import sklearn.preprocessing
import time

Next, we load up the embeddings for all of the dogs:

EMB_DIR = '/tmp/basilica-embeddings/'
files = [f for f in os.listdir(EMB_DIR) if re.match('.*dog.*', f)]
random.shuffle(files)

signatures = np.zeros((len(files), 2048))
for i, filename in enumerate(files):
    with open(EMB_DIR + filename, 'r') as f:
        signatures[i] = json.load(f)

Next, let's fit a nearest neighbors model. The number of dimensions you want for nearest neighbors search is usually much smaller than the number you want for classification, so we're going to PCA down to 200 dimensions first:

scaler = sklearn.preprocessing.StandardScaler(with_std=False)
pca = sklearn.decomposition.PCA(n_components=200, whiten=True)

signatures = sklearn.preprocessing.normalize(signatures)
signatures = scaler.fit_transform(signatures)
signatures = pca.fit_transform(signatures)
signatures = sklearn.preprocessing.normalize(signatures)

nbrs = sklearn.neighbors.NearestNeighbors(n_neighbors=4).fit(signatures)

Now, let's search for similar dog pictures. Let's look for pictures similar to each of the first 3 in our dataset:

dog 1
dog 2
dog 3
IMG_DIR = 'images/'
target_files = ['dog.1.jpg', 'dog.2.jpg', 'dog.3.jpg']

API_KEY = 'SLOW_DEMO_KEY'

with basilica.Connection(API_KEY) as c:
    targets = np.array(list(c.embed_image_files(IMG_DIR + f for f in target_files)))
targets = sklearn.preprocessing.normalize(targets) targets = scaler.transform(targets) targets = pca.transform(targets) targets = sklearn.preprocessing.normalize(targets)
_, all_indices = nbrs.kneighbors(targets) for indices in all_indices: print(' '.join(files[i] for i in indices))
dog.1.jpg.emb dog.283.jpg.emb dog.603.jpg.emb dog.597.jpg.emb
dog.2.jpg.emb dog.849.jpg.emb dog.334.jpg.emb dog.586.jpg.emb
dog.3.jpg.emb dog.1130.jpg.emb dog.1157.jpg.emb dog.607.jpg.emb

Here are the similar images grouped together:


dog 1
dog 283
dog 603
dog 597

dog 2
dog 849
dog 334
dog 586

dog 3
dog 1130
dog 1157
dog 607

Try it! Upload your own image or use one of our sample images.


Processing…
Error uploading image. . Try again!

You'll notice that we're searching for similar dog pictures, not pictures of similar dogs. Our code thinks that environmental features like cages and fences are just as important as the features of the dogs themselves.

If we wanted to instead cluster similar dog pictures together, we'd need a larger dataset with better labeling, so that we could look for features that distinguish dogs from each other, but not from pictures of the same dog against different backgrounds.

An important point

You probably noticed in the previous examples that we were treating each float in the embedding (or in the PCA-ed embedding) as a single feature. You're might be wondering: what if I have my own features?

Embeddings play nice with hand-designed features, and with each other. If you were training a tweet classifier, for example, you could combine hand-crafted features like the time of day, Basilica's English language embedding of the tweet, and Basilica's image embedding of any pictures in the tweet, into a single set of features that you feed into a regression.

If you already have an existing ML pipeline that that handles numeric features, and you'd like to improve it with high-dimensional data, embeddings let you add that high dimensional data right alongside your hand-designed features, with no need to retool.

Ready to get started?

See Our Quickstart
Create An Account