Image Segmentation Tutorial

This was originally material for a presentation and blog post. You can get the slides online.

Let us imagine you are trying to compare two image segmentation algorithms based on human-segmented images. This is a completely real-world example as it was one of the projects where I first used jug [1].

It depends on mahotas for image processing.

We are going to build this up piece by piece.

First a few imports:

import mahotas as mh
from jug import TaskGenerator
from glob import glob

Here, we test two thresholding-based segmentation method, called method1 and method2. They both (i) read the image, (ii) blur it with a Gaussian, and (iii) threshold it [2]:

@TaskGenerator
def method1(image):
    # Read the image
    image = mh.imread(image)[:, :, 0]
    image  = mh.gaussian_filter(image, 2)
    binimage = (image > image.mean())
    labeled, _ = mh.label(binimage)
    return labeled

@TaskGenerator
def method2(image):
    image = mh.imread(image)[:, :, 0]
    image  = mh.gaussian_filter(image, 4)
    image = mh.stretch(image)
    binimage = (image > mh.otsu(image))
    labeled, _ = mh.label(binimage)
    return labeled

We need a way to compare these. We will use the Adjusted Rand Index [3]:

@TaskGenerator
def compare(labeled, ref):
    from milk.measures.cluster_agreement import rand_arand_jaccard
    ref = mh.imread(ref)
    return rand_arand_jaccard(labeled.ravel(), ref.ravel())[1]

Running over all the images looks exactly like Python:

results = []
for im in glob('images/*.jpg'):
    m1 = method1(im)
    m2 = method2(im)
    ref = im.replace('images', 'references').replace('jpg', 'png')
    v1 = compare(m1, ref)
    v2 = compare(m2, ref)
    results.append( (v1,v2) )

But how do we get the results out?

A simple solution is to write a function which writes to an output file:

@TaskGenerator
def print_results(results):
    import numpy as np
    r1, r2 = np.mean(results, 0)
    with open('output.txt', 'w') as out:
        out.write('Result method1: {}\nResult method2: {}\n'.format(r1,
                                                                    r2))
print_results(results)

§

Except for the ``TaskGenerator`` this would be a pure Python file!

With TaskGenerator, we get jugginess!

We can call:

jug execute &
jug execute &
jug execute &
jug execute &

to get 4 processes going at once.

§

Note also the line:

print_results(results)

results is a list of Task objects. This is how you define a dependency. Jug picks up that to call print_results, it needs all the results values and behaves accordingly.

Easy as Py.

§

The full script above including data is available from github

[1]The code in that repository still uses a pretty old version of jug, this was 2009, after all. TaskGenerator had not been invented yet.
[2]This is for demonstration purposes; the paper had better methods, of course.
[3]Again, you can do better than Adjusted Rand, as we show in the paper; but this is a demo. This way, we can just call a function in milk