===========================
Image Segmentation Tutorial
===========================
This was originally material for a presentation and `blog post
`__.
You can get the `slides online `__.
Let us imagine you are trying to compare two image segmentation algorithms
based on human-segmented images. This is a completely real-world example as it
`was one of the projects where I first used jug
`__ [#]_.
It depends on `mahotas `__ for image
processing.
We are going to build this up piece by piece.
First a few imports::
import mahotas as mh
from jug import TaskGenerator
from glob import glob
Here, we test two thresholding-based segmentation method, called ``method1`` and
``method2``. They both (i) read the image, (ii) blur it with a Gaussian, and
(iii) threshold it [#]_::
@TaskGenerator
def method1(image):
# Read the image
image = mh.imread(image)[:, :, 0]
image = mh.gaussian_filter(image, 2)
binimage = (image > image.mean())
labeled, _ = mh.label(binimage)
return labeled
@TaskGenerator
def method2(image):
image = mh.imread(image)[:, :, 0]
image = mh.gaussian_filter(image, 4)
image = mh.stretch(image)
binimage = (image > mh.otsu(image))
labeled, _ = mh.label(binimage)
return labeled
We need a way to compare these. We will use the `Adjusted Rand Index
`__ [#]_::
@TaskGenerator
def compare(labeled, ref):
from milk.measures.cluster_agreement import rand_arand_jaccard
ref = mh.imread(ref)
return rand_arand_jaccard(labeled.ravel(), ref.ravel())[1]
Running over all the images **looks exactly like Python**::
results = []
for im in glob('images/*.jpg'):
m1 = method1(im)
m2 = method2(im)
ref = im.replace('images', 'references').replace('jpg', 'png')
v1 = compare(m1, ref)
v2 = compare(m2, ref)
results.append( (v1,v2) )
But how do we get the results out?
A simple solution is to write a function which writes to an output file::
@TaskGenerator
def print_results(results):
import numpy as np
r1, r2 = np.mean(results, 0)
with open('output.txt', 'w') as out:
out.write('Result method1: {}\nResult method2: {}\n'.format(r1,
r2))
print_results(results)
§
**Except for the ``TaskGenerator`` this would be a pure Python file!**
With ``TaskGenerator``, we get jugginess!
We can call::
jug execute &
jug execute &
jug execute &
jug execute &
to get 4 processes going at once.
§
Note also the line::
print_results(results)
``results`` is a list of ``Task`` objects. This is *how you define a
dependency*. Jug picks up that to call ``print_results``, it needs all the
``results`` values and behaves accordingly.
Easy as Py.
§
The full script above including data is available `from github
`__
.. [#] The code in that repository still uses a pretty old version of jug, this
was 2009, after all. ``TaskGenerator`` had not been invented yet.
.. [#] This is for demonstration purposes; the paper had better methods, of
course.
.. [#] Again, you can do better than Adjusted Rand, as we show in the paper;
but **this is a demo**. This way, we can just call a function in `milk
`__