================= Magic Jug Methods ================= This is an advanced use of jug and you can shoot yourself in the foot doing this. If you cannot figure out why this functionality could be useful, then you probably should not be using it. Custom hash functions --------------------- Sometimes, you may want to give your objects a special hash function, either to add functionality or for efficiency. There are two ways to do it: (1) use a ``CustomHash`` object for simple cases or (2) add a ``__jug_hash__`` method for more complex ones. Using the CustomHash wrapper ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For example, here is how ``timed_path`` is implemented (minus the comments in the real code):: def hash_with_mtime_size(path): from .hash import hash_one st = os.stat_result(os.stat(path)) mtime = st.st_mtime size = st.st_size return hash_one((path, mtime, size)) def timed_path(path): return CustomHash(path, hash_with_mtime_size) The return value from ``timed_path`` is an object which behaves exactly like ``path`` (i.e., as a file path), but when jug needs to hash it, it calls the function ``hash_with_mtime_size``. Implementing a __jug_hash__ method ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When jug wants to hash an object, first it checks whether the object has a ``__jug_hash__`` method. If so, that method should return ``bytes`` (or a bytes-like object that can be used by a `hashlib `__ object. If there is no ``__jug_hash__`` method, jug checks whether the object is one of its known types (dict, list, tuple, numpy array, ...). If so, it will use optimized code. Otherwise; it resorts to calling pickle on the object and then hashing the pickled representation. This fallback can be very inefficient. For example, let's say you have an object which is basically just a numpy array loaded from disk, which remembers its initial location. The standard pickling method would be very inefficient compared to the optimized numpy code. The way to solve this is to define a ``__jug_hash__`` method. Inside it, we can rely on the jug hashing machinery to access the optimized version! Here is how we'd do it:: import numpy as np class NamedNumpy: def __init__(self, ifile): self.data = np.load(ifile) self.name = ifile def transform(self, x): self.data *= x def __jug_hash__(self): from jug.hash import hash_one return hash_one({ 'type': 'NumpyPair', 'data': self.data, 'name': self.name, }) The function ``hash_one`` takes one object and hashes it using the jug machinery. Because we are passing it a dictionary, it recursively build a hash for it. Thus, our ``NamedNumpy`` object now has a very fast hash function. In fact, the ``CustomHash`` object we saw above, just defined its ``__jug_hash__`` function to call whatever you pass it in. Overriding the ``value`` function --------------------------------- Similarly to overriding the hashing, we can override the ``value`` call which jug used internally to load objects. Again, ``value(x)`` works in the following way: 1. Does the ``x.__jug_value__`` member exist? If so, call it. 2. Is ``x`` one of the composite types it knows about (dict, list,...). If so, use special code to recursively get all the sub objects. For a list ``value([x,y]) == [value(x), value(y)]``. 3. Is it a ``Task`` or a ``Tasklet``? If so, load it from the store. 4. Otherwise, ``value(x) == x``. What if you have your own sequence object? Then you can set a ``__jug_value__`` method, which will be called whenever ``value(self)`` is needed. This is a pretty advanced use case: if you cannot figure out why this may be useful, then you probably don't need to use it.