========= Utilities ========= The module ``jug.utils`` has a few functions which are meant to be used in writing jugfiles. Identity -------- This is simply implemented as:: @TaskGenerator def identity(x): return x This might seem like the most pointless function, but it can be helpful in speeding things up. Consider the following case:: from glob import glob def load(fname): return open(fname).readlines() @TaskGenerator def process(inputs, parameter): ... inputs = [] for f in glob('*.data'): inputs.extend(load(f)) # inputs is a large list results = {} for p in range(1000): results[p] = process(inputs, p) How is this processed? Every time ``process`` is called, a new ``jug.Task`` is generated. This task has two arguments: ``inputs`` and an integer. When the hash of the task is computed, both its arguments are analysed. ``inputs`` is a large list of strings. Therefore, it is going to take a very long time to process all of the hashes. Consider the variation:: from jug.utils import identity # ... # same as above inputs = identity(inputs) results = {} for p in range(1000): results[p] = process(inputs, p) Now, the long list is only hashed once! It is transformed into a ``Task`` (we reuse the name ``inputs`` to keep things clear) and each ``process`` call can now compute its hash very fast. Using ``identity`` to induce dependencies ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``identity`` can also be used to introduce dependencies. One can define a helper function:: def value_after(val, token): from jug.utils import identity return identity( [val, token] )[0] Now, this function, will always return its first argument, but will only run once its second argument is available. Here is a typical use case: 1. Function ``process`` takes an output file name 2. Function ``postprocess`` takes as input the output filename of ``process`` Now, you want to run ``process`` and **then** ``postprocess``, but since communication is done with files, Jug does not see that these functions depend on each other. ``value_after`` is the solution:: token = process(input, ofile='output.txt') postprocess(value_after('output.txt', token)) This works independently of whatever ``process`` returns (even if it is ``None``). jug_execute ----------- This is a simple wrapper around ``subprocess.call()``. It adds two important pieces of functionality: 1. it checks the exit code and raises an exception if not zero (this can be disabled by passing ``check_exit=False``). 2. It takes an argument called ``run_after`` which is ignored but can be used to declare dependencies between tasks. Thus, it can be used to ensure that a specific process only runs after something else has run:: from jug.utils import jug_execute from jug import TaskGenerator @TaskGenerator def my_computation(input, ouput_filename): ... token = my_computation(input, 'output.txt') # We want to run gzip, but **only after** `my_computation` has run: jug_execute(['gzip', 'output.txt'], run_after=token) cached_glob ----------- ``cached_glob`` is a simple utility to perform the following common operation:: from glob import glob from jug import CachedFunction files = CachedFunction(glob, pattern) files.sort() Where ``pattern`` is a glob pattern can be simply written as:: from jug.utils import cached_glob files = cached_glob(pattern) .. automodule:: jug.utils :members: