Tasks & LAW¶
FLAF expresses the whole analysis as a set of tasks wired together by their dependencies, and runs them with LAW (on top of Luigi). You do not need to know Luigi to use FLAF, but a working mental model of "tasks" pays off immediately.
What is a task?¶
A task is one stage of work with three things defined:
- outputs — the file(s) it produces (
output()), - requirements — the other tasks it depends on (
requires()), - a run step — what it actually does to turn inputs into outputs.
The crucial property: a task that already has its output is considered done. LAW checks for the output file; if it exists, the task is skipped. This makes the pipeline resumable — re-run a late stage and only the missing upstream pieces are computed.
You request the end; LAW fills in the middle¶
You almost never run the intermediate stages by hand. You ask for the task whose result you want, and LAW walks the dependency graph and runs whatever is missing, in order:
Even though plotting is the last stage, this single command will (if needed) resolve input files, produce and merge ntuples, compute observables and fill histograms first. The dependency graph for the analysis is, in order:
flowchart LR
IFT[InputFileTask] --> ATF[AnaTupleFileTask] --> ATM[AnaTupleMergeTask]
ATM --> HTP[HistTupleProducerTask] --> HFN[HistFromNtupleProducerTask]
HFN --> HM[HistMergerTask] --> HP[HistPlotTask]
Every box is documented in the Task reference, and the same chain is walked through with commands in the full-workflow walkthrough.
Workflows and branches¶
Most FLAF tasks are workflows: they split into many independent branches that can run in parallel. What a branch is depends on the task:
AnaTupleFileTaskhas one branch per input NanoAOD file.HistPlotTaskhas one branch per variable to plot.- merge tasks have one branch per dataset/process.
Two arguments control workflows:
--workflow localruns branches on the current machine;--workflow htcondorsubmits them to the batch system.--branches 0,2,5-7runs only selected branches (great for testing one file or one variable).
--branches 0 does not mean 'a tiny run of everything'
--branches only restricts the task you launched. Its upstream dependencies still run for
everything they need. For example HistPlotTask --branches 0 plots one variable, but the
ntuples and histograms it needs are still produced for all datasets. To make a run genuinely
small, combine it with --test (few events) and phys_model: TestModel (few processes).
Inspecting and cleaning up¶
LAW gives you commands to see and manage the state of the graph without running it:
| Command | What it does |
|---|---|
--print-status N,K |
Show the status of the dependency tree to task depth N and file-collection depth K. --print-status 3,1 is a good default. The output also reveals each output's path. |
--print-deps N |
Print the dependency tree to depth N without checking outputs. |
--remove-output N,a,y |
Remove this task's outputs to depth N. a = all branches, y = no confirmation. Use to force a redo. |
--parallel-jobs M |
Cap the number of branches running at once (e.g. --parallel-jobs 100). Strongly recommended for large local or batch runs. |
--remove-output deletes files
It removes real outputs (including on grid storage). Double-check the depth and the version before confirming, especially in a shared production area.
Where do the common options come from?¶
--version, --period, --workflow, --branches, --test, --customisations, --process,
--user-custom and the per-task --*-version overrides are defined on FLAF's base task classes
(in FLAF/run_tools/law_customizations.py), so they are available on every FLAF task. They
are all catalogued in Command arguments.
When to re-index¶
LAW maintains an index of available tasks. Re-run law index --verbose after you add, rename
or move a task class, or you will get "task not found" errors. Simply editing the body of an
existing task does not require re-indexing.