Skip to content

Running on HTCondor

Producing ntuples and histograms for a full era means processing thousands of files — far too much for one machine. FLAF tasks are workflows (Tasks & LAW), so their branches can be submitted to CERN's HTCondor batch system. The recommended pattern is to develop and test with --workflow local, then switch to --workflow htcondor for production — the command is otherwise the same.

Submit a task to the batch system

law run FLAF.AnaProd.tasks.AnaTupleFileTask \
  --period Run3_2022 --version prod \
  --workflow htcondor \
  --transfer-logs \
  --parallel-jobs 100
Option Why you want it
--workflow htcondor Submit branches as batch jobs instead of running locally.
--transfer-logs Bring each job's stdout/stderr back to your data/ area. Highly recommended — without it, debugging a failed job is painful.
--parallel-jobs 100 Cap how many jobs are in flight at once. Be a good citizen on the shared pool; very large uncapped submissions are discouraged.
--branches 0-99 Submit only a subset (e.g. to retry a range).

Other HTCondor parameters available on every workflow task: --max-runtime, --n-cpus, --priority, --htcondor-spool. See Command arguments.

Monitor and resume

LAW tracks which branches have finished (by checking their outputs), so a re-run only resubmits the missing ones — batch jobs fail and time out, and resuming is normal. Check progress with:

law run FLAF.AnaProd.tasks.AnaTupleFileTask \
  --period Run3_2022 --version prod --print-status 1,1

Standard condor_q / condor_status work for the underlying jobs.

Bundles: shipping the code to workers

A batch worker needs your code and environment. FLAF supports two modes:

  • Non-bundle jobs rely on the shared AFS area being mounted on the worker: the job receives FLAF_PATH/CORRECTIONS_PATH and runs the code straight from AFS (including any edits you made via the dev overlay).
  • Bundle jobs ship a tarball of the code/environment to the worker (the --bundle flag and the BundleTask machinery). The worker runs from the tarball and never reaches back to AFS, so it is deliberately not given FLAF_PATH/CORRECTIONS_PATH. Bundles also set FLAF_NO_INSTALL=1 so the worker never tries to build the environment.

For most work the defaults are correct; you only think about bundles when a stage explicitly needs one (e.g. it declares a CMSSW bundle flavour) or when AFS is not available on the target pool.

Your edits to FLAF do reach the workers

Thanks to the dev overlay, non-bundle jobs run your edited FLAF/Corrections, and bundle jobs include them in the tarball — so testing framework changes on HTCondor works without committing first. See Contributing.

Caveats

Keep your proxy valid for the whole run

Jobs that outlive your VOMS proxy lose grid access mid-flight. Create a long-lived proxy (-valid 192:00) before a big submission, and refresh it for long campaigns.

Killing a background law leaves its jobs/children

Pressing Ctrl-C or kill-ing a backgrounded law process does not necessarily stop the branches it spawned. To stop everything for a run, match the processes by pattern, e.g. pkill -f "version=prod", and condor_rm the submitted jobs if needed.

Test small, then scale

Validate a task with --workflow local --branches 0 --test 1000 before submitting the full workflow to HTCondor. A bug found on one local branch is far cheaper than one found across a thousand batch jobs.