Skip to content

Full workflow walkthrough

This is the end-to-end tour of the pipeline: every stage, in order, with the command that runs it. Read it once to understand the chain; in day-to-day work you usually run only the last stage you need and let LAW produce the rest (see the shortcut).

The commands use FLAF.AnaProd.tasks.* for production stages and FLAF.Analysis.tasks.* for analysis stages — the fully-qualified task paths the framework registers.

Setup recap

cd HH_bbtautau          # your analysis repository
source env.sh           # once per shell
voms-proxy-info         # confirm a valid proxy

# Pick a data-taking era and a label for this production:
ERA=Run3_2022
VER=dev

Throughout, --period $ERA selects the era and --version $VER namespaces the outputs. Add --workflow local to run on this machine; switch to --workflow htcondor to scale up (HTCondor guide).


Stage 0 — Resolve the input files

InputFileTask turns "the datasets for this era" into a concrete list of NanoAOD files (from DAS). Everything else depends on it, so it runs first — automatically when you launch a later stage, or explicitly:

law run FLAF.AnaProd.tasks.InputFileTask --period $ERA --version $VER --workflow local

It is fast and cheap. If a from-scratch run unexpectedly stays in InputFileTask or fails here, suspect a wrong --period/--version or an expired proxy.

Stage 1 — Produce and merge analysis ntuples (anaTuples)

AnaTupleFileTask runs the analysis producer (AnaProd/anaTupleProducer.py, inside CMSSW) over each NanoAOD file — one branch per file — applying the object selections and corrections and writing a slimmed anaTuple. AnaTupleMergeTask then merges the per-file pieces into one anaTuple per dataset.

# Produce per-file anaTuples (heavy; normally on HTCondor):
law run FLAF.AnaProd.tasks.AnaTupleFileTask --period $ERA --version $VER --workflow local

# Merge them per dataset:
law run FLAF.AnaProd.tasks.AnaTupleMergeTask --period $ERA --version $VER --workflow local

Test on a few files first

--branches 0,1,2 runs only the first three input files, and --test 1000 processes only 1000 events per file. Combine both to smoke-test ntuple production quickly.

Stage 2 — Compute analysis observables (histTuples)

HistTupleProducerTask reads the merged anaTuples and computes the heavier analysis observables (the "payload producers" configured in global.yaml), writing histTuples:

law run FLAF.Analysis.tasks.HistTupleProducerTask --period $ERA --version $VER --workflow local

HH→bb̄WW: a caching step runs here first

In HH→bb̄WW, AnalysisCacheTask (and AnalysisCacheAggregationTask) pre-compute and aggregate per-event payloads — notably the b-tag shape weights — before histogramming. They are pulled in automatically and can be time-consuming (budget roughly an hour per branch on a cold cache). See the HH_bbWW docs and the Task reference.

Stage 3 — Fill and merge histograms

HistFromNtupleProducerTask fills histograms of the requested variables from the histTuples — one branch per variable — including systematic variations. HistMergerTask merges the pieces into per-process histograms ready for plotting and fitting.

# Fill histograms (restrict variables with --variables, batch with --n-var-batches):
law run FLAF.Analysis.tasks.HistFromNtupleProducerTask --period $ERA --version $VER --workflow local

# Merge them:
law run FLAF.Analysis.tasks.HistMergerTask --period $ERA --version $VER --workflow local

Which variables are produced is controlled by the analysis config and can be narrowed with the --variables parameter or the variables: list in user_custom.yaml.

Stage 4 — Make the plots

HistPlotTask produces the final plots — one branch per variable:

law run FLAF.Analysis.tasks.HistPlotTask --period $ERA --version $VER --workflow local
# one variable only:
law run FLAF.Analysis.tasks.HistPlotTask --period $ERA --version $VER --workflow local --branches 0

This is the task you most often launch directly: asking for the plots makes LAW produce every upstream product that is missing.

Stage 5 — Statistical inference

The two HH analyses turn the merged histograms into datacards and then run limits and diagnostics with Combine, via the StatInference and inference submodules. H→μμ does not include this stage.

Because these commands run inside CMSSW/Combine, prefix them with cmsEnv (or open a cmsEnv subshell once):

# 1) Create datacards from the produced shapes:
cmsEnv python3 StatInference/dc_make/create_datacards.py \
  --input  <PATH_TO_SHAPES> \
  --output <PATH_TO_CARDS> \
  --config <PATH_TO_CONFIG>      # e.g. StatInference/config/x_hh_bbww_run3.yaml

# 2) Run resonant limits:
law run PlotResonantLimits --version $VER --datacards '<PATH_TO_CARDS>/*.txt' --xsec fb --y-log

# 3) Pulls & impacts (per mass point — point at a single card):
PlotPullsAndImpacts --version $VER --datacards "<PATH_TO_CARDS>/<one_card>.txt" ...

The exact configs and options are analysis-specific — see each analysis's Statistical inference page (linked from Analyses) and the cms-hh inference docs.


The shortcut: just ask for the end

You rarely run the stages one by one. Because every task knows its dependencies, launching a late stage runs all missing upstream stages automatically:

law run FLAF.Analysis.tasks.HistPlotTask --period $ERA --version $VER --workflow local

Run the individual stages explicitly only when you want to stop at an intermediate product (e.g. produce anaTuples for someone else to use), or to inspect/debug one stage.

See progress and redo selectively

# Status of the whole tree (task depth 3, file depth 1) — also prints output paths:
law run FLAF.Analysis.tasks.HistPlotTask --period $ERA --version $VER --print-status 3,1

# Force one stage to be recomputed:
law run FLAF.Analysis.tasks.HistMergerTask --period $ERA --version $VER --remove-output 0,a,y

See Command arguments for the full option list, and Running on HTCondor to take any of these commands to the batch system by swapping --workflow local for --workflow htcondor.