Datasets¶
A dataset is one CMS sample — a simulated signal/background, or a chunk of real data —
identified by its DAS name. Datasets are declared per era in datasets.yaml files. Thanks to the
configuration merge, the lists from the
framework and the analysis are concatenated, so all datasets for an era are available together.
The split rule (where a dataset belongs)¶
| Kind of sample | Goes in |
|---|---|
| SM background, real data | FLAF/config/<era>/datasets.yaml (framework — common to all analyses) |
| Signal, custom/CI sample | <analysis>/config/<era>/datasets.yaml (analysis-specific) |
This keeps the common SM samples in one shared place while each analysis owns its signals.
Dataset entry format¶
DatasetName:
generator: powheg # madgraph, powheg, pythia, ...
mass: 125 # optional (signals)
spin: 0 # optional (resonant signals)
crossSection: 1pb # a value, or a key into crossSections13p6TeV.yaml
nanoAOD:
v12: # NanoAOD campaign tag (matches the era)
- /DAS/path/to/dataset/NANOAODSIM
v15:
- /DAS/path/to/dataset/NANOAODSIM
For custom/local samples (e.g. CI test inputs) that are not official DAS datasets, point at your own storage instead:
custom_CI:
generator: powheg
mass: 125
spin: 0
crossSection: 1pb
fs_nanoAOD: T3_CH_CERNBOX:/store/user/<user>/
dirName: "directory_name"
Cross-sections¶
MC datasets reference a cross-section, either inline (crossSection: 1pb) or by a key into
FLAF/config/crossSections13p6TeV.yaml (13.6 TeV; crossSections13TeV.yaml for Run 2). For
signals whose normalisation is set elsewhere, a placeholder such as 1pb is conventional.
Adding a dataset¶
- Choose the file by the split rule above.
- Add the entry in the right era's
datasets.yaml, following the format. - Make sure the
crossSectionresolves — add it tocrossSections13p6TeV.yamlif needed. - For Run3_2024 signals, check the actual DAS name: the naming changed (e.g. VBF drops the
_fixedTauDecayssuffix and uses a_Par-form). The conventions are summarised in the analysis docs and the project notes. - Validate (below).
Validate the dataset config¶
The same check the CI runs (ds-consistency-check) verifies that MC entries have a generator and a
resolvable cross-section, that names are well-formed, etc.:
python3 test/checkDatasetConfigConsistency.py \
--exception config/dataset_exceptions.yaml \
Run3_2022 Run3_2022EE Run3_2023 Run3_2023BPix Run3_2024 Run3_2025
Run it after editing any datasets.yaml. Known, intentional exceptions live in
config/dataset_exceptions.yaml. See CI / GitHub Actions.
Adding a new era¶
- Create
FLAF/config/<new_era>/with at leastdatasets.yamlandglobal.yaml. - Create
<analysis>/config/<new_era>/with the analysis-specific overrides and signals. - Add the era to
test-setup-loading.yamlin each affected analysis (so CI loadsSetup.pyfor it and catches config errors early). - Add the era to the
*_erasvariable in the relevant.github/integration_cfg.yamlif it should be part of CI runs. See Integration pipeline.
See also Eras & periods.