Overview¶
RomsMarblInputData is a dataclass that implements ROMS-MARBL specific input data generation. It handles the creation of all input files required for a ROMS simulation, including grid, initial conditions, and all types of forcing data.
Class Definition¶
@dataclass
class RomsMarblInputData(InputData):
"""ROMS-MARBL specific input data generation."""
model_spec: cson_models.ModelSpec
grid: rt.Grid
boundaries: cson_models.OpenBoundaries
source_data: source_data.SourceData
blueprint_dir: Path
partitioning: cstar_models.PartitioningParameterSet
use_dask: bool = True
blueprint_elements: RomsMarblBlueprintInputData # Auto-initialized
_settings_compile_time: dict # Auto-initialized
_settings_run_time: dict # Auto-initializedInitialization¶
Input List Derivation¶
During __post_init__(), the class builds input_list from model_spec.inputs:
Grid: Extracts grid specifications →
("grid", kwargs)Initial Conditions: Extracts initial conditions specs →
("initial_conditions", kwargs)Forcing: Iterates over forcing categories (surface, boundary, tidal, river) and items within each →
("forcing.{category}", kwargs)for each item
Example Input List:
[
("grid", {"topography_source": "ETOPO5"}),
("initial_conditions", {"source": {"name": "GLORYS"}, "bgc_source": {...}}),
("forcing.surface", {"source": {"name": "ERA5"}, "type": "physics", ...}),
("forcing.surface", {"source": {"name": "UNIFIED"}, "type": "bgc", ...}),
("forcing.boundary", {"source": {"name": "GLORYS"}, "type": "physics", ...}),
("forcing.tidal", {"source": {"name": "TPXO"}, "ntides": 15}),
("forcing.river", {"source": {"name": "DAI"}, "include_bgc": True}),
]Registry Validation¶
The class validates that all keys in input_list have registered handlers in INPUT_REGISTRY. Missing handlers raise a ValueError.
Blueprint Elements Initialization¶
Creates RomsMarblBlueprintInputData instance with empty datasets:
grid: Empty dataset if “grid” in input_listinitial_conditions: Empty dataset if “initial_conditions” in input_listforcing: ForcingConfiguration with datasets for each category (boundary, surface, tidal, river)cdr_forcing: Empty dataset if “cdr_forcing” in input_list
Validation:
Requires “boundary” forcing if any forcing is specified
Requires “surface” forcing if any forcing is specified
Settings Initialization¶
_settings_compile_time: Empty dictionary{}_settings_run_time: Dictionary with{"roms.in": {}}
Registry Framework¶
Input Registry¶
The INPUT_REGISTRY dictionary maps input keys to InputStep instances:
INPUT_REGISTRY: Dict[str, InputStep] = {
"grid": InputStep(name="grid", order=10, label="Writing ROMS grid", handler=_generate_grid),
"initial_conditions": InputStep(name="initial_conditions", order=20, label="Generating initial conditions", handler=_generate_initial_conditions),
"forcing.surface": InputStep(name="forcing.surface", order=30, label="Generating surface forcing", handler=_generate_surface_forcing),
"forcing.boundary": InputStep(name="forcing.boundary", order=40, label="Generating boundary forcing", handler=_generate_boundary_forcing),
"forcing.tidal": InputStep(name="forcing.tidal", order=50, label="Generating tidal forcing", handler=_generate_tidal_forcing),
"forcing.river": InputStep(name="forcing.river", order=60, label="Generating river forcing", handler=_generate_river_forcing),
"cdr_forcing": InputStep(name="cdr_forcing", order=80, label="Generating CDR forcing", handler=_generate_cdr_forcing),
"forcing.corrections": InputStep(name="forcing.corrections", order=90, label="Generating corrections forcing", handler=_generate_corrections),
}Registration Decorator¶
@register_input(name: str, order: int, label: str | None = None)Parameters:
name: Input key (e.g., “grid”, “forcing.surface”)order: Execution order (lower numbers run first)label: Human-readable label for progress messages
Example:
@register_input(name="forcing.surface", order=30, label="Generating surface forcing")
def _generate_surface_forcing(self, key: str = "forcing.surface", **kwargs):
"""Generate surface forcing input files."""
# Implementation...Input Generation Process¶
generate_all() Method¶
Main entry point for generating all input files:
def generate_all(
self,
clobber: bool = False,
partition_files: bool = False,
test: bool = False
) -> Tuple[RomsMarblBlueprintInputData, dict, dict]:
"""
Generate all ROMS input files.
Returns
-------
blueprint_elements: RomsMarblBlueprintInputData
Blueprint subset with generated input file paths
compile_time_settings: dict
Compile-time settings dictionary
run_time_settings: dict
Run-time settings dictionary
"""Process:
Clobber Check: Ensures output directory is empty or removes existing files if
clobber=TrueBuild Step List: Creates list of
(step, kwargs)tuples frominput_list, sorted by orderExecute Handlers: Calls each handler with
keyandkwargsPartitioning: Optionally partitions files across tiles if
partition_files=TrueReturn: Returns
blueprint_elementsand settings dictionaries
Handler Function Signature¶
All registered handlers follow this pattern:
@register_input(name="input_key", order=ORDER, label="Label")
def _generate_input(self, key: str = "input_key", **kwargs):
"""
Generate input file(s) for this input type.
Parameters
----------
key : str
Input key (matches registered name)
**kwargs
Input-specific arguments from input_list
Side Effects
------------
- Creates NetCDF file(s) in input_data_dir
- Creates YAML metadata file in blueprint_dir
- Appends Resource(s) to blueprint_elements
- Updates _settings_compile_time and/or _settings_run_time
"""Registered Input Handlers¶
Grid (grid, order=10)¶
Handler: _generate_grid()
Generates:
Grid NetCDF file:
{model_name}_{grid_name}_grid.ncGrid YAML metadata:
_{grid_name}.yml(in blueprint_dir)
Updates Blueprint:
Appends
Resourcetoblueprint_elements.grid.data
Populates Settings:
Compile-time (
cppdefs): Open boundary flagsself._settings_compile_time["cppdefs"]["obc_west"] = self.boundaries.west self._settings_compile_time["cppdefs"]["obc_east"] = self.boundaries.east self._settings_compile_time["cppdefs"]["obc_north"] = self.boundaries.north self._settings_compile_time["cppdefs"]["obc_south"] = self.boundaries.southCompile-time (
param): Grid dimensions and partitioningself._settings_compile_time["param"]["LLm"] = self.grid.nx self._settings_compile_time["param"]["MMm"] = self.grid.ny self._settings_compile_time["param"]["N"] = self.grid.N self._settings_compile_time["param"]["NP_XI"] = self.partitioning.n_procs_x self._settings_compile_time["param"]["NP_ETA"] = self.partitioning.n_procs_yRun-time (
roms.in.grid): Grid file pathself._settings_run_time["roms.in"]["grid"] = {"grid_file": out_path}
Initial Conditions (initial_conditions, order=20)¶
Handler: _generate_initial_conditions()
Generates:
Initial conditions NetCDF file(s):
{model_name}_{grid_name}_initial_conditions.ncInitial conditions YAML metadata:
_initial_conditions.yml
Source Resolution:
Uses
sourceand optionalbgc_sourcefrom kwargsResolves paths via
_resolve_source_block()→SourceData.path_for_source()
Updates Blueprint:
Appends
Resource(s)toblueprint_elements.initial_conditions.data
Populates Settings:
Run-time (
roms.in.initial): Initial conditions file pathself._settings_run_time["roms.in"]["initial"] = { "nrrec": 1, "initial_file": paths[0] # First file in list }
Surface Forcing (forcing.surface, order=30)¶
Handler: _generate_surface_forcing()
Generates:
Surface forcing NetCDF file(s):
{model_name}_{grid_name}_surface-{type}_YYYYMM.ncSurface forcing YAML metadata:
_forcing.surface-{type}.yml
Key Features:
Supports multiple surface forcing sources (physics and bgc)
Each item in
forcing.surfacelist generates a separate fileRequires
typeparameter:"physics"or"bgc"
Source Resolution:
Uses
sourcefrom kwargsResolves path via
_resolve_source_block()
Updates Blueprint:
Appends
Resource(s)toblueprint_elements.forcing.surface.data
Populates Settings:
Run-time (
roms.in.forcing): Surface forcing file pathsif type == "bgc": self._settings_run_time["roms.in"]["forcing"]["surface_forcing_bgc_path"] = paths[0] else: # physics self._settings_run_time["roms.in"]["forcing"]["surface_forcing_path"] = paths[0]
Note: Compile-time settings for surface forcing are not yet populated (TODO in code).
Boundary Forcing (forcing.boundary, order=40)¶
Handler: _generate_boundary_forcing()
Generates:
Boundary forcing NetCDF file(s):
{model_name}_{grid_name}_boundary-{type}_YYYYMM.ncBoundary forcing YAML metadata:
_forcing.boundary-{type}.yml
Key Features:
Supports multiple boundary forcing sources (physics and bgc)
Each item in
forcing.boundarylist generates a separate fileRequires
typeparameter:"physics"or"bgc"Uses
boundariesconfiguration for open boundary specification
Source Resolution:
Uses
sourcefrom kwargsResolves path via
_resolve_source_block()
Updates Blueprint:
Appends
Resource(s)toblueprint_elements.forcing.boundary.data
Populates Settings:
Run-time (
roms.in.forcing): Boundary forcing file pathsif type == "bgc": self._settings_run_time["roms.in"]["forcing"]["boundary_forcing_bgc_path"] = paths[0] else: # physics self._settings_run_time["roms.in"]["forcing"]["boundary_forcing_path"] = paths[0]
Note: Compile-time settings for boundary forcing are not yet populated (TODO in code).
Tidal Forcing (forcing.tidal, order=50)¶
Handler: _generate_tidal_forcing()
Generates:
Tidal forcing NetCDF file(s):
{model_name}_{grid_name}_tidal.ncTidal forcing YAML metadata:
_forcing.tidal.yml
Key Features:
Uses
ntidesparameter from kwargs (default from model_spec.inputs)Uses
model_reference_date(set tostart_date)
Source Resolution:
Uses
sourcefrom kwargs (typically TPXO)Resolves path via
_resolve_source_block()
Updates Blueprint:
Appends
Resource(s)toblueprint_elements.forcing.tidal.data
Populates Settings:
Compile-time (
tides): Tidal forcing configurationself._settings_compile_time["tides"] = { "ntides": 10, # Default, may be overridden by kwargs "bry_tides": True, "pot_tides": True, "ana_tides": False }
Note: Run-time settings for tidal forcing are not yet populated (TODO in code).
River Forcing (forcing.river, order=60)¶
Handler: _generate_river_forcing()
Generates:
River forcing NetCDF file(s):
{model_name}_{grid_name}_river.ncRiver forcing YAML metadata:
_forcing.river.yml
Key Features:
Uses
include_bgcparameter from kwargsExtracts number of rivers from generated dataset
Source Resolution:
Uses
sourcefrom kwargs (typically DAI)Resolves path via
_resolve_source_block()
Updates Blueprint:
Appends
Resource(s)toblueprint_elements.forcing.river.data
Populates Settings:
Compile-time (
river_frc): River forcing configurationself._settings_compile_time["river_frc"] = { "river_source": True, "analytical": False, "nriv": river.ds.sizes["nriver"], # From generated dataset "rvol_vname": "river_volume", "rvol_tname": "river_time", "rtrc_vname": "river_tracer", "rtrc_tname": "river_time", }Run-time (
roms.in.forcing): River forcing file pathself._settings_run_time["roms.in"]["forcing"]["river_path"] = paths[0]
CDR Forcing (cdr_forcing, order=80)¶
Handler: _generate_cdr_forcing()
Generates:
CDR forcing NetCDF file(s):
{model_name}_{grid_name}_cdr_forcing.ncCDR forcing YAML metadata:
_cdr_forcing.yml
Key Features:
Optional input (only generates if
cdr_listis provided)Uses
releasesparameter for CDR release specifications
Updates Blueprint:
Appends
Resource(s)toblueprint_elements.cdr_forcing.data
Populates Settings:
Settings for CDR forcing are not yet implemented (TODO in code).
Corrections Forcing (forcing.corrections, order=90)¶
Handler: _generate_corrections()
Status: Not yet implemented (raises NotImplementedError)
Source Resolution¶
_resolve_source_block() Method¶
Normalizes source blocks and injects file paths:
def _resolve_source_block(self, block: Union[str, Dict[str, Any]]) -> Dict[str, Any]:
"""
Normalize a "source"/"bgc_source" block and inject a 'path' based on SourceData.
Parameters
----------
block : str or dict
Source specification (e.g., "GLORYS" or {"name": "GLORYS", "climatology": True})
Returns
-------
dict
Source block with 'name' and optional 'path' fields
"""Process:
Normalize to dict: If string, convert to
{"name": str}Extract name: Get
namefield from dictMap to dataset key: Use
SourceData.dataset_key_for_source(name)Check streamability: If streamable (ERA5, DAI), don’t add path unless explicitly provided
Get path: Use
SourceData.path_for_source(name)for non-streamable sourcesReturn: Dict with
nameand optionalpath
Examples:
# String input
"GLORYS" → {"name": "GLORYS", "path": Path("/path/to/GLORYS_REGIONAL_file.nc")}
# Dict input
{"name": "UNIFIED", "climatology": True} → {"name": "UNIFIED", "climatology": True, "path": Path("/path/to/UNIFIED_BGC_file.nc")}
# Streamable source
"ERA5" → {"name": "ERA5"} # No path (streamable)_build_input_args() Method¶
Merges default arguments with runtime overrides:
def _build_input_args(
self,
key: str,
extra: Optional[Dict[str, Any]] = None,
base_kwargs: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Merge per-input defaults with runtime arguments.
Uses base_kwargs if provided (from input_list), otherwise looks up in model_spec.inputs.
Resolves "source" and "bgc_source" through SourceData.
Merges with extra, where extra overrides defaults.
"""Process:
Get base config: Use
base_kwargsif provided, otherwise lookup inmodel_spec.inputsResolve source blocks: Convert
sourceandbgc_sourcePydantic models to dicts with pathsMerge extra:
extraoverrides defaults (for runtime additions likestart_time,use_dask)
Settings Population for Forcing¶
Compile-Time Settings¶
Tidal Forcing:
tides.ntides: Number of tidal constituentstides.bry_tides: Boundary tides flagtides.pot_tides: Potential tides flagtides.ana_tides: Analytical tides flag
River Forcing:
river_frc.river_source: Enable river source flagriver_frc.analytical: Analytical river flagriver_frc.nriv: Number of rivers (from generated dataset)river_frc.rvol_vname,river_frc.rvol_tname: River volume variable/time namesriver_frc.rtrc_vname,river_frc.rtrc_tname: River tracer variable/time names
Surface/Boundary Forcing:
Compile-time settings for surface and boundary forcing are not yet populated (TODO in code).
Run-Time Settings¶
Surface Forcing:
roms.in.forcing.surface_forcing_path: Path to physics surface forcing fileroms.in.forcing.surface_forcing_bgc_path: Path to bgc surface forcing file
Boundary Forcing:
roms.in.forcing.boundary_forcing_path: Path to physics boundary forcing fileroms.in.forcing.boundary_forcing_bgc_path: Path to bgc boundary forcing file
River Forcing:
roms.in.forcing.river_path: Path to river forcing file
Tidal Forcing:
Run-time settings for tidal forcing are not yet populated (TODO in code).
Blueprint Element Updates¶
Each handler appends Resource objects to the appropriate blueprint element:
Resource Creation:
resource = cstar_models.Resource(
location=out_path, # Path to generated NetCDF file
partitioned=False # Set to True after partitioning
)Blueprint Updates:
Grid:
blueprint_elements.grid.data.append(resource)Initial Conditions:
blueprint_elements.initial_conditions.data.append(resource)Forcing Categories:
blueprint_elements.forcing.{category}.data.append(resource)forcing.surface→forcing.surface.dataforcing.boundary→forcing.boundary.dataforcing.tidal→forcing.tidal.dataforcing.river→forcing.river.data
CDR Forcing:
blueprint_elements.cdr_forcing.data.append(resource)
File Partitioning¶
_partition_files() Method¶
Partitions whole-field input files across tiles:
def _partition_files(self, **kwargs):
"""
Partition whole input files across tiles using roms_tools.partition_netcdf.
Uses the paths stored in blueprint_elements to build the list of whole-field files,
and records the partitioned paths in the Resource objects.
"""Process:
Iterate over input_list: For each input key, get corresponding dataset from
blueprint_elementsPartition each Resource: Call
rt.partition_netcdf()for eachResource.locationCreate partitioned Resources: Replace original resources with partitioned ones
Update partitioned flag: Set
partitioned=Trueon new resources
Partitioning Arguments:
input_args = {
"np_eta": self.partitioning.n_procs_y,
"np_xi": self.partitioning.n_procs_x,
"output_dir": self.input_data_dir,
"include_coarse_dims": False,
}Result:
Original whole-field files remain unchanged
Partitioned files created in
input_data_dirblueprint_elementsupdated with partitionedResourceobjectspartitionedflag set toTrue
Return Values¶
generate_all() returns a tuple:
(
blueprint_elements: RomsMarblBlueprintInputData,
compile_time_settings: dict,
run_time_settings: dict
)Usage:
blueprint_elements: Merged into main blueprint during POSTCONFIG stagecompile_time_settings: Merged with template defaults, used to render*.optfilesrun_time_settings: Merged with template defaults, used to renderroms.in
These are used by CstarSpecBuilder.generate_inputs() to update the blueprint and settings before persisting.