Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

ModelSpec Example

This notebook illustrates how the ModelSpec works with models.yml to curate and access model attributes.

Overview

The ModelSpec class defines the complete specification for an ocean model configuration, including:

  • Templates: Jinja2 template locations for compile-time and run-time configuration files

  • Settings: Default settings and configuration files

  • Code: Repository specifications for ROMS, MARBL, and associated code

  • Inputs: Default specifications for grid, initial conditions, and forcing data

  • Datasets: List of required source datasets

Model specifications are stored in models.yml and loaded using load_models_yaml().

Setup

Import the necessary modules and enable autoreload for development.

%load_ext autoreload
%autoreload 2

from pathlib import Path
from cson_forge import models, config

Load ModelSpec

Load a model specification from models.yml using load_models_yaml(). This function takes the path to the YAML file and the model name.

# Load a model specification
model_name = "cson_roms-marbl_v0.1"
model_spec = models.load_models_yaml(config.paths.models_yaml, model_name)

print(f"Loaded ModelSpec: {model_spec.name}")
print(f"Type: {type(model_spec)}")
Loaded ModelSpec: cson_roms-marbl_v0.1
Type: <class 'cson_forge.models.ModelSpec'>

Inspect ModelSpec Structure

The ModelSpec is a Pydantic model with several main components. Let’s explore each one:

# View all ModelSpec attributes
print("ModelSpec attributes:")
# Use the class to access model_fields (not the instance) to avoid deprecation warning
for attr in model_spec.__class__.model_fields.keys():
    value = getattr(model_spec, attr)
    if isinstance(value, (list, dict)) and len(str(value)) > 100:
        print(f"  - {attr}: {type(value).__name__} (length: {len(value)})")
    else:
        print(f"  - {attr}: {value}")
ModelSpec attributes:
  - name: cson_roms-marbl_v0.1
  - templates: compile_time=CodeRepository(documentation='', locked=False, location='/Users/mclong/codes/cson-forge/cson_forge/model-configs/cson_roms-marbl_v0.1/templates/compile-time', commit='', branch='main', filter=PathFilter(directory='', files=['bgc.opt.j2', 'blk_frc.opt.j2', 'cdr_frc.opt.j2', 'cppdefs.opt.j2', 'diagnostics.opt.j2', 'ocean_vars.opt.j2', 'param.opt.j2', 'river_frc.opt.j2', 'surf_flux.opt.j2', 'tides.opt.j2', 'tracers.opt.j2', 'Makefile'])) run_time=CodeRepository(documentation='', locked=False, location='/Users/mclong/codes/cson-forge/cson_forge/model-configs/cson_roms-marbl_v0.1/templates/run-time', commit='', branch='main', filter=PathFilter(directory='', files=['roms.in.j2', 'marbl_in', 'marbl_tracer_output_list', 'marbl_diagnostic_output_list']))
  - settings: properties=PropertiesSpec(n_tracers=34) compile_time=SettingsStage(settings_dict={'bgc': {'wrt_his': True, 'output_period_his': 86400, 'nrpf_his': 7, 'wrt_avg': False, 'output_period_avg': 86400, 'nrpf_avg': 7, 'wrt_his_dia': True, 'output_period_his_dia': 86400, 'nrpf_his_dia': 7, 'wrt_avg_dia': False, 'output_period_avg_dia': 60, 'nrpf_avg_dia': 1, 'nbgc_flx': 2, 'interp_frc': 1}, 'blk_frc': {'interp_frc': 1}, 'cppdefs': {'obc_west': True, 'obc_east': True, 'obc_north': True, 'obc_south': True, 'marbl': True}, 'param': {'LLm': 512, 'MMm': 512, 'N': 60, 'NP_XI': 16, 'NP_ETA': 16, 'NSUB_X': 1, 'NSUB_E': 1, 'nt_passive': 0, 'ntrc_bio': 32}, 'ocean_vars': {'wrt_file_rst': True, 'output_period_rst': 43200, 'monthly_restarts': False, 'nrpf_rst': 2, 'wrt_file_his': True, 'output_period_his': 86400, 'nrpf_his': 7, 'wrt_Z': True, 'wrt_Ub': True, 'wrt_Vb': True, 'wrt_U': True, 'wrt_V': True, 'wrt_R': False, 'wrt_O': False, 'wrt_W': True, 'wrt_Akv': False, 'wrt_Akt': False, 'wrt_Aks': False, 'wrt_Hbls': False, 'wrt_Hbbl': False, 'wrt_file_avg': False, 'output_period_avg': 604800, 'nrpf_avg': 1, 'wrt_avg_Z': True, 'wrt_avg_Ub': True, 'wrt_avg_Vb': True, 'wrt_avg_U': True, 'wrt_avg_V': True, 'wrt_avg_R': True, 'wrt_avg_O': True, 'wrt_avg_W': True, 'wrt_avg_Akv': True, 'wrt_avg_Akt': True, 'wrt_avg_Aks': True, 'wrt_avg_Hbls': True, 'wrt_avg_Hbbl': True, 'code_check': False}, 'surf_flux': {'wrt_smflx': False, 'wrt_stflx': False, 'sflx_avg': False, 'output_period': 31536000, 'nrpf': 10, 'sst_vname': 'sst', 'sst_tname': 'sst_time', 'sss_vname': 'sss', 'sss_tname': 'sss_time', 'interp_frc': 1}, 'tides': {'ntides': 10, 'bry_tides': True, 'pot_tides': True, 'ana_tides': False}, 'river_frc': {'river_source': False, 'analytical': False, 'nriv': 0, 'rvol_vname': 'river_volume', 'rvol_tname': 'river_time', 'rtrc_vname': 'river_tracer', 'rtrc_tname': 'river_time'}, 'diagnostics': {'diag_avg': False, 'output_period': 86400, 'nrpf': 7, 'diag_uv': False, 'diag_trc': False, 'diag_pflx': True, 'timescale': 86400, 'diag_prec': 'nf90_double'}, 'tracers': {'interp_t': 1}, 'cdr_frc': {'cdr_source': False, 'cdr_volume': True, 'cdr_analytical': False, 'ncdr': 1, 'cdr_file': 'cdr_forcing.nc', 'cdrvol_vname': 'cdr_volume', 'cdrvol_tname': 'cdr_time', 'cdrtrc_vname': 'cdr_tracer', 'cdrtrc_tname': 'cdr_time', 'cdrflx_vname': 'cdr_trcflx', 'cdrflx_tname': 'cdr_time', 'cdr_loc_lon': 'cdr_lon', 'cdr_loc_lat': 'cdr_lat', 'cdr_loc_dep': 'cdr_dep', 'cdr_scl_hor': 'cdr_hsc', 'cdr_scl_vrt': 'cdr_vsc'}}) run_time=SettingsStage(settings_dict={'roms.in': {'title': {'casename': None}, 'time_stepping': {'ntimes': 1200, 'dt': 2160, 'ndtfast': 60, 'ninfo': 1}, 's_coord': {'theta_s': 5.0, 'theta_b': 2.0, 'tcline': 300.0}, 'grid': {'grid_file': None}, 'forcing': {'surface_forcing_path': None, 'surface_forcing_bgc_path': None, 'boundary_forcing_path': None, 'boundary_forcing_bgc_path': None, 'river_path': None}, 'lateral_visc': {'visc2': 0.0, 'visc4': 0.0, 'rho0': 1000.0}, 'output_root_name': {'output_root_name': None}, 'vertical_mixing': {'akv': 0.0, 'akt_default': 0.0}, 'tracer_diff2': {'tnu2_default': 0.0}, 'bottom_drag': {'rdrg': 0.0, 'rdrg2': 0.001, 'zob': 0.01, 'cdb_min': 0.0001, 'cdb_max': 0.01}, 'v_sponge': {'v_sponge': 0.0}, 'gamma2': 1.0, 'ubind': 0.1, 'initial': {'nrrec': 1, 'initial_file': None}}})
  - code: roms=CodeRepository(documentation='', locked=False, location='https://github.com/CWorthy-ocean/ucla-roms.git', commit='', branch='main', filter=None) run_time=CodeRepository(documentation='', locked=False, location='placeholder://run_time', commit='', branch='main', filter=None) compile_time=CodeRepository(documentation='', locked=False, location='placeholder://compile_time', commit='', branch='main', filter=None) marbl=CodeRepository(documentation='', locked=False, location='https://github.com/marbl-ecosys/MARBL.git', commit='marbl0.45.0', branch='', filter=None)
  - inputs: grid=GridInput(topography_source='ETOPO5') initial_conditions=InitialConditionsInput(source=SourceSpec(name='GLORYS', climatology=False), bgc_source=SourceSpec(name='UNIFIED', climatology=True)) forcing=ForcingInput(surface=[SurfaceForcingItem(source=SourceSpec(name='ERA5', climatology=False), type='physics', correct_radiation=True), SurfaceForcingItem(source=SourceSpec(name='UNIFIED', climatology=True), type='bgc', correct_radiation=False)], boundary=[BoundaryForcingItem(source=SourceSpec(name='GLORYS', climatology=False), type='physics'), BoundaryForcingItem(source=SourceSpec(name='UNIFIED', climatology=True), type='bgc')], tidal=[TidalForcingItem(source=SourceSpec(name='TPXO', climatology=False), ntides=15)], river=[RiverForcingItem(source=SourceSpec(name='DAI', climatology=False), include_bgc=True)])
  - datasets: ['ERA5', 'GLORYS_REGIONAL', 'TPXO', 'UNIFIED_BGC']
/var/folders/x8/7n8hknbj717fxnf07pnk3pch0000gn/T/ipykernel_11022/1169389429.py:3: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
  for attr in model_spec.model_fields.keys():

Templates Specification

The templates field defines where Jinja2 templates are located for compile-time and run-time configuration files.

if model_spec.templates:
    print("Templates Specification:")
    print(f"  Compile-time location: {model_spec.templates.compile_time.location}")
    print(f"  Compile-time files: {model_spec.templates.compile_time.filter.files}")
    print(f"\n  Run-time location: {model_spec.templates.run_time.location}")
    print(f"  Run-time files: {model_spec.templates.run_time.filter.files}")
else:
    print("No templates specification")
Templates Specification:
  Compile-time location: /Users/mclong/codes/cson-forge/cson_forge/model-configs/cson_roms-marbl_v0.1/templates/compile-time
  Compile-time files: ['bgc.opt.j2', 'blk_frc.opt.j2', 'cdr_frc.opt.j2', 'cppdefs.opt.j2', 'diagnostics.opt.j2', 'ocean_vars.opt.j2', 'param.opt.j2', 'river_frc.opt.j2', 'surf_flux.opt.j2', 'tides.opt.j2', 'tracers.opt.j2', 'Makefile']

  Run-time location: /Users/mclong/codes/cson-forge/cson_forge/model-configs/cson_roms-marbl_v0.1/templates/run-time
  Run-time files: ['roms.in.j2', 'marbl_in', 'marbl_tracer_output_list', 'marbl_diagnostic_output_list']

Settings Specification

The settings field defines default settings and configuration file locations for compile-time and run-time stages.

if model_spec.settings:
    print("Settings Specification:")
    print(f"  Properties: {model_spec.settings.properties}")
    print(f"  Compile-time defaults: {model_spec.settings.compile_time._default_config_yaml}")
    print(f"  Run-time defaults: {model_spec.settings.run_time._default_config_yaml}")
else:
    print("No settings specification")
Settings Specification:
  Properties: n_tracers=34
  Compile-time defaults: /Users/mclong/codes/cson-forge/cson_forge/model-configs/cson_roms-marbl_v0.1/templates/compile-time-defaults.yml
  Run-time defaults: /Users/mclong/codes/cson-forge/cson_forge/model-configs/cson_roms-marbl_v0.1/templates/run-time-defaults.yml

Code Repository Specification

The code field defines the code repositories (ROMS, MARBL) and their locations, branches, and commits.

print("Code Repository Specification:")
print(f"  ROMS location: {model_spec.code.roms.location}")
print(f"  ROMS branch: {model_spec.code.roms.branch}")

if model_spec.code.marbl:
    print(f"  MARBL location: {model_spec.code.marbl.location}")
    print(f"  MARBL commit: {model_spec.code.marbl.commit}")
else:
    print("  MARBL: Not specified")
Code Repository Specification:
  ROMS location: https://github.com/CWorthy-ocean/ucla-roms.git
  ROMS branch: main
  MARBL location: https://github.com/marbl-ecosys/MARBL.git
  MARBL commit: marbl0.45.0

Inputs Specification

The inputs field defines default specifications for grid, initial conditions, and forcing data. These serve as defaults when generating inputs.

print("Inputs Specification:")
print(f"\nGrid:")
print(f"  Topography source: {model_spec.inputs.grid.topography_source}")

print(f"\nInitial Conditions:")
print(f"  Source: {model_spec.inputs.initial_conditions.source}")
if model_spec.inputs.initial_conditions.bgc_source:
    print(f"  BGC source: {model_spec.inputs.initial_conditions.bgc_source}")

print(f"\nForcing:")
if model_spec.inputs.forcing:
    if model_spec.inputs.forcing.surface:
        print(f"  Surface forcing ({len(model_spec.inputs.forcing.surface)} sources):")
        for i, surf in enumerate(model_spec.inputs.forcing.surface, 1):
            print(f"    {i}. {surf.source.name} ({surf.type})")
    
    if model_spec.inputs.forcing.boundary:
        print(f"  Boundary forcing ({len(model_spec.inputs.forcing.boundary)} sources):")
        for i, bnd in enumerate(model_spec.inputs.forcing.boundary, 1):
            print(f"    {i}. {bnd.source.name} ({bnd.type})")
    
    if model_spec.inputs.forcing.tidal:
        print(f"  Tidal forcing ({len(model_spec.inputs.forcing.tidal)} sources):")
        for i, tide in enumerate(model_spec.inputs.forcing.tidal, 1):
            print(f"    {i}. {tide.source.name} (ntides: {tide.ntides})")
    
    if model_spec.inputs.forcing.river:
        print(f"  River forcing ({len(model_spec.inputs.forcing.river)} sources):")
        for i, riv in enumerate(model_spec.inputs.forcing.river, 1):
            print(f"    {i}. {riv.source.name} (include_bgc: {riv.include_bgc})")
Inputs Specification:

Grid:
  Topography source: ETOPO5

Initial Conditions:
  Source: name='GLORYS' climatology=False
  BGC source: name='UNIFIED' climatology=True

Forcing:
  Surface forcing (2 sources):
    1. ERA5 (physics)
    2. UNIFIED (bgc)
  Boundary forcing (2 sources):
    1. GLORYS (physics)
    2. UNIFIED (bgc)
  Tidal forcing (1 sources):
    1. TPXO (ntides: 15)
  River forcing (1 sources):
    1. DAI (include_bgc: True)

Required Datasets

The datasets field lists all source datasets required by this model configuration. These are derived from the inputs specification.

print("Required Datasets:")
print(f"  Total: {len(model_spec.datasets)}")
for i, dataset in enumerate(model_spec.datasets, 1):
    print(f"    {i}. {dataset}")
Required Datasets:
  Total: 4
    1. ERA5
    2. GLORYS_REGIONAL
    3. TPXO
    4. UNIFIED_BGC

Accessing Nested Fields

You can access nested fields using dot notation. Here are some examples:

# Examples of accessing nested fields
print("Example field access:")
print(f"  ROMS repository URL: {model_spec.code.roms.location}")
print(f"  Number of tracers: {model_spec.settings.properties.n_tracers}")
print(f"  First surface forcing source: {model_spec.inputs.forcing.surface[0].source.name}")
print(f"  Grid topography source: {model_spec.inputs.grid.topography_source}")
Example field access:
  ROMS repository URL: https://github.com/CWorthy-ocean/ucla-roms.git
  Number of tracers: 34
  First surface forcing source: ERA5
  Grid topography source: ETOPO5

ModelSpec as Dictionary

You can convert the ModelSpec to a dictionary for inspection or serialization:

# Convert to dictionary (Pydantic model_dump)
model_dict = model_spec.model_dump()

print("ModelSpec as dictionary (top-level keys):")
for key in model_dict.keys():
    print(f"  - {key}")

# You can also use model_dump_json() for JSON serialization
# import json
# json_str = model_spec.model_dump_json(indent=2)
ModelSpec as dictionary (top-level keys):
  - name
  - templates
  - settings
  - code
  - inputs
  - datasets

Summary

The ModelSpec provides a structured, validated representation of model configurations:

  • Type-safe: Pydantic models provide validation and type checking

  • Accessible: Use dot notation to access nested fields

  • Serializable: Convert to dict/JSON for storage or inspection

  • Complete: Contains all information needed to configure and build a model

This specification is used by CstarSpecBuilder to configure model builds and input generation.