Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Blueprint Catalog API

The catalog module provides API-driven access to blueprint information stored in the blueprints directory. This allows you to discover, query, and load blueprint data for instantiating OcnModel objects.

Overview

The BlueprintCatalog class provides methods to:

  • Discover blueprint files in the blueprints directory (pattern: B_*.yml)

  • Load individual blueprint YAML files

  • Extract grid parameters from grid YAML files (_grid.yml)

  • Load all blueprints into a pandas DataFrame with extracted model/grid names, dates, partitioning, and paths

  • Filter blueprints by stage (preconfig, postconfig, build, run)

Basic Usage

The module provides a convenience instance blueprint that you can use directly:

%load_ext autoreload
%autoreload 2
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
from cson_forge import catalog

Finding Blueprint Files

You can find all blueprint files in the blueprints directory:

# Find all blueprint files (defaults to all stages)
blueprint_files = catalog.blueprint.find_blueprint_files()
print(f"Found {len(blueprint_files)} blueprint files:")
for bp_file in blueprint_files[:10]:  # Show first 10
    print(f"  - {bp_file.name}")

# You can also filter by stage
postconfig_files = catalog.blueprint.find_blueprint_files(stage="postconfig")
print(f"\nFound {len(postconfig_files)} postconfig blueprint files")
Found 13 blueprint files:
  - B_cson_roms-marbl_v0.1_ccs-12km_build.yml
  - B_cson_roms-marbl_v0.1_ccs-12km_postconfig.yml
  - B_cson_roms-marbl_v0.1_ccs-12km_preconfig.yml
  - B_cson_roms-marbl_v0.1_gulf-guinea-toy_build.yml
  - B_cson_roms-marbl_v0.1_gulf-guinea-toy_postconfig.yml
  - B_cson_roms-marbl_v0.1_gulf-guinea-toy_preconfig.yml
  - B_cson_roms-marbl_v0.1_hvalfjörður-0_preconfig.yml
  - B_cson_roms-marbl_v0.1_test-tiny_build.yml
  - B_cson_roms-marbl_v0.1_test-tiny_postconfig.yml
  - B_cson_roms-marbl_v0.1_test-tiny_preconfig.yml

Found 3 postconfig blueprint files

Loading a Single Blueprint

You can load and inspect a single blueprint file:

# Load a single blueprint
if blueprint_files:
    bp_data = catalog.blueprint.load_blueprint(blueprint_files[0])
    blueprint_name = bp_data.get('name', '')
    model_name, grid_name = catalog.blueprint._extract_model_and_grid_name(blueprint_name)
    partitioning = bp_data.get('partitioning', {})
    
    print(f"Blueprint name: {blueprint_name}")
    print(f"Model name: {model_name}")
    print(f"Grid name: {grid_name}")
    print(f"Description: {bp_data.get('description')}")
    print(f"Start time: {bp_data.get('valid_start_date')}")
    print(f"End time: {bp_data.get('valid_end_date')}")
    if isinstance(partitioning, dict):
        print(f"Processors: {partitioning.get('n_procs_x')} x {partitioning.get('n_procs_y')}")
Blueprint name: cson_roms-marbl_v0.1_ccs-12km
Model name: cson_roms-marbl_v0.1
Grid name: ccs-12km
Description: California Current System
Start time: 2024-01-01T00:00:00
End time: 2024-01-02T00:00:00
Processors: 16 x 20

Loading Grid Parameters

You can extract grid keyword arguments from a grid YAML file:

# Load grid kwargs from a blueprint
# Grid YAML files are typically in the same directory as the blueprint
if blueprint_files:
    bp_file = blueprint_files[0]
    bp_data = catalog.blueprint.load_blueprint(bp_file)

    # Look for _grid.yml in the same directory as the blueprint
    grid_yaml_path = bp_file.parent / "_grid.yml"
    
    if grid_yaml_path.exists():
        try:
            grid_kwargs = catalog.blueprint.load_grid_kwargs(grid_yaml_path)
            print("Grid parameters:")                
            for key, value in grid_kwargs.items():
                print(f"  {key}: {value}")
        except Exception as e:
            print(f"Could not load grid kwargs: {e}")
    else:
        print(f"Grid YAML file not found at {grid_yaml_path}")
        print("Grid parameters may be available in the DataFrame after calling load()")
Grid parameters:
  nx: 224
  ny: 440
  size_x: 2688
  size_y: 5280
  center_lon: -134.5
  center_lat: 39.6
  rot: 33.3
  N: 100
  theta_s: 6.0
  theta_b: 6.0
  hc: 250
  topography_source: {'name': 'ETOPO5'}
  mask_shapefile: None
  hmin: 5.0

Loading All Blueprints into a DataFrame

The main feature is the load() method, which returns a pandas DataFrame with all data necessary to instantiate OcnModel objects:

# Load all blueprints into a DataFrame
# Defaults to 'postconfig' stage which has the most complete data
df = catalog.blueprint.load(stage="postconfig")

print(f"Loaded {len(df)} blueprints")
print(f"\nDataFrame columns: {list(df.columns)}")
print(f"\nDataFrame shape: {df.shape}")

# Display the DataFrame (excluding dict columns for readability)
display_cols = [col for col in df.columns if col not in ['grid_kwargs']]
df[display_cols]
Loaded 3 blueprints

DataFrame columns: ['model_name', 'grid_name', 'blueprint_name', 'description', 'start_time', 'end_time', 'np_eta', 'np_xi', 'grid_kwargs', 'blueprint_path', 'grid_yaml_path', 'input_data_dir', 'stage']

DataFrame shape: (3, 13)
Loading...

Inspecting the DataFrame

Let’s look at the structure of the DataFrame:

# Display basic information about the DataFrame
if not df.empty:
    print("DataFrame info:")
    print(df.info())
    
    print("\nFirst few rows:")
    # Display non-dict columns for readability
    display_cols = [col for col in df.columns if col not in ['grid_kwargs']]
    display(df[display_cols].head())
DataFrame info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   model_name      3 non-null      object
 1   grid_name       3 non-null      object
 2   blueprint_name  3 non-null      object
 3   description     3 non-null      object
 4   start_time      3 non-null      object
 5   end_time        3 non-null      object
 6   np_eta          3 non-null      int64 
 7   np_xi           3 non-null      int64 
 8   grid_kwargs     3 non-null      object
 9   blueprint_path  3 non-null      object
 10  grid_yaml_path  3 non-null      object
 11  input_data_dir  3 non-null      object
 12  stage           3 non-null      object
dtypes: int64(2), object(11)
memory usage: 444.0+ bytes
None

First few rows:
Loading...

Viewing Grid Parameters

The grid_kwargs column contains dictionaries with grid parameters:

# Display grid kwargs for the first blueprint
if not df.empty:
    first_row = df.iloc[0]
    print(f"Grid kwargs for {first_row['grid_name']}:")
    grid_kwargs = first_row['grid_kwargs']
    if isinstance(grid_kwargs, dict):
        for key, value in grid_kwargs.items():
            print(f"  {key}: {value}")
Grid kwargs for ccs-12km:
  nx: 224
  ny: 440
  size_x: 2688
  size_y: 5280
  center_lon: -134.5
  center_lat: 39.6
  rot: 33.3
  N: 100
  theta_s: 6.0
  theta_b: 6.0
  hc: 250
  topography_source: {'name': 'ETOPO5'}
  mask_shapefile: None
  hmin: 5.0

Querying the DataFrame

You can query the DataFrame to find specific blueprints:

# Query by model name
if not df.empty:
    model_name = df['model_name'].iloc[0] if 'model_name' in df.columns else None
    if model_name:
        model_blueprints = df[df['model_name'] == model_name]
        print(f"Found {len(model_blueprints)} blueprints for model '{model_name}':")
        print(model_blueprints[['grid_name', 'start_time', 'end_time']].to_string())

# Query by grid name
if not df.empty and 'grid_name' in df.columns:
    grid_name = df['grid_name'].iloc[0]
    grid_blueprints = df[df['grid_name'] == grid_name]
    print(f"\nFound {len(grid_blueprints)} blueprints for grid '{grid_name}':")
    print(grid_blueprints[['model_name', 'start_time', 'end_time']].to_string())
Found 3 blueprints for model 'cson_roms-marbl_v0.1':
         grid_name           start_time             end_time
0         ccs-12km  2024-01-01T00:00:00  2024-01-02T00:00:00
1  gulf-guinea-toy  2012-01-01T00:00:00  2012-01-02T00:00:00
2        test-tiny  2012-01-01T00:00:00  2012-01-02T00:00:00

Found 1 blueprints for grid 'ccs-12km':
             model_name           start_time             end_time
0  cson_roms-marbl_v0.1  2024-01-01T00:00:00  2024-01-02T00:00:00

Instantiating OcnModel from DataFrame

The DataFrame contains all the data needed to instantiate OcnModel objects. Here’s how to use it:

# Example: Using blueprint data to create a CstarSpecBuilder
from cson_forge import CstarSpecBuilder
from datetime import datetime

# Example: Create a builder from blueprint data
if not df.empty:
    row = df.iloc[0]
    
    # Note: To recreate a builder from a blueprint, you would need to:
    # 1. Load the blueprint file using CstarSpecBuilder.from_blueprint() (if such method exists)
    # 2. Or manually extract data from the DataFrame and create a new builder
    # 3. The blueprint_path column contains the path to the blueprint file
    
    print(f"Blueprint path: {row['blueprint_path']}")
    print(f"Model: {row['model_name']}, Grid: {row['grid_name']}")
    print(f"Time range: {row['start_time']} to {row['end_time']}")
    
    # TODO: Add method to CstarSpecBuilder to load from existing blueprint file
    
Blueprint path: /Users/mclong/codes/cson-forge/cson_forge/blueprints/cson_roms-marbl_v0.1_ccs-12km/B_cson_roms-marbl_v0.1_ccs-12km_postconfig.yml
Model: cson_roms-marbl_v0.1, Grid: ccs-12km
Time range: 2024-01-01T00:00:00 to 2024-01-02T00:00:00

Summary

The catalog.blueprint module provides:

  1. Discovery: Find all blueprint files in the blueprints directory (pattern: B_*.yml)

  2. Loading: Load individual blueprints or all blueprints at once (with optional stage filtering)

  3. Data Extraction: Extract model/grid names, dates, partitioning, and grid parameters

  4. DataFrame Interface: Get all blueprint data in a pandas DataFrame for easy querying

  5. Grid Parameters: Load grid keyword arguments from _grid.yml files when available

This makes it easy to:

  • Query existing blueprints by model, grid, or stage

  • Compare configurations across different domains

  • Access blueprint metadata (names, dates, partitioning, paths)

  • Build analysis workflows that work with multiple domains

  • Programmatically work with stored blueprint configurations