The catalog module provides API-driven access to blueprint information stored in the blueprints directory. This allows you to discover, query, and load blueprint data for instantiating OcnModel objects.
Overview¶
The BlueprintCatalog class provides methods to:
Discover blueprint files in the blueprints directory (pattern:
B_*.yml)Load individual blueprint YAML files
Extract grid parameters from grid YAML files (
_grid.yml)Load all blueprints into a pandas DataFrame with extracted model/grid names, dates, partitioning, and paths
Filter blueprints by stage (preconfig, postconfig, build, run)
Basic Usage¶
The module provides a convenience instance blueprint that you can use directly:
%load_ext autoreload
%autoreload 2The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
from cson_forge import catalogFinding Blueprint Files¶
You can find all blueprint files in the blueprints directory:
# Find all blueprint files (defaults to all stages)
blueprint_files = catalog.blueprint.find_blueprint_files()
print(f"Found {len(blueprint_files)} blueprint files:")
for bp_file in blueprint_files[:10]: # Show first 10
print(f" - {bp_file.name}")
# You can also filter by stage
postconfig_files = catalog.blueprint.find_blueprint_files(stage="postconfig")
print(f"\nFound {len(postconfig_files)} postconfig blueprint files")
Found 13 blueprint files:
- B_cson_roms-marbl_v0.1_ccs-12km_build.yml
- B_cson_roms-marbl_v0.1_ccs-12km_postconfig.yml
- B_cson_roms-marbl_v0.1_ccs-12km_preconfig.yml
- B_cson_roms-marbl_v0.1_gulf-guinea-toy_build.yml
- B_cson_roms-marbl_v0.1_gulf-guinea-toy_postconfig.yml
- B_cson_roms-marbl_v0.1_gulf-guinea-toy_preconfig.yml
- B_cson_roms-marbl_v0.1_hvalfjörður-0_preconfig.yml
- B_cson_roms-marbl_v0.1_test-tiny_build.yml
- B_cson_roms-marbl_v0.1_test-tiny_postconfig.yml
- B_cson_roms-marbl_v0.1_test-tiny_preconfig.yml
Found 3 postconfig blueprint files
Loading a Single Blueprint¶
You can load and inspect a single blueprint file:
# Load a single blueprint
if blueprint_files:
bp_data = catalog.blueprint.load_blueprint(blueprint_files[0])
blueprint_name = bp_data.get('name', '')
model_name, grid_name = catalog.blueprint._extract_model_and_grid_name(blueprint_name)
partitioning = bp_data.get('partitioning', {})
print(f"Blueprint name: {blueprint_name}")
print(f"Model name: {model_name}")
print(f"Grid name: {grid_name}")
print(f"Description: {bp_data.get('description')}")
print(f"Start time: {bp_data.get('valid_start_date')}")
print(f"End time: {bp_data.get('valid_end_date')}")
if isinstance(partitioning, dict):
print(f"Processors: {partitioning.get('n_procs_x')} x {partitioning.get('n_procs_y')}")
Blueprint name: cson_roms-marbl_v0.1_ccs-12km
Model name: cson_roms-marbl_v0.1
Grid name: ccs-12km
Description: California Current System
Start time: 2024-01-01T00:00:00
End time: 2024-01-02T00:00:00
Processors: 16 x 20
Loading Grid Parameters¶
You can extract grid keyword arguments from a grid YAML file:
# Load grid kwargs from a blueprint
# Grid YAML files are typically in the same directory as the blueprint
if blueprint_files:
bp_file = blueprint_files[0]
bp_data = catalog.blueprint.load_blueprint(bp_file)
# Look for _grid.yml in the same directory as the blueprint
grid_yaml_path = bp_file.parent / "_grid.yml"
if grid_yaml_path.exists():
try:
grid_kwargs = catalog.blueprint.load_grid_kwargs(grid_yaml_path)
print("Grid parameters:")
for key, value in grid_kwargs.items():
print(f" {key}: {value}")
except Exception as e:
print(f"Could not load grid kwargs: {e}")
else:
print(f"Grid YAML file not found at {grid_yaml_path}")
print("Grid parameters may be available in the DataFrame after calling load()")
Grid parameters:
nx: 224
ny: 440
size_x: 2688
size_y: 5280
center_lon: -134.5
center_lat: 39.6
rot: 33.3
N: 100
theta_s: 6.0
theta_b: 6.0
hc: 250
topography_source: {'name': 'ETOPO5'}
mask_shapefile: None
hmin: 5.0
Loading All Blueprints into a DataFrame¶
The main feature is the load() method, which returns a pandas DataFrame with all data necessary to instantiate OcnModel objects:
# Load all blueprints into a DataFrame
# Defaults to 'postconfig' stage which has the most complete data
df = catalog.blueprint.load(stage="postconfig")
print(f"Loaded {len(df)} blueprints")
print(f"\nDataFrame columns: {list(df.columns)}")
print(f"\nDataFrame shape: {df.shape}")
# Display the DataFrame (excluding dict columns for readability)
display_cols = [col for col in df.columns if col not in ['grid_kwargs']]
df[display_cols]Loaded 3 blueprints
DataFrame columns: ['model_name', 'grid_name', 'blueprint_name', 'description', 'start_time', 'end_time', 'np_eta', 'np_xi', 'grid_kwargs', 'blueprint_path', 'grid_yaml_path', 'input_data_dir', 'stage']
DataFrame shape: (3, 13)
Inspecting the DataFrame¶
Let’s look at the structure of the DataFrame:
# Display basic information about the DataFrame
if not df.empty:
print("DataFrame info:")
print(df.info())
print("\nFirst few rows:")
# Display non-dict columns for readability
display_cols = [col for col in df.columns if col not in ['grid_kwargs']]
display(df[display_cols].head())
DataFrame info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 model_name 3 non-null object
1 grid_name 3 non-null object
2 blueprint_name 3 non-null object
3 description 3 non-null object
4 start_time 3 non-null object
5 end_time 3 non-null object
6 np_eta 3 non-null int64
7 np_xi 3 non-null int64
8 grid_kwargs 3 non-null object
9 blueprint_path 3 non-null object
10 grid_yaml_path 3 non-null object
11 input_data_dir 3 non-null object
12 stage 3 non-null object
dtypes: int64(2), object(11)
memory usage: 444.0+ bytes
None
First few rows:
Viewing Grid Parameters¶
The grid_kwargs column contains dictionaries with grid parameters:
# Display grid kwargs for the first blueprint
if not df.empty:
first_row = df.iloc[0]
print(f"Grid kwargs for {first_row['grid_name']}:")
grid_kwargs = first_row['grid_kwargs']
if isinstance(grid_kwargs, dict):
for key, value in grid_kwargs.items():
print(f" {key}: {value}")
Grid kwargs for ccs-12km:
nx: 224
ny: 440
size_x: 2688
size_y: 5280
center_lon: -134.5
center_lat: 39.6
rot: 33.3
N: 100
theta_s: 6.0
theta_b: 6.0
hc: 250
topography_source: {'name': 'ETOPO5'}
mask_shapefile: None
hmin: 5.0
Querying the DataFrame¶
You can query the DataFrame to find specific blueprints:
# Query by model name
if not df.empty:
model_name = df['model_name'].iloc[0] if 'model_name' in df.columns else None
if model_name:
model_blueprints = df[df['model_name'] == model_name]
print(f"Found {len(model_blueprints)} blueprints for model '{model_name}':")
print(model_blueprints[['grid_name', 'start_time', 'end_time']].to_string())
# Query by grid name
if not df.empty and 'grid_name' in df.columns:
grid_name = df['grid_name'].iloc[0]
grid_blueprints = df[df['grid_name'] == grid_name]
print(f"\nFound {len(grid_blueprints)} blueprints for grid '{grid_name}':")
print(grid_blueprints[['model_name', 'start_time', 'end_time']].to_string())
Found 3 blueprints for model 'cson_roms-marbl_v0.1':
grid_name start_time end_time
0 ccs-12km 2024-01-01T00:00:00 2024-01-02T00:00:00
1 gulf-guinea-toy 2012-01-01T00:00:00 2012-01-02T00:00:00
2 test-tiny 2012-01-01T00:00:00 2012-01-02T00:00:00
Found 1 blueprints for grid 'ccs-12km':
model_name start_time end_time
0 cson_roms-marbl_v0.1 2024-01-01T00:00:00 2024-01-02T00:00:00
Instantiating OcnModel from DataFrame¶
The DataFrame contains all the data needed to instantiate OcnModel objects. Here’s how to use it:
# Example: Using blueprint data to create a CstarSpecBuilder
from cson_forge import CstarSpecBuilder
from datetime import datetime
# Example: Create a builder from blueprint data
if not df.empty:
row = df.iloc[0]
# Note: To recreate a builder from a blueprint, you would need to:
# 1. Load the blueprint file using CstarSpecBuilder.from_blueprint() (if such method exists)
# 2. Or manually extract data from the DataFrame and create a new builder
# 3. The blueprint_path column contains the path to the blueprint file
print(f"Blueprint path: {row['blueprint_path']}")
print(f"Model: {row['model_name']}, Grid: {row['grid_name']}")
print(f"Time range: {row['start_time']} to {row['end_time']}")
# TODO: Add method to CstarSpecBuilder to load from existing blueprint file
Blueprint path: /Users/mclong/codes/cson-forge/cson_forge/blueprints/cson_roms-marbl_v0.1_ccs-12km/B_cson_roms-marbl_v0.1_ccs-12km_postconfig.yml
Model: cson_roms-marbl_v0.1, Grid: ccs-12km
Time range: 2024-01-01T00:00:00 to 2024-01-02T00:00:00
Summary¶
The catalog.blueprint module provides:
Discovery: Find all blueprint files in the blueprints directory (pattern:
B_*.yml)Loading: Load individual blueprints or all blueprints at once (with optional stage filtering)
Data Extraction: Extract model/grid names, dates, partitioning, and grid parameters
DataFrame Interface: Get all blueprint data in a pandas DataFrame for easy querying
Grid Parameters: Load grid keyword arguments from
_grid.ymlfiles when available
This makes it easy to:
Query existing blueprints by model, grid, or stage
Compare configurations across different domains
Access blueprint metadata (names, dates, partitioning, paths)
Build analysis workflows that work with multiple domains
Programmatically work with stored blueprint configurations