Skip to content

catalog

Classes:

  • DataCatalog

    DataCatalog class for finding and loading NNJA datasets.

DataCatalog

DataCatalog(
    mirror=DEFAULT_MIRROR, base_path=None, catalog_json=None
)

DataCatalog class for finding and loading NNJA datasets.

The DataCatalog represents a collection of NNJADataset objects, and provides some basic search/list functionality.

Attributes:

  • base_path (str) –

    Base path for resolving relative URIs.

  • catalog_uri (str) –

    Full URI to the catalog JSON file.

  • catalog_metadata (dict) –

    Metadata of the catalog, loaded from the JSON file.

  • datasets (dict) –

    Dictionary of dataset instances or subtypes.

Parameters:

  • mirror (Optional[str], default: DEFAULT_MIRROR ) –

    Name of predefined mirror to use (e.g., 'gcp_nodd', 'aws_opendata').

  • base_path (Optional[str], default: None ) –

    Custom base path for resolving relative URIs. Cannot be used with mirror.

  • catalog_json (Optional[str], default: None ) –

    Custom catalog JSON path. Cannot be used with mirror.

Note

Dataset manifests are now loaded lazily on first access for better performance.

Raises:

  • ValueError

    If both mirror and custom parameters are specified.

Methods:

  • __getitem__

    Fetch a specific dataset by name.

  • info

    Provide information about the catalog.

  • list_datasets

    List all dataset groups.

  • search

    Search datasets by name, tags, description, or variables.

__getitem__

__getitem__(dataset_name)

Fetch a specific dataset by name.

Parameters:

  • dataset_name (str) –

    The name of the dataset to fetch.

Returns:

info

info()

Provide information about the catalog.

list_datasets

list_datasets()

List all dataset groups.

search

search(query_term)

Search datasets by name, tags, description, or variables.

Parameters:

  • query_term (str) –

    The term to search for.

Returns:

  • list ( list ) –

    A list of NNJADataset objects matching the search term.