catalog
Classes:
-
DataCatalog–DataCatalog class for finding and loading NNJA datasets.
DataCatalog
DataCatalog(
mirror=DEFAULT_MIRROR, base_path=None, catalog_json=None
)
DataCatalog class for finding and loading NNJA datasets.
The DataCatalog represents a collection of NNJADataset objects, and provides some basic search/list functionality.
Attributes:
-
base_path(str) –Base path for resolving relative URIs.
-
catalog_uri(str) –Full URI to the catalog JSON file.
-
catalog_metadata(dict) –Metadata of the catalog, loaded from the JSON file.
-
datasets(dict) –Dictionary of dataset instances or subtypes.
Parameters:
-
mirror(Optional[str], default:DEFAULT_MIRROR) –Name of predefined mirror to use (e.g., 'gcp_nodd', 'aws_opendata').
-
base_path(Optional[str], default:None) –Custom base path for resolving relative URIs. Cannot be used with mirror.
-
catalog_json(Optional[str], default:None) –Custom catalog JSON path. Cannot be used with mirror.
Note
Dataset manifests are now loaded lazily on first access for better performance.
Raises:
-
ValueError–If both mirror and custom parameters are specified.
Methods:
-
__getitem__–Fetch a specific dataset by name.
-
info–Provide information about the catalog.
-
list_datasets–List all dataset groups.
-
search–Search datasets by name, tags, description, or variables.
__getitem__
__getitem__(dataset_name)
Fetch a specific dataset by name.
Parameters:
-
dataset_name(str) –The name of the dataset to fetch.
Returns:
-
NNJADataset(NNJADataset) –The dataset object.
info
info()
Provide information about the catalog.
list_datasets
list_datasets()
List all dataset groups.
search
search(query_term)
Search datasets by name, tags, description, or variables.
Parameters:
-
query_term(str) –The term to search for.
Returns:
-
list(list) –A list of NNJADataset objects matching the search term.