unesco_reader
unesco_reader
A package for reading data from the UNESCO Institute for Statistics API, with additional functionality and convenience, including error handling, filtering, and basic pandas support.
Usage:
Import the package: >>> import unesco_reader as uis
Get data for an indicator and geo unit: >>> df = uis.get_data(“CR.1”, “ZWE”)
Get data with additional fields like indicator and geo unit names, and footnotes: >>> df = uis.get_data(“CR.1”, “ZWE”, footnotes=True, labels=True)
Get metadata for an indicator: >>> metadata = uis.get_metadata(“CR.1”)
Get metadata with disaggregations and glossary terms: >>> metadata = uis.get_metadata(“CR.1”, disaggregations=True, glossaryTerms=True)
Get available indicators: >>> indicators = uis.available_indicators()
Get available indicators for a specific theme and with data starting at least in 2010: >>> indicators = uis.available_indicators(theme=”education”, minStart=2010)
Get available geo units: >>> geo_units = uis.available_geo_units()
Get available regional geo units: >>> geo_units = uis.available_geo_units(geoUnitType=”REGIONAL”)
Get available themes: >>> themes = uis.available_themes()
Get available data versions: >>> versions = uis.available_versions()
Get the default data version: >>> default_version = uis.default_version()
A basic thin wrapper around all the API endpoints is available in the api module. This module does not provide any additional functionality and mirrors the API endpoints directly.
Additional information: - API definition endpoints (indicators, geo units, versions) are cached in memory for the session. Use clear_cache() to invalidate. - Transient network errors are retried once by default. Use set_max_retries() to configure. - Field names are not modified and are returned as they are from the API. - Currently there are no rate limits for the API other than 100,000 row response limit. This package does not implement any multithreading or async functionality to handle this limit, as the intended usage for the API is to make smaller requests for specific data points.
Submodules
Functions
|
Clear all caches used by the package. |
|
Set the number of retries for transient network errors. |
|
Get UIS data |
|
Get the metadata for indicators |
|
Get available indicators |
|
Get available geo units |
|
Get available data versions and basic information including publication date and description |
|
Get the default data version |
|
Get the available themes and basic information including latest update and description |
Package Contents
- unesco_reader.clear_cache() None
Clear all caches used by the package.
- unesco_reader.set_max_retries(retries: int) None
Set the number of retries for transient network errors.
Controls how many times a request is retried on timeouts, connection errors, and 502/503/504 responses before raising an exception. Set to 0 to disable retries.
- Parameters:
retries – Number of retries. Must be a non-negative integer. Default is 1.
- Raises:
ValueError – If retries is negative.
- unesco_reader.get_data(indicator: str | list[str] | None = None, geoUnit: str | list[str] | None = None, start: int | None = None, end: int | None = None, labels: bool = False, geoUnitType: unesco_reader.config.GeoUnitType | None = None, footnotes: bool = False, *, raw: bool = False, version: str | None = None) pandas.DataFrame | list[dict]
Get UIS data
Query the UIS API for data based on the given parameters. At least one indicator or one geo_unit must be provided. If only indicators are provided, data for all geographies is returned, and vice versa. To see available indicators or geographies, use the available_indicators or available_geo_units functions respectively. If both a geo_unit and geo_unit_type are provided, the geo_unit_type is ignored.
- Parameters:
indicator – The indicator code or name to request data for. If None, data for all indicators is returned. By default, None. To see all available indicators, use the available_indicators function.
geoUnit – The geo unit code or name to request data for. If None, data for all geo units is returned. By default, None. To see all available geo units, use the available_geo_units function.
start – The start year to request data for. Includes the year itself. Default is None, which returns the earliest available year.
end – The end year to request data for. Includes the year itself. Default is None, which returns the latest available year.
labels – If True, adds indicator and geo unit labels to the data. Default is False.
geoUnitType – The type of geography to request data for. Allowed values are NATIONAL and REGIONAL. If geoUnit is provided, this parameter is ignored. Default is both national and regional data
footnotes – If True, includes footnotes in the response. Default is False.
raw – If True, returns the data as a list of dictionaries in the original format from the API. Default is False.
version – The data version to use. Default uses the latest default version.
- Returns:
A pandas DataFrame with the data or a list of dictionaries if raw=True.
- unesco_reader.get_metadata(indicator: str | list[str] | None = None, disaggregations: bool = False, glossaryTerms: bool = False, *, version: str | None = None) list[dict]
Get the metadata for indicators
Get the metadata for the given indicators. If no indicator is provided, metadata for all indicators is returned. Optionally include disaggregations and glossary terms in the response.
- Parameters:
indicator – The indicator code or name to get metadata for. If None, metadata for all indicators is returned. Default is None which returns metadata for all indicators. To see all available indicators, use the available_indicators function.
disaggregations – Include disaggregations in the response. Default is False.
glossaryTerms – Include glossary terms in the response. Default is False.
version – The data version to use. Default uses the latest default version.
- Returns:
A list of dictionaries with the metadata for the indicators
- unesco_reader.available_indicators(theme: str | list[str] | None = None, minStart: int | None = None, geoUnitType: unesco_reader.config.GeoUnitType | Literal['ALL'] | None = None, *, raw: bool = False, version: str | None = None) pandas.DataFrame | list[dict]
Get available indicators
This functions returns the available indicators from the UIS API with some basic information, including theme, time range, last data update, and total records. The data is filtered based on the given parameters.
- Parameters:
theme – Filter indicators for specific themes. Can be a single theme or a list of themes. Default returns all themes. Use the available_themes function to see all available themes.
minStart – The earliest start year for the indicator data. Includes the start year itself. Default is None, which returns all available data.
geoUnitType – The type of geography for which data is available. Default is None which does not filter and gets any available type. Allowed values are “NATIONAL” (country-level data), “REGIONAL” (regional-level data), “ALL” (both national and regional data), or None for all types.
raw – If True, returns the data as a list of dictionaries in the original format from the API. Default is False.
version – The data version to use. Default uses the latest default version.
- Returns:
A pandas DataFrame with the available indicators or a list of dictionaries if raw=True.
- unesco_reader.available_geo_units(geoUnitType: unesco_reader.config.GeoUnitType | None = None, *, raw: bool = False, version: str | None = None) pandas.DataFrame | list[dict]
Get available geo units
Get all available geo units for a given API data version (or the current default version if no explicit version is provided), along with some basic information like the region group and type of geography.
- Parameters:
geoUnitType – The type of geography to request data for. Allowed values are NATIONAL and REGIONAL. Default is None which returns all available types.
raw – If True, returns the data as a list of dictionaries in the original format from the API. Default is False.
version – The data version to use. Default uses the latest default version.
- Returns:
A pandas DataFrame with the available geo units or a list of dictionaries if raw=True.
- unesco_reader.available_versions(*, raw: bool = False) pandas.DataFrame | list[dict]
Get available data versions and basic information including publication date and description
- Parameters:
raw – If True, returns the data as a list of dictionaries in the original format from the API. Default is False.
- Returns:
A pandas DataFrame with the available versions or a list of dictionaries if raw=True.
- unesco_reader.default_version() str
Get the default data version
- Returns:
The default data version string
- unesco_reader.available_themes(*, raw: bool = False) pandas.DataFrame | dict
Get the available themes and basic information including latest update and description
- Parameters:
raw – If True, returns the data as a dictionary in the original format from the API. Default is False.