Skip to content

Feature Values - Latest

FeatureValuesLatest(perfdb)

Class used for handling Latest Feature Values. Can be accessed via perfdb.features.values.latest.

Parameters:

  • perfdb

    (PerfDB) –

    Top level object carrying all functionality and the connection handler.

Source code in echo_postgres/perfdb_root.py
def __init__(self, perfdb: e_pg.PerfDB) -> None:
    """Base class that all subclasses should inherit from.

    Parameters
    ----------
    perfdb : PerfDB
        Top level object carrying all functionality and the connection handler.

    """
    self._perfdb: e_pg.PerfDB = perfdb

get(features, output_type='DataFrame')

Get the latest available feature values for the given features.

For better performance, this does not get the data from table feature_values directly. It uses the table feature_values_latest that only contains the latest feature values per object and feature, which are updated via a trigger on the feature_values table.

The columns returned are: - object_id: The ID of the object. - object_name: The name of the object (part of the index in case of DataFrame output). - object_model_id: The ID of the object model. - object_model_name: The name of the object model. - object_type_id: The ID of the object type. - object_type_name: The name of the object type. - feature_id: The ID of the feature. - feature_name: The name of the feature (part of the index in case of DataFrame output). - timestamp: The timestamp of the latest feature value. - value: The latest feature value. - unit: The unit of the feature value.

Parameters:

  • features

    (dict[str, list[str]]) –

    Dictionary in the format {object_name: [feature1, feature2, ...]}, ...}.

  • output_type

    (Literal['DataFrame', 'pl.DataFrame'], default: 'DataFrame' ) –

    The type of the output DataFrame. Can be either "DataFrame" (pandas DataFrame) or "pl.DataFrame" (polars DataFrame).

Returns:

  • DataFrame

    In case output_type=="DataFrame", a Pandas DataFrame with the latest feature values. The index is a multiindex with the object_name and feature_name as levels.

  • DataFrame

    In case output_type=="pl.DataFrame", a Polars DataFrame with the latest feature values.

Source code in echo_postgres/feature_values_latest.py
@validate_call
def get(
    self,
    features: dict[str, list[str]],
    output_type: Literal["DataFrame", "pl.DataFrame"] = "DataFrame",
) -> pd.DataFrame:
    """Get the latest available feature values for the given features.

    For better performance, this does not get the data from table feature_values directly. It uses the table feature_values_latest that only contains the latest feature values per object and feature, which are updated via a trigger on the feature_values table.

    The columns returned are:
    - object_id: The ID of the object.
    - object_name: The name of the object (part of the index in case of DataFrame output).
    - object_model_id: The ID of the object model.
    - object_model_name: The name of the object model.
    - object_type_id: The ID of the object type.
    - object_type_name: The name of the object type.
    - feature_id: The ID of the feature.
    - feature_name: The name of the feature (part of the index in case of DataFrame output).
    - timestamp: The timestamp of the latest feature value.
    - value: The latest feature value.
    - unit: The unit of the feature value.

    Parameters
    ----------
    features : dict[str, list[str]]
        Dictionary in the format {object_name: [feature1, feature2, ...]}, ...}.
    output_type : Literal["DataFrame", "pl.DataFrame"], optional
        The type of the output DataFrame. Can be either "DataFrame" (pandas DataFrame) or "pl.DataFrame" (polars DataFrame).

    Returns
    -------
    pd.DataFrame
        In case output_type=="DataFrame", a Pandas DataFrame with the latest feature values. The index is a multiindex with the object_name and feature_name as levels.
    pl.DataFrame
        In case output_type=="pl.DataFrame", a Polars DataFrame with the latest feature values.
    """
    obj_wheres = []
    for object_name, feature_names in features.items():
        obj_wheres.append(
            sql.SQL("(fv.object_name = {object_name} AND fv.feature_name = ANY({feature_names}))").format(
                object_name=sql.Literal(object_name),
                feature_names=sql.Literal(feature_names),
            ),
        )

    where_clauses = sql.SQL(" OR ").join(obj_wheres)

    query = sql.SQL(
        """SELECT
            fv.object_id,
            fv.object_name,
            fv.object_model_id,
            fv.object_model_name,
            fv.object_type_id,
            fv.object_type_name,
            fv.feature_id,
            fv.feature_name,
            fv."timestamp"::TIMESTAMP,
            fv.value,
            fv.unit
        FROM performance.v_feature_values_latest fv
        WHERE ({where_clauses})
        ORDER BY fv.object_name, fv.feature_name
    """,
    ).format(
        where_clauses=where_clauses,
    )

    # executing the query
    with self._perfdb.conn.reconnect() as conn:
        # we are using polars for faster processing
        df = conn.read_to_polars(
            query=query,
            schema_overrides={
                "object_id": pl.Int64,
                "object_name": pl.Utf8,
                "object_model_id": pl.Int64,
                "object_model_name": pl.Utf8,
                "object_type_id": pl.Int64,
                "object_type_name": pl.Utf8,
                "feature_id": pl.Int64,
                "feature_name": pl.Utf8,
                "timestamp": pl.Datetime("ms"),
                "value": pl.Float64,
                "unit": pl.Utf8,
            },
        )

    if output_type == "pl.DataFrame":
        return df

    df = df.to_pandas()

    # setting MultiIndex
    df = df.set_index(["object_name", "feature_name"])

    return df