Feature Values - Latest¶
FeatureValuesLatest(perfdb)
¶
Class used for handling Latest Feature Values. Can be accessed via perfdb.features.values.latest.
Parameters:
Source code in echo_postgres/perfdb_root.py
def __init__(self, perfdb: e_pg.PerfDB) -> None:
"""Base class that all subclasses should inherit from.
Parameters
----------
perfdb : PerfDB
Top level object carrying all functionality and the connection handler.
"""
self._perfdb: e_pg.PerfDB = perfdb
get(features, output_type='DataFrame')
¶
Get the latest available feature values for the given features.
For better performance, this does not get the data from table feature_values directly. It uses the table feature_values_latest that only contains the latest feature values per object and feature, which are updated via a trigger on the feature_values table.
The columns returned are: - object_id: The ID of the object. - object_name: The name of the object (part of the index in case of DataFrame output). - object_model_id: The ID of the object model. - object_model_name: The name of the object model. - object_type_id: The ID of the object type. - object_type_name: The name of the object type. - feature_id: The ID of the feature. - feature_name: The name of the feature (part of the index in case of DataFrame output). - timestamp: The timestamp of the latest feature value. - value: The latest feature value. - unit: The unit of the feature value.
Parameters:
-
(features¶dict[str, list[str]]) –Dictionary in the format {object_name: [feature1, feature2, ...]}, ...}.
-
(output_type¶Literal['DataFrame', 'pl.DataFrame'], default:'DataFrame') –The type of the output DataFrame. Can be either "DataFrame" (pandas DataFrame) or "pl.DataFrame" (polars DataFrame).
Returns:
-
DataFrame–In case output_type=="DataFrame", a Pandas DataFrame with the latest feature values. The index is a multiindex with the object_name and feature_name as levels.
-
DataFrame–In case output_type=="pl.DataFrame", a Polars DataFrame with the latest feature values.
Source code in echo_postgres/feature_values_latest.py
@validate_call
def get(
self,
features: dict[str, list[str]],
output_type: Literal["DataFrame", "pl.DataFrame"] = "DataFrame",
) -> pd.DataFrame:
"""Get the latest available feature values for the given features.
For better performance, this does not get the data from table feature_values directly. It uses the table feature_values_latest that only contains the latest feature values per object and feature, which are updated via a trigger on the feature_values table.
The columns returned are:
- object_id: The ID of the object.
- object_name: The name of the object (part of the index in case of DataFrame output).
- object_model_id: The ID of the object model.
- object_model_name: The name of the object model.
- object_type_id: The ID of the object type.
- object_type_name: The name of the object type.
- feature_id: The ID of the feature.
- feature_name: The name of the feature (part of the index in case of DataFrame output).
- timestamp: The timestamp of the latest feature value.
- value: The latest feature value.
- unit: The unit of the feature value.
Parameters
----------
features : dict[str, list[str]]
Dictionary in the format {object_name: [feature1, feature2, ...]}, ...}.
output_type : Literal["DataFrame", "pl.DataFrame"], optional
The type of the output DataFrame. Can be either "DataFrame" (pandas DataFrame) or "pl.DataFrame" (polars DataFrame).
Returns
-------
pd.DataFrame
In case output_type=="DataFrame", a Pandas DataFrame with the latest feature values. The index is a multiindex with the object_name and feature_name as levels.
pl.DataFrame
In case output_type=="pl.DataFrame", a Polars DataFrame with the latest feature values.
"""
obj_wheres = []
for object_name, feature_names in features.items():
obj_wheres.append(
sql.SQL("(fv.object_name = {object_name} AND fv.feature_name = ANY({feature_names}))").format(
object_name=sql.Literal(object_name),
feature_names=sql.Literal(feature_names),
),
)
where_clauses = sql.SQL(" OR ").join(obj_wheres)
query = sql.SQL(
"""SELECT
fv.object_id,
fv.object_name,
fv.object_model_id,
fv.object_model_name,
fv.object_type_id,
fv.object_type_name,
fv.feature_id,
fv.feature_name,
fv."timestamp"::TIMESTAMP,
fv.value,
fv.unit
FROM performance.v_feature_values_latest fv
WHERE ({where_clauses})
ORDER BY fv.object_name, fv.feature_name
""",
).format(
where_clauses=where_clauses,
)
# executing the query
with self._perfdb.conn.reconnect() as conn:
# we are using polars for faster processing
df = conn.read_to_polars(
query=query,
schema_overrides={
"object_id": pl.Int64,
"object_name": pl.Utf8,
"object_model_id": pl.Int64,
"object_model_name": pl.Utf8,
"object_type_id": pl.Int64,
"object_type_name": pl.Utf8,
"feature_id": pl.Int64,
"feature_name": pl.Utf8,
"timestamp": pl.Datetime("ms"),
"value": pl.Float64,
"unit": pl.Utf8,
},
)
if output_type == "pl.DataFrame":
return df
df = df.to_pandas()
# setting MultiIndex
df = df.set_index(["object_name", "feature_name"])
return df