def test_expand_collections():
assert expand_collections([1, 2], 3, 4) == [1, 2, 3, 4]
assert expand_collections([1, 2], [3, 4]) == [1, 2, 3, 4]
assert expand_collections([1, 2], "3", 4) == [1, 2, "3", 4]
Utils for Polars
IO
convert_to_pd_dataframe
convert_to_pd_dataframe (df:polars.dataframe.frame.DataFrame|polars.lazy frame.frame.LazyFrame)
Convert a Polars DataFrame or LazyFrame into a pandas-like DataFrame.
Type | Details | |
---|---|---|
df | polars.dataframe.frame.DataFrame | polars.lazyframe.frame.LazyFrame | original DataFrame or LazyFrame |
Functions
sort
sort (df:polars.dataframe.frame.DataFrame, col='time')
pl_norm
pl_norm (*columns:str|polars.expr.expr.Expr)
*Computes the square root of the sum of squares for the given columns.
Args: columns (str): Names of the columns.
Returns: pl.Expr: Expression representing the square root of the sum of squares.*
decompose_vector
decompose_vector (df:polars.dataframe.frame.DataFrame, vector_col, name=None, suffixes:list=['_x', '_y', '_z'])
*Decompose a vector column in a DataFrame into separate columns for each component with custom suffixes.
Parameters: - df (pl.DataFrame): The input DataFrame. - vector_col (str): The name of the vector column to decompose. - name (str, optional): Base name for the decomposed columns. If None, uses vector_col
as the base name. - suffixes (list, optional): A list of suffixes to use for the decomposed columns. If None or not enough suffixes are provided, defaults to ’_0’, ’_1’, etc.
Returns: - pl.DataFrame: A DataFrame with the original vector column decomposed into separate columns.*
format_time
format_time (df:polars.dataframe.frame.DataFrame|polars.lazyframe.frame. LazyFrame, time_unit='ns')
Fast filter for a list of predicates
Use a list of filters within polars - Stack Overflow
filter_df_by_ranges
filter_df_by_ranges (data:polars.dataframe.frame.DataFrame, starts:list, stops:list, col='time')
Filter a DataFrame from ranges
filter_series_by_ranges_i
filter_series_by_ranges_i (data:polars.series.series.Series, starts:list, stops:list)
def sample_data(n=10):
return pl.DataFrame(
{"time": pl.arange(n, eager=True),
}
)
def test_filter_df_by_intervals(sample_data):
= filter_df_by_ranges(sample_data, [1, 5], [3, 7])
filtered_data assert len(filtered_data) == 6
assert filtered_data["time"].min() == 1
assert filtered_data["time"].max() == 7
def test_filter_df_by_intervals_no_match(sample_data):
= filter_df_by_ranges(sample_data, [100, 200], [300, 400])
filtered_data assert len(filtered_data) == 0
def test_filter_df_by_intervals_edge_case(sample_data):
= filter_df_by_ranges(sample_data, [1, 1], [1, 1])
filtered_data assert len(filtered_data) == 1
= sample_data()
_sample_data
test_filter_df_by_intervals(_sample_data)
test_filter_df_by_intervals_no_match(_sample_data) test_filter_df_by_intervals_edge_case(_sample_data)
= 1000000
n = sample_data(n)
data
= list(range(0, n - 200, 100))
starts = list(range(100, n - 100, 100)) stops
CPU times: user 38.1 ms, sys: 8.89 ms, total: 47 ms
Wall time: 45.6 ms
time |
---|
i64 |
0 |
1 |
2 |
3 |
4 |
… |
999796 |
999797 |
999798 |
999799 |
999800 |