def test_expand_collections():
assert expand_collections([1, 2], 3, 4) == [1, 2, 3, 4]
assert expand_collections([1, 2], [3, 4]) == [1, 2, 3, 4]
assert expand_collections([1, 2], "3", 4) == [1, 2, "3", 4]Utils for Polars
IO
convert_to_pd_dataframe
convert_to_pd_dataframe (df:polars.dataframe.frame.DataFrame|polars.lazy frame.frame.LazyFrame)
Convert a Polars DataFrame or LazyFrame into a pandas-like DataFrame.
| Type | Details | |
|---|---|---|
| df | polars.dataframe.frame.DataFrame | polars.lazyframe.frame.LazyFrame | original DataFrame or LazyFrame |
Functions
sort
sort (df:polars.dataframe.frame.DataFrame, col='time')
pl_norm
pl_norm (*columns:str|polars.expr.expr.Expr)
*Computes the square root of the sum of squares for the given columns.
Args: columns (str): Names of the columns.
Returns: pl.Expr: Expression representing the square root of the sum of squares.*
decompose_vector
decompose_vector (df:polars.dataframe.frame.DataFrame, vector_col, name=None, suffixes:list=['_x', '_y', '_z'])
*Decompose a vector column in a DataFrame into separate columns for each component with custom suffixes.
Parameters: - df (pl.DataFrame): The input DataFrame. - vector_col (str): The name of the vector column to decompose. - name (str, optional): Base name for the decomposed columns. If None, uses vector_col as the base name. - suffixes (list, optional): A list of suffixes to use for the decomposed columns. If None or not enough suffixes are provided, defaults to ’_0’, ’_1’, etc.
Returns: - pl.DataFrame: A DataFrame with the original vector column decomposed into separate columns.*
format_time
format_time (df:polars.dataframe.frame.DataFrame|polars.lazyframe.frame. LazyFrame, time_unit='ns')
Fast filter for a list of predicates
Use a list of filters within polars - Stack Overflow
filter_df_by_ranges
filter_df_by_ranges (data:polars.dataframe.frame.DataFrame, starts:list, stops:list, col='time')
Filter a DataFrame from ranges
filter_series_by_ranges_i
filter_series_by_ranges_i (data:polars.series.series.Series, starts:list, stops:list)
def sample_data(n=10):
return pl.DataFrame(
{
"time": pl.arange(n, eager=True),
}
)
def test_filter_df_by_intervals(sample_data):
filtered_data = filter_df_by_ranges(sample_data, [1, 5], [3, 7])
assert len(filtered_data) == 6
assert filtered_data["time"].min() == 1
assert filtered_data["time"].max() == 7
def test_filter_df_by_intervals_no_match(sample_data):
filtered_data = filter_df_by_ranges(sample_data, [100, 200], [300, 400])
assert len(filtered_data) == 0
def test_filter_df_by_intervals_edge_case(sample_data):
filtered_data = filter_df_by_ranges(sample_data, [1, 1], [1, 1])
assert len(filtered_data) == 1_sample_data = sample_data()
test_filter_df_by_intervals(_sample_data)
test_filter_df_by_intervals_no_match(_sample_data)
test_filter_df_by_intervals_edge_case(_sample_data)n = 1000000
data = sample_data(n)
starts = list(range(0, n - 200, 100))
stops = list(range(100, n - 100, 100))CPU times: user 38.1 ms, sys: 8.89 ms, total: 47 ms
Wall time: 45.6 ms
| time |
|---|
| i64 |
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| … |
| 999796 |
| 999797 |
| 999798 |
| 999799 |
| 999800 |