Utils for Polars

IO

convert_to_pd_dataframe

 convert_to_pd_dataframe
                          (df:polars.dataframe.frame.DataFrame|polars.lazy
                          frame.frame.LazyFrame)

Convert a Polars DataFrame or LazyFrame into a pandas-like DataFrame.

	Type	Details
df	polars.dataframe.frame.DataFrame \| polars.lazyframe.frame.LazyFrame	original DataFrame or LazyFrame

Functions

sort

 sort (df:polars.dataframe.frame.DataFrame, col='time')

def test_expand_collections():
    assert expand_collections([1, 2], 3, 4) == [1, 2, 3, 4]
    assert expand_collections([1, 2], [3, 4]) == [1, 2, 3, 4]
    assert expand_collections([1, 2], "3", 4) == [1, 2, "3", 4]

pl_norm

 pl_norm (*columns:str|polars.expr.expr.Expr)

*Computes the square root of the sum of squares for the given columns.

Args: columns (str): Names of the columns.

Returns: pl.Expr: Expression representing the square root of the sum of squares.*

decompose_vector

 decompose_vector (df:polars.dataframe.frame.DataFrame, vector_col,
                   name=None, suffixes:list=['_x', '_y', '_z'])

*Decompose a vector column in a DataFrame into separate columns for each component with custom suffixes.

Parameters: - df (pl.DataFrame): The input DataFrame. - vector_col (str): The name of the vector column to decompose. - name (str, optional): Base name for the decomposed columns. If None, uses vector_col as the base name. - suffixes (list, optional): A list of suffixes to use for the decomposed columns. If None or not enough suffixes are provided, defaults to ’_0’, ’_1’, etc.

Returns: - pl.DataFrame: A DataFrame with the original vector column decomposed into separate columns.*

format_time

 format_time
              (df:polars.dataframe.frame.DataFrame|polars.lazyframe.frame.
              LazyFrame, time_unit='ns')

Fast filter for a list of predicates

Use a list of filters within polars - Stack Overflow

filter_df_by_ranges

 filter_df_by_ranges (data:polars.dataframe.frame.DataFrame, starts:list,
                      stops:list, col='time')

Filter a DataFrame from ranges

filter_series_by_ranges_i

 filter_series_by_ranges_i (data:polars.series.series.Series, starts:list,
                            stops:list)

def sample_data(n=10):
    return pl.DataFrame(
        {
            "time": pl.arange(n, eager=True),
        }
    )


def test_filter_df_by_intervals(sample_data):
    filtered_data = filter_df_by_ranges(sample_data, [1, 5], [3, 7])
    assert len(filtered_data) == 6
    assert filtered_data["time"].min() == 1
    assert filtered_data["time"].max() == 7


def test_filter_df_by_intervals_no_match(sample_data):
    filtered_data = filter_df_by_ranges(sample_data, [100, 200], [300, 400])
    assert len(filtered_data) == 0


def test_filter_df_by_intervals_edge_case(sample_data):
    filtered_data = filter_df_by_ranges(sample_data, [1, 1], [1, 1])
    assert len(filtered_data) == 1

_sample_data = sample_data()
test_filter_df_by_intervals(_sample_data)
test_filter_df_by_intervals_no_match(_sample_data)
test_filter_df_by_intervals_edge_case(_sample_data)

n = 1000000
data = sample_data(n)

starts = list(range(0, n - 200, 100))
stops = list(range(100, n - 100, 100))

CPU times: user 38.1 ms, sys: 8.89 ms, total: 47 ms
Wall time: 45.6 ms

shape: (999_801, 1)

time
i64
0
1
2
3
4
…
999796
999797
999798
999799
999800