Variance method

Large variance in the magnetic field compared with neighboring intervals

References: (liuMagneticDiscontinuitiesSolar2022?)

Introduction

For each sampling instant \(t\), we define three intervals: the pre-interval \([-1,-1/2]\cdot T+t\), the middle interval \([-1/,1/2]\cdot T+t\), and the post-interval \([1/2,1]\cdot T+t\), in which \(T\) are time lags. Let time series of the magnetic field data in these three intervals are labeled \({\mathbf B}_-\), \({\mathbf B}_0\), \({\mathbf B}_+\), respectively. Compute the following indices:

\[ I_1 = \frac{\sigma(B_0)}{Max(\sigma(B_-),\sigma(B_+))} \]

\[ I_2 = \frac{\sigma(B_- + B_+)} {\sigma(B_-) + \sigma(B_+)} \]

\[ I_3 = \frac{| \Delta \vec{B} |}{|B_{bg}|} \]

By selecting a large and reasonable threshold for the first two indices (\(I_1>2, I_2>1\)) , we could guarantee that the field changes of the IDs identified are large enough to be distinguished from the stochastic fluctuations on magnetic fields, while the third is a supplementary condition to reduce the uncertainty of recognition. While the third index (relative field jump) is a supplementary condition to reduce the uncertainty of recognition.

Index of the standard deviation

\[ I_1 = \frac{\sigma(B_0)}{Max(\sigma(B_-),\sigma(B_+))} \]


compute_std

 compute_std (df:polars.lazyframe.frame.LazyFrame,
              period:datetime.timedelta, index_column='time',
              cols:list[str]=['BX', 'BY', 'BZ'],
              every:datetime.timedelta=None, result_column='std')
Type Default Details
df LazyFrame
period timedelta period to group by
index_column str time
cols list [‘BX’, ‘BY’, ‘BZ’]
every timedelta None every to group by (default: period / 2)
result_column str std

add_neighbor_std

 add_neighbor_std (df:polars.lazyframe.frame.LazyFrame,
                   tau:datetime.timedelta, join_strategy='inner',
                   std_column='std', time_column='time')

Get the neighbor standard deviations


compute_index_std

 compute_index_std (df:polars.lazyframe.frame.LazyFrame, std_column='std')

Compute the standard deviation index based on the given DataFrame

Type Default Details
df LazyFrame
std_column str std
Returns - pl.LazyFrame: DataFrame with calculated ‘index_std’ column.

Index of fluctuation

\[ I_2 = \frac{\sigma(B_- + B_+)} {\sigma(B_-) + \sigma(B_+)} \]


compute_index_fluctuation

 compute_index_fluctuation (df:polars.lazyframe.frame.LazyFrame,
                            std_column='std', clean=True)

compute_combinded_std

 compute_combinded_std (df:polars.lazyframe.frame.LazyFrame,
                        cols:list[str], every:datetime.timedelta,
                        period:datetime.timedelta=None,
                        index_column='time', result_column='std_combined')
Type Default Details
df LazyFrame
cols list
every timedelta every to group by (default: period / 2)
period timedelta None period to group by
index_column str time
result_column str std_combined

Index of the relative field jump

\[ I_3 = \frac{| \Delta \vec{B} |}{|B_{bg}|} \]


pl_dvec

 pl_dvec (columns, *more_columns)

compute_index_diff

 compute_index_diff (df:polars.lazyframe.frame.LazyFrame,
                     every:datetime.timedelta, cols:list[str],
                     period:datetime.timedelta=None, clean=True)

compute_indices

 compute_indices (df:polars.lazyframe.frame.LazyFrame,
                  tau:datetime.timedelta, cols:list[str], clean=True,
                  join_strategy='inner', on='time')

Compute all index based on the given DataFrame and tau value.

Type Default Details
df LazyFrame Input DataFrame.
tau timedelta Time interval value.
cols list List of column names.
clean bool True
join_strategy str inner
on str time
Returns LazyFrame Tuple containing DataFrame results for fluctuation index,
standard deviation index, and ‘index_num’.

Filtering


filter_indices

 filter_indices (df:polars.lazyframe.frame.LazyFrame,
                 index_std_threshold:float=2,
                 index_fluc_threshold:float=1,
                 index_diff_threshold:float=0.1, sparse_num:int=15)

detect_variance

 detect_variance (data:polars.lazyframe.frame.LazyFrame,
                  tau:datetime.timedelta, bcols,
                  ts:datetime.timedelta=None, sparse_num=None)

Obsolete

Code
def _compute_combinded_std(df: pl.LazyFrame, tau, cols: list[str]):
    combined_std_cols = [col_name + "_combined_std" for col_name in cols]
    offsets = [0 * tau, tau / 2]
    combined_std_dfs = []

    for offset in offsets:
        truncated_df = df.select(
            (pl.col("time") - offset).dt.truncate(tau, offset=offset).alias("time"),
            pl.col(cols),
        )

        prev_df = truncated_df.select(
            (pl.col("time") + tau),
            pl.col(cols),
        )

        next_df = truncated_df.select(
            (pl.col("time") - tau),
            pl.col(cols),
        )

        temp_combined_std_df = (
            pl.concat([prev_df, next_df])
            .group_by("time")
            .agg(pl.col(cols).std(ddof=0).name.suffix("_combined_std"))
            .with_columns(B_std_combined=pl_norm(combined_std_cols))
            .drop(combined_std_cols)
            .sort("time")
        )

        combined_std_dfs.append(temp_combined_std_df)

    combined_std_df = pl.concat(combined_std_dfs)
    return combined_std_df