Variance method

Large variance in the magnetic field compared with neighboring intervals

References: (liuMagneticDiscontinuitiesSolar2022?)

Introduction

For each sampling instant \(t\), we define three intervals: the pre-interval \([-1,-1/2]\cdot T+t\), the middle interval \([-1/,1/2]\cdot T+t\), and the post-interval \([1/2,1]\cdot T+t\), in which \(T\) are time lags. Let time series of the magnetic field data in these three intervals are labeled \({\mathbf B}_-\), \({\mathbf B}_0\), \({\mathbf B}_+\), respectively. Compute the following indices:

\[ I_1 = \frac{\sigma(B_0)}{Max(\sigma(B_-),\sigma(B_+))} \]

\[ I_2 = \frac{\sigma(B_- + B_+)} {\sigma(B_-) + \sigma(B_+)} \]

\[ I_3 = \frac{| \Delta \vec{B} |}{|B_{bg}|} \]

By selecting a large and reasonable threshold for the ﬁrst two indices (\(I_1>2, I_2>1\)) , we could guarantee that the ﬁeld changes of the IDs identiﬁed are large enough to be distinguished from the stochastic ﬂuctuations on magnetic ﬁelds, while the third is a supplementary condition to reduce the uncertainty of recognition. While the third index (relative field jump) is a supplementary condition to reduce the uncertainty of recognition.

Index of the standard deviation

\[ I_1 = \frac{\sigma(B_0)}{Max(\sigma(B_-),\sigma(B_+))} \]

compute_std

 compute_std (df:polars.lazyframe.frame.LazyFrame,
              period:datetime.timedelta, index_column='time',
              cols:list[str]=['BX', 'BY', 'BZ'],
              every:datetime.timedelta=None, result_column='std')

	Type	Default	Details
df	LazyFrame
period	timedelta		period to group by
index_column	str	time
cols	list	[‘BX’, ‘BY’, ‘BZ’]
every	timedelta	None	every to group by (default: period / 2)
result_column	str	std

add_neighbor_std

 add_neighbor_std (df:polars.lazyframe.frame.LazyFrame,
                   tau:datetime.timedelta, join_strategy='inner',
                   std_column='std', time_column='time')

Get the neighbor standard deviations

compute_index_std

 compute_index_std (df:polars.lazyframe.frame.LazyFrame, std_column='std')

Compute the standard deviation index based on the given DataFrame

	Type	Default
df	LazyFrame
std_column	str	std
Returns	- pl.LazyFrame: DataFrame with calculated ‘index_std’ column.

Index of fluctuation

\[ I_2 = \frac{\sigma(B_- + B_+)} {\sigma(B_-) + \sigma(B_+)} \]

compute_index_fluctuation

 compute_index_fluctuation (df:polars.lazyframe.frame.LazyFrame,
                            std_column='std', clean=True)

compute_combinded_std

 compute_combinded_std (df:polars.lazyframe.frame.LazyFrame,
                        cols:list[str], every:datetime.timedelta,
                        period:datetime.timedelta=None,
                        index_column='time', result_column='std_combined')

	Type	Default	Details
df	LazyFrame
cols	list
every	timedelta		every to group by (default: period / 2)
period	timedelta	None	period to group by
index_column	str	time
result_column	str	std_combined

Index of the relative field jump

\[ I_3 = \frac{| \Delta \vec{B} |}{|B_{bg}|} \]

pl_dvec

 pl_dvec (columns, *more_columns)

compute_index_diff

 compute_index_diff (df:polars.lazyframe.frame.LazyFrame,
                     every:datetime.timedelta, cols:list[str],
                     period:datetime.timedelta=None, clean=True)

compute_indices

 compute_indices (df:polars.lazyframe.frame.LazyFrame,
                  tau:datetime.timedelta, cols:list[str], clean=True,
                  join_strategy='inner', on='time')

Compute all index based on the given DataFrame and tau value.

	Type	Default	Details
df	LazyFrame		Input DataFrame.
tau	timedelta		Time interval value.
cols	list		List of column names.
clean	bool	True
join_strategy	str	inner
on	str	time
Returns	LazyFrame		Tuple containing DataFrame results for fluctuation index, standard deviation index, and ‘index_num’.

Filtering

filter_indices

 filter_indices (df:polars.lazyframe.frame.LazyFrame,
                 index_std_threshold:float=2,
                 index_fluc_threshold:float=1,
                 index_diff_threshold:float=0.1, sparse_num:int=15)

detect_variance

 detect_variance (data:polars.lazyframe.frame.LazyFrame,
                  tau:datetime.timedelta, bcols,
                  ts:datetime.timedelta=None, sparse_num=None)

Obsolete

Code

def _compute_combinded_std(df: pl.LazyFrame, tau, cols: list[str]):
    combined_std_cols = [col_name + "_combined_std" for col_name in cols]
    offsets = [0 * tau, tau / 2]
    combined_std_dfs = []

    for offset in offsets:
        truncated_df = df.select(
            (pl.col("time") - offset).dt.truncate(tau, offset=offset).alias("time"),
            pl.col(cols),
        )

        prev_df = truncated_df.select(
            (pl.col("time") + tau),
            pl.col(cols),
        )

        next_df = truncated_df.select(
            (pl.col("time") - tau),
            pl.col(cols),
        )

        temp_combined_std_df = (
            pl.concat([prev_df, next_df])
            .group_by("time")
            .agg(pl.col(cols).std(ddof=0).name.suffix("_combined_std"))
            .with_columns(B_std_combined=pl_norm(combined_std_cols))
            .drop(combined_std_cols)
            .sort("time")
        )

        combined_std_dfs.append(temp_combined_std_df)

    combined_std_df = pl.concat(combined_std_dfs)
    return combined_std_df