References: (liuMagneticDiscontinuitiesSolar2022? )
Introduction
For each sampling instant \(t\) , we define three intervals: the pre-interval \([-1,-1/2]\cdot T+t\) , the middle interval \([-1/,1/2]\cdot T+t\) , and the post-interval \([1/2,1]\cdot T+t\) , in which \(T\) are time lags. Let time series of the magnetic field data in these three intervals are labeled \({\mathbf B}_-\) , \({\mathbf B}_0\) , \({\mathbf B}_+\) , respectively. Compute the following indices:
\[
I_1 = \frac{\sigma(B_0)}{Max(\sigma(B_-),\sigma(B_+))}
\]
\[
I_2 = \frac{\sigma(B_- + B_+)} {\sigma(B_-) + \sigma(B_+)}
\]
\[
I_3 = \frac{| \Delta \vec{B} |}{|B_{bg}|}
\]
By selecting a large and reasonable threshold for the first two indices (\(I_1>2, I_2>1\) ) , we could guarantee that the field changes of the IDs identified are large enough to be distinguished from the stochastic fluctuations on magnetic fields, while the third is a supplementary condition to reduce the uncertainty of recognition. While the third index (relative field jump) is a supplementary condition to reduce the uncertainty of recognition.
Index of the standard deviation
\[
I_1 = \frac{\sigma(B_0)}{Max(\sigma(B_-),\sigma(B_+))}
\]
compute_std
compute_std (df:polars.lazyframe.frame.LazyFrame,
period:datetime.timedelta, index_column='time',
cols:list[str]=['BX', 'BY', 'BZ'],
every:datetime.timedelta=None, result_column='std')
df
LazyFrame
period
timedelta
period to group by
index_column
str
time
cols
list
[‘BX’, ‘BY’, ‘BZ’]
every
timedelta
None
every to group by (default: period / 2)
result_column
str
std
add_neighbor_std
add_neighbor_std (df:polars.lazyframe.frame.LazyFrame,
tau:datetime.timedelta, join_strategy='inner',
std_column='std', time_column='time')
Get the neighbor standard deviations
compute_index_std
compute_index_std (df:polars.lazyframe.frame.LazyFrame, std_column='std')
Compute the standard deviation index based on the given DataFrame
df
LazyFrame
std_column
str
std
Returns
- pl.LazyFrame: DataFrame with calculated ‘index_std’ column.
Index of fluctuation
\[
I_2 = \frac{\sigma(B_- + B_+)} {\sigma(B_-) + \sigma(B_+)}
\]
compute_index_fluctuation
compute_index_fluctuation (df:polars.lazyframe.frame.LazyFrame,
std_column='std', clean=True)
compute_combinded_std
compute_combinded_std (df:polars.lazyframe.frame.LazyFrame,
cols:list[str], every:datetime.timedelta,
period:datetime.timedelta=None,
index_column='time', result_column='std_combined')
df
LazyFrame
cols
list
every
timedelta
every to group by (default: period / 2)
period
timedelta
None
period to group by
index_column
str
time
result_column
str
std_combined
Index of the relative field jump
\[
I_3 = \frac{| \Delta \vec{B} |}{|B_{bg}|}
\]
pl_dvec
pl_dvec (columns, *more_columns)
compute_index_diff
compute_index_diff (df:polars.lazyframe.frame.LazyFrame,
every:datetime.timedelta, cols:list[str],
period:datetime.timedelta=None, clean=True)
compute_indices
compute_indices (df:polars.lazyframe.frame.LazyFrame,
tau:datetime.timedelta, cols:list[str], clean=True,
join_strategy='inner', on='time')
Compute all index based on the given DataFrame and tau value.
df
LazyFrame
Input DataFrame.
tau
timedelta
Time interval value.
cols
list
List of column names.
clean
bool
True
join_strategy
str
inner
on
str
time
Returns
LazyFrame
Tuple containing DataFrame results for fluctuation index, standard deviation index, and ‘index_num’.
Filtering
filter_indices
filter_indices (df:polars.lazyframe.frame.LazyFrame,
index_std_threshold:float=2,
index_fluc_threshold:float=1,
index_diff_threshold:float=0.1, sparse_num:int=15)
detect_variance
detect_variance (data:polars.lazyframe.frame.LazyFrame,
tau:datetime.timedelta, bcols,
ts:datetime.timedelta=None, sparse_num=None)
Obsolete
Code
def _compute_combinded_std(df: pl.LazyFrame, tau, cols: list [str ]):
combined_std_cols = [col_name + "_combined_std" for col_name in cols]
offsets = [0 * tau, tau / 2 ]
combined_std_dfs = []
for offset in offsets:
truncated_df = df.select(
(pl.col("time" ) - offset).dt.truncate(tau, offset= offset).alias("time" ),
pl.col(cols),
)
prev_df = truncated_df.select(
(pl.col("time" ) + tau),
pl.col(cols),
)
next_df = truncated_df.select(
(pl.col("time" ) - tau),
pl.col(cols),
)
temp_combined_std_df = (
pl.concat([prev_df, next_df])
.group_by("time" )
.agg(pl.col(cols).std(ddof= 0 ).name.suffix("_combined_std" ))
.with_columns(B_std_combined= pl_norm(combined_std_cols))
.drop(combined_std_cols)
.sort("time" )
)
combined_std_dfs.append(temp_combined_std_df)
combined_std_df = pl.concat(combined_std_dfs)
return combined_std_df