Code
# from tsflex.features import MultipleFeatureDescriptors, FeatureCollection
# from tsflex.features.integrations import catch22_wrapper
# from pycatch22 import catch22_all
It can be divided into two parts:
Notes that the candidates only require a small portion of the data so we can compress the data to speed up the processing.
compress_data_by_events (data:polars.dataframe.frame.DataFrame, events:polars.dataframe.frame.DataFrame)
Compress the data for parallel processing
ids_finder (detection_df:polars.lazyframe.frame.LazyFrame, tau:datetime.timedelta, ts:datetime.timedelta, bcols=None, extract_df:polars.lazyframe.frame.LazyFrame=None, **kwargs)
Type | Default | Details | |
---|---|---|---|
detection_df | LazyFrame | data used for anomaly dectection (typically low cadence data) | |
tau | timedelta | ||
ts | timedelta | ||
bcols | NoneType | None | |
extract_df | LazyFrame | None | data used for feature extraction (typically high cadence data), |
kwargs |
wrapper function for partitioned input used in Kedro
extract_features (partitioned_input:dict[str,typing.Callable[...,polars.l azyframe.frame.LazyFrame]], tau:float, ts:float, **kwargs)
wrapper function for partitioned input
Type | Details | |
---|---|---|
partitioned_input | dict | |
tau | float | in seconds, yaml input |
ts | float | in seconds, yaml input |
kwargs | ||
Returns | DataFrame |
As we are dealing with multiple spacecraft, we need to be careful about naming conventions. Here are the conventions we use in this project.
sat_id
: name of the spacecraft. We also use abbreviation, for example
sta
for STEREO-A
thb
for ARTEMIS-B
sat_state
: state data of the spacecraftb_vl
: maximum variance vector of the magnetic field, (major eigenvector)Data Level
l0: unprocessed
l1: cleaned data, fill null value, add useful columns
l2: time-averaged data
radial_distance
: radial distance of the spacecraft, in units of \(AU\)
plasma_speed
: solar wind plasma speed, in units of \(km/s\)
sw_elevation
: solar wind elevation angle, in units of \(\degree\)
sw_azimuth
: solar wind azimuth angle, in units of \(\degree\)
v_{x,y,z}
or sw_vel_{X,Y,Z}
: solar wind plasma speed in the ANY coordinate system, in units of \(km/s\)
sw_vel_{r,t,n}
: solar wind plasma speed in the RTN coordinate system, in units of \(km/s\)sw_vel_gse_{x,y,z}
: solar wind plasma speed in the GSE coordinate system, in units of \(km/s\)sw_vel_lmn_{x,y,z}
: solar wind plasma speed in the LMN coordinate system, in units of \(km/s\)
v_l
or sw_vel_l
: abbreviation for sw_vel_lmn_1
v_mn
or sw_vel_mn
(deprecated)plasma_density
: plasma density, in units of \(1/cm^{3}\)
plasma_temperature
: plasma temperature, in units of \(K\)
B_{x,y,z}
: magnetic field in ANY coordinate system
b_rtn_{x,y,z}
or b_{r,t,n}
: magnetic field in the RTN coordinate systemb_gse_{x,y,z}
: magnetic field in the GSE coordinate systemB_mag
: magnetic field magnitude
Vl_{x,y,z}
or b_vecL_{X,Y,Z}
: maxium variance vector of the magnetic field in ANY coordinate system
b_vecL_{r,t,n}
: maxium variance vector of the magnetic field in the RTN coordinate systemmodel_b_{r,t,n}
: modelled magnetic field in the RTN coordinate system
state
: 1 for solar wind, 0 for non-solar wind
L_mn{_norm}
: thickness of the current sheet in MN direction, in units of \(km\)
j0{_norm}
: current density, in units of \(nA/m^2\)
Notes: we recommend use unique names for each variable, for example, plasma_speed
instead of speed
. Because it is easier to search and replace the variable names in the code whenever necessary.
For the unit, by default we use
# tau_pd = pd.Timedelta(tau)
# catch22_feats = MultipleFeatureDescriptors(
# functions=catch22_wrapper(catch22_all),
# series_names=bcols, # list of signal names
# windows = tau_pd, strides=tau_pd/2,
# )
# fc = FeatureCollection(catch22_feats)
# features = fc.calculate(data, return_df=True) # calculate the features on your data