OMNI data

Near-Earth solar wind magnetic field and plasma

Reference:

Notes:

The flow OMNI "phi" angle is opposite GSE "phi" angle, threrfore, Flow-vector cartesian components in GSE coordinates may be derived from the given speed and angles as

Vx = - V * cos(theta) * cos(phi)
Vy = + V * cos(theta) * sin(phi)
Vz = + V * sin(theta)
and vise versa: two angles may be derived from the given speed and Vx,Vy,Vz comp. as  
          a_theta=vz/V
          theta=(180.*asin(a_theta))/!PI
         a_phi=Vy/(-Vx)
        phi=(180.*atan(a_phi))/!PI
   (*)   Quasi-GSE for the flow longitude angle means the angle increases from zero
         to positive values as the flow changes from being aligned along the -X(GSE)
         axis towards the +Y(GSE) axis.  The flow longitude angle is positive for 
         flow from west of the sun, towards +Y(GSE).
         The flow latitude angle is positive for flow from south of the sun, 
         towards +Z(GSE)

::: {#cell-2 .cell 0=‘e’ 1=‘x’ 2=‘p’ 3=‘o’ 4=‘r’ 5=‘t’ execution_count=1}

Code
import polars as pl

from discontinuitypy.utils.basic import cdf2pl, pmap

OMNI_VARS: list
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[1], line 4
      1 # | export
      2 import polars as pl
----> 4 from discontinuitypy import PARAMS
      5 from discontinuitypy.utils.basic import cdf2pl, pmap
      6 from discontinuitypy.pipelines.default.data import create_pipeline_template

ImportError: cannot import name 'PARAMS' from 'discontinuitypy' (/Users/zijin/micromamba/envs/juno/lib/python3.11/site-packages/discontinuitypy/__init__.py)

:::

Setup

Need to run command in shell first as pipeline is project-specific command

kedro pipeline create omni

Downloading data

::: {#cell-7 .cell 0=‘e’ 1=‘x’ 2=‘p’ 3=‘o’ 4=‘r’ 5=‘t’}

Code
def download_data(
    start,
    end,
    datatype,
):
    import pyspedas

    trange = [start, end]
    files = pyspedas.omni.data(trange=trange, datatype=datatype, downloadonly=True)
    return files


def load_data(
    start,
    end,
    datatype="hourly",
    vars: dict = OMNI_VARS,
) -> pl.LazyFrame:
    files = download_data(start, end, datatype=datatype)
    df: pl.LazyFrame = pl.concat(files | pmap(cdf2pl, var_names=list(vars)))
    return df

:::

Preprocessing data

::: {#cell-9 .cell 0=‘e’ 1=‘x’ 2=‘p’ 3=‘o’ 4=‘r’ 5=‘t’}

Code
def preprocess_data(
    raw_data: pl.LazyFrame,
    vars: dict = OMNI_VARS,
) -> pl.LazyFrame:
    """
    Preprocess the raw dataset (only minor transformations)

    - Applying naming conventions for columns
    - Extracting variables from `CDF` files, and convert them to DataFrame
    """

    columns_name_mapping = {key: value["COLNAME"] for key, value in vars.items()}

    return raw_data.rename(columns_name_mapping)

:::

Processing data

::: {#cell-11 .cell 0=‘e’ 1=‘x’ 2=‘p’ 3=‘o’ 4=‘r’ 5=‘t’}

Code
def flow2gse(df: pl.LazyFrame) -> pl.LazyFrame:
    """
    - Transforming solar wind data from `Quasi-GSE` coordinate to GSE coordinate system
    """
    plasma_speed = pl.col("plasma_speed")
    sw_theta = pl.col("sw_vel_theta")
    sw_phi = pl.col("sw_vel_phi")

    return df.with_columns(
        sw_vel_gse_x=-plasma_speed * sw_theta.cos() * sw_phi.cos(),
        sw_vel_gse_y=+plasma_speed * sw_theta.cos() * sw_phi.sin(),
        sw_vel_gse_z=+plasma_speed * sw_theta.sin(),
    ).drop(["sw_theta", "sw_phi"])

def process_data(
    raw_data: pl.LazyFrame,
    ts=None,  # time resolution
) -> pl.LazyFrame:
    """
    - Transforming data to GSE coordinate system
    """

    return raw_data.pipe(flow2gse).rename(
        {
            "sw_vel_gse_x": "v_x",
            "sw_vel_gse_y": "v_y",
            "sw_vel_gse_z": "v_z",
        }
    )

:::

Pipelines

Code
# # | export
# def create_pipeline(sat_id="OMNI", source="LowRes"):

#     return create_pipeline_template(
#         sat_id=sat_id,
#         source=source,
#         load_data_fn=load_data,
#         preprocess_data_fn=preprocess_data,
#         process_data_fn=process_data,
#     )