Remove spikes from signal

The example data is a sine wave with random spikes.

using Random

"""
    create_sample_data()

Create a sine wave and add random positive and negative spikes.
Returns a DataFrame with columns: `x`, `y`, `rand`, `spike_high`, `spike_low`, and `y_spikey`.
"""
function create_sample_data(; length=1000)
    # Create x values and compute sine wave y values
    x = range(0, stop=2π, length=length)
    y = 2 .* sin.(x)

    rands = rand(Xoshiro(1), length)

    # random values above this trigger a spike:
    RAND_HIGH = 0.98
    # random values below this trigger a negative spike:
    RAND_LOW = 0.02

    # amplitude of the spikes:
    spike_amplitudes =  0.1 .+ 10rand(Xoshiro(2), length)

    # Create random spikes based on threshold conditions
    spike_high = ifelse.(rands .> RAND_HIGH, 1, 0) .* spike_amplitudes
    spike_low  = ifelse.(rands .< RAND_LOW, -1, 0) .* spike_amplitudes
    n_spikes = sum(spike_high .!= 0) + sum(spike_low .!= 0)

    y .+ spike_high .+ spike_low, n_spikes
end
y_spikey, n_spikes = create_sample_data()
([0.0, 0.012578866632135501, 0.025157235677482116, 0.03773460956893418, 0.050310490778751694, 0.0628843818382412, 0.07545578535743436, 0.08802420404476333, 0.10058914072673236, 0.1131500983675847  …  -0.11315009836758545, -0.1005891407267323, -0.08802420404476421, -0.07545578535743443, -0.06288438183824223, -0.0503104907787519, -0.03773460956893535, -0.025157235677482466, -0.01257886663213681, -4.898587196589413e-16], 36)

By default, replace_outliers uses a threshold-detection approach based on the median absolute deviation (MAD) to detect spikes. It is also possible to use a filter-based approach (i.e. low-pass filtering).

using SPEDAS
using CairoMakie
using Test

y_remove_outliers = replace_outliers(y_spikey, detector=find_spikes)
n_removed = sum(isnan.(y_remove_outliers))
@test n_removed == n_spikes

begin
    f = Figure()
    lines(f[1,1],y_spikey)
    lines(f[2,1],y_remove_outliers)
    f
end
Example block output
SPEDAS.find_spikesFunction
find_spikes(data; threshold=3.0, window=0)

Identifies indices in data that are considered spikes

For multidimensional arrays, the function can be applied along a specific dimension using the dims parameter.

Arguments

  • threshold: Threshold multiplier for MAD to identify spikes (default: 3.0)
  • window: Size of the moving window for local statistics (default: 16)
  • dims: Dimension along which to find spikes (for multidimensional arrays)

Returns

  • For 1D arrays: Vector of indices where spikes were detected
  • For multidimensional arrays: Dictionary mapping dimension indices to spike indices

See also: find_spikes_1d_mad

source
SPEDAS.replace_outliersFunction
replace_outliers(data; detector=find_spikes, replacement_fn=nothing, kwargs...)

Replaces outliers in data using replacement_fn.

A detector function (by default, find_spikes) is used to identify outlier indices.

A replacement_fn function can be supplied to define how to correct each spike:

  • It should takes (data, index) and returns a replacement value;
  • If not provided, the default is to replace with NaN.

For multidimensional arrays, the dims parameter specifies the dimension along which to detect and replace outliers.

See also: find_spikes

source