Remove spikes from signal

The example data is a sine wave with random spikes.

using Random

"""
    create_sample_data()

Create a sine wave and add random positive and negative spikes.
Returns a DataFrame with columns: `x`, `y`, `rand`, `spike_high`, `spike_low`, and `y_spikey`.
"""
function create_sample_data(; length=1000)
    # Create x values and compute sine wave y values
    x = range(0, stop=2π, length=length)
    y = 2 .* sin.(x)

    rands = rand(Xoshiro(1), length)

    # random values above this trigger a spike:
    RAND_HIGH = 0.98
    # random values below this trigger a negative spike:
    RAND_LOW = 0.02

    # amplitude of the spikes:
    spike_amplitudes =  0.1 .+ 10rand(Xoshiro(2), length)

    # Create random spikes based on threshold conditions
    spike_high = ifelse.(rands .> RAND_HIGH, 1, 0) .* spike_amplitudes
    spike_low  = ifelse.(rands .< RAND_LOW, -1, 0) .* spike_amplitudes
    n_spikes = sum(spike_high .!= 0) + sum(spike_low .!= 0)

    y .+ spike_high .+ spike_low, n_spikes
end
y_spikey, n_spikes = create_sample_data()

([0.0, 0.012578866632135501, 0.025157235677482116, 0.03773460956893418, 0.050310490778751694, 0.0628843818382412, 0.07545578535743436, 0.08802420404476333, 0.10058914072673236, 0.1131500983675847  …  -0.11315009836758545, -0.1005891407267323, -0.08802420404476421, -0.07545578535743443, -0.06288438183824223, -0.0503104907787519, -0.03773460956893535, -0.025157235677482466, -0.01257886663213681, -4.898587196589413e-16], 36)

By default, replace_outliers uses a threshold-detection approach based on the median absolute deviation (MAD) to detect spikes. It is also possible to use a filter-based approach (i.e. low-pass filtering).

using SPEDAS
using CairoMakie
using Test

y_remove_outliers = replace_outliers(y_spikey)
n_removed = sum(isnan.(y_remove_outliers))
@test n_removed == n_spikes

begin
    f = Figure()
    lines(f[1,1],y_spikey)
    lines(f[2,1],y_remove_outliers)
    f
end

SPEDAS.find_outliers — Function

find_outliers(A, [method, window]; dim = 1, kw...)

Find outliers in data A along the specified dim dimension.

Returns a Boolean array whose elements are true when an outlier is detected in the corresponding element of A.

The default method is :median (other option is :mean), which uses the median absolute deviation (MAD) to detect outliers. When the length of A is greater than 256, it uses a moving window of size 16.

source

SPEDAS.replace_outliers! — Function

replace_outliers!(A, s, [find_method, window]; kwargs...)

Finds outliers in A and replaces them with s (by default: NaN).

source

replace_outliers!(A, method, [find_method, window]; kwargs...)
replace_outliers!(A, method, outliers; kwargs...)

Replaces outliers in A with values determined by the specified method.

Outliers can be detected using find_outliers with optional find_method and window parameters or specified directly as a Boolean array outliers.

method can be one of the following:

:linear: Linear interpolation of neighboring, nonoutlier values
:previous: Previous nonoutlier value
:next: Next nonoutlier value
:nearest: Nearest nonoutlier value