Stream Data Types

Posted on February 13, 2015

I’m currently looking for a very fast streaming data type for realtime audio processing in Haskell. The data types have two objectives:

  1. Continuously consume and produce values
  2. Maintain state

I created a project that contains different implementations and a benchmark that tests their speed. In this post I will show and discuss the different implementations and show their benchmarking results. The benchmark computes the integral of a sinus curve for 1 second at a sample rate of 48 kHz. Each of the different implementation defines a nthIntegral function, that is the integral of a function from 0 to n with a step size of 1/48000.

\[ \int_0^1 \sin(t) dt \]


Coroutine / Lazy

This data type looks like the state monad, except that it closures its state inside the function.

newtype Coroutine a b = Coroutine (a -> (b,Coroutine a b))

scan :: (s -> a -> s) -> s -> Coroutine a s
scan f = go
  where
    go s = Coroutine $ \a -> let s' = f s a in (s',go s')

unfold :: (s -> (b,s)) -> s -> Coroutine a b
unfold f = go
  where
    go s = Coroutine $ const $ let (b,s') = f s in (b,go s')

(!!) :: Coroutine () a -> Int -> a
Coroutine f !! 0 = fst $ f ()
Coroutine f !! n = snd (f ()) !! (n-1)
infixl 9 !!

(>>) :: Coroutine a b -> Coroutine b c -> Coroutine a c
Coroutine f >> Coroutine g = Coroutine $ \a ->
  let (b,f') = f a
      (c,g') = g b
  in (c,f' >> g')
infixl 1 >>

integral :: Int -> Coroutine Double Double
integral rate = scan (\x i -> i + x * dt) 0
  where
    dt = 1 / fromIntegral rate

nthIntegral :: Int -> (Double -> Double) -> Int -> Double
nthIntegral rate fun n =
    (unfold (\t -> (fun (t / fromIntegral rate),t+1)) 0
        >> integral rate)
        !! n

Numbers

benchmarking coroutine/lazy/integral of sinus for 1 second at 48 kHz
time                 117.0 ms   (116.5 ms .. 118.0 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 117.9 ms   (117.6 ms .. 118.3 ms)
std dev              471.2 μs   (340.4 μs .. 601.5 μs)
variance introduced by outliers: 11% (moderately inflated)

The coroutine type has similarities to the one I presented in my previous posts, except that this is more concrete. The scan and unfold functions have the same meaning as their list counterparts. The problem with these functions is, that especially the scan function produces a large space leak because the state is handled strict enough.


Coroutine / Strict

If we think this further, we just have to add strictness annotations in the scan and unfold function that evaluates the state to Weak Head Normal Form (WHNF) in each iteration.

newtype Coroutine a b = Coroutine (a -> (b, Coroutine a b))

scan :: (s -> a -> s) -> s -> Coroutine a s
scan f = go
  where
    go s = Coroutine $ \a -> let s' = f s a in s' `seq` (s',go s')

unfold :: (s -> (b,s)) -> s -> Coroutine a b
unfold f = go
  where
    go s = Coroutine $ const $ let (b,s') = f s in s' `seq` (b,go s')

Numbers

benchmarking coroutine/strict/integral of sinus for 1 second at 48 kHz
time                 86.22 ms   (83.67 ms .. 88.77 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 84.65 ms   (83.88 ms .. 85.85 ms)
std dev              1.432 ms   (648.3 μs .. 2.493 ms)

That gave a nice speed up, but we can do better. One contention is the closuring of state.


Stream / Lazy

One data type that can produce values and maintain state is of course an infinite list. In this benchmark I use the Stream library from Wouter Swierstra and Bas van Dijk.

import           Data.Stream (Stream)
import qualified Data.Stream as S

-- data Stream a = Cons a (Stream a)

integral :: Int -> Stream Double -> Stream Double
integral rate = S.scan' (\i x -> i + x * dt) 0
  where
    dt = 1 / fromIntegral rate

nthIntegral :: Int -> (Double -> Double) -> Int -> Double
nthIntegral rate fun n =
    integral rate (S.unfold (\t -> (fun (t / fromIntegral rate),t+1)) 0)
        S.!! n

Numbers

benchmarking stream/lazy/integral of sinus for 1 second at 48 kHz
time                 10.71 ms   (10.63 ms .. 10.76 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 10.61 ms   (10.52 ms .. 10.67 ms)
std dev              186.2 μs   (127.0 μs .. 260.3 μs)

This looks pretty decent, but we can do better if we use a stream data type that is strict in the head of the stream (not the tail, this would lead to non-termination).


Stream / Strict

data Stream a = Cons !a (Stream a)

scan :: (a -> b -> a) -> a -> Stream b -> Stream a
scan f a (Cons b bs) =
  let a' = f a b
  in Cons a' (scan f a' bs)

unfold :: (s -> (a,s)) -> s -> Stream a
unfold f s0 =
  let (a,s) = f s0
  in Cons a (unfold f s)

(!!) :: Stream a -> Int -> a
Cons a _ !! 0 = a
Cons _ s !! n = s !! (n-1)
infixl 9 !!

Numbers

benchmarking stream/strict/integral of sinus for 1 second at 48 kHz
time                 6.481 ms   (6.398 ms .. 6.542 ms)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 6.311 ms   (6.272 ms .. 6.353 ms)
std dev              118.5 μs   (104.2 μs .. 132.0 μs)

This is definitely one of the fastest data types for this kind of scenario, but I can show you a data type that performs at nearly C level.


Signal / Strict

Now I want to present the pearl of this benchmark, the Signal data type. I have to admit, I’m not really sure why this data type beats the time of the previous data type, but numbers don’t lie.

data Signal s a b = Signal !((a,s) -> (b,s)) !s

scan :: (s -> a -> s) -> s -> Signal s a s
scan f = Signal (\(a,s) -> let s' = f s a in (s',s'))

unfold :: (s -> (b,s)) -> s -> Signal s a b
unfold f = Signal (\(_,s) -> f s)

(!!) :: Signal s () b -> Int -> b
Signal f s !! 0 = fst $ f ((),s)
Signal f s !! n = Signal f (snd (f ((),s))) !! (n-1)
infixl 9 !!

(>>) :: Signal s a b -> Signal t b c -> Signal (s,t) a c
Signal f s0 >> Signal g t0 = Signal go (s0,t0)
  where
    go (a,(s,t)) =
      let (b,s') = f (a,s)
          (c,t') = g (b,t)
      in (c,(s',t'))
infixl 1 >>

integral :: Double -> Signal Double Double Double
integral rate = scan (\x i -> i + x * dt) 0
  where
    dt = 1 / fromIntegral rate

nthIntegral :: Double -> (Double -> Double) -> Int -> Double
nthIntegral rate fun n =
    (unfold (\t -> (fun (t / fromIntegral rate),t+1)) 0
        >> integral rate)
        !! n

Numbers

benchmarking signal/strict/integral of sinus for 1 second at 48 kHz
time                 2.767 ms   (2.766 ms .. 2.768 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.766 ms   (2.766 ms .. 2.767 ms)
std dev              1.310 μs   (1.125 μs .. 1.596 μs)

This is as fast as you can get with Haskell, to my knowledge. To compare this numbers with the performance of C, I wrote the same benchmark in C, only without higher order functions and without composability.


Baseline / C

#include <math.h>

double nthIntegral(double dt, int n)
{
	double integral = 0;
	int i;
	for(i = 0; i < n; i++) {
		double t = (double)i / dt;
		double s = sin(t);
		integral = integral + s * dt;
	}
	return integral;
}

Numbers

benchmarking c/baseline/integral of sinus for 1 second at 48 kHz
time                 1.856 ms   (1.854 ms .. 1.857 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.857 ms   (1.857 ms .. 1.857 ms)
std dev              279.0 ns   (233.2 ns .. 346.5 ns)

Conclusion

Here is again an overview of all the benchmarking results.

benchmarking c/baseline/integral of sinus for 1 second at 48 kHz
time                 1.856 ms   (1.854 ms .. 1.857 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.857 ms   (1.857 ms .. 1.857 ms)
std dev              279.0 ns   (233.2 ns .. 346.5 ns)

benchmarking automaton/integral of sinus for 1 second at 48 kHz
time                 1.430 s    (1.401 s .. 1.492 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.432 s    (1.424 s .. 1.438 s)
std dev              9.857 ms   (0.0 s .. 10.97 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking coroutine/lazy/integral of sinus for 1 second at 48 kHz
time                 117.0 ms   (116.5 ms .. 118.0 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 117.9 ms   (117.6 ms .. 118.3 ms)
std dev              471.2 μs   (340.4 μs .. 601.5 μs)
variance introduced by outliers: 11% (moderately inflated)

benchmarking coroutine/strict/integral of sinus for 1 second at 48 kHz
time                 86.22 ms   (83.67 ms .. 88.77 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 84.65 ms   (83.88 ms .. 85.85 ms)
std dev              1.432 ms   (648.3 μs .. 2.493 ms)

benchmarking signal/lazy/integral of sinus for 1 second at 48 kHz
time                 54.15 ms   (53.39 ms .. 55.22 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 54.20 ms   (53.68 ms .. 54.65 ms)
std dev              839.1 μs   (577.4 μs .. 1.166 ms)

benchmarking signal/strict/integral of sinus for 1 second at 48 kHz
time                 2.767 ms   (2.766 ms .. 2.768 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.766 ms   (2.766 ms .. 2.767 ms)
std dev              1.310 μs   (1.125 μs .. 1.596 μs)

benchmarking stream/lazy/integral of sinus for 1 second at 48 kHz
time                 10.71 ms   (10.63 ms .. 10.76 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 10.61 ms   (10.52 ms .. 10.67 ms)
std dev              186.2 μs   (127.0 μs .. 260.3 μs)

benchmarking stream/strict/integral of sinus for 1 second at 48 kHz
time                 6.481 ms   (6.398 ms .. 6.542 ms)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 6.311 ms   (6.272 ms .. 6.353 ms)
std dev              118.5 μs   (104.2 μs .. 132.0 μs)

My recommendation is, if you want to work on audio processing and digital synthesis in Haskell, use the strict Stream or strict Signal data type I presented. The later one even can reach the performance of C very closely.

If you want to execute the benchmarks yourself or play with the source code, go to the repository on github.