Before applying the Hidden Markov Model or SVM to our movements, we need to extract features out of our 12 signal screaming signals. If we don’t, we get a mess (dododoudi lolalololadou doulu) because from the eye of the SVM machine, the signal is totally different from time to time, even though we can see it is just shifting.

In order to take this “change in phase” property into account, we will try to use the information in the frequency domain.

Movement features (Kinemes)

plot

What makes a movement ? How can we extract something meaningful out of the signals on the right ?

Frame

We should first divide our temporal data into frames (overlapping to avoid loosing information) with a group of sample.

In each frame, relevant features could be:

  • Signed id of signal with peak value {-12,12}. -12 would be “right foot going down”, 12 “right foot going up”, -3 “left knee going backward”, etc. This seems like a good idea.
  • Signed id of second peak signal.
  • Peak value reduced to an integer between {0,4}. Thats six values.
  • Second reduced peak value. {0,4}

These four elements (24*24*5*5) build up to 14’400 possible different values for each frame. The Tokenize object does this and outputs these values for a simple movement (one captor moving):

..., 93, 74,         938, 957, 93, 1513
..., 93, 74, 145, 938, 957, 93, 1513

Looking at the values, we can see that they repeat and should not be too hard to learn.

The literature on gesture recognition gives great attention to the signal preprocessing part. They use fourier transmforms and vector quantization. The first element (fourier transforms) looks hard to implement for me. The second is easy, it’s just a matter of grouping close elements together in clusters and give them a common name. not easy, but with the help of the FFmpeg project, I could build the VQ object.

Short-time fast Fourier transforms

STFT image

picture © Alessio Damato

Internet is a great place to dig for such informations. Here is an article explaining how this works. Wikipedia has an article too: short-time Fourier transform. A nice introduction to Fourier theory as applied to audio processing can be found here.

On the picture you can see an example of a short-time fourier transform applied to a sinusoidal signal with frequencies changing every 5 second from 10Hz to 25Hz, 50Hz and 100Hz. The frequency changes are easily identified. The window used by STFT was 1 second.

For those interested, here is the formula for the Fourier transform:

fourier

What this formula says basically is “for a given frequency f, for each time t, sum all the values of the signal when it is like the cosinus of period (2πf)”. If the signal is not like this cosinus, it’s positive and negative values will sum up to zero. If the signal is constant it will also sum up to zero.

If we get a big value, the frequency has an important role in the signal. Note: we remove the imaginary part to show this period relationship, not because it is useless…

FFT object

FFT

We talked about it and I just made it ! It uses the code from Laurent de Soras and was not too hard to implement (thanks!). It’s also fast enough for our 12 signals with a window of 256 samples.

The picture on the left shows the raw output from FFT (without expressing the result in polar coordinates to show amplitude and phase separately which makes more sense for us).

FFTOk

I will fix fixed this and show you the new version. To avoid the computation of atan, phase calculation was dropped.

As you can see, we have nice edges at the frequency at which I was shaking my right leg. We will further investigate to see if we can feed SVM with this data instead of the raw vector (maybe using the VQ object in between).

Other stuff

Wavelets

After some research and basic understanding of FFT (see the relisoft article) I finally understood that FFT (and particularly short-time FFT) is a particular case of wavelet transforms. I found a good tutorial here (in french here). Hmmm, this is too much work for me right now. STOP.