FM Synthesis Part 1

Chris Tralie

Frequency Modulation (FM) Synthesis is a digital audio waveform generation technique invented by John Chowning, a professor at Stanford, in the late '60s / '70s. It was used to design the Yamaha DX7 synthesizer, which was incredibly popular in '80s music.

One of the things that made this technique take off is that it only requires the generation of two sinusoids for a wide variety of sounds, so it worked efficiently on the slow, expensive hardware at the time. But even today, the unique timbres that arise from this technique are still in use and treasured by many artists.

What's really interesting is the discovery of FM synthesis was basically a complete accident. As we will show, if we start with a completely reasonable model for vibrato but then push it way past its limits, we start to get whole families of really interesting sounds. This paper by Chowning from 1973 lays out the mathematical details that he eventually filled in to explain what was going on. But let's start step by step and look at ordinary vibrato again first.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd

Vibrato

Vibrato is a periodic deviation in frequency often used for artistic purposes. Let's come up with an expression of a waveform that exhibits vibrato. First, recall that for the waveform

$y(t) = \cos(2 \pi f(t) )$

We hear the frequency $f'(t)$

Ex) We can recover ordinary sinusoids with a constant frequency in this framework

$f'(t) = f_0$, $f(t) = f_0t + c$

$ y(t) = \cos(2 \pi f_0 t + c)$

But we can also do something fancier for vibrato. Let's define some parameters that are consistent with Chowning's paper

  • $f_c$: The center frequency or "carrier frequency" of vibrato

  • $f_m$: The modulation frequency, or how quickly we're going back and forth around the center

  • d: The modulation deviation, maximum amount that $f_c$ changes, as modulated by $f_m$ in the either the positive or negative direction

In other words, the instantaneous frequency trajectory looks like this

$f'(t) = f_c + d \cos(2 \pi f_m t) $

If we integrate it, we get this (ignoring constant phase offset)

$f(t) = f_c t + \frac{d}{2 \pi f_m} \sin(2 \pi f_m t) $

So then, the final expression for the waveform is as follows

$y(t) = \cos( 2 \pi f_c t + \frac{d}{f_m} \sin(2 \pi f_m t)) $

Let's look at an example trying to replicate what I did on my violin in the video where I applied vibrato to a concert A, moving back and forth about 7 times per second between 430hz and 450hz (a deviation of 10hz)

In [2]:
sr = 44100
t = np.arange(sr*2)/sr
fc = 440
fm = 7
d = 10
y = np.cos(2*np.pi*fc*t + (d/fm)*np.sin(2*np.pi*fm*t))
ipd.Audio(y, rate=sr)
Out[2]:

So that's already pretty cool, but here's where some magic comes in. We can start to push this to its limits by making the modulation frequency and the deviation way bigger, each on the order of the carrier frequency. When we do this, we see we get something quite different from a pure tone, which sounds a little more like the examples we did when we added a bunch of harmonics together

In [5]:
sr = 44100
t = np.arange(sr*2)/sr
fc = 440
fm = 440
d = 440
y = np.cos(2*np.pi*fc*t + (d/fm)*np.sin(2*np.pi*fm*t))
ipd.Audio(y, rate=sr)
Out[5]:

Since we're working with such large deviations, it's convenient to define something that Chowning calls a "modulation index" $I$, which is the ratio of the deviation to the modulation frequency $f_m$

  • Modulation index $I = d/f_m$

We also want to think about the ratio of $f_m / f_c$, though we'll talk about this more in the next video.

Anyway, the math simplifies quite a bit now also

$y(t) = \cos( 2 \pi f_c t + I \sin(2 \pi f_m t)) $

We notice as we start to change the modulation index $I$ that we get very different sounds. So already we're making some progress in adding some degrees of freedom in what audio we generate with this technique!

In [4]:
sr = 44100
t = np.arange(sr*2)/sr
fc = 440
fm = 440
I = 4
y = np.cos(2*np.pi*fc*t + I*np.sin(2*np.pi*fm*t))
ipd.Audio(y, rate=sr)
Out[4]:
In [ ]: