The song "Godmother" uses stems from Jlin and Holly's track "Expand" processed by
a voice transformer network trained to turn Mat's voice into
Holly's voice. The name "Godmother" appeared because I was referred to
(rather androgynously) as Spawn's Göttin or god-parent in
a Swiss article about Spawn. The song has a
music video
made from composite portraits of Jlin and Holly; photos of the ensemble were used to produce Holly's hybrid album cover.
Since it was 2018 and pix2pix was a well-known algorithm that worked well, I tracked
down an open-source pix2pix vocoder called
become-yukarin
which was trained on hours of Vocaloid recordings,
faithfully imitated by the developer, Hiroshiba, who worked to transform their own voice into Hatsune Miku's.
Now that we had some software to try, Holly and Mat came up with a
speech training set with broad phonetic coverage for English, as well as a singing
dataset – the more the merrier.
The neural vocoder works similarly to the official Vocaloid software from Yamaha.
It uses the open-source WORLD vocoder to divide a sound into harmonic and inharmonic
spectra, plus a base frequency f0. It then trains a pix2pix style transfer network
on these components, and then resynthesizes them with WORLD. I experimented with sample rates, test data,
super-resolution, and frequency transfer – the latter being the
most useful, as you could pitch a recording into another register.
I believe most of the texture of Godmother is a result of the vocoder analyzing
features from these unusual drum sounds and resynthesizing them,
although the pix2pix style transfer does make the voices sound more like Holly.
Jlin's drum stems were recorded in stereo, so I processed them individually, resulting in a
lovely stereo field as these insect-like beatboxing voices fly in circles around your head.