The song "Godmother" uses stems from Jlin and Holly's track "Expand" put
through a voice transformer network trained to turn Mat's voice into
Holly's voice. It's called "Godmother" because the word Göttin came up in
a Swiss article about Spawn. The song has a
music video
produced from pictures of Jlin and Holly taken for the album cover, a
composite of the entire ensemble.
Since it was 2018 and pix2pix was the only thing that worked, I tracked
down a pix2pix vocoder called
become-yukarin
which was trained on hours of vocaloid data. Holly and Mat came up with a
speech training set with broad phonetic coverage, as well as a singing
dataset - the more the merrier.
The neural vocoder would divide a sound into harmonic and inharmonic
spectra, plus a base frequency f0, then train a pix2pix network
on these components. I experimented with sample rates, test data,
super-resolution (sr), and f0 transfer - the latter being the
most useful, as you could pitch up a recording into another register.
I believe most of the texture of Godmother is a result of the vocoder,
although the pix2pix component does make the voices sound more like Holly.
The stems were in stereo, so I processed them individually, resulting in a
lovely stereo field as these insect-like voices fly around your head.
speaking + singing network, 44k
network trained on a combination of speaking + singing datasets