In 2018, I taught machines to break glass. For the task I used SampleRNN, a neural network architecture that trains on sound, and produces audio generators. While I trained on a diverse set of datasets, this page focuses on glass as a way to compare how SampleRNN unravels a single sound - when it works well, when it works poorly.
The first success I had with SampleRNN was rather counter-intuitive. Holly, Mat and I had done some voice recordings, since their goal was making a neural singer. One thing I wanted to try was tapping a glass, which would make a nice bell-like tone. I tried a mix of very simple inputs layering the voices, and on a whim made a layered recording of the glass where I pitched it up and down and occasionally reversed it.
Results from simple inputs were very boring, but SampleRNN reproduced the complex glass recording astoundingly well. I spent the next few months training networks on a variety of complex sounds - recordings containing a mixture of tones, transients, and rhythms. The finest results can be heard from the Auctioneer and Siren.
Glass Bells (source) Glass Bells (neural)
Meanwhile, Hito told me she'd found a company that was training AI to recognize the sound of breaking windows, and she wanted to try to make these sounds make them audible.
When I saw a video she'd recorded, the premise seemed even more ridiculous: they were recording the glass breaking in a disused airplane hangar. The breaking glass echoed in the cavernous space. Acoustically it was a mess. Worse, Hito wanted the glass breaking to be musical!
I moved on to glass tests with SampleRNN, and had good results surprisingly quickly. I tuned the input very slightly, using recordings of breaking glass from a sound effects library (I forget which). I had just 2-3 minutes strung together. I would do a training pass and listen to the results, then I would edit the input to remove parts that sounded related to things I didn't like, and then retrained.
These samples are the same network, with different amounts of training. Be warned, the under-trained results can be noisy.
glass, best fit glass, good fit glass, undertrained
I did more tests on network hyperparameters that brough out other qualities of the glass, though they were noisier. My research notes contain some fanciful descriptions of what these sounded like:
whistling glass fuzzy glass punch tumbling glass
Finally, these are the first training results, which eventually led to the better results above. Still, you can hear the glass in there - either the rhythm of it breaking, or its tonal quality.
first test with foley foley, single channel from glass I recorded
The first test uses two channels from a stereo recording (in series), but the second uses just one channel. At the time I noted that "noise decreases the further it trains.. but there is more silence and the breaking is further apart." I trimmed noisier parts to get the good results above. The final example used my own recording and led to the musical glass below.
Finally I moved on to musical glass. I used the same technique as the drinking glass - taking a recording, pitching it up and down, and layering it:
Glass Filing Cabinet (source) Glass Filing Cabinet (neural)
For this test, I made a recording from Hito's office at UdK. I had thought that we'd need to actually break glass for this project, so she had bought a half dozen wine glasses from a Getränkemarkt. When the time came, I was alone with the glasses with a hammer in my hand, staring at them and wondering if I really needed to destroy them, thinking of that security firm and the sledgehammer.
I broke one glass and got a poor recording of it. The other glasses I clattered around the inside of a metal filing cabinet - which explains the more clumsy sounds on this recording.
glass cabinet, very underfit glass cabinet, underfit glass cabinet, overfit glass cabinet, very overfit glass cabinet, very overfit
Finally, once I had a network that performed well, I generated a rhythmic sound by restarting the network every quarter second or so, producing a string of samples that gave Hito her "musical neural glass":
To hedge against failure, I made "traditional" implementation of glass breaking using granular synthesis. The first demo uses glass hits generated by the SampleRNN, and glass detritus / grains I recorded, following a similar project to do infinitely-sustained glass breaking. The second demo performs spectral resynthesis.
For more varied sound sources, see the page on SampleRNN.
Back to index