teaching machines to break glass

In 2018, I taught machines to break glass. For the task I used SampleRNN, a neural network architecture that trains on sound, and produces audio generators. While I trained on a diverse set of datasets, this page focuses on glass as a way to compare how SampleRNN unravels a single sound - when it works well, when it works poorly.

the drinking glass

The first success I had with SampleRNN was rather counter-intuitive. Holly, Mat and I had done some voice recordings, since their goal was making a neural singer. One thing I wanted to try was tapping a glass, which would make a nice bell-like tone. I tried a mix of very simple inputs layering the voices, and on a whim made a layered recording of the glass where I pitched it up and down and occasionally reversed it.

Results from simple inputs were very boring, but SampleRNN reproduced the complex glass recording astoundingly well. I spent the next few months training networks on a variety of complex sounds - recordings containing a mixture of tones, transients, and rhythms. The finest results can be heard from the Auctioneer and Siren.

Glass Bells (source) Glass Bells (neural)

Meanwhile, Hito told me she'd found a company that was training AI to recognize the sound of breaking windows, and she wanted to try to make these sounds make them audible.

When I saw a video she'd recorded, the premise seemed even more ridiculous: they were recording the glass breaking in a disused airplane hangar. The breaking glass echoed in the cavernous space. Acoustically it was a mess. Worse, Hito wanted the glass breaking to be musical!

neural glass

I moved on to glass tests with SampleRNN, and had good results surprisingly quickly. I tuned the input very slightly, using recordings of breaking glass from a sound effects library (I forget which). I had just 2-3 minutes strung together. I would do a training pass and listen to the results, then I would edit the input to remove parts that sounded related to things I didn't like, and then retrained.

These samples are the same network, with different amounts of training. Be warned, the under-trained results can be noisy.

glass, best fit glass, good fit glass, undertrained

I did more tests on network hyperparameters that brough out other qualities of the glass, though they were noisier. My research notes contain some fanciful descriptions of what these sounded like:

whistling glass fuzzy glass punch tumbling glass

Finally, these are the first training results, which eventually led to the better results above. Still, you can hear the glass in there - either the rhythm of it breaking, or its tonal quality.

first test with foley foley, single channel from glass I recorded

The first test uses two channels from a stereo recording (in series), but the second uses just one channel. At the time I noted that "noise decreases the further it trains.. but there is more silence and the breaking is further apart." I trimmed noisier parts to get the good results above. The final example used my own recording and led to the musical glass below.

musical glass

Finally I moved on to musical glass. I used the same technique as the drinking glass - taking a recording, pitching it up and down, and layering it:

Glass Filing Cabinet (source) Glass Filing Cabinet (neural)

For this test, I made a recording from Hito's office at UdK. I had thought that we'd need to actually break glass for this project, so she had bought a half dozen wine glasses from a Getränkemarkt. When the time came, I was alone with the glasses with a hammer in my hand, staring at them and wondering if I really needed to destroy them, thinking of that security firm and the sledgehammer.

I broke one glass and got a poor recording of it. The other glasses I clattered around the inside of a metal filing cabinet - which explains the more clumsy sounds on this recording.

glass cabinet, very underfit glass cabinet, underfit glass cabinet, overfit glass cabinet, very overfit glass cabinet, very overfit

Finally, once I had a network that performed well, I generated a rhythmic sound by restarting the network every quarter second or so, producing a string of samples that gave Hito her "musical neural glass":

glass cabinet (chopped)

the old-fashioned way

To hedge against failure, I made "traditional" implementation of glass breaking using granular synthesis. The first demo uses glass hits generated by the SampleRNN, and glass detritus / grains I recorded, following a similar project to do infinitely-sustained glass breaking. The second demo performs spectral resynthesis.

more neural recordings

For more varied sound sources, see the page on SampleRNN.

Back to index