As part of our collaboration with Die Graphische, student Valentin Haring writes how music and technology have interacted over the past decades – interviewing DADABOTS, who use machine learning to make music, on what this means for the art today.
Computers have only existed in our world for a few decades, but the impact they have had on culture, art and life in general, is undeniable. With computers profoundly influencing every aspect of our lives music will not, of course, be exempt to this change. Thanks to programs such as Ableton Live, FL Studio, and Reason, making music is more approachable and easier than ever. Basically, anyone with a computer could theoretically produce the next chart-topping hit on it, given that they put the work into learning the software and impact the zeitgeist.
With all these possibilities in front of us, let’s look into how we got here and what has happened in computer music over the last 70 or so years.
January 1st, 1951 – the beginning
The first documented instance where a computer entirely played audio recordings was with the Ferranti Mark 1. A massive machine that played only three songs. While that doesn’t seem too impressive now, back then it showed what was possible and laid the foundations of what was to come.
August 1982 – MIDI
MIDI, short for Musical Instrument Digital Interface, was introduced into the lives of digital musicians. The technology allowed different musical instruments to communicate with each other and your computer. Now almost 40 years old, MIDI is still the industry standard today, but there will be more on MIDI and what you can do with it later in the article.
April 1989 – Cubase
The music company Steinberg released its music making software Cubase, which quickly revolutionized the way all digital audio workstations (DAWs) functioned.
The early ‘90s – Audio recording
Until the ‘90s, computers were used to sequence external hardware instruments via MIDI. After the release of Steinberg’s Cubase Audio, this rapidly changed, and computers were now able to record sounds. With that advancement, the basics were laid down, and since then the hardware and software to create music continued to improve steadily.
Randomly generated melodies
That brings us to the main topic of this article, we are in the 2010s now, and recent years have seen a big rise in the use of artificial intelligence across all fields. It’s basically the beginning of computers creating things on their own, but before I talk about computers making music completely on their own, let me touch on the topic of randomly generated melodies using the MIDI technology I mentioned above.
So MIDI, at a basic level, communicates what note is played. You can then sequence these MIDI notes on Digital Audio Workstations to write melodies and songs. Most of the newer DAWs have some kind of way to randomize the MIDI inputs. For example, the feature can be used to let the computer come up with melodies for you, but you still have to input some rules. Without any kind of direction it would sound terrible, so you can tell it to stick to certain scales, rhythms etc. With a bit of work, you can make these melodies sound quite pretty. But that’s still not quite what we were searching for, it’s still not exclusively produced by the computer.
What music is made exclusively by a computer, then? Music made by artificial intelligence. Over recent years, there has been increasing media attention towards this technology, something about it just fascinates all who hear about it: computers capable of learning by themselves. So, with computers becoming more technologically advanced each and every year – the act of training these computers is becoming democratized. So there’s an increasing number of independent programmers and artists, that uses the so-called neural networks technology, to experiment with a new form of creating art. And what is a neural network? Well, simply put, it’s a framework that uses machine learning algorithms to learn from the data you put into it.
But what has all of this to do with the topic, computer-generated music? Well, you’ve probably already guessed it by now, but some people have trained neural networks to make music. How does it work? Basically, you feed the neural network enough material to listen to, then you wait and let the computer learn its characteristics and how to recreate them. Two people who do just this to create music are CJ Carr and Jack Zukowski. Together, their musical persona is called DADABOTS. They train neural networks to make Black Metal, Math Rock and they’ve even trained a network on the music of the Beatles.
But does it really sound like the music that it is based on? Well, yes and no. We as humans are pretty bound to specific rhythms and melody types that get lost when AI creates music, but it does create soundscapes that are similar to the music they’re based on, and that makes them an extremely interesting listen. But don’t take my word for it, I’ve interviewed DADABOTS themselves to find more about how it’s done and what is possible, and they’ve provided some extremely interesting details about both the technology itself and their DADABOTS project.
What is DADABOTS?
Not sure what DADABOTS is. We’re a cross between a band, a hackathon team, and an ephemeral research lab. We’re musicians seduced by math. We do science; we engineer the software, we make the music. All in one project. We don’t need anybody else. Except we do, because we’re standing on the shoulders of giants, and because the whole point is to collaborate with more artists.
And in the future, if musicians lose their jobs, we will be the scapegoat. We jest: please don’t burn us to death. We’ll fight on the right side of musical history, we swear.
How did you get started working on DADABOTS?
We started at Music Hack Day at MIT in 2012. We were intrigued by the pointlessness of machines generating crappy art. We announced that we set out to “destroy SoundCloud“ by creating an army of remix bots, spidering SoundCloud for music to remix, posting hundreds of songs an hour. They kept banning us. We kept working around it. That was fun.
How does creating your music with neural networks work?
We started with the original SampleRNN research code, written using Theano. It’s a hierarchical Long Short-Term Memory network. LSTMs can be trained to generate sequences. Sequences of whatever: it could be text, it could be the weather. We trained it on the raw acoustic waveforms of metal albums. As it listened, it tried to guess the next fraction of a millisecond. It played this game millions of times over a few days. After training, we asked it to come up with its own music, similar to how a weather forecast machine can be asked to invent centuries of seemingly plausible weather patterns.
It hallucinated 10 hours of music this way. That was way too much. So we built another tool to explore and curate it. We found the bits we liked and arranged them in an album for human consumption.
It’s a challenge to train nets. There are all these hyperparameters to try. How big is it? What’s the learning rate? How many tiers of the hierarchy? Which gradient descent optimizer? How does it a sample from the distribution? If you get it wrong, it sounds like white noise, silence, or barely anything. It’s like brewing beer. How much yeast? How much sugar? You set the parameters early on, and you don’t know if it’s going to taste good until way later. We trained hundreds of nets until we found good hyperparameters and then published it for the world to use.
What’s the difference in your approach to generating music and other methods to make music, such as randomizing MIDI inputs?
We trained it completely unsupervised. There’s no knowledge of music theory. There’s no MIDI. There’s nothing. It’s just raw audio. It’s surprising that it works in the first place. What I love about unsupervised learning is that it gives hints into how brains self-organize raw data from the senses.
Do you think music generated by neural networks will have the potential to reach mainstream success? Is there any specific reason why you are focusing on math rock and black metal to generate, rather than other, more mainstream genres?
For some reason, other AI music people are trying to do mainstream. Mainstream music is dead. Solid. Not alive. Rigor Mortis. Any new music idea it gets has been harvested from the underground. The underground has always been home for the real explorers, cartographers, and scientists of music. The mainstream finds these ideas and beats them like a dead horse until they’re distasteful. Why should a musician set out to do mainstream music? Because they want to be famous while they’re alive?
Becoming mainstream has been important for subcultures that were underrepresented and needed a voice. Teenagers. African-Americans. etc. Whereas tech culture already dominates the world. It’s swallowing the music industry whole. What does it have to gain by making mainstream music?
Math Rock and Black Metal are the music we love. It has a special place with us. Whereas many new black metal bands sound like an imitation of the early ‘90s black metal, albums like Krallice’s Ygg Hurr push it to new places I’ve never felt before. The research is fresh. Rehashing old sounds is like publishing scientific papers on the same old experiments. That’s no way to keep music alive.
Listening to your music, the voices created by the neural network sometimes sound eerily similar to real ones, do you think there will be a point where the artificial intelligence can incorporate real words and coherent sentences into the generated song?
As of 2016, this was possible. Did anyone try it? Realistic end-to-end text-to-speech is achievable with Tacotron 2 as well as others. Applying the same idea to singing is possible. Aligned lyrics-music datasets exist. Has anyone trained to train this net? It’s expensive to do this. You need hundreds of thousands of dollars’ worth of GPU hours. Give us the resources, and we’ll do it.
How do you think artificial intelligence will influence music in the years to come?
Think cartography – mapping the deep space between all the songs, all the artists, all the genres. Think super-expressive instruments – think beatboxers, creating full symphonies with their voice. Think autistic children, etc., in the context of music therapy, making expressive music, gaining a cultural voice.