Episode 6: Pure Cybernetic Chaos with Axel Chemla—Romeu-Santos, Post-Doctorate @ IRCAM

Axel Chemla—Romeu-Santos, Post-Doctorate @ IRCAM joins Nick and Jamal to discuss using AI for music generation, the tension between control and randomness, the limitations of current AI technology.

Transcript

(music: intro)

Nick Charney Kaye
Welcome to the XJ music podcast where we explore new possibilities in background music for video games, live streaming and environments. I’m founder Nick Charney Kaye here with co-founder Jamal Whittaker.

Jamal Whitaker
Hey hey!

Nick Charney Kaye
Everyone’s talking about AI these days. On one side AI evangelists rejoice in their newfound ability to create art without having any actual artistic ability or committing the years of discipline required to master a craft. On the other side, true artists lament the widespread planarization and commodification of their life’s passion. We’re joined today by Axel Shimla, Romeo Santos, who is both an artist and a technologist, a published computer scientist and a graduate of the prestigious EarthCam Institute in Paris researcher exploring sound jointly through music, sound design, and science. He’s the real deal pushing the boundary of music generating neural networks and music itself. Welcome to the show.

Axel Chemla—Romeu-Santos
Thanks. So yeah, I’m Axel Chemla—Romeu-Santos. With a French accent. I graduated both in music and engineering science. And this is how I entered IRCAM. Actually, initially, I just wanted to create weird sounds, it was really at the very beginning of neural networks before it was cool, as an hipster would say. And so we started experimenting with all these very, very basic stuff at the time. I started in 2014. And so I started of course, with musical ambitions, I was making music aside from my PhD wearing one of the first real time performance at the time with spectral variation auto encoders, with my friend at the time, he was working on a reinforcement agent that was made actually to explore parameters from synthesizers. Let’s say that this kind of machine learning generators have parameters. And so we made these first performances with his agent that he was controlling with his hands, generating music, actually, just with that, and then I graduated, actually, just one week before, before COVID.

Jamal Whitaker
Same here.

Axel Chemla—Romeu-Santos
Good luck, right. And so it was rather quiet here. I mean, in terms of research, quite quiet, of course, because research, it’s not said enough, but it’s really always a collective adventure. I mean, so I made a lot of compositions involving all my work and also like, developed framework to actually manage to reach somehow extrapolation with machine learning meaning that for example, how we can use these permissions to generate novel sounds, instead of just reproducing it somehow. There is something very important also to have in mind, is that what we call now Artificial Intelligence is a subfield of artificial intelligence that we call machine learning. I mean, artificial intelligence exists for a very, very long time. I mean, the inventor of the word is Norbert Wiener, that was in the 50s, the 60s. He’s a brilliant guy. It’s very interesting to read his writings, actually, because he was like, a technician, but also a philosopher. And actually, all the problems we deal with now. He has forecasted in some ways are interesting. His book is called God and Gollum, you know? Gollum like, the touch of God that turns mud into soul. And so it’s really, really fascinating actually, to see that all the things were facing now was really like conceived since the beginning by its own very creator.

Nick Charney Kaye
Wow, yeah.

Jamal Whitaker
Wow.

Axel Chemla—Romeu-Santos
But at the time, what I really like was asking myself the question, what are the new problems we are facing more than I mean, just automatization? That is actually a philosophical and social problems since the industrial revolution. So it’s really not new. And actually, what is really new now is that actually, this subfield of artificial intelligence that is called machine learning, is actually like inverting the way of doing techniques and also science now, because first it was like human mind just shaping architectures of knowledge. And now we totally inverted the process. Actually we have like dump machines with a lot of data. And then we expect these machines find the solutions by their own, involving, of course, existing data. For generation, it’s a very, very big topic, because for example, if you took generative systems before machine learning, it was like, you know, constrained random systems that was making like, for example, in games like Rogue, you know Rogue-like, of course. It has been implemented video games since its beginning actually. It was really conceiving generators that use randomness, actually, to create architectures and this kind of stuff. And now actually, it’s not really the same, because you have to have existing data and the machine will learn how to learn how to reproduce it, but also in a way that won’t actually really escape with what he’s been trained on. It uses randomness, but in another way. For example, one of the main generative system is called L system, for example, you know, it’s how we make trees, for example, in video games, okay, you have a line at the random point at the line and you can have two splits and one again, when again, when again, you know, when with this very, very simple formula, you can actually make things that are not trees at all. with machine learning. It’s literally the same process because actually you learn the machine to reproduce. In Counter-Strike, Yes. For example, bots from Counter-Strike one are not like the bots that are playing StarCraft by learning a million plays, you know? It’s pretty much the same thing.

Jamal Whitaker
Right.

Axel Chemla—Romeu-Santos
And this is very different grounding the algorithm on existing data. Actually, it’s kind of schizophrenic because you wanted to make something new, but you evaluate it and you train it, so it doesn’t. So it’s really like paradoxical in some way.

Nick Charney Kaye
Interesting. We’ve been looking over RAVE of the real time audio variational auto encoders, for training a neural network on audio, and then generating audio from that. And we were curious about, for example, your latest set of ACIDS, Genesis, which you said, is “a first attempt of using audio generative neural networks without the resort to any human data, amplifying own materiality of the algorithms towards cybernetic chaos rather than nature imitation.” Tell us about that.

Axel Chemla—Romeu-Santos
Yeah, that idea of new data training. Curio, you have to say, hello to Terence Broad . That is one of the first guys that actually did that. He already used two generators, actually just comparing themselves against each other. And actually like, rather than learning by being fed some data, they’re just being trained actually just to like, make different things things from the other. And he actually yes, made a series of he made with that. And the idea was like to extend this idea. Rather than actually taking neural networks and making them train on the similarity of existing data, I really take like single agents and instead of making them trained on how to reproduce data, I just say to them, just give me the most possible different things you can. So it has an input, because of course, every neural network as an input as an output, so the trains it this way, you cannot really like overcome that. But what you can do, of course, it just making a loss function, I mean, this is a function that you use, that we use to train the algorithm that does not rely on the only data. And so actually, I just like, take random inputs, and just with some criteria, I’ll actually like say to them, make me whatever you want, I just want you to make the most different possible things. Everything that you try to hide when you design your networks for reproduction, actually, this is what you use for new inputs, trainings, because actually, you just overemphasize all the artifacts and bugs of the thing, instead of trying to hide them. And this is how, of course, the output is not really pop music, that’s for sure.

Nick Charney Kaye
Sure, sure.

Axel Chemla—Romeu-Santos
But you’re reading bits, which is very interesting that you really have a direct conflict with the materiality, you know, really liking written experimental music.

Nick Charney Kaye
What you’re doing is kind of the opposite of what it looks like some of your earlier work was in regularization of those latent spaces. So you’re, you’re kind of doing the opposite of that.

Axel Chemla—Romeu-Santos
I use, like modularity densities, or additional additive synthesis like but I really enter in the underlying the technical and experimental capabilities of what it can make. It was also, of course, inspired by industrial music, because it’s really like, just forget about what we expect, as humans. A little also like in system theory, in art, you know, in the 60s, just do your own work. I actually also developed a library, to be able to hijack actually other inner parameters of the mission in real time to actually generate stuff. And then, as a performer, I am really lost in the huge variety of control. It’s not really a control, actually, it’s really co-improvisation. When I entered this domain, I have still actually a very simple mind, I was, like, very used to synthesizers and this kind of stuff. Okay, synthesizers, it has many buttons. I mean, the neural networks have the capacity to shape everything possible. It has like 1 billion buttons. And actually, it was kind of dumb, I mean, a genuine, let’s say, it was very genuine, it was not actually that’s wrong with this is what I do for two years. So I had really like you know, a gap of almost 10 years. Some kind of “whoa!” unbalanced. You know, you always do what you want the most, at the very end of the day, when you’re tired. It is really like your a full toolbox to take any possible existing AI thing that is open source. And actually perform a network bending on its. Being able actually to, like, you know, really like you were you with, like circuit bending when you just, you know, check the wires, make some experiments to change the sounds. And I make a library to be able to make this on your music gen on Facebook, for example, diffusers and to perform something new. That is much more interesting, actually.

Nick Charney Kaye
Very cool. That’s a cool new term for me network bending,

Axel Chemla—Romeu-Santos
this is what that runs, bro. It was actually yeah, they were they were interesting. He was very pioneer at the time, it was like in 2018. It always somehow I say things that seems very old, but it’s just six years ago.

Jamal Whitaker
Moving fast!

Nick Charney Kaye
I know, I’ve been at this all for a long time.

Axel Chemla—Romeu-Santos
When I was making my PhD, it was really GPT3, this was really like the thing that made me understand like the new wave of artificial intelligence, because like, you know, GPT2 was like a research projects. And then GPT3 was really emblematic of that, because the paper was just that we just multiplied the number of parameters, the capacity of the model by 1 billion times right or once, I mean, but it was something big.

Nick Charney Kaye
Huge model.

Axel Chemla—Romeu-Santos
Yeah, huge model. We multiplied the data by, but it was at the time, it was really like, if you’re just not seeing the data, you are training, you’re training your models on you were just, you know, failures were not published. They, they hid the data, they just multiplied everything by 1 million. And they say, Oh, it works better. Yeah. But it didn’t work better. Because it’s not always the case.

Nick Charney Kaye
It worked more.

Axel Chemla—Romeu-Santos
Yeah, exactly.

Nick Charney Kaye
So that’s fascinating to me. When we heard no input, we thought, well, you know, no human input to the performance, but it actually sounds like it’s sort of the other interpretation, you mean that there’s no input training data, but then you as the person in the space, you actually are deeply involved in the performance?

Axel Chemla—Romeu-Santos
Yeah, then with my supervisor that is, of course, one of my best friends Philippe. Hello, Philippe. Also working with, with my friend Hugo, and also with musician because it was musician myself that and also, of course we can because the Academy is a place where actually composers compose. You know, computer scientists are really like always targeting the way of automating things, of course. But you have to deal with musicians and of course, honestly, musicians, they really don’t care if you have a machine that will reproduce a violin because they play violin. Sorry, if it’s so good matter, for example, for video games, that what do you expect it to mission for actually like everything that has to do with revolution, it’s made to deal with something that we call, for example, in human computer interaction Empowerment. This means it’s that it’s made for people that don’t have the skills to make such things. And really, as a musical matter, you cannot do that because I can’t honestly it’s not interesting at all. And so I already had to think, what is a real interest in this perspective of research machines? And actually, it’s, of course, the deconstruction of reality. And at the very end, I mean, Genesis to me is, it’s a spectrum, you know, like a full dependency on the data, and no data at all and all the possible things between. these He’s wearing like the experiments to be very radical on these aspects.

Nick Charney Kaye
Interesting. The work that we’ve been getting into has to do with when we look at video game composition, we’re sort of working with composers who already have, like, you know, strong musical backgrounds, traditional ideas about the way that they want to build the mood of a video game. And perhaps within that we’re trying to offer more randomization, you know, more different varieties, people composing for that space want a lot of control, I think.

Axel Chemla—Romeu-Santos
Of course, I mean, technology has made for that. There is always this kind of schizophrenia between the need of control, and the need of randomness.

Nick Charney Kaye
Yeah,

Axel Chemla—Romeu-Santos
To release a generative model that can be commercialized and this kind of stuff, you have to prove, for example, that your model does not learn by heart existing data, and actually kind of manages to, to mix things at the point that it’s not, under some constraints, not the thing, For example, it makes music, so actually, it’s really make some kind of like a mash up of mixdown of every pop music possible, and what is Pop Music in 2023, but it will not be, you know, a specific music, right. But for example, there must be under like one point 0.8 persons of the modal approach that is this close to an existing song. But this is not new. And this is what also interests me, this is not an another problem, because actually, there are other ways it’s kind of tendency between control and randomness. Randomness being, for example, the behavior of the player, for example, but also like, the behavior of the observe agents in the video games, since it’s always like a trade off between both.

Jamal Whitaker
Talking about control, if we delve into the realm of control and music, we were listening to your group, and the fact that you had taken one of your albums, and used rave to take your whole discography. And do essentially what you were talking about a mix down of Daim’s entire discography into one album, and then generate it again. Tell us a little bit about the group and that method right there. Because it really produced some incredible results.

Axel Chemla—Romeu-Santos
Yeah, it was really weird. One the members of Daim a friend I made the performance of AI with. And the third guy is one of my best friend for a very, very long time. And it was a Yeah, it was it’s a really strange story as a band, because actually, yes, we are kind of artificial intelligence ourselves, for example, that the very beginning we really tried to copy popular music, and we just discovered very quickly that we couldn’t. And so yeah, it was like, like a struggle to find ourselves. Personally, actually, what unleashed really mean when we found ourselves good at actually, like, attacking our unconsciousness, that was always based on internet memories and of course, artificial intelligence, because we were working on that. And, of course, I was working with RAVE with Antoine at the time, he’s the creator of RAVE. Yeah, it’s a phenomenon called cycle consistency in the technical domain is that for example, if you enter an image into an AI, it is a way to evaluate the model to like feed, you know, the output into the input some numerical amount of time. Because if there are some errors, during the reconstruction, it’s a good thing to see if the model will amplify them. And we were making some experiments into that. And we actually found that really fascinating. Because the process actually of normalization, you know, you take something and normalize it, then you apply this process ever again.

Jamal Whitaker
Yeah.

Axel Chemla—Romeu-Santos
And this is how actually, we discovered and like I was somehow doing that also, using this way it’s it’s very like paradoxical because it says this kind of level of progressivism, and this is really like the technical progress. But when you think about it, actually, it may also be understood as the most reactionist technology ever. Because it will always reproduce actually what it have been trained on. And for us, it was also like an artistic take to show this phenomenon but in music, and not with any big discourse, just to make it feel I mean, as in as a musical experience. 24 songs for 24 hours a working man and it’s pretty easy. All these days, just advertisement music, with very precise things. I mean, The final trick is so fucking depressing. It’s always making the same stuff that is something of this mushy almost vaporware wish but for nowhere,

Jamal Whitaker
Right, right.

Axel Chemla—Romeu-Santos
Tt’s like very fascinating.

Jamal Whitaker
Yeah it’s definitely fascinating. I make, you know, my own production myself, I go by the name of Voodoo Lion. I mostly do like hip hop, you know, beats. And I found myself thinking like, what if I were to do essentially the same thing, take my entire discography? And like, how would I do that? What I try to parse it down and you know, all my beats down into stems, would it make a difference? And so it’s just, it was a really fun thing to imagine while I was listening to this project.

Axel Chemla—Romeu-Santos
Yeah, what is kind of funny iss that it really depends on the music, but for example, on our music were like making like changing aesthetics, like every 10 seconds. And that actually, that the model was totally unable to learn what we were feeding it. We didn’t even have the choice, actually. Because we’re really, really bad. I even trained a thing on our multitracks. It was even worse.

Jamal Whitaker
Oh, okay. Interesting. Interesting, because it just had less to go on.

Axel Chemla—Romeu-Santos
Yeah, to be honest, it also depends a lot on the technology you are using, because for example, rave, rave is kind of, yeah, it’s an in between, between, like, old machine learning of the 2010, and, novel machine learning, you know, as you see with me, Midjourney and blah, blah, blah.

Jamal Whitaker
Sure, yeah.

Axel Chemla—Romeu-Santos
There are a lot of ways, for example, to make very precise outputs. And honestly, it was, it was both, we did not want to do that much. And we also didn’t arrive because we didn’t have at ICRAM the computational capacities to be like really, I mean, in terms of scale, it’s not even comparable to what is needed, though, for things like Midjourney and blah blah blah.

Nick Charney Kaye
These huge machines, yeah

Axel Chemla—Romeu-Santos
And things that you can use on your, on your little laptop. In your discography, I don’t know, just tell me which one and we’ll train one on yours if you want.

Jamal Whitaker
Awesome.

Axel Chemla—Romeu-Santos
That’s actually no problem.

Axel Chemla—Romeu-Santos
Yeah, so I started to organize like a series of events off with composers and developers that were interested to get involved in really musical creation. That is definitely a different agenda that we were used to having in scientific pursuits. And so now I am more into the artistic field, trying to find a lot of ways to use that and also to find the conceptual frames and so now I am actually more critical about what’s going on with the advent of generative AI that is is everywhere that it was not at my time.

Jamal Whitaker
Way back then.

Axel Chemla—Romeu-Santos
Exactly. It was to enforce also my scientific fellows that were interested in music, but that constrained themselves to make just science No, use your stuff. Yeah, otherwise, yeah, you’re gonna work with a composer and you cannot be proud of what you do if you don’t use it. And use it also with an audience with a full process. These concerts are free also. To think I am a musician I don’t I cannot make bad music. You know, I mean, when you’re a musician, you think you innovate yourself to the process of fitting coffee or feeling confident and feeling confident with what you do. And so I work also with a circus, friends of mine that are really like cool and welcoming. And it’s a circus. So it’s amazing.

Nick Charney Kaye
That’s awesome. It does look really cool. I’d love to check that out.

Axel Chemla—Romeu-Santos
One of the few things that really fears me without artificial intelligence is the communication that is made around it, you know when you are on Twitter, when you see newspaper, when you see what’s online. I mean, that’s really the worst information you will ever have. And it really worries me because if it’s the case for AI, I really wonder who I can trust for everything that else that is important nearbay.

Nick Charney Kaye
It’s when you actually know the territory, that’s when you realize how bad the information is.

Axel Chemla—Romeu-Santos
Yeah, that’s pretty frightening. And we had, for example, yeah, some experiences with journalists that was like, because the thing of “will the machine replace the human” was really not our things, right, for example, a documentary with journalists, and we were, we told them like 100 times that it was not the thing to show that. And when we saw the documentary, it was like, it was a you know, at the end, it was just a blind test of his “is it machine or human?” Oh wow. It’s a betrayal. It’s a bitch. First, it’s a betrayal. But I mean, I think this is unfortunately common in journalism.

Jamal Whitaker
Right.

Axel Chemla—Romeu-Santos
Just saying the AI, the AI. Sorry. Just just saying The AI, like, it was like some kind of, you know, meta object, that was surrounding us, that just that existed, or that is unique. And, you know, it’s not, again, it’s laboratories that are making products that are different to yours, and right at the same behavior that doesn’t have blah, blah, blah, and making your common, you know, common discourse around artificial intelligence. It’s nonsense.

Jamal Whitaker
Yeah, it’s just such an infantile level.

Axel Chemla—Romeu-Santos
Exactly. And even and, you know, just just to say, you’re making that something from above the AI, it’s, since the beginning, actually remove the power of the people to understand it, and to fight and have an opinion and blah, blah, blah, instead of just a reaction of fear, of course, you know, yeah, yeah, there are these kind of hardness to sorry, but really, really just, you know, different breeds of AI, that we sell, you know, if you will mount a business without AI use Trude. So it’s really like a circle, of, you know, of total, like, alienation, some hope that you make fear people, people need you, you make it and you know, it’s like dependency.

Jamal Whitaker
Yes. Yeah,

Nick Charney Kaye
PPeople just cannot resist anthropomorphizing these models, you know, and the idea of whether it understands you and I try to explain to people that I know, that don’t have any background in this, you know, you type something into Chat GPT. And it is just reflecting back to you what you typed in, it isn’t, you know, there’s no understanding, it does not actually have its own mind or concept or any of those programs in place. And it’s interesting, what you say, to kind of make the religious analogy that we also I think, will almost default to our most primal instincts to actually look at these things as being Godlike are somehow you know, just totally above us.

Axel Chemla—Romeu-Santos
But actually, it also works because scientists cannot explain either.

Jamal Whitaker
Yeah.

Axel Chemla—Romeu-Santos
That is really important to understand it that’s in the process of producing a machine learning stuff, until you are designing the architecture, you choose the data, and you’ve already did the data with the criteria you give the machine to be evaluated on. And so of course, there are some I mean, there are ways to do to circumvent that, you know, the thing between the data, you train on the validation that in a train, blah, blah, blah, but actually, it’s always biased. And of course, I mean, the idea of score, of course of machine learning being biased, it is total nonsense, because the training is biasing your machine.

Nick Charney Kaye
Yeah, it’s made of bias, yes.

Axel Chemla—Romeu-Santos
Yeah, it’s already like, for example, my favorite nonsense ever, and these kind of things. If that’s, for example, you know, you if you want to make an article that says, like, kind of solid, you will like us a perceptual test, of course, with real humans, with the preliminary that we don’t have, you don’t have any money to do that. So it takes a lot of time. And it’s also an expertise by itself and like making procedure works for this. It’s, it’s, it’s, it’s an amount of friction and the amount of knowledge to do that. And so with the D, the trained model to forecast the predictions, right? From the data, like you’re like, Oh, wow. So you know, yeah, it was, you make the problem the solution and the discourse, right?

Nick Charney Kaye
It’s just turtles all the way down.

Axel Chemla—Romeu-Santos
Exactly. And the primary that unfortunately, no, you can rarely do science. I mean, this is techniques. And so nowadays in machine learning, honestly, it’s mostly techniques not science anymore. And this is my opinion. I mean, science is more, more Rabins really like searching, for example, just the fact that you cannot publish a paper on the system that did not work prove that you’re not anymore in science. Okay, it’s a proof of concept. I mean, okay. Okay, that’s I mean, that’s more research because you prove that this object can generate this thing. This is science. But for the artists repercussions of this? This is for scientists, this is not for musicians.

Nick Charney Kaye
And not a lot of middle ground.

(music: outro)

Nick Charney Kaye
If you’re interested in realizing new possibilities and background music for video games, live streaming or environments, get at us here at XJ we believe music is human,

Jamal Whitaker
Something that artists couldn’t even dream of. Now it’s at our fingertips.