Episode 3: Dan Spencer, Music Evangelist & VGM analyst

Episode 3: Dan Spencer, Music Evangelist & VGM analyst

Dan Spencer, music evangelist & VGM analyst, joins Charney and Jamal to discuss the future of dynamic background music in video games, streams, and places, and the power of music to connect us.
Listen on Spotify
Listen on Apple
Listen on YouTube

Transcript

(music: intro)

Charney Kaye
Welcome to the XJ music podcast. I’m Charney Kaye, founder, and I’m here with Jamal Whittaker, co-founder.

Jamal Whitaker
Hey!

Charney Kaye
We’re here today with Dan Spencer, who is a musician- you make your own music- but, you describe yourself as a music evangelist. Welcome.

Dan Spencer
Thank you. Super stoked to be here.

Jamal Whitaker
I was looking at your YouTube channel actually a little bit earlier. And you had gone on what you described as a rant. The video was titled “music teacher rants,” and and in the description of what a song was to you, your definition of a song was that it tells a story. And I thought that was so dope because it encapsulates what we do as people in telling stories and what music means to us as storytellers.

Dan Spencer
If you listen to pieces like “Peter and the Wolf,” where each instrument is representing of a character, and you can hear the interactions of characters, yes, they are stories. If we look at video game music, or film music, where particular sounds or melodies or instruments represent people on the screen, or situations, a person a place or a thing on the screen. Now, the fancy music word for this is light motif. But we can just say it’s representing a person, place or thing.

Charney Kaye
This whole project that we’ve been embarked upon, has centered around ambient music. That can mean a lot of different things, but the thread that seems to run through everything we’ve done is kind of hands off- an hours-long stream or the music in a space that stands still while people move through it.

Dan Spencer
I think using ambient music as an example, the music becomes the soundtrack for whatever you’re doing it the time. Sometimes if you’re in a hotel lobby, there’s not pop songs playing, there’s sound that’s creating an experience. There have been studies that have been done around specifically ambient music helping people. The data does not say specifically that it was the music. In fact, researchers think that maybe the music gets us into a state where we become relaxed in our nervous system calms down, and by doing that we actually experience less pain.

Charney Kaye
What’s the difference for you, between playing music live for people and playing music in a recording setting.

Dan Spencer
When I record the focus for me is always on precision, quality and soul. Precision & quality, so that it is rhythmically technically a viable performance, it’s going to live the test of time that’s being recorded. And when I play live, I honestly don’t care about quality. I care about having fun, I care about performing and I care about sharing with people. For example, I do these live streams where I at sight unseen breakdown video game music, and oftentimes, I will start playing over the video game music live in real time with no prior preparation or knowledge. There’s hundreds of hours of me sounding pretty good, but also making mistakes out there on YouTube, you can go watch. I also do music theory analysis, I listen to it and then I tell everyone what’s happening in the music. And you can check that out on the YouTube channel @musictheoryforgamers.

Charney Kaye
Very cool.

Jamal Whitaker
Oh, a different channel. Okay, because I didn’t see that on your main channel. And that was that was one of the questions that we had: are you a gamer?

Dan Spencer
Oh, yeah.

Charney Kaye
Dan’s done a series of videos where he’s actually streaming with an audience talking to him live playing through a game and really diving into a lot of the details of the music. Video game music is the most exciting place that we could be using XJ music. We’re specifically interested in putting this more advanced tool in the hands of video game music creators, and collaborating with people working on video game engines. It’s a really cool surface with a lot of new possibilities for dynamic music. So we wanted to talk to Dan and just get some ideas about how one would approach designing music for a video game.

Jamal Whitaker
One of the things we were looking at is like predicting use cases for for XJ music in the video game sphere. And one of the ways that we envision that happening is in an open world game such as Skyrim and like the various different states that you can be in, in Skyrim, and musically, what you would want to go for, like in an exploration sense versus a combat sense versus a dialogue state.

Dan Spencer
So I actually listened to one of the the lead developer guys over at Bethesda. He was on the Lex Fridman podcast and I was listening to that episode, which is amazing. I don’t know if you guys have heard that, if not, if anyone’s into game music or video games in general, go check out that episode. One of the things that makes Skyrim specifically so unique is that all possible courses and quests are all available to you at all times. So the only thing that would stop you from going and taking on a quest would literally be that you might die, your character might die before you get there, because literally, the level of the challenge is too high for you to get through. So with something like that, you already have an extremely unique proposition for a game where all possible things are available to you at all times within that, if we think about music, and music states, everyone’s going to have a unique journey past just picking what your avatar looks like and what order you do things. And like, where that goes, who knows, I’ve never made video game music myself. So I’ve broken it down as as an outsider, and I’ve put in several hundred hours or thousand hours at this point, I’m not even really sure. So I can give you my understanding from that perspective. I also know how to write music. So I can I can make some informed inferences there. I think that there’s a lot of technology that goes into video game music. There’s a lot of exciting new options now in terms of dynamic tracks. So, you can be walking around in an area and depending on what you’re doing, the music changes depending on actions you take, or depending on an item you are in possession of or the way you use an item. The entire music can change. What I’ve seen in terms of people who respond the best to video game music, it is always going to be a game where there is an element of excitement and adventure. The ambient tracks, for example, from Elden Ring, people were like, “oh, yeah, that’s kind of cool” and like to do it, don’t get me wrong, takes a huge amount of artistry, a massive amount of talent, and a huge amount of skill. But in terms of what people get excited about, people get excited about the stuff that is moving, that’s groovin that is high energy, or the things that are high in emotional content, that are describing or painting the scenes were significant emotions are being felt, or depicted on characters or in cutscenes.

Charney Kaye
Video games that I’ve been playing my whole life did some version of this tracker, where you’ve got multiple layers of audio and they get activated or deactivated depending on what you do you. And video game music is essentially ambient music. But you know, it says if you had ambient music following you around in your life, just like scoring in your life,

Dan Spencer
Dan presses the button on the blender! Dun, dun-dun-dun! The horns come in! As I’m getting ready to do my workout on my bike, the drums come in! Chc, chc-i-chc, chc-i-chc. Well, I think the music actually has it easy because the music can follow the lead of the character development. So the person who has the most work to do is the character developer, and then you could just assign different parts of music to the states of the character, Iif that makes sense. You get to have creative legwork done for you. And then at that point, you step in and say okay, how am I going to get X to translate? So it’s like, just like people. Some people get angry in a certain way. Some people get sad in a certain way. Some people have a propensity towards certain moods or states over others. And that would be the same that would be what would make a believable character. And then from there, you would extrapolate and you’d say, “how do we get this character’s theme to translate?” as being, chilling by a log fire versus we’re walking through the woods versus walking through the woods and now it’s getting a little scary because we’re hearing some noises. Where’s the music being pinned to is a mute because a light motif is for a person, place or thing, right? So it’s like, what is the music being pinned to is the music being pinned to the avatar that you’re talking to? Is the music being pinned to you? Is a music being pinned to an action as a music being pinned to what you’re doing? Is the music being pinned to where you are? Or once you get because once you get really complex with it, it’s like why not have a combination of two, right? So it’s like you’re in the Dwarven ruins but because you’re there with Sally, it’s a dwarven ruins but there’s this extra instrument that’s playing that represents the fact that you’re there with Sally.

Charney Kaye
Makes sense. We could start as video game designers with the list of people and places and things and start thinking about motifs for those.

Dan Spencer
Always yes. Go to John Williams Star Wars just just episode for new hope. Just every character, every situation, everything it’s literally just like fast ball down the middle. Here’s the new thing with them, you can close your eyes. And the movie unfolds in music. Like, like with many fantastic video games scores as well. I’m just using that as an example to throw back and go back aways. But yeah, and then I think the question really becomes, where are we pinning the motif? And what combinations do things? The Dwarven army shows up, or the resurrected Dwarven army ghosts show up, right? And it’s like, how does that change the theme of the Dwarven Ruins?

Jamal Whitaker
Yeah, the processes of where to pin music and modifiers of music has been something that we’ve been talking about for a long time. For example, like one of the games we referenced is an old one, you might have played the SSL series, which was the snowboarding game.

Dan Spencer
Gosh, I love that. I played that thing for days. Was that the one where you would get the Uber? I love that.

Jamal Whitaker
What does it sound like when you do a massive Uber trick off the side of a mountain and you’re spinning, you know, 360 degrees, tailspin, all this stuff. And the answer to that is kind of like, well, the music, if it has lyrics, it drops the lyrics, it becomes just an instrumental, there’s a high pass that’s added. And then you’re given all these, these kind of effects that take place to give you the sense of you’re doing something really cool, and then you hit the ground again, and then it’s back to normal.

Dan Spencer
You can see certain ideas used over and over again to represent large ideas. And then sometimes you hear those same ideas come back, and they don’t represent those large ideas, and they kind of throw you for a loop. Then they are a red herring. You get duped by the theme. So it’s like, for example, having a piece of music where the one chord, the tonic modulates between being the Parallel Major and minor. So it’s like, all other chords remain the same. But sometimes, at the end of the chord progression, the cadence arrives on a major one versus a minor one. And when we hear that in a game like Umineko, that implies mystery and uncertainty. When we hear that in a game like Fire Emblem: Three Houses, it’s representing the villain, the villainous duplicity of the main character. And when you hear that in Xenoblade Chronicles 3, it’s just a fun throw-away to create some texture in a night theme for a town. I heard that happen in this night, and I was like, “guys, is something crazy about to happened in this town?” And they were like, “No, it’s just like, chill, you just walk around and do stuff.” I was like, “Are you sure there’s not like, some hidden thing?” You have these musical levers that create leverage by really the factor of exposure, right? So it’s like, we hear certain sounds associated with certain things over and over again. And then those sounds, we take them to mean that those sounds represent the things. So for example, the sounds that are used in video games to represent a deep ocean versus space, are sometimes very, very similar. And indeed, interchangeable. And you could use one for the other, the totally different, but they’re also similar in the sense, like, not a lot of life, not a lot of light. Not a lot of oxygen, pressure, right vacuums pressure, right? There’s there’s all these similar characteristics that go into things where we hear them an instantaneous, you go, Oh, yeah, we’re under the water. Oh, yeah. We’re in space. There are answers. And there are no answers, because it’s art. And one of the beauties of art is that you can create a schema in which to operate and only operate within that schema, and then everyone gets it. And then you create context from that schema. And then you can also throw everything out the window. And so when we’re talking about, like, how to create music that sounds happy. Well, probably we’re not going to use diminished chords. Probably we’re not unless it’s like in a transitory way. Think about the Mario theme, right? That has a lot of crazy chords in it, but the overall feeling Dub-up Bap-boo-dee-gonk, gunk! that has many, many, many, many, many things going on at harmony-wise very quickly, but it’s sort of this Tin Pan Alley type Broadway stuff right where it’s like it’s chord change chord change, chord change, chords charge. Just changing chords relentlessly is part of what gives it the feeling. As you said, you create your own world, and then you operate within it. And that’s art. And I think that in a lot of ways that is totally counter to the way that let’s say, corporations, building AI products want to be able to think about this where it is all actually universally categorizable and you can just name a tone and pull it out. Yes. And if you’re going to create a world, you have to do that. So the feeling of a world in music is a product of doing that. So I think, I think really what we’re talking about is not whether you’re doing that or not, we’re talking about at what scale you’re doing them. Because if it’s at a large enough scale, then it doesn’t feel as restrictive. But if it’s at a small scale, it feels restrictive. That is really is really what we’re what we’re talking about, if that makes sense.

Charney Kaye
You know, if you could just go to Splice and say “here, make video game music,” then it might all come out sounding the same for all these different games. It’s the sort of the complexity of like you say, coming up with the motif for each character. And that’s directly tied to what writers go through, you know, like whether an AI would be able to write a compelling world, character by character is directly tied to whether I could write a compelling OST, you know, motif by motif?

Dan Spencer
Sure, maybe. But I think that, again, that’s a question of scale. It’s like, how large is the language model? If the language model is large enough, I don’t see why it couldn’t. I’m not saying I don’t want humans to create art. I do want humans created art. But I think that the humans making art is going to become such a commodity, and then it’s going to become like a Ferrari. You know, it’s gonna be like, “we got a real composer for our OST.” And we’ll be like, “oooh.”

Charney Kaye
Yeah. I do think a lot of the character of something or the quality of it, and the unusualness of a piece of artwork comes a lot of the time from all the rules that you make for yourself about what you’re not going to do. And I suspect that comes from being human ourselves, and being limited ourselves.

Dan Spencer
I don’t know, I don’t see why you couldn’t teach an AI to set restrictions on itself.

Charney Kaye
Right, you have to be specific.

Dan Spencer
You can tell the model to pretend that it has restrictions or it doesn’t have restrictions, so I think we’re there. Like, you can tell the model to be it can only use three letter words and has to make a sentence, it can do that.

Charney Kaye
Right. That choice, though, becomes one of the most signifiying qualities.

Dan Spencer
Yes, all right. So I take your point, that a human being right now has to make the prompt.

Charney Kaye
And it becomes less about what we can do and just more about why are we bothering with any of it, right?

Dan Spencer
Yes.

Charney Kaye
I know Jamal has been playing the hell out of Starfield was just a couple of days ago, you know, you can still make a huge game in 2023.

Jamal Whitaker
Right. Yeah. Oh, yeah, definitely. And, I mean, obviously, that soundtrack has been stuck in my head for I think it’ll start to play like my mind is some kind of like start menu.

Charney Kaye
Imagine a world in which all those are assumptions. Like we have the language model, and we can just go to it and design it. And you know, if we live in a utopia in which we’re all just designing games for each other, right, and not worrying about making a living, that’s a hugely powerful tool, but it still leaves you with a lot of the most human aspects of it left out. Like, what am I making a game about? And what sorts of layers make it compelling for you? Yeah, I appreciate that you’re highlighting these really well crafted experiences that people are putting together even though technology is involved. Is there a middle ground between human originated music and this full-blown give-a-prompt-to-a-machine and have it compose the music for you?

Dan Spencer
I like to look at the auto industry for this, because we can see how even with complete automation, you know, carmakers like Tesla are able to come in and just crank these things out. And there’s also people like Ferrari, Bugatti Lamborghini, where it’s like, you get hand stitched leather seats. And I think music is going to head in that type of a direction where we’re going to have a lot of people figure out how to automate music creation. And, again, whoever is going to be thinking the most creative ways around this is going to create some of the cool stuff that’s really going to be groundbreaking. Unfortunately, we have through a series of mishaps that happened in the record industry several years ago, devalued music, and which has resulted in everyone devaluing music to a large degree, which means that now, people take music and sound to a degree- and this is not everyone, but this is generally true- people take music for granted and do not ascribe $20 for an album value, or $30, to an album value that they used to. Now it’s about .007 cents per listen, per song.

Jamal Whitaker
Yes, I know that all too well. Those fractions of a fraction of a cent.

Dan Spencer
When you are able to have an AI read facial expressions in a room and create music that’s been scientifically proven to increase buying or I was talking with my girlfriend today, we were watching some boxing movie or whatever, and like the moment where like the guy does like the last right hook and like knocks knocks the other guy out, like how the music swells. I was like, “I wish I was the guy who did that for the UFC,” so in real time, you could hear like, the top fighters in the world and have this amazing music happening in real time. The AI is reading what’s happening on the fight and predicting what’s going to happen and then scoring it out.

Charney Kaye
Yeah, it’s such a wild time where like, we still are waiting for the other shoe to drop, which for me is when we really see that kind of Dall-E level, the Midjourney level of synthesis showing up. Things that we’ve seen so far are still pretty handmade. You ever listen to that account? What’s this guy’s name.. “There, I Ruined It.” You know, I’m talking about?

Dan Spencer
No, I’m not sure I do.

Charney Kaye
Love this dude. It’s like, it’s just the wildest stuff like he, there was this one he was this was just a couple of weeks ago, it was Johnny Cash singing Barbie Girl. But it’s like, perfect.

Dan Spencer
Excellent.

Charney Kaye
My sense about that, and I love watching that dude’s work, is that it’s still so handcrafted. There’s something so incredibly human about the choice to like, combine these things specifically. We are pioneering advancements in ambient music, with this digital audio workstation that we’ve built. Fundamentally, you know, this is all still human made, you’re recording everything yourself loading in stems that you’ve already created. And then there’s a whole lot of metadata that you you build around that in order to design what is an endless musical experience, except that you can also then influence it along the way, like the video game use case, for example, where you say, “okay, now add, you know, this tag, if you will,”

Jamal Whitaker
Yeah, combat, exploration, you know,

Dan Spencer
The live implementation of that is going to be so cool, because you could have someone in real time deciding things, like the next level of DJing. That’s so cool. So I’m already I’m already turned on by the idea. Cool. Okay. So tell me a little bit about how the interface of the DAW works as it extrapolates things out forward. So I understand we could load stems in. So let’s say each stem is five minutes long each, right? So we’d end up with, let’s say, a stack of five stems, right. And so and I’m assuming we’re saying stem is going to mean the sum of all parts. So it’s going to be all the drum parts together would be the drum stem. So we have a drums, let’s say bass, let’s say pads, right? Because ambience. What happens next?

Jamal Whitaker
When we sat down and we’re saying, Okay, we want XJ to be able to make Lo Fi hip hop. And so what I did was, as, you know, me being a hip hop producer myself for over a decade, it was basically kind of like just diving into Lo Fi. And, and Lo Fi isn’t necessarily what I occupied my own production with, but in learning it, I was like, Okay, well, let me let me break down what the what the most common tempos are. We found those and then we will say, Okay, well, it goes between 75 to 85. We average it out. So we’ll just pick those two and have it have it build on those and and from there it was, it was a lot of adapting XJ to be able to create a more organic and more kind of unquantized sound. And being able to shift drums a little bit later, shifts some drums a little bit early. You know, you want your kick a little earlier, snare a little bit late, your high hats very late, basically getting XJ to to put out a more organic sound with each specific instrument. In working with one of our other musicians, Mark Stewart, I was working more and more so on the rhythm side, and Mark was was, you know, doing all these performances and putting them and the way we wanted to kind of catalog these, these performances and kind of you know, give them what we call “memes,” which is a sense of like, just tagging them and so that XJ can read it and then decide, you know, this goes with this, you know, come up with a an accurate pairing and a pairing that sounds good. And so we started with calling our harmonic performances by city names. So for example, we might have in Nagoya, a Boulder, you know, so on and so forth. Under the hood of XJ that’s how the harmonic performances are, are organized for Lo Fi.

Charney Kaye
So you know that everything in Nagoya goes with Nagoya, and to where we started with that, when you have these stems, you have to know like, they’re the specific tempo that belongs with this other set of stems and you end up with, you might end up with a very large number of different stems, like all these different choices for bass, all these different choices for drums, all these different choices for your pads. But this totally abstract idea, Nagoya, which was just agreed upon by a group of musicians is the underlying thing that ties all the stems together. And it turns out you’ve got different layers of this Jamal use different meats to describe other like flavors, you know, harmonically. So like, Sure, so what’s happening by the time that XJ is actually kind of cranking its gears and going through your stems and putting them together? It’s like, it’s matching that stuff up or just goes, okay. Now it’s Nagoya pork time.

Jamal Whitaker
I would say most commonly within Lo Fi, it ranges from like a grouping of a certain drum pattern with a certain quantization or on quantization aspect.

Dan Spencer
Sure, ‘cuz Lo Fi has its own feeling.

Jamal Whitaker
Right.

Dan Spencer
Lo Fi, by its definition has this sort of unquantified, slightly herky-jerky feeling of the drums that gives it its character.

Jamal Whitaker
Right. Right. So So along those lines, chicken might be very herky-jerky, and beef might be more straightforward. And, you know, less, you know, less swing and less, you know, so yeah, there’s there’s all there’s all different types of ways you can go between different genres and the memes within those genres.

Dan Spencer
Sure! Okay, so what’s the end game?

Charney Kaye
Streams, games and places.

Dan Spencer
So in the streams, are you seeing that someone before the stream starts says I’m doing X type of thing.

Jamal Whitaker
Yeah.

Dan Spencer
And obviously, it wouldn’t user interface would not be Nagoya Pork, it would be, you know, emotion, like user interface would be like a motion, emotion emotion of what I want. And then after a certain duration of time, they could, as the stream evolves, or if they want to talk about something else, they could have like a one button that they press, and then the music would change with them.

Jamal Whitaker
Yeah. And it would, it would essentially just keep on going. I mean, that that is at XJ’s core. It is like from a DJ. But just with X.

Dan Spencer
Right.

Charney Kaye
What we’re doing is very bespoke, there’s companies that are really pushing for this kind of generalization of music, where you just type in what you want, and you get it back. And we would actually prefer to kind of stay on the other side of that, where it does just say Nagoya and Pork on your control board, because that is what means the most to you if you were the person who created that music.

Jamal Whitaker
Sure, yeah.

Charney Kaye
And as we get more into, like serving more general needs, like for streamers, we would end up building out libraries that were like designed to serve these different needs.

Dan Spencer
So then how do you how do you monitor for licensing content? Because you’re gonna get there’s so many possible permutations how then do you how then do you track usage across a platform like YouTube, for example? Like how do you content ID that?

Charney Kaye
You can look at both sides of it. At some level, we’re creating this incredibly unique music where like every individual one is like its own fingerprint.

Dan Spencer
Yeah.

Charney Kaye
But, the reality is that because you’re not feeding every single second of that into their audio fingerprinting system, you’re not really you know, going to and we’re all learning that we get the Random Shazam is for our music that we know are not our music. So that you know, those systems are not perfect.

Dan Spencer
Yeah. So I understand the streaming, that’s really cool. It’s super useful. I get it. Video games.

Charney Kaye
So video games, I think are, for me the most exciting use case, because so much craft goes into a video game, and people get so much value out of that craft. So we would like to work with the game development team starting pretty early on. And we would want to be abreast of the music direction process from the very beginning. We’d really encourage the people designing the music and producing the music to do exactly what they’ve already been doing. And just think of this as a tool to add to their toolkit, which would allow them to get that finished product to be even more dynamic, you know, when it’s actually playing in the game. What I imagine a video game designer goes through is a bit like what we’ve gone through with these channels, where you have these major areas of let’s say, the map or the game world, and they’re going to have these specific characters, you know, or the music is going to have a particular character when you’re in there. But then there’s the aspect of what you happen to be doing at the time, whether you’re in combat or whether it’s a romantic moment. So you consider all of those things and sort of have this matrix of all the possible game music that you need, which again, I’m imagining is just already how this has been done.

Dan Spencer
For sure. I think the next step is actually where it really gets exciting in the next step is people using chat models within video games to create spontaneous character interactions. If you could hook that model up to that, that’s where that’s what really works, right? Because you have all the information of the person’s theme that you’re talking to. But then depending on what the model is spitting out, you could match that to your product. And I think that’s for me, looking to the next 5 to 10 years of where gaming is gonna go. That’s the thing.

Jamal Whitaker
Yeah.

Charney Kaye
It’s really been a pleasure meeting you and speaking with you about all of this. And really cool to know you in the space that we’re in.

Dan Spencer
For sure. Likewise, this is really cool.

Jamal Whitaker
Yeah, man. It’s awesome to hang out.

Charney Kaye
Let anybody listening know where to find you what they want to know to check out

Dan Spencer
My newest book called “The 14 Unshakable Laws of Learning Music: How to Master Any Instrument and Singing in Five Minutes a Day” is out. Now, if you’ve ever wanted to understand how to actually get better at music, and how to learn music, I break down the very confusing matrix of apps, courses, teachers, books, what they all do, how to use them, how to actually set goals and music, how to actually figure out what you want to do music, how to actually get there. The book released hit number one bestseller in three out of three categories that are number one hot new release, and three out of three categories. And number one bestseller for over a week. And so it’s rocking and rolling, doing super well.

Jamal Whitaker
Awesome! Congratulations.

Charney Kaye
Very cool. I’m gonna make sure to pick up a copy.

Dan Spencer
For sure. Thank you so much, guys.

Jamal Whitaker
Awesome. Thank you.

Charney Kaye
Come on in. The water’s fine.

Jamal Whitaker
And this isn’t a swimming pool or a lake. This is an ocean of possibilities we’re talking about, something artists couldn’t even dream of, that’s now at our fingertips.

(music: outro)