AI in Sound Design: The Balancing Act between Innovation and Human Creativity

“What the hell?!”

That was my first incredulous reaction to the introduction of AI via Midjourney. It’s a phrase that I’ve repeated with every new version of ChatGPT, Midjourney, and now with audio tools. I had no idea I was making such an existential exclamation point.

The other day I sent an email to a great friend of mine, Walter, who is an outstanding mixing, recording and mastering music engineer/guru based in central Canada, and got my usual thoughtful and considerate response from him.

I’d emailed Walter asking him for a touch of critique on a mix of an upcoming project, as well as recommendations for any tantalizing new plugins that might enhance the guitar sound on my project, and he sent me a few killer recommendations, but the one that stood out and piqued my curiousity was named, suspiciously, The God Particle.

A wrathful Zeus, very upset with the title of this audio plug-in. Naturally, this was done in MidJourney. Scared yet?

Originally I thought the name was created out of hubris, because of course, what the heck kind of name is that for any kind of product?

So, I ended up grabbing a demo of the plug-in, and threw it into my guitars. And then my bass. And my vocals. And my drums.

I was hooked.

What the heck was going on underneath the hood of this piece of audio magic?

I asked myself if my mixing just sucked, if I just wasn’t processing things well – if I knew what I was doing at all.

There’s some talk in audio circles that there might be some AI going on with The God Particle, but even just the idea that there is makes me immediately apprehensive. There’s no definitive information out there that AI is being used, however. I became suspicious, but was also marvelling at what it was doing: it really was removing all the “suck” and enhanced all the “superb” in each mix bus.

It was like adding a water purification system to my lines for the first time.

I asked myself if my mixing just sucked, if I just wasn’t processing things well – if I knew what I was doing at all. The insecurity of audio engineers is not usually talked about beyond the context of the conversations we have with each other – but it’s definitely there.

This devious little plug-in encroached on my mental health in only 4 minutes.

There’s a nervous pulse throughout my body I feel every time I load up a plug-in that says it can do it all, and though none of them do, they get frightfully closer each time I try a new one out.

Obviously, there is a line with all audio plugins, and I can already see the raised eyebrows of my wonderful teachers back at VFS telling me to “Cool it with the processing!”, however, I really couldn’t believe such a plugin could do so much with literally 3 clicks and a drag.

Several iZotope products, Supertone Clear, Landr, and many others are here and on the market right now, and as if whiplash hit me and my sound community: we were suddenly thrown into the age of AI, and of existential risk to our job security.

Yeah, but what about the real world, man?

What do new AI tools mean for audio professionals globally? It means saving time. Which means saving money, which is great right?!

As someone who works in post-audio, I’ve done a huge variety of projects. From commercials, to documentaries, television, feature films, and games. One of my primary passions is dialogue editing: to make things sound smooth, correct and pleasing is the name of the game.

Dialogue editorial takes time, a fine hand, and extremely refined ears. The best editors are right up there hanging out at the mixing studio to make sure their work is pin-point accurate and perfect for each scene of a movie.

On the other side of things, in games, dialogue editing and mastering is extremely important for the sake of clarity, performance, emotion, and even in terms of design (think about an alien talking and what it might take to process a human voice to sound alien!)

With the advent of new plugins and tech, dialogue editorial has become that much more simple to do. Even in DaVinci Resolves Fairlight suite you effectively have a voice de-noise functionality, and it’s pretty darn good! The most exciting and anxiety-provoking new(ish) product, however, being Adobe Podcast.

I’ve heard more than one story from an audio friend telling me that the director of their project “Ran the dialogue through Adobe so don’t worry about editing it.” Great! Not really.

This then shrinks the time spent on a project, which does many things against sound people: The (largely expensive) tools we buy then don’t get used, our expertise gets completed overlooked, many times the dialogue ends up being heinously over-processed, and we have to put our names on a project that we may not really want to even put our name on because it was done “Quick N’ Dirty.” That, and we get paid less. Great. Can you sense the sarcasm?

These stories are being told to me more and more often.

The money being saved isn’t going into our pockets. In fact, directors and producers are looking for sound professionals to cover more and for the job to be done for even less right at the same time as the tools get better. And despite the loveliness of saving money, shrinking sound budgets are not a good thing at all for what I arguably think of as the other half of visual media.

What’s the future outlook? I think we’re going to see ever-shrinking budgets for post-sound, and the same if not higher expectations. In a nutshell, “Grim”.

What is there to do?

One solution: Education? (Maybe)

“Have they considered sound?” I ask myself this question every time I start a new project.

I’m sure every director or project lead does – consciously or not.

There’s a tonal palette that’s achieved through music telling the audience the emotions we need to feel, as well as sound effects like backgrounds telling us if a scene is lively or lonely. These are all expectations felt by the director, the producers, and the studio. We have clear, human expectations when it comes to visual media and sound. Hearing a lame car honk when the thing on the screen is a spaceship could be useful if it’s a comedy, but if it’s a horror, it just won’t work.

Knowing what works takes time listening to sounds in reality as well as listening to visual media. And I don’t mean listening comfortably at home on your sofa, I mean at the theatre, or loudly on headphones, or literally sitting at a cafe and just paying attention to the millions of sounds going on with each passing moment.

That is all data that can be fed into an AI, yes, and it can very likely generate a fantastic and passable facsimile of reality, just as it has been doing and disrupting in the art world, but at the end of the day, is it not just a fabricated copy?

This leads me to another probing, and potentially incendiary question: Is that copy good enough for a movie? A commercial? A tv show? A video game?

Directors, producers, project leaders and executives tend to make those decisions of ‘Good Enough’, and when it comes to the people that make things sound extraordinary, what happens to them?

The Quality of Sound

“Oooh! What a nice door handle!” – Every sound design nerd ever

Without sound, we get bored.

And here’s my immediate, undeniable proof.

If you play games, or a video of any sort, play without sound for ten minutes. Subtitles are fine. But then, play it with headphones on, and one notch louder than you would usually find comfortable (do this safely!). How much does your attention drift versus without sound at all?

For me, as well as most of my friends (who aren’t just sound-people) we get super-duper bored. And quickly.

The quality of sound effects has reached an absolutely beautiful level: We have sound libraries out there that cover the strangest, most anomalous, and most specific sounds you could possibly imagine, and there’s always new sound effects coming out. The internet is an absolute candy store for us.

Those of us who call ourselves sound designers pick up on interesting things in our daily lives and hear things that make us go “Oooh! What a nice door handle!” Or “Wow, do you hear the zuzz of that old lightbulb?!” or ever-more oddly “Wow, that was a really nice sounding clack to those heels!” These are the kinds of nitpicky, extremely curious, vibrant people you want cutting and designing the aural experience of any project.

For now, AI doesn’t give feedback in the same way. It isn’t able to tell you that a metal door hinge might have a better squeak to it than the wooden one you’re using for that one horror scene that needs to be more intimidating or creepy. At least not yet.

Maybe by the end of this blog post?

When AI is able to give feedback like a human being, with all of our experience and flaws, we’ll then surely be robbed of our jobs, right?

Well, I still don’t think so.

As good as AI gets, your audience as a creator of visual art, is people.

We can gauge, study, and have tricks for what works for a scene, do movie screenings and tests, and even predict audience feedback, but even what ultimately works for a particular genre of film might still end up falling flat.

Whenever you see a movie with a huge A-Level star bomb, you’ll see that in action. Talent and tools don’t matter if the whole story isn’t cohesive. If tropes get overused, culturally, we get exhausted. It becomes expected.

When stories become predictable, attention wanes, and your audience suffers. Thereby, culture suffers.

There could be an argument made that the status quo is nice and comfortable, the fact that there are still a million (scientifically inaccurate number) Christmas movies coming out every year is proof enough of that, but for people out there trying to create art, “Comfortable” is a place you truly do not want to be.

How does this come all the way back around to sound?

Sound is Half of Picture

Sometimes you need a brand new, bespoke sound for something that isn’t translating well. And it’s vastly richer to have someone working alongside you in this capacity. From feedback, to challenging your ideas, to collaborating on new solutions, to iterating and refinement, sound people will be needed for a long time.

That’s what I fundamentally believe.

The usefulness of these new tools cannot be underestimated, nor over-excitedly-adopted. I think and hope that a cautious way forward is best, as the tools develop at such lightning speeds (and for such hefty price tags, but that’s for another post!).

I think AI will be very useful in developing drafts, and early cuts of projects and ideas. And I think it can be used as a tireless assistant to help us figure out the palette to our sonic paintings. But I think the discerning ears that have been making movies for generations must have the final say.

What are you willing to accept in terms of audio quality? Does knowing AI assisted in creating a piece of media bother you? Would it bother you to know if I used AI to help write this post? Comment below or e-mail me your thoughts to keep the conversation going.