How to make a movie with AI tools: A brief note at the end of 2023

Dong Zhang
7 min readJan 2, 2024

Making a movie is a big project that requires extensive team work. Since late 2022, advancements in AI technology have accelerated, allowing us to leverage new AI tools to significantly reduce the amount of human effort needed in filmmaking.

To date, the AI community has seen a number of films produced by AI, most of which are trailers, and also short fairy tales. I believe that, even though AI film production is in its early stages, we are already capable and ready to utilize AI tools for the production of formal movies.

So I spent a couple of days conceptualizing and making a AI movie. Here is the movie I created:

Chicago Mafia 1930 (Episode I)

Let me elaborate on more details.

Writing the Script

Many people have tried to use ChatGPT to automatically generate stories/scripts, and I have also made such attempts. For example, I made a fairy tale book a few months ago using ChatGPT and Midjourney:

The Adventure of Peter: ChatGPT and Midjourney fully generated comic book

I asked ChatGPT to write a short interesting children’s story, then used Midjourney to create the comic book.

However, so far the stories written by ChatGPT, while interesting, have not yet reached the level of a real movie script.

So I wrote a story about mafia by myself, set in 1930s Chicago. It is a gangster story about conspiracy, loyalty, betrayal, friendship, and more. I tentatively titled this series as: Chicago Mafia 1930.

I made the first episode of this series, which is about three and a half minutes long. It includes three acts:

  • The first act takes place in a Catholic church in Chicago, showing an intense standoff between Sean and Tony, with Franco intervening. This act can be titled “Confrontation in the Church”.
  • The second act moves to Don’s office, where Don is set as the boss of a mafia gang, just out of prison, attempting to re-control his gang.
  • The third act, which is the final act in Episode I, plays on the streets beneath Don’s office.

I found that although ChatGPT might not originally create scripts like the aforementioned, it can polish a script well, especially in mimicking the speaking style of the 1930s gangsters.

Prompts for Image Generation

After completing the script, I used Midjourney to generate the images for the story.

Prompts are always crucial for AI image generators. There are already some tools available for generating image prompts, such as GravityWriter, which is a paid service. Among the free tools for generating image prompts, ChatGPT remains a good choice.

For example, in crafting the character of Sean:

Sean, in his 40s, wearing 1930s suits, appears a bit angry but still calm.

ChatGPT helped me modify the prompt with much more details:

Close-up of SEAN, a man in his 40s, attired in a 1930s suit, his expression a mix of anger and controlled calm. His face conveys a complex emotion, eyes intense with a hint of betrayal yet maintaining a certain composed demeanor. The focus is tight on his face, capturing the slight tension in his jaw and the steely gaze, with the details of his vintage suit providing a sharp contrast to the simmering emotion within.

Adding image generation conditions such as “hyper realistic” and “real photo”, the image generated by Midjourney is as follows:

Sean in “Chicago Mafia 1930”

For the character creation of Franco, the initial setup is:

Franco, in his 30s, of Italian descent, is standing calmly in a 1930s suit. Medium shot. The background is the dim yellow light of the church.

The prompt generated by ChatGPT is

Visualize Franco, a calm and collected man in his 30s of Italian descent, standing in the middle of the frame. He’s dressed sharply in a 1930s style suit, embodying the fashion of the era. The medium shot captures his composed posture against the backdrop of a church, which is bathed in a dim, yellow light, providing a contrast between his sharp figure and the soft, ambient glow of the surroundings. The church’s architectural details faintly visible in the background add to the historical and solemn mood of the image.

Accompanied by the yellow candlelight of the church, Franco is like this:

Franco in “Chicago Mafia 1930”

Also, it is worthwhile to mention that Midjourney has a face-swapping feature now (They did not have this feature when I was creating the Peter Adventure comic book). For example, the original image of Sean in Don’s office is like this:

Sean in Don’s office

Based on the above close-up image of Sean, Midjourney revised Sean’s image as follows:

Sean in Don’s office revised

You can choose the person who most resembles Sean.

Voice Generation and Sound Effects

Once you have all images, the next step is to generate all voices and sound effects. One of the best voice generation tools is called llElevenLabs, which allows for Speech Synthesis by selecting a speaker and text-to-voice to generate the necessary sounds for the movie.

We can use llElevenLabs generate all the movie’s speech dubbing, then we can use another website called pixabay to find music and sound effects for the movie. I used three pieces of excellent music from pixabay as background music, as well as various sound effects such as gunshots, burning fires, and street noises. In particular, I would like to thank the following music creators/uploaders:

  • Muzaproduction: Crime Trap
  • Simoke: Street Mafia
  • QubeSounds: Epic Dramatic Action

Video Generation

Currently, the primary tools for generating videos come from Runway ML and Pika Labs. Runway ML use its own platform to generate vidoes, while Pika Labs generate videos in Discord.

I used the image-to-video tool to prepare various segments of the movie. In Runway ML, each image can generate a 4-second video, which can be extended four more seconds each time, to create a video of 4+4+4+… seconds in length. Pika Labs, on the other hand, can generate a 3-second video from each image.

From my personal experience, as I was still on the waitlist for Pika Labs while making the movie, I used Runway ML. Most of my videos for the movie were generated by Runway ML. Note that once Runway ML switched to a paid model, videos could be generated without watermark.

For example, the first scene of the first act: “Chicago night” generate by Runway ML:

By Pika Labs:

Fire in the Fireplace. By Runway ML:

By Pika Labs:

Running Car, By Runway ML:

By Pika Labs:

I am not going to judge which tool gives better video, readers can compare and test them :)

Video Voice Syncing

With the video and audio materials all ready, I used Lalamu Studio to synchronize voices with videos.

Lalamu Studio is an excellent free tool for giving voice to characters in videos. The method of creation is very straightforward: users simply upload a video along with the corresponding audio, and Lalamu will make the characters in the video speak the words from the audio, syncing the lip movements accurately.

One bottleneck with Lalamu is the significant reduction in video resolution after processing. To improve resolution, we can adopt a frame-by-frame optimization approach by first breaking the video down into a series of images (using image online convert), then making each image clearer (using Think Diffusion), and finally reassembling the enhanced images into a video using Runway ML.

For example, video made by Lalamu:

After Think Diffusion processing:

Video Editing Software

The final step is to combine all videos, speeches, sounds and music together to make a final movie. I used CapCut to combine all the video materials.

Since CapCut is widely used, many people have been using it for creating videos, I will skip the details of using CapCut in this blog.

Again, let me post the final movie of Chicago Mafia 1930 (Episode I):

--

--