Vertical video clip creation with FFMPEG and Batch

How I made a simple vertical video clip creation process for a streamer friend using FFMPEG and Batch scripts.

Alt text Written by Okom on Jan 08, 2025 in Video. Last edited: Jan 08, 2025.

I got to know JSLund at HCS Worlds 2023 who streams on Twitch playing Halo Infinite. I've been helping him with various things mostly just from wanting to see him grow his brand as he is genuinely a good person and a positive figure in the Halo community.

Being knowledgeable in video and audio manipulation with FFMPEG and somewhat automating it with Batch scripting, I created a process for him to easily create vertical video clips from his streams. I've still written the scripts to work for any content creator as they have multiple adjustable parts.

The output

The final output video would look like this for example, with a comparison screenshot of the source video:

An example output video after running all the scripts; file size reduced for sharing.
Alt text
Source video screenshot; notice the webcam on the right.

The output would include:

Download

Find the vertical video creation scripts here. Download and extract the .zip file.

Prerequisites

The script relies on these software to work: Windows PowerShell, FFMPEG and YT-DLP. For Windows users I recommend downloading these through the Chocolatey package manager using choco install ffmpeg yt-dlp.

Even if you have these software already installed, ensure they are using the latest version to ensure all the commands are supported. To do this with Chocolatey, use choco upgrade ffmpeg yt-dlp.

Usage

How to convert a section of a video you want to clip into a vertical video using my scripts.

Alt text
Showcase of all the script files.
  1. Download Video: Open 01 Download Video.bat, paste the video URL to download and enter the output file name. Optionally enter the start and end time of the video to only download that section.
    • Download section: Choose a section of the video to download. Useful for downloading a short section of a long video.
    Enter video URL to download: https://www.twitch.tv/videos/2344438774
    Enter output file name: redgear clip
    Download only specific section of video? (Y/N): y
    Enter clip start time (HH:mm:ss.SSS): 02:56:10.5
    Enter clip end time (HH:mm:ss.SSS): 02:56:32
    Example input for 01 Download Video.bat.
    Alt text
    Still frame from a downloaded video.
  2. Crop Vertical Video: Drag your desired video clip on 02 Crop Vertical Video.bat and choose whether you want to relocate the webcam to the middle and trim the video length.
    • Preset webcam values: Static values written in the code, which can be edited by opening the .bat file with a text editor and editing the values starting from line 18. Useful if you've figured out your working values and don't want to always type them.
    • Manual webcam values: Enter the video resolution and position and size of the webcam frame for the script to know where to pull it from. Enter the desired webcam offset and scale; the default values place it on the top middle.
    • Trim length: Trim the video down to a specific length, if your source video was too long for example.
      You can use "0" as the start time or "9999" as the end time to begin from the start or end at the end. Useful if you want to start at e.g. 10 seconds, but end whenever the source video ends.
    Relocate webcam to vertical video? (Y/N): y
    Use preset webcam position values? (Y/N): n
    Enter video width in pixels (e.g. 1920): 1920
    Enter video height in pixels (e.g. 1080): 1080
    Enter webcam width in pixels: 520
    Enter webcam height in pixels: 293
    Enter webcam distance from the left of video in pixels: 1385
    Enter webcam distance from the top of video in pixels: 231
    Enter webcam output size multiplier (default = 1): 0.7
    Enter webcam horizontal offset (default = 1): -370
    Enter webcam vertical offset (default = 10): 620
    Trim video to a specific section? (Y/N): y
    Enter clip start time (HH:mm:ss.SSS): 0.5
    Enter clip end time (HH:mm:ss.SSS): 9999
    Example input for 02 Crop Vertical Video.bat.
    Alt text
    Webcam default position after a succesful crop.
    Alt text
    Webcam scaled to 0.7 with a horizontal offset of -370 and a vertical offset of 620.
    Alt text
    ShareX is a great screenshotting tool that shows the position and size of the cropped area, which is useful for figuring out the webcam frame values. Just make sure to have the video be at the correct resolution and at the top left corner of your screen.
  3. Merge Videos: Merge two or more videos together by dragging them all on the 03 Merge Videos.bat file. For the ideal result, only merge videos with the same resolution, frame rate and encoder settings.
    Remove the -c copy expression from the script if you're running into issues regarding the specifications.
  4. Add Image Overlays: Drag your desired video clip on 04 Add Image Overlays.bat and choose if you want to add an end card image in addition to the regular overlay image.
    • Image Overlay: Static image that is placed on top of the video like a watermark. Uses the image-overlay.png file; replace it with your own image using the same name and resolution.
    • End Card: Image that shows up 3 seconds before the end of the video. Uses the end-card.png file; replace it with your own image using the same name and resolution.
    • These image files must be stored in the same directory as the input video and should be the same aspect ratio as the input video to avoid deformation.
    Add end card? (Y/N): y
    Example input for 04 Add Image Overlays.bat.
    Alt text
    Template image overlay that is on the video for the whole duration.
    Alt text
    Template end card image shows up at the end of the video.
  5. Add Music: Drag your desired video clip and audio file on Add Music.bat and enter the start time and volume for the music. This will add the audio as background music mixed to the rest of the audio.
    Enter music start time (mm:ss): 1:20
    Enter music volume (default = 0.1): 0.3
    Example input for Add Music.bat.
    You can download music with the 01 Download Video.bat script, as the Add Music.bat script will only use the audio stream and ignore the video.

    For downloading just the audio with yt-dlp, you can run the command yt-dlp -F <URL> and see if there is a format with just audio. If there is, you can download that with yt-dlp -f <format> <URL> such as yt-dlp -f 251 https://youtu.be/bwikj_lQLT0
  6. Downscale Resolution: Drag your desired media file on Downscale Resolution.bat and enter the downscalar multiplier. This will change the resolution of the file and reduce the size.
  7. The width of the output file must be divisble by 2.
    Enter desired scalar (e.g. 1.5): 1.6
    Example input for Downscale Resolution.bat.

Script breakdown

These scripts ultimately rely on FFMPEG commands to produce the output, so I'll explain those commands instead of everything that the script does to get to those commands. Still some parts of the Batch scripts are explained so the FFMPEG commands make sense.

Download Video

Uses a yt-dlp command with an optional expression to download a specific section of a video.

yt-dlp --no-check-certificate --download-sections "*%starttime%-%endtime%" "%link%" -o "%filename%.mp4"
YT-DLP command to donwload a video file from a specific section.

Crop Vertical Video

The command responsible for the core of the method's functionality. First the prompts for webcam relocation are given, and the user needs to input the location and size of the webcam frame so these can be used in calculations later.

Webcam position

The first calculation is turning those webcam position pixel values into ratios so they can be applied to any video resolution and the position of the webcam will be correct as it's just a specific amount of the resolution instead of a fixed pixel amount. As Batch doesn't handle decimals, I run the calculations in PowerShell. This opening of a different program momentarily causes a slight delay in performance of the action, but it's negligible:

for /f "delims=" %%a in ('powershell -Command %videoWidth%/%camWidth%') do set ratioCamWidth=%%a
for /f "delims=" %%a in ('powershell -Command %videoHeight%/%camHeight%') do set ratioCamHeight=%%a
for /f "delims=" %%a in ('powershell -Command %videoWidth%/%camPosWidth%') do set ratioCamPosWidth=%%a
for /f "delims=" %%a in ('powershell -Command %videoHeight%/%camPosHeight%') do set ratioCamPosHeight=%%a
for /f "delims=" %%a in ('powershell -Command %videoWidth%/%camOffsetHorizontal%') do set ratioCamOffsetHorizontal=%%a
for /f "delims=" %%a in ('powershell -Command %videoHeight%/%camOffsetVertical%') do set ratioCamOffsetVertical=%%a
Calculcating various ratio values. For example 1920/535 = 3.588785... for the first line.

Crop command

After the webcam position ratios are calculated, the user is asked to optionally trim the input video. Then the following FFMPEG command is ran:

ffmpeg -ss %starttime% -to %endtime% -i "%~1" -filter_complex "[0:v]crop=(iw/3.15789):(ih)[vertical],[0:v]crop=(iw/%ratioCamWidth%):(ih/%ratioCamHeight%):(iw/%ratioCamPosWidth%):(ih/%ratioCamPosHeight%)[webcam],[webcam]scale=(iw*%camScale%):(ih*%camScale%)[webcam-scaled],[vertical][webcam-scaled]overlay=(((W-w)/2)+(W/%ratioCamOffsetHorizontal%)):(H/%ratioCamOffsetVertical%)[out]" -map [out] -map 0:a -c:a copy "vertical_%~n1.mp4"
  • -ss %starttime%: start the input file at the value of %starttime%
  • -to %endtime%: end the input file at the value of %endtime%
  • -i "%~1": first input file; the file that was dragged on the Batch script
  • -filter_complex: initialize filterchain
  • [0:v]crop=(iw/3.15789):(ih)[vertical]: crop the video stream of the first input file (0:v) to a portrait width with the input height; e.g. 1920/3.15789 = 608. Call it "vertical"
  • [0:v]crop=(iw/%ratioCamWidth%):(ih/%ratioCamHeight%):(iw/%ratioCamPosWidth%):(ih/%ratioCamPosHeight%)[webcam]: crops the webcam frame from the first input video stream based on the previously calculated ratio values. These ratio values are applied to the input width and height to determine the position and size of the webcam on any input file resolution, which allows for accurately relocating the webcam as long as the initial webcam position values are set correctly. Call the output "webcam"
  • [webcam]scale=(iw*%camScale%):(ih*%camScale%)[webcam-scaled]: scale the "webcam" stream with the user-defined scalar value to scale the webcam frame; call it "webcam-scaled"
  • [vertical][webcam-scaled]overlay=(((W-w)/2)+(W/%ratioCamOffsetHorizontal%)):(H/%ratioCamOffsetVertical%)[out]: overlay "webcam-scaled" on to "vertical" and place it in the top middle of the vertical video along with user-defined offset values to optionally offset its position. Call the output "out"
  • -map [out] -map 0:a: choose the custom "out" stream as the video stream and choose the audio from the first input file as the audio stream for the output file
  • -c:a copy: select the streamcopy encoder with no decoding or encoding for the audio stream; copying the same audio stream to not cause unnecessary quality loss as it was not modified
  • "vertical_%~n1.mp4": output file name and extension; forced to be .mp4 for ease of use and compatibility reasons

Merge Videos

I don't know half of how the Batch script works for this; I found it on GitHub, and it works. I can explain the FFMPEG command and how it works though:

ffmpeg -f concat -safe 0 -i confiles.txt -c copy merged_"!file!"
  • -f concat: demuxes a list of files one after another
  • -safe 0: ensures that the command works with non-relative file paths in the input file
  • -i confiles.txt: input file that was created by the Batch script; holds the file names of the videos to merge
  • -c copy: select the streamcopy encoder with no decoding or encoding; copying the same video and audio streams
  • merged_"!file!": output file with the name nad extension inherited from one of the input files

Add Image Overlays

First we get the duration of the first input file with an ffprobe command so we can later determine when to show the end card:

ffprobe -i "%~1" -show_entries format=duration -v quiet -of csv="p=0" >> duration.txt
Fetches and stores the duration of the first input file in a text file "duration.txt" in the form: ss.SSS

Then a bit of Batch script to declare the fadestart variable based on the video duration.

set /p videoduration=<duration.txt
set /a fadestart=videoduration-3
del duration.txt
  • set videoduration as the text from the duration.txt file
  • set fadestart as the videoduration - 3 so the end card image fade-in begins 3 seconds before the video ends
  • delete duration.txt as it's not needed anymore

Then the FFMPEG command that wraps it all together. There are two variants of this command with the other one not having the end-card.png input, but this is the full command:

ffmpeg -i "%~1" -i "image-overlay.png" -loop 1 -i "end-card.png" -filter_complex [1:v][0:v]scale=rw:rh[1-scaled],[2:v][0:v]scale=rw:rh[2-scaled],[0:v][1-scaled]overlay=0:0[image-overlay],[2-scaled]fade=in:st=%fadestart%:d=1:alpha=1[end-card],[image-overlay][end-card]overlay[out] -map [out] -map 0:a -c:a copy -shortest "overlay_%~n1.mp4"
  • -i "%~1": first input file; the file that was dragged on the Batch script
  • -i "image-overlay.png": second input file
  • -loop 1 -i "end-card.png": third input file with a loop filter; I guess the loop keeps the image loaded when it's not rendered, as it doesn't work without it
  • -filter_complex: initialize filterchain
  • [1:v][0:v]scale=rw:rh[1-scaled]: scale the second input (1:v) to the width and height of the first input (0:v) and call it "1-scaled"; makes image-overlay.png the same size as the video to not limit it to a specific resolution
  • [2:v][0:v]scale=rw:rh[2-scaled]: scale the third input (2:v) to the width and height of the first input (0:v) and call it "2-scaled"; same scaling procedure for end-card.png
  • [0:v][1-scaled]overlay=0:0[image-overlay]: overlay the scaled image-overlay.png to the video at coordinates 0,0 from the top left and call it "image-overlay"
  • [2-scaled]fade=in:st=%fadestart%:d=1:alpha=1[end-card]: begin fade-in of end-card.png 3 seconds before the video ends (variable %fadestart%) that takes 1 second and has transparency on, call it "end-card"
  • [image-overlay][end-card]overlay[out]: overlay "end-card" on "image-overlay" and call it "out"
  • -map [out] -map 0:a: choose the custom "out" stream as the video stream and choose the audio from the first input file as the audio stream for the output file
  • -c:a copy: select the streamcopy encoder with no decoding or encoding for the audio stream; copying the same audio stream to not cause unnecessary quality loss as it was not modified
  • -shortest: finish encoding when the shortest output stream ends; stops the video from being infinitely long due to the image input file loop
  • "overlay_%~n1.mp4": output file name and extension; forced to be .mp4 for ease of use and compatibility reasons

Add Music

First we get the duration of the first input file with an ffprobe command so we can later determine when to fade out the music:

ffprobe -i "%~1" -show_entries format=duration -v quiet -of csv="p=0" >> duration.txt
Fetches and stores the duration of the first input file in a text file "duration.txt" in the form: ss.SSS

Then a bit of Batch script to declare the fadestart variable based on the video duration.

set /p videoduration=<duration.txt
set /a fadestart=videoduration-5
del duration.txt
  • set videoduration as the text from the duration.txt file
  • set fadestart as the videoduration - 5 so the music fade-out begins 5 seconds before the video ends
  • delete duration.txt as it's not needed anymore

Then we set a start time and volume for the music and let the FFMPEG command do the rest:

ffmpeg -i "%~1" -ss %starttime% -i "%~2" -filter_complex [1:a]volume=%volume%[music],[music]afade=in:st=0:d=1[music-fade-in],[music-fade-in]afade=out:st=%fadestart%:d=3[music-fade-in-out],[0:a][music-fade-in-out]amix=inputs=2[combined],[combined]volume=2[audio] -map 0:v -map [audio] -c:v copy -shortest "music_%~n1.mp4"
  • -i "%~1": first input file
  • -ss %starttime%: start the second input file at the value of %starttime%
  • -i "%~1": second input file
  • -filter_complex: initialize filterchain
  • [1:a]volume=%volume%[music]: set the volume of the second input file to the value of %volume% and call the new stream "music"
  • [music]afade=in:st=0:d=1[music-fade-in]: begin fade-in of the "music" stream at 0 s for a duration of 1 s and call it "music-fade-in"
  • [music-fade-in]afade=out:st=%fadestart%:d=3[music-fade-in-out]: begin fade-out of the "music-fade-in" stream at the value of %fadestart% (5 s before video end) for 3 seconds and call it "music-fade-in-out"
  • [0:a][music-fade-in-out]amix=inputs=2[combined]: mix the audio of the first input file and the custom stream "music-fade-in-out" and all it "combined"
  • [combined]volume=2[audio]: increase the volume of "combined" as mixing the streams causes a volume drop for some reason. Call the new stream "audio"
  • -map 0:v -map [audio]: choose the first input as the video stream and choose the audio from the combined "audio" stream as the audio stream for the output file
  • -c:v copy: select the streamcopy encoder with no decoding or encoding for the video stream; copying the same video stream to not cause unnecessary quality loss as it was not modified
  • -shortest: finish encoding when the shortest output stream ends; stops the video from being infinitely long due to the image input file loop
  • "music_%~n1.mp4": output file name and extension; forced to be .mp4 for ease of use and compatibility reasons

Downscale Resolution

A prompt is given to enter a scalar value, and then the FFMPEG command is ran:

ffmpeg -i "%~1" -vf scale=(iw/%scalar%):-2 -c:a copy "scaled_%~n1%~x1"
  • -i "%~1": first input file
  • -vf scale=(iw/%scalar%):-2: scaling input width of the file divided by the predetermined scalar variable and applying the same aspect ratio to the input height while rounding it to the nearest even pixel; e.g. 1920/1.6 & 1080/1.6 = 1200x675 -> 1200x676. The h264 encoder used often in an .mp4 video container requires even pixel resolutions to work.
  • -c:a copy: select the streamcopy encoder with no decoding or encoding for the audio stream; copying the same audio stream to not cause unnecessary quality loss as it was not modified
  • "scaled_%~n1%~x1": output file name along with the extension taken from the input file to allow for use with all extensions

Thoughts

I hope this can be of use to someone looking to spend little time in producing vertical video clips to share to their audiences. This started as a proof of concept from me just helping a friend out after we thought about how to share short clips of his content on the current shorts-type platforms and it eventually snowballed into me using my current knowledge of FFMPEG and Batch to make it a linear and simple process with no time spent in a video editor.

I can think of ways to make it easier such as a GUI where you can easily select the webcam location, adjust music volume and preview things, but those are beyond my programming capabilities at the time of writing and frankly out of scope for this project, which focuses on simplicity.