Video Standardization and Compression

Origin

It all started with my friend complaining about the lack of space on his drives and I was thinking there has to be a way to fix that problem. At first, I did a comparison program, but it only saved a few GB out of his 12 TB and mapping the drive noticed a lot of videos and they were all over the place. The end goal was running a NAS server with all his files which required a certain standardization to the files. Inadvertently during the process, I found a way to compress the files by an insane rate.

Overview

The program starts at a directory and traverses that directory into sub directories taking the video files and through some checks the folder as potentially the series name. From those we come up with a standardized format naming the files and exporting to a excel document for manual user review. Once the user reviews and makes modifications they change the 2nd to last field to true and switch over to other python file to do changes. For optimal results we decided to upgrade to libx265 for format and AAC for audio. The program then runs subcommands for each ffmpeg which depending on the CPU determines the amount of time it takes to rewrite the file. If the program finishes or stop it requires the running cleaning the database and optimally run the delete format or manually doing it to update the database to not rerun the files already rewritten files.

Step 1: Setup and Retrieving Files

The setup was a little bit of a pain because I had to conversate with the customer of important data to them.

class fileObject:
	source = ""
	resolution = -1 # Check metadata
	newFilename = ""
	year = 0000
	season = -1
	episode = -1
	endEpisode = -1
	series = ""
	episodeName = ""
	def __init__(self, filepath, subtitles):
		self.filepath = filepath
		self.fullpath = os.path.abspath(filepath)
		self.filename = os.path.basename(filepath)
		self.parentFold = os.path.dirname(filepath)
		self.extension = os.path.splitext(filepath)[1].strip() # Ex: .mkv
		self.subtitles = subtitles

# Extra feature to remove files that start with ._ In addition to listing all files
def cleaningUp():
	acceptedExt = [".mkv", ".mov", ".mp4", '.wmv', '.m4v', '.avi', '.flv', '.srt', '.ass', '.ssa', '.sub']
	print("Starting Function to remove all files with a ._ Starting")
	fileList = []
	for root, dirs, files in os.walk(containingDir):
		for name in files:
			for x in acceptedExt:
				if x in name:
					fileList.append(os.path.join(root, name))
				if name[0:2] == "._":
					os.remove(os.path.join(root, name))
					print("Removing file "+ name)
	return fileList

def main():
	os.chdir(containingDir)
	fileList = cleaningUp()
	for f in fileList:
		absCurFilePath = os.path.abspath(f)
		if isVideoFile(f):
			myCurFile = extraction(absCurFilePath)
			fileRecords.append(myCurFile)
		else:
			pass

Step 2: Creating new Filename

This one was simple in theory, just extract the data how hard could that be.. Apparently very hard. So many setbacks from the naming convention people used. Taking it one function at a time gave a very long program. Below is the code of those functions and what they do.

I am most proud of smartReplace function finding all the things in different capitalization of words. Next step was the fun part of extraction based on various elements.

Step 3: Information Extraction

Feel free to refer to the class format or the writting to database to see what information is being extracted. We start out first with creating the file object

We then move to the function to extract information needed from the name and parent folder name

Then we move on to subfunctions to extract tags from the names.

Infering information based on other factors

Constructing new name from information gotten1

Step 4: Clean up

Small things got weird in formatting as so to account for that I included a few functions to deal with them. Feel free to take a dive on what they do more specifically.

Step 5: Writting to database

Since we didn't have an actual database and required that the person verify manually that the entries were correct due to human difference, we decided to write to a csv file as our source of truth.

Step 6: Video Conversion

The hardest part was reading through all of the fffmpeg documentation and finding what was the most optimal setting for videos with the maximum amount of benefit.

So after a lot of research upgrading to libx265 was the most optimal and variable framerate as we prioritized size over computing speed it would take. So I created my own remux for each file

Close runner up command was changing the buffer and max frame rates. Also a notable fix was instead of assuming the first stream was video so changed it to read video.

Step 7: Clean up

Once they are created another copy we need to verify that it was correct as some gave me a < 10 byte file which was not right and I would manually check 1 from each season of a series.

Then the optional of getting rid of the old version of the file

Final Version

Thanks to alpha tester Arturo I was able to finish this project for many more edge cases.

6KB
Open

Notes

  • Have to have series names for consideration because that is the folder it is placed into

Last updated

Was this helpful?