I have so many files on my computer. Like so many files. I noticed that a majority of files were pictures or videos. Most videos I don't want to compress because that would require rewriting the whole file and a lot of time. I was like let me just do photos and see how that goes.
Overview
My objective is to create a program that can compress the image, and make the hard decision of lossless and lossy.
The Difference
When explaining this to someone the first question they ask is, why would you ever want Lossy compression?
To this question as simple as possible, do you always use everything? I will use an example from psychology, do you remember what cars were around you while driving this morning. Most likely not so unlike a computer you omitted information that was not needed. In this case that would be the metadata. I wrote a program to preserve the metadata from one file to another file, but the only use is a backup. Like trying to find the location of a certain photo or the specs of the camera/phone.
Lossy
The difference in the Lossy Program and Lossless is the preservation of metadata and because of the writing variation, <1% pixel differential. So now moving onto the program
Step 1 - Handling metadata
This is the only step that separates my program from other ones. Whenever reading the file, I stored the program data into a different file so it can be seen later. I am going to leave that code out of this because in my opinion, it isn't that big of a deal.
Exporting Metadata
As I would try and export it from one photo to another that never worked so I came up with a solution. Exporting the metadata to a JSON file.
from PIL import Imagefrom PIL.ExifTags import TAGSimport piexiffilename ="Test (3).jpg"new_file ="img\\Test (3)s.jpg"# Transfers the ExifdefgetOn(filename,new_file): im = Image.open(filename) exif_dict = piexif.load(im.info["exif"]) exif_bytes = piexif.dump(exif_dict) im.save(new_file, "jpeg", exif=exif_bytes)# Returns Exif Data (Used for printing)defget_exif(fn): ret ={} i = Image.open(fn) info = i._getexif()for tag, value in info.items(): decoded = TAGS.get(tag, tag) ret[decoded]= valuereturn retgetOn("0 e.jpg", "img\\0e.jpg")
Step 2 - User Interface
Giving options was the best idea because sometimes I wanted to do everything and other times just 1 folder so that is where I get the men options Idea.
import ntpathimport osfrom PIL import Imageimport piexif# Author: MasterWarddefmain():print("Welcome to cleaning JPG Files episode 10")print("How much are we cleaning")print("1. One File")# Tested. Worksprint("2. Multiple Files")# Tested. Fragile but worksprint("3. One Directory")# Not Testedprint("4. Multiple Directories")# Not Testedprint("5. One Directory and subdirectories")# Tested. Worksprint("6. Multiple Directories and subdirectories")# Semi works. Leaves out someprint("WARNING: Don't Try anything stupid. This will include the following")print("- Try number 6 and expect results")print("- Enter full file paths put in a folder without files")print("- Lastly anything I wouldn't do") co =input()if(co =='1'):justOne();elif(co =='2'):justTwo();elif(co =='3'):justThree();elif(co =='4'):justFour();elif(co =='5'):justFive();elif(co =='6'):justSix();else:print("That isn't an option. Goodbye")input("Press any button to end the program...")
First up are the helper functions that are called through the program multiple times which made it easier to read. These are simple interface responses and redundant tasks. In comments describe what each one does.
#===============================================================================# Test for positive words#===============================================================================defyesNo(cases): valid ={'yes','YES','true'}for v in valid:if v == cases:returnTrue;returnFalse;#===============================================================================# Tests for valid file extensions#===============================================================================defvalidPhoto(filename): valid ={'jpg','JPG','jpeg','JPEG'}for v in valid:if v in filename:returnTrue;returnFalse; #===============================================================================# Transfer/Clean the EXif of a file#===============================================================================defgetOn(filename,new_file): im = Image.open(filename)try: exif_dict = piexif.load(im.info["exif"]) exif_bytes = piexif.dump(exif_dict) im.save(new_file, "jpeg", exif=exif_bytes)except:print('shit')#===============================================================================# Strips string to file name#===============================================================================defstripFileName(path): head, tail = ntpath.split(path)return tail or ntpath.basename(head)#===============================================================================# Returns an array with all the files that are valid in the directory# and its sub directories#===============================================================================defallThePaths(mypath): f = []for (dirName, subdirlist, fileList) in os.walk(mypath):for fname in fileList:ifvalidPhoto(fname): f.append(dirName +"\\"+ fname)return f#===============================================================================# Like allThePaths. Only returns files from the directory (no sub directories)#===============================================================================defshallowPath(mypath): f = []for (dirName, subdirlist, fileList) in os.walk(mypath):for fname in fileList:ifvalidPhoto(fname): f.append(dirName +"\\"+ fname)break;return f
Then we have each function. I could split them up, but feel the comments make it valid for the explanations
#===============================================================================# Implements cleaning up just one file and replacing it or making a copy#===============================================================================defjustOne():print("Awesome Just one file") filename =input("Enter the file path: ")ifnotvalidPhoto(filename):print("Screwed up you have")print("Can't do anything to help you.")print("Goodbye")exit() replace =input("Are you going to be replacing the file? ") replacing =yesNo(replace); new_file ="";ifnot replacing: new_file =input("Input the file path ")ifnot os.path.exists(new_file): os.mkdir(new_file) new_file +="\\"+ filenameelse:print("Awesome. One Conversion coming right up") new_file = filename# Cleans fileprint("Cleaning "+ filename)getOn(filename, new_file)print("Done.")print("Have a good day")#===============================================================================# Implements cleaning up on multiple files but not directories#===============================================================================defjustTwo():print("Ok a couple of files not something to hard")# Inputting file paths filenames = [] fileMore =''print("Enter in the file name and when you're done enter \"done\"")while(not fileMore =='done'): fileMore =input();if os.path.isfile(fileMore):# If valid Fileprint(fileMore +" has been added") filenames += fileMoreelif os.path.isdir(fileMore):# If is a directoryprint("That doesn't work because that is a directory")elif fileMore !='done':print(fileMore +" does not exist")print("Maybe there is a typo. Try again")# Creating Target Destination replace =input("Are you going to be replacing the file? ") replacing =yesNo(replace); newPath ="";ifnot replacing: newPath =input("Input the new folder leaf (Folder copying place) ")ifnot os.path.exists(newPath): os.mkdir(newPath)else:print("Glad to hear that. Less work for me")# Cleans Filesprint("Now it is time for me to get to work") con =0;for fil in filenames:print('Cleaning '+ os.path.basename(fil)) con +=1if replacing:getOn(fil, fil)else:getOn(fil, newPath +"\\"+ fil)print("All done with that\nHave a good day with your "+str(con) +" Cleaned files")#===============================================================================# Implements cleaning up on multiple files only in 1 directory (No SubDir)#===============================================================================defjustThree():print("One directory. Piece of cake after you answer some questions") valdir =False# Valid directory input filename ="";whilenot valdir: filename =input("Input the directory name: ")if ntpath.isdir(filename): valdir =Trueelse:print("That is not an valid directory. Try again")# Creating Target Destination replace =input("Are you going to be replacing the file? ") replacing =yesNo(replace); newPath ="";ifnot replacing: newPath =input("What are we going to call this new folder destination ")ifnot os.path.exists(newPath): os.mkdir(newPath)else:print("What a joy. Just replacing")# Gets files in the directory filenames =shallowPath(filename);# Cleans Filesprint("Now it is time for me to get to work") con =0;for fil in filenames:print('Cleaning '+ os.path.basename(fil)) con +=1if replacing:getOn(fil, fil)else:getOn(fil, newPath +"\\"+ (stripFileName(fil)))print("Ha. I finished.\nHave a good day with your "+str(con) +" Cleaned files")#===============================================================================# Implements cleaning up multiple files from multiple directories (No subdir)#===============================================================================defjustFour():print("More than one directory. Really putting the program to the test")# Inputting directory paths dirnames = [] dirMore =''print("Enter in the directory name and when you're done enter \"done\"")while(not dirMore =='done'): dirMore =input();if os.path.isdir(dirMore):# If valid directoryprint(dirMore +" has been added") dirnames += dirMoreelif os.path.isfile(dirMore):# If is a fileprint("That doesn't work because that is a file")elif dirMore !='done':print(dirMore +" does not exist")print("You Done Goof. Try again")# Creating Target Destination replace =input("Are you going to be replacing the file? ") replacing =yesNo(replace); newPath ="";ifnot replacing: newPath =input("What do you call this directory? ")ifnot os.path.exists(newPath): os.mkdir(newPath)else:print("Splendid. Just a replacement")# Gets files in the Directories filenames = []for x in dirnames: filenames +=shallowPath(x)# Cleans Filesprint("We got all the data. Let us now begin") con =0;for fil in filenames:print('Cleaning '+ os.path.basename(fil)) con +=1if replacing:getOn(fil, fil)else:getOn(fil, newPath +"\\"+ (stripFileName(fil)))print("That wasn't too hard.\nHave a good day with your "+str(con) +" Cleaned files")#===============================================================================# Implements cleaning up on multiple files in 1 directory (and SubDirs)#===============================================================================defjustFive():print("One directory and their children. That is just great") valdir =False# Valid directory input filename ="";whilenot valdir: filename =input("Input the directory name: ")if ntpath.isdir(filename): valdir =Trueelse:print("That is not an valid directory. Try again")# Creating Target Destination replace =input("Are you going to be replacing the file? ") replacing =yesNo(replace); newPath ="";ifnot replacing: newPath =input("What is the name of this destination folder ")ifnot os.path.exists(newPath): os.mkdir(newPath)else:print("Perfect. Just replacing")# Gets files in the directory filenames =allThePaths(filename);# Cleans Filesprint("Now lets get down to the good stuff") con =0;for fil in filenames:print('Cleaning '+ os.path.basename(fil)) con +=1if replacing:getOn(fil, fil)else:getOn(fil, newPath +"\\"+ (stripFileName(fil)))print("I have finished.\nHave a good day with your "+str(con) +" Cleaned files")#===============================================================================# Implements cleaning up multiple files from multiple directories (No subdir)#===============================================================================defjustSix():print("You want the Most. This is the max capability of the program")# Inputting directory paths dirnames = [] dirMore =''print("Enter in the directory name and when you're done enter \"done\"")while(not dirMore =='done'): dirMore =input();if os.path.isdir(dirMore):# If valid directoryprint(dirMore +" has been added") dirnames += dirMoreelif os.path.isfile(dirMore):# If is a fileprint("That doesn't work because that is a file")elif dirMore !='done':print(dirMore +" does not exist")print("You Screwed up. Try again")# Creating Target Destination replace =input("Are you going to be replacing the file? ") replacing =yesNo(replace); newPath ="";ifnot replacing: newPath =input("What do you call this destination directory? ")ifnot os.path.exists(newPath): os.mkdir(newPath)else:print("Zipping Zebras. Just a replacement")# Gets files in the Directories filenames = []for x in dirnames: filenames +=allThePaths(x)# Cleans Filesprint("Galloping Gallardo. We better get started") con =0for fil in filenames:print('Cleaning '+ os.path.basename(fil)) con +=1if replacing:getOn(fil, fil)else:getOn(fil, newPath +"\\"+ (stripFileName(fil)))print("DAMN THAT WAS HARD.\nAt least we are done and cleaned "+str(con) +" files")main()
Step 3 - Verification
Of course, at first, I would try and create my own version of it that was brute force which works... But is incredibly slow. Originally this code was for finding duplicate images...
# Author MasterWardfrom Old import*from datetime import datetimefrom send2trash import send2trashfrom walkpath import pathimport Image# Imports all the images# Using a 100% Check checking every pixelfile =open("testfile.txt", "w")defnextNodes(n): temp =Truewhile(temp):if n ==None: temp =Falseelse: n = n.nextif n ==None: temp =Falseelse: file.write(str(n.data) +"\n")# Test that the files are valid and compatbiledefsame(a,b):if(".JPG"in a or".jpg"in a):return".JPG"in b or".jpg"in belif(".jpeg"in a or".JPEG"in a):return".jpeg"in b or".JPEG"in belif(".png"in a or".PNG"in a):return".png"in b or".PNG"in belse:print("That is not a valid format")returnFalsedefmain(): ar =path("D:\\Photos Test\\Jacob iPhone 6 JPG 1")print(datetime.now().time())print(len(ar)) ll =LinkedList() x =0 cons =0;while(len(ar)>0): ia = Image.open(ar[x]) ll2 =LinkedList() ll2.add(ar[x], None) ar.remove(ar[x]) iwa, iha = ia.size y =0while(y <len(ar)):if(same(str(ar[x]), str(ar[y]))): ib = Image.open(ar[y]) iwb, ihb = ib.size y +=1 ll.add(ll2, None) cons +=1while(len(ar)>0): ll2 =LinkedList() ll2.add(ar[0], None) ar.remove(ar[0]) ll.add(ll2, None) temper = ll.headwhile(temper !=None): temper =nextNodes(temper) file.close()print(cons)print(datetime.now().time())main()
I found someone else's code to put a comparison on what the similarity of the images was. Of course, this was nothing compared to the human eye which was the deciding factor but knowing the compression rate was very useful.
# MIT License## Copyright (c) 2016 Jonas Hahn <jonas.hahn@datenhahn.de>## Permission is hereby granted, free of charge, to any person obtaining a copy# of this software and associated documentation files (the "Software"), to deal# in the Software without restriction, including without limitation the rights# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell# copies of the Software, and to permit persons to whom the Software is# furnished to do so, subject to the following conditions:## The above copyright notice and this permission notice shall be included in all# copies or substantial portions of the Software.## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE# SOFTWARE.""" === imagecompare === This little tool compares two images using pillow's ImageChops and then converts the differnce to black/white and sums up all found differences by summing up the histogram values of the difference pixels. Taking the difference between a black and a white image of the same size as base a percentage value is calculated. Check the tests to see example diffs for different scenarios. Don't expect the diff of two jpg images be the same for the same images converted to png. Don't do interformat compares (e.g. JPG with PNG). Usage: == compare images == same = is_equal("image_a.jpg", "image_b.jpg") # use the tolerance parameter to allow a certain diff pass as same same = is_equal("image_a.jpg", "image_b.jpg", tolerance=2.5) == get the diff percentage == percentage = image_diff_percent("image_a.jpg", "image_b.jpg") # or work directly with pillow image instances image_a = Image.open("image_a.jpg") image_b = Image.open("image_b.jpg") percentage = image_diff_percent(image_a, image_b)"""from PIL import Imagefrom PIL import ImageChopsclassImageCompareException(Exception):""" Custom Exception class for imagecompare's exceptions. """passdefpixel_diff(image_a,image_b):""" Calculates a black/white image containing all differences between the two input images. :param image_a: input image A :param image_b: input image B :return: a black/white image containing the differences between A and B """if image_a.size != image_b.size:raiseImageCompareException("different image sizes, can only compare same size images: A="+str(image_a.size) +" B="+str( image_b.size))if image_a.mode != image_b.mode:raiseImageCompareException("different image mode, can only compare same mode images: A="+str(image_a.mode) +" B="+str( image_b.mode)) diff = ImageChops.difference(image_a, image_b) diff = diff.convert('L')return diffdeftotal_histogram_diff(pixel_diff):""" Sums up all histogram values of an image. When used with the black/white pixel-diff image this gives the difference "score" of an image. :param pixel_diff: the black/white image containing all differences (output of imagecompare.pixel_diff function) :return: the total "score" of histogram values (histogram values of found differences) """returnsum(i * n for i, n inenumerate(pixel_diff.histogram()))defimage_diff(image_a,image_b):""" Calculates the total difference "score" of two images. (see imagecompare.total_histogram_diff). :param image_a: input image A :param image_b: input image A :return: the total difference "score" between two images """ histogram_diff =total_histogram_diff(pixel_diff(image_a, image_b))return histogram_diffdefis_equal(image_a,image_b,tolerance=0.0):""" Compares two image for equalness. By specifying a tolerance a certain diff can be allowed to pass as True. :param image_a: input image A :param image_b: input image B :param tolerance: allow up to (including) a certain percentage of diff pass as True :return: True if the images are the same, false if they differ """returnimage_diff_percent(image_a, image_b)<= tolerancedefimage_diff_percent(image_a,image_b):""" Calculate the difference between two images in percent. :param image_a: input image A :param image_b: input image B :return: the difference between the images A and B as percentage """# if paths instead of image instances where passed in# load the imagesifisinstance(image_a, str): image_a = Image.open(image_a)ifisinstance(image_b, str): image_b = Image.open(image_b)# first determine difference of input images input_images_histogram_diff =image_diff(image_a, image_b)# to get the worst possible difference use a black and a white image# of the same size and diff them black_reference_image = Image.new('RGB', image_a.size, (0, 0, 0)) white_reference_image = Image.new('RGB', image_a.size, (255, 255, 255)) worst_bw_diff =image_diff(black_reference_image, white_reference_image) percentage_histogram_diff = (input_images_histogram_diff /float(worst_bw_diff)) *100return percentage_histogram_diff
Overall
It was a good program and overall has proven to be vitally useful in many situations. 50% compression is insane when you think about it when dealing with compression. Like zipping up an image and you get maybe 90%. In the future, I would hopefully do PNG compression as well.
Drawbacks
It was not smooth sailing the whole time.
I wanted to get the metadata transferred over but that just never worked.
The program can only do JPEG/JPG files which are very limited.
Trying to transfer XMP data was a complete fail and wasted at least 2 hours
Future ideas
Try different file formats
Print cleanly the results and not just number validity
Alternative
I found someone else with a compression that does both jpg and png files and thought it was worth mentioning.