I have so many files on my computer. Like so many files. I noticed that a majority of files were pictures or videos. Most videos I don't want to compress because that would require rewriting the whole file and a lot of time. I was like let me just do photos and see how that goes.
Overview
My objective is to create a program that can compress the image, and make the hard decision of lossless and lossy.
The Difference
When explaining this to someone the first question they ask is, why would you ever want Lossy compression?
To this question as simple as possible, do you always use everything? I will use an example from psychology, do you remember what cars were around you while driving this morning. Most likely not so unlike a computer you omitted information that was not needed. In this case that would be the metadata. I wrote a program to preserve the metadata from one file to another file, but the only use is a backup. Like trying to find the location of a certain photo or the specs of the camera/phone.
Lossy
The difference in the Lossy Program and Lossless is the preservation of metadata and because of the writing variation, <1% pixel differential. So now moving onto the program
Step 1 - Handling metadata
This is the only step that separates my program from other ones. Whenever reading the file, I stored the program data into a different file so it can be seen later. I am going to leave that code out of this because in my opinion, it isn't that big of a deal.
Exporting Metadata
As I would try and export it from one photo to another that never worked so I came up with a solution. Exporting the metadata to a JSON file.
from PIL import Image
from PIL.ExifTags import TAGS
import piexif
filename = "Test (3).jpg"
new_file = "img\\Test (3)s.jpg"
# Transfers the Exif
def getOn(filename, new_file):
im = Image.open(filename)
exif_dict = piexif.load(im.info["exif"])
exif_bytes = piexif.dump(exif_dict)
im.save(new_file, "jpeg", exif=exif_bytes)
# Returns Exif Data (Used for printing)
def get_exif(fn):
ret = {}
i = Image.open(fn)
info = i._getexif()
for tag, value in info.items():
decoded = TAGS.get(tag, tag)
ret[decoded] = value
return ret
getOn("0 e.jpg", "img\\0e.jpg")
Step 2 - User Interface
Giving options was the best idea because sometimes I wanted to do everything and other times just 1 folder so that is where I get the men options Idea.
import ntpath
import os
from PIL import Image
import piexif
# Author: MasterWard
def main():
print("Welcome to cleaning JPG Files episode 10")
print("How much are we cleaning")
print("1. One File") # Tested. Works
print("2. Multiple Files") # Tested. Fragile but works
print("3. One Directory") # Not Tested
print("4. Multiple Directories") # Not Tested
print("5. One Directory and subdirectories") # Tested. Works
print("6. Multiple Directories and subdirectories") # Semi works. Leaves out some
print("WARNING: Don't Try anything stupid. This will include the following")
print("- Try number 6 and expect results")
print("- Enter full file paths put in a folder without files")
print("- Lastly anything I wouldn't do")
co = input()
if(co == '1'):
justOne();
elif(co == '2'):
justTwo();
elif(co == '3'):
justThree();
elif(co == '4'):
justFour();
elif(co == '5'):
justFive();
elif(co == '6'):
justSix();
else:
print("That isn't an option. Goodbye")
input("Press any button to end the program...")
First up are the helper functions that are called through the program multiple times which made it easier to read. These are simple interface responses and redundant tasks. In comments describe what each one does.
#===============================================================================
# Test for positive words
#===============================================================================
def yesNo(cases):
valid = {'yes', 'YES', 'true'}
for v in valid:
if v == cases:
return True;
return False;
#===============================================================================
# Tests for valid file extensions
#===============================================================================
def validPhoto(filename):
valid = {'jpg', 'JPG', 'jpeg', 'JPEG'}
for v in valid:
if v in filename:
return True;
return False;
#===============================================================================
# Transfer/Clean the EXif of a file
#===============================================================================
def getOn(filename, new_file):
im = Image.open(filename)
try:
exif_dict = piexif.load(im.info["exif"])
exif_bytes = piexif.dump(exif_dict)
im.save(new_file, "jpeg", exif=exif_bytes)
except:
print('shit')
#===============================================================================
# Strips string to file name
#===============================================================================
def stripFileName(path):
head, tail = ntpath.split(path)
return tail or ntpath.basename(head)
#===============================================================================
# Returns an array with all the files that are valid in the directory
# and its sub directories
#===============================================================================
def allThePaths(mypath):
f = []
for (dirName, subdirlist, fileList) in os.walk(mypath):
for fname in fileList:
if validPhoto(fname):
f.append(dirName + "\\" + fname)
return f
#===============================================================================
# Like allThePaths. Only returns files from the directory (no sub directories)
#===============================================================================
def shallowPath(mypath):
f = []
for (dirName, subdirlist, fileList) in os.walk(mypath):
for fname in fileList:
if validPhoto(fname):
f.append(dirName + "\\" + fname)
break;
return f
Then we have each function. I could split them up, but feel the comments make it valid for the explanations
#===============================================================================
# Implements cleaning up just one file and replacing it or making a copy
#===============================================================================
def justOne():
print("Awesome Just one file")
filename = input("Enter the file path: ")
if not validPhoto(filename):
print("Screwed up you have")
print("Can't do anything to help you.")
print("Goodbye")
exit()
replace = input("Are you going to be replacing the file? ")
replacing = yesNo(replace);
new_file = "";
if not replacing:
new_file = input("Input the file path ")
if not os.path.exists(new_file):
os.mkdir(new_file)
new_file += "\\" + filename
else:
print("Awesome. One Conversion coming right up")
new_file = filename
# Cleans file
print("Cleaning " + filename)
getOn(filename, new_file)
print("Done.")
print("Have a good day")
#===============================================================================
# Implements cleaning up on multiple files but not directories
#===============================================================================
def justTwo():
print("Ok a couple of files not something to hard")
# Inputting file paths
filenames = []
fileMore = ''
print("Enter in the file name and when you're done enter \"done\"")
while(not fileMore == 'done'):
fileMore = input();
if os.path.isfile(fileMore): # If valid File
print(fileMore + " has been added")
filenames += fileMore
elif os.path.isdir(fileMore): # If is a directory
print("That doesn't work because that is a directory")
elif fileMore != 'done':
print(fileMore + " does not exist")
print("Maybe there is a typo. Try again")
# Creating Target Destination
replace = input("Are you going to be replacing the file? ")
replacing = yesNo(replace);
newPath = "";
if not replacing:
newPath = input("Input the new folder leaf (Folder copying place) ")
if not os.path.exists(newPath):
os.mkdir(newPath)
else:
print("Glad to hear that. Less work for me")
# Cleans Files
print("Now it is time for me to get to work")
con = 0;
for fil in filenames:
print('Cleaning ' + os.path.basename(fil))
con += 1
if replacing:
getOn(fil, fil)
else:
getOn(fil, newPath + "\\" + fil)
print("All done with that\nHave a good day with your " + str(con) + " Cleaned files")
#===============================================================================
# Implements cleaning up on multiple files only in 1 directory (No SubDir)
#===============================================================================
def justThree():
print("One directory. Piece of cake after you answer some questions")
valdir = False # Valid directory input
filename = "";
while not valdir:
filename = input("Input the directory name: ")
if ntpath.isdir(filename):
valdir = True
else:
print("That is not an valid directory. Try again")
# Creating Target Destination
replace = input("Are you going to be replacing the file? ")
replacing = yesNo(replace);
newPath = "";
if not replacing:
newPath = input("What are we going to call this new folder destination ")
if not os.path.exists(newPath):
os.mkdir(newPath)
else:
print("What a joy. Just replacing")
# Gets files in the directory
filenames = shallowPath(filename);
# Cleans Files
print("Now it is time for me to get to work")
con = 0;
for fil in filenames:
print('Cleaning ' + os.path.basename(fil))
con += 1
if replacing:
getOn(fil, fil)
else:
getOn(fil, newPath + "\\" + (stripFileName(fil)))
print("Ha. I finished.\nHave a good day with your " + str(con) + " Cleaned files")
#===============================================================================
# Implements cleaning up multiple files from multiple directories (No subdir)
#===============================================================================
def justFour():
print("More than one directory. Really putting the program to the test")
# Inputting directory paths
dirnames = []
dirMore = ''
print("Enter in the directory name and when you're done enter \"done\"")
while(not dirMore == 'done'):
dirMore = input();
if os.path.isdir(dirMore): # If valid directory
print(dirMore + " has been added")
dirnames += dirMore
elif os.path.isfile(dirMore): # If is a file
print("That doesn't work because that is a file")
elif dirMore != 'done':
print(dirMore + " does not exist")
print("You Done Goof. Try again")
# Creating Target Destination
replace = input("Are you going to be replacing the file? ")
replacing = yesNo(replace);
newPath = "";
if not replacing:
newPath = input("What do you call this directory? ")
if not os.path.exists(newPath):
os.mkdir(newPath)
else:
print("Splendid. Just a replacement")
# Gets files in the Directories
filenames = []
for x in dirnames:
filenames += shallowPath(x)
# Cleans Files
print("We got all the data. Let us now begin")
con = 0;
for fil in filenames:
print('Cleaning ' + os.path.basename(fil))
con += 1
if replacing:
getOn(fil, fil)
else:
getOn(fil, newPath + "\\" + (stripFileName(fil)))
print("That wasn't too hard.\nHave a good day with your " + str(con) + " Cleaned files")
#===============================================================================
# Implements cleaning up on multiple files in 1 directory (and SubDirs)
#===============================================================================
def justFive():
print("One directory and their children. That is just great")
valdir = False # Valid directory input
filename = "";
while not valdir:
filename = input("Input the directory name: ")
if ntpath.isdir(filename):
valdir = True
else:
print("That is not an valid directory. Try again")
# Creating Target Destination
replace = input("Are you going to be replacing the file? ")
replacing = yesNo(replace);
newPath = "";
if not replacing:
newPath = input("What is the name of this destination folder ")
if not os.path.exists(newPath):
os.mkdir(newPath)
else:
print("Perfect. Just replacing")
# Gets files in the directory
filenames = allThePaths(filename);
# Cleans Files
print("Now lets get down to the good stuff")
con = 0;
for fil in filenames:
print('Cleaning ' + os.path.basename(fil))
con += 1
if replacing:
getOn(fil, fil)
else:
getOn(fil, newPath + "\\" + (stripFileName(fil)))
print("I have finished.\nHave a good day with your " + str(con) + " Cleaned files")
#===============================================================================
# Implements cleaning up multiple files from multiple directories (No subdir)
#===============================================================================
def justSix():
print("You want the Most. This is the max capability of the program")
# Inputting directory paths
dirnames = []
dirMore = ''
print("Enter in the directory name and when you're done enter \"done\"")
while(not dirMore == 'done'):
dirMore = input();
if os.path.isdir(dirMore): # If valid directory
print(dirMore + " has been added")
dirnames += dirMore
elif os.path.isfile(dirMore): # If is a file
print("That doesn't work because that is a file")
elif dirMore != 'done':
print(dirMore + " does not exist")
print("You Screwed up. Try again")
# Creating Target Destination
replace = input("Are you going to be replacing the file? ")
replacing = yesNo(replace);
newPath = "";
if not replacing:
newPath = input("What do you call this destination directory? ")
if not os.path.exists(newPath):
os.mkdir(newPath)
else:
print("Zipping Zebras. Just a replacement")
# Gets files in the Directories
filenames = []
for x in dirnames:
filenames += allThePaths(x)
# Cleans Files
print("Galloping Gallardo. We better get started")
con = 0
for fil in filenames:
print('Cleaning ' + os.path.basename(fil))
con += 1
if replacing:
getOn(fil, fil)
else:
getOn(fil, newPath + "\\" + (stripFileName(fil)))
print("DAMN THAT WAS HARD.\nAt least we are done and cleaned " + str(con) + " files")
main()
Step 3 - Verification
Of course, at first, I would try and create my own version of it that was brute force which works... But is incredibly slow. Originally this code was for finding duplicate images...
# Author MasterWard
from Old import *
from datetime import datetime
from send2trash import send2trash
from walkpath import path
import Image
# Imports all the images
# Using a 100% Check checking every pixel
file = open("testfile.txt", "w")
def nextNodes(n):
temp = True
while(temp):
if n == None:
temp = False
else:
n = n.next
if n == None:
temp = False
else:
file.write(str(n.data) + "\n")
# Test that the files are valid and compatbile
def same(a , b):
if(".JPG" in a or ".jpg" in a):
return ".JPG" in b or ".jpg" in b
elif(".jpeg" in a or ".JPEG" in a):
return ".jpeg" in b or ".JPEG" in b
elif(".png" in a or ".PNG" in a):
return ".png" in b or ".PNG" in b
else:
print("That is not a valid format")
return False
def main():
ar = path("D:\\Photos Test\\Jacob iPhone 6 JPG 1")
print(datetime.now().time())
print(len(ar))
ll = LinkedList()
x = 0
cons = 0;
while(len(ar) > 0):
ia = Image.open(ar[x])
ll2 = LinkedList()
ll2.add(ar[x], None)
ar.remove(ar[x])
iwa, iha = ia.size
y = 0
while(y < len(ar)):
if(same(str(ar[x]), str(ar[y]))):
ib = Image.open(ar[y])
iwb, ihb = ib.size
y += 1
ll.add(ll2, None)
cons += 1
while(len(ar) > 0):
ll2 = LinkedList()
ll2.add(ar[0], None)
ar.remove(ar[0])
ll.add(ll2, None)
temper = ll.head
while(temper != None):
temper = nextNodes(temper)
file.close()
print(cons)
print(datetime.now().time())
main()
I found someone else's code to put a comparison on what the similarity of the images was. Of course, this was nothing compared to the human eye which was the deciding factor but knowing the compression rate was very useful.
# MIT License
#
# Copyright (c) 2016 Jonas Hahn <jonas.hahn@datenhahn.de>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
"""
=== imagecompare ===
This little tool compares two images using pillow's ImageChops and then converts the differnce to
black/white and sums up all found differences by summing up the histogram values of the difference
pixels.
Taking the difference between a black and a white image of the same size as base a percentage value
is calculated.
Check the tests to see example diffs for different scenarios. Don't expect the diff of two jpg images be
the same for the same images converted to png. Don't do interformat compares (e.g. JPG with PNG).
Usage:
== compare images ==
same = is_equal("image_a.jpg", "image_b.jpg")
# use the tolerance parameter to allow a certain diff pass as same
same = is_equal("image_a.jpg", "image_b.jpg", tolerance=2.5)
== get the diff percentage ==
percentage = image_diff_percent("image_a.jpg", "image_b.jpg")
# or work directly with pillow image instances
image_a = Image.open("image_a.jpg")
image_b = Image.open("image_b.jpg")
percentage = image_diff_percent(image_a, image_b)
"""
from PIL import Image
from PIL import ImageChops
class ImageCompareException(Exception):
"""
Custom Exception class for imagecompare's exceptions.
"""
pass
def pixel_diff(image_a, image_b):
"""
Calculates a black/white image containing all differences between the two input images.
:param image_a: input image A
:param image_b: input image B
:return: a black/white image containing the differences between A and B
"""
if image_a.size != image_b.size:
raise ImageCompareException(
"different image sizes, can only compare same size images: A=" + str(image_a.size) + " B=" + str(
image_b.size))
if image_a.mode != image_b.mode:
raise ImageCompareException(
"different image mode, can only compare same mode images: A=" + str(image_a.mode) + " B=" + str(
image_b.mode))
diff = ImageChops.difference(image_a, image_b)
diff = diff.convert('L')
return diff
def total_histogram_diff(pixel_diff):
"""
Sums up all histogram values of an image. When used with the black/white pixel-diff image
this gives the difference "score" of an image.
:param pixel_diff: the black/white image containing all differences (output of imagecompare.pixel_diff function)
:return: the total "score" of histogram values (histogram values of found differences)
"""
return sum(i * n for i, n in enumerate(pixel_diff.histogram()))
def image_diff(image_a, image_b):
"""
Calculates the total difference "score" of two images. (see imagecompare.total_histogram_diff).
:param image_a: input image A
:param image_b: input image A
:return: the total difference "score" between two images
"""
histogram_diff = total_histogram_diff(pixel_diff(image_a, image_b))
return histogram_diff
def is_equal(image_a, image_b, tolerance=0.0):
"""
Compares two image for equalness. By specifying a tolerance a certain diff can
be allowed to pass as True.
:param image_a: input image A
:param image_b: input image B
:param tolerance: allow up to (including) a certain percentage of diff pass as True
:return: True if the images are the same, false if they differ
"""
return image_diff_percent(image_a, image_b) <= tolerance
def image_diff_percent(image_a, image_b):
"""
Calculate the difference between two images in percent.
:param image_a: input image A
:param image_b: input image B
:return: the difference between the images A and B as percentage
"""
# if paths instead of image instances where passed in
# load the images
if isinstance(image_a, str):
image_a = Image.open(image_a)
if isinstance(image_b, str):
image_b = Image.open(image_b)
# first determine difference of input images
input_images_histogram_diff = image_diff(image_a, image_b)
# to get the worst possible difference use a black and a white image
# of the same size and diff them
black_reference_image = Image.new('RGB', image_a.size, (0, 0, 0))
white_reference_image = Image.new('RGB', image_a.size, (255, 255, 255))
worst_bw_diff = image_diff(black_reference_image, white_reference_image)
percentage_histogram_diff = (input_images_histogram_diff / float(worst_bw_diff)) * 100
return percentage_histogram_diff
Overall
It was a good program and overall has proven to be vitally useful in many situations. 50% compression is insane when you think about it when dealing with compression. Like zipping up an image and you get maybe 90%. In the future, I would hopefully do PNG compression as well.
Drawbacks
It was not smooth sailing the whole time.
I wanted to get the metadata transferred over but that just never worked.
The program can only do JPEG/JPG files which are very limited.
Trying to transfer XMP data was a complete fail and wasted at least 2 hours
Future ideas
Try different file formats
Print cleanly the results and not just number validity
Alternative
I found someone else with a compression that does both jpg and png files and thought it was worth mentioning.