Image Compression

Origin

I have so many files on my computer. Like so many files. I noticed that a majority of files were pictures or videos. Most videos I don't want to compress because that would require rewriting the whole file and a lot of time. I was like let me just do photos and see how that goes.

Overview

My objective is to create a program that can compress the image, and make the hard decision of lossless and lossy.

The Difference

When explaining this to someone the first question they ask is, why would you ever want Lossy compression? To this question as simple as possible, do you always use everything? I will use an example from psychology, do you remember what cars were around you while driving this morning. Most likely not so unlike a computer you omitted information that was not needed. In this case that would be the metadata. I wrote a program to preserve the metadata from one file to another file, but the only use is a backup. Like trying to find the location of a certain photo or the specs of the camera/phone.

Lossy

The difference in the Lossy Program and Lossless is the preservation of metadata and because of the writing variation, <1% pixel differential. So now moving onto the program

Step 1 - Handling metadata

This is the only step that separates my program from other ones. Whenever reading the file, I stored the program data into a different file so it can be seen later. I am going to leave that code out of this because in my opinion, it isn't that big of a deal.

Exporting Metadata

As I would try and export it from one photo to another that never worked so I came up with a solution. Exporting the metadata to a JSON file.

from PIL import Image
from PIL.ExifTags import TAGS
import piexif
filename = "Test (3).jpg"
new_file = "img\\Test (3)s.jpg"

# Transfers the Exif
def getOn(filename, new_file):
    im = Image.open(filename)
    exif_dict = piexif.load(im.info["exif"])
    exif_bytes = piexif.dump(exif_dict)
    im.save(new_file, "jpeg", exif=exif_bytes)

# Returns Exif Data (Used for printing)
def get_exif(fn):
    ret = {}
    i = Image.open(fn)
    info = i._getexif()
    for tag, value in info.items():
        decoded = TAGS.get(tag, tag)
        ret[decoded] = value
    return ret

getOn("0 e.jpg", "img\\0e.jpg")

Step 2 - User Interface

Giving options was the best idea because sometimes I wanted to do everything and other times just 1 folder so that is where I get the men options Idea.

import ntpath
import os

from PIL import Image
import piexif
# Author: MasterWard

def main():
    print("Welcome to cleaning JPG Files episode 10")
    print("How much are we cleaning")
    print("1. One File")  # Tested. Works
    print("2. Multiple Files")  # Tested. Fragile but works
    print("3. One Directory")  # Not Tested
    print("4. Multiple Directories")  # Not Tested
    print("5. One Directory and subdirectories")  # Tested. Works
    print("6. Multiple Directories and subdirectories")  # Semi works. Leaves out some
    print("WARNING: Don't Try anything stupid. This will include the following")
    print("- Try number 6 and expect results")
    print("- Enter full file paths put in a folder without files")
    print("- Lastly anything I wouldn't do")
          
    co = input()
    if(co == '1'):
        justOne();
    elif(co == '2'):
        justTwo();
    elif(co == '3'):
        justThree();
    elif(co == '4'):
        justFour();
    elif(co == '5'):
        justFive();
    elif(co == '6'):
        justSix();
    else:
        print("That isn't an option. Goodbye")
    input("Press any button to end the program...")

First up are the helper functions that are called through the program multiple times which made it easier to read. These are simple interface responses and redundant tasks. In comments describe what each one does.

#===============================================================================
# Test for positive words
#===============================================================================
def yesNo(cases):
    valid = {'yes', 'YES', 'true'}
    for v in valid:
        if v == cases:
            return True;
    return False;

#===============================================================================
# Tests for valid file extensions
#===============================================================================
def validPhoto(filename):
    valid = {'jpg', 'JPG', 'jpeg', 'JPEG'}
    for v in valid:
        if v in filename:
            return True;
    return False;    

#===============================================================================
# Transfer/Clean the EXif of a file
#===============================================================================
def getOn(filename, new_file):
    im = Image.open(filename)
    try:
        exif_dict = piexif.load(im.info["exif"])
        exif_bytes = piexif.dump(exif_dict)
        im.save(new_file, "jpeg", exif=exif_bytes)
    except:
        print('shit')

#===============================================================================
# Strips string to file name
#===============================================================================
def stripFileName(path):
    head, tail = ntpath.split(path)
    return tail or ntpath.basename(head)


#===============================================================================
# Returns an array with all the files that are valid in the directory
# and its sub directories
#===============================================================================
def allThePaths(mypath):
    f = []
    for (dirName, subdirlist, fileList) in os.walk(mypath):
        for fname in fileList:
            if validPhoto(fname):
                f.append(dirName + "\\" + fname)
    return f

#===============================================================================
# Like allThePaths. Only returns files from the directory (no sub directories)
#===============================================================================
def shallowPath(mypath):
    f = []
    for (dirName, subdirlist, fileList) in os.walk(mypath):
        for fname in fileList:
            if validPhoto(fname):
                f.append(dirName + "\\" + fname)
        break;
    return f

Then we have each function. I could split them up, but feel the comments make it valid for the explanations

#===============================================================================
# Implements cleaning up just one file and replacing it or making a copy
#===============================================================================
def justOne():
    print("Awesome Just one file")
    filename = input("Enter the file path: ")
    if not validPhoto(filename):
        print("Screwed up you have")
        print("Can't do anything to help you.")
        print("Goodbye")
        exit()
    replace = input("Are you going to be replacing the file? ")
    replacing = yesNo(replace);
    new_file = "";
    if not replacing:
        new_file = input("Input the file path ")
        if not os.path.exists(new_file):
            os.mkdir(new_file)
        new_file += "\\" + filename
    else:
        print("Awesome. One Conversion coming right up")
        new_file = filename

    # Cleans file
    print("Cleaning " + filename)
    getOn(filename, new_file)
    print("Done.")
    print("Have a good day")
    
    
#===============================================================================
# Implements cleaning up on multiple files but not directories
#===============================================================================
def justTwo():
    print("Ok a couple of files not something to hard")
    # Inputting file paths
    filenames = []
    fileMore = ''
    print("Enter in the file name and when you're done enter \"done\"")
    while(not fileMore == 'done'):
        fileMore = input();
        if os.path.isfile(fileMore):  # If valid File
            print(fileMore + " has been added")
            filenames += fileMore
        elif os.path.isdir(fileMore):  # If is a directory
                print("That doesn't work because that is a directory")
        elif fileMore != 'done':
            print(fileMore + " does not exist")
            print("Maybe there is a typo. Try again")
    
    # Creating Target Destination
    replace = input("Are you going to be replacing the file? ")
    replacing = yesNo(replace);
    newPath = "";
    if not replacing:
        newPath = input("Input the new folder leaf (Folder copying place) ")
        if not os.path.exists(newPath):
            os.mkdir(newPath)
    else:
        print("Glad to hear that. Less work for me")
        
    # Cleans Files
    print("Now it is time for me to get to work")
    con = 0;
    for fil in filenames:
        print('Cleaning ' + os.path.basename(fil))
        con += 1
        if replacing:
            getOn(fil, fil)
        else:
            getOn(fil, newPath + "\\" + fil)
    print("All done with that\nHave a good day with your " + str(con) + " Cleaned files")


#===============================================================================
# Implements cleaning up on multiple files only in 1 directory (No SubDir)
#===============================================================================
def justThree():
    print("One directory. Piece of cake after you answer some questions")
    valdir = False  # Valid directory input
    filename = "";
    while not valdir:
        filename = input("Input the directory name: ")
        if ntpath.isdir(filename):
            valdir = True
        else:
            print("That is not an valid directory. Try again")
    
    # Creating Target Destination
    replace = input("Are you going to be replacing the file? ")
    replacing = yesNo(replace);
    newPath = "";
    if not replacing:
        newPath = input("What are we going to call this new folder destination ")
        if not os.path.exists(newPath):
            os.mkdir(newPath)
    else:
        print("What a joy. Just replacing")
        
    # Gets files in the directory
    filenames = shallowPath(filename);
    # Cleans Files
    print("Now it is time for me to get to work")
    con = 0;
    for fil in filenames:
        print('Cleaning ' + os.path.basename(fil))
        con += 1
        if replacing:
            getOn(fil, fil)
        else:
            getOn(fil, newPath + "\\" + (stripFileName(fil)))
    print("Ha. I finished.\nHave a good day with your " + str(con) + " Cleaned files")

    
#===============================================================================
# Implements cleaning up multiple files from multiple directories (No subdir)
#===============================================================================
def justFour():
    print("More than one directory. Really putting the program to the test")
    # Inputting directory paths
    dirnames = []
    dirMore = ''
    print("Enter in the directory name and when you're done enter \"done\"")
    while(not dirMore == 'done'):
        dirMore = input();
        if os.path.isdir(dirMore):  # If valid directory
            print(dirMore + " has been added")
            dirnames += dirMore
        elif os.path.isfile(dirMore):  # If is a file
                print("That doesn't work because that is a file")
        elif dirMore != 'done':
            print(dirMore + " does not exist")
            print("You Done Goof. Try again")
    # Creating Target Destination
    replace = input("Are you going to be replacing the file? ")
    replacing = yesNo(replace);
    newPath = "";
    if not replacing:
        newPath = input("What do you call this directory? ")
        if not os.path.exists(newPath):
            os.mkdir(newPath)
    else:
        print("Splendid. Just a replacement")
    # Gets files in the Directories
    filenames = []
    for x in dirnames:
        filenames += shallowPath(x)
    # Cleans Files
    print("We got all the data. Let us now begin")
    con = 0;
    for fil in filenames:
        print('Cleaning ' + os.path.basename(fil))
        con += 1
        if replacing:
            getOn(fil, fil)
        else:
            getOn(fil, newPath + "\\" + (stripFileName(fil)))
    print("That wasn't too hard.\nHave a good day with your " + str(con) + " Cleaned files")


#===============================================================================
# Implements cleaning up on multiple files in 1 directory (and SubDirs)
#===============================================================================
def justFive():
    print("One directory and their children. That is just great")
    valdir = False  # Valid directory input
    filename = "";
    while not valdir:
        filename = input("Input the directory name: ")
        if ntpath.isdir(filename):
            valdir = True
        else:
            print("That is not an valid directory. Try again")
    
    # Creating Target Destination
    replace = input("Are you going to be replacing the file? ")
    replacing = yesNo(replace);
    newPath = "";
    if not replacing:
        newPath = input("What is the name of this destination folder ")
        if not os.path.exists(newPath):
            os.mkdir(newPath)
    else:
        print("Perfect. Just replacing")
        
    # Gets files in the directory
    filenames = allThePaths(filename);
    # Cleans Files
    print("Now lets get down to the good stuff")
    con = 0;
    for fil in filenames:
        print('Cleaning ' + os.path.basename(fil))
        con += 1
        if replacing:
            getOn(fil, fil)
        else:
            getOn(fil, newPath + "\\" + (stripFileName(fil)))
    print("I have finished.\nHave a good day with your " + str(con) + " Cleaned files")


#===============================================================================
# Implements cleaning up multiple files from multiple directories (No subdir)
#===============================================================================
def justSix():
    print("You want the Most. This is the max capability of the program")
    # Inputting directory paths
    dirnames = []
    dirMore = ''
    print("Enter in the directory name and when you're done enter \"done\"")
    while(not dirMore == 'done'):
        dirMore = input();
        if os.path.isdir(dirMore):  # If valid directory
            print(dirMore + " has been added")
            dirnames += dirMore
        elif os.path.isfile(dirMore):  # If is a file
                print("That doesn't work because that is a file")
        elif dirMore != 'done':
            print(dirMore + " does not exist")
            print("You Screwed up. Try again")
    # Creating Target Destination
    replace = input("Are you going to be replacing the file? ")
    replacing = yesNo(replace);
    newPath = "";
    if not replacing:
        newPath = input("What do you call this destination directory? ")
        if not os.path.exists(newPath):
            os.mkdir(newPath)
    else:
        print("Zipping Zebras. Just a replacement")
    # Gets files in the Directories
    filenames = []
    for x in dirnames:
        filenames += allThePaths(x)
    # Cleans Files
    print("Galloping Gallardo. We better get started")
    con = 0
    for fil in filenames:
        print('Cleaning ' + os.path.basename(fil))
        con += 1
        if replacing:
            getOn(fil, fil)
        else:
            getOn(fil, newPath + "\\" + (stripFileName(fil)))
    print("DAMN THAT WAS HARD.\nAt least we are done and cleaned " + str(con) + " files")


main()

Step 3 - Verification

Of course, at first, I would try and create my own version of it that was brute force which works... But is incredibly slow. Originally this code was for finding duplicate images...

# Author MasterWard
from Old import *
from datetime import datetime
from send2trash import send2trash
from walkpath import path
import Image

# Imports all the images
# Using a 100% Check checking every pixel

file = open("testfile.txt", "w")


def nextNodes(n):
    temp = True
    while(temp):
        if n == None:
            temp = False
        else:
            n = n.next
            if n == None:
                temp = False
            else:
                file.write(str(n.data) + "\n")


# Test that the files are valid and compatbile
def same(a , b):
    if(".JPG" in a or ".jpg" in a):
        return ".JPG" in b or ".jpg" in b
    elif(".jpeg" in a or ".JPEG" in a):
        return ".jpeg" in b or ".JPEG" in b
    elif(".png" in a or ".PNG" in a):
        return ".png" in b or ".PNG" in b
    else:
        print("That is not a valid format")
        return False


def main():
    ar = path("D:\\Photos Test\\Jacob iPhone 6 JPG 1")
    print(datetime.now().time())
    print(len(ar))
    ll = LinkedList()
    x = 0
    cons = 0;
    while(len(ar) > 0):
        ia = Image.open(ar[x])
        ll2 = LinkedList()
        ll2.add(ar[x], None)
        ar.remove(ar[x])
        iwa, iha = ia.size
        y = 0
        while(y < len(ar)):
            if(same(str(ar[x]), str(ar[y]))):
                ib = Image.open(ar[y])
                iwb, ihb = ib.size
            
            y += 1
        ll.add(ll2, None)
        cons += 1
    while(len(ar) > 0):
        ll2 = LinkedList()
        ll2.add(ar[0], None)
        ar.remove(ar[0])
        ll.add(ll2, None)
    temper = ll.head
    while(temper != None):
        temper = nextNodes(temper)
    file.close()
    print(cons)
    print(datetime.now().time())
    
main()

I found someone else's code to put a comparison on what the similarity of the images was. Of course, this was nothing compared to the human eye which was the deciding factor but knowing the compression rate was very useful.

# MIT License
#
# Copyright (c) 2016 Jonas Hahn <jonas.hahn@datenhahn.de>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

"""
  === imagecompare ===

  This little tool compares two images using pillow's ImageChops and then converts the differnce to
  black/white and sums up all found differences by summing up the histogram values of the difference
  pixels.

  Taking the difference between a black and a white image of the same size as base a percentage value
  is calculated.

  Check the tests to see example diffs for different scenarios. Don't expect the diff of two jpg images be
  the same for the same images converted to png. Don't do interformat compares (e.g. JPG with PNG).

  Usage:

  == compare images ==

    same = is_equal("image_a.jpg", "image_b.jpg")

    # use the tolerance parameter to allow a certain diff pass as same
    same = is_equal("image_a.jpg", "image_b.jpg", tolerance=2.5)

  == get the diff percentage ==

    percentage = image_diff_percent("image_a.jpg", "image_b.jpg")

    # or work directly with pillow image instances
    image_a = Image.open("image_a.jpg")
    image_b = Image.open("image_b.jpg")
    percentage = image_diff_percent(image_a, image_b)

"""

from PIL import Image
from PIL import ImageChops


class ImageCompareException(Exception):
    """
    Custom Exception class for imagecompare's exceptions.
    """
    pass


def pixel_diff(image_a, image_b):
    """
    Calculates a black/white image containing all differences between the two input images.

    :param image_a: input image A
    :param image_b: input image B
    :return: a black/white image containing the differences between A and B
    """

    if image_a.size != image_b.size:
        raise ImageCompareException(
            "different image sizes, can only compare same size images: A=" + str(image_a.size) + " B=" + str(
                image_b.size))

    if image_a.mode != image_b.mode:
        raise ImageCompareException(
            "different image mode, can only compare same mode images: A=" + str(image_a.mode) + " B=" + str(
                image_b.mode))

    diff = ImageChops.difference(image_a, image_b)
    diff = diff.convert('L')

    return diff


def total_histogram_diff(pixel_diff):
    """
    Sums up all histogram values of an image. When used with the black/white pixel-diff image
    this gives the difference "score" of an image.

    :param pixel_diff: the black/white image containing all differences (output of imagecompare.pixel_diff function)
    :return: the total "score" of histogram values (histogram values of found differences)
    """
    return sum(i * n for i, n in enumerate(pixel_diff.histogram()))


def image_diff(image_a, image_b):
    """
    Calculates the total difference "score" of two images. (see imagecompare.total_histogram_diff).

    :param image_a: input image A
    :param image_b: input image A
    :return: the total difference "score" between two images
    """
    histogram_diff = total_histogram_diff(pixel_diff(image_a, image_b))

    return histogram_diff


def is_equal(image_a, image_b, tolerance=0.0):
    """
    Compares two image for equalness. By specifying a tolerance a certain diff can
    be allowed to pass as True.

    :param image_a: input image A
    :param image_b: input image B
    :param tolerance: allow up to (including) a certain percentage of diff pass as True
    :return: True if the images are the same, false if they differ
    """
    return image_diff_percent(image_a, image_b) <= tolerance


def image_diff_percent(image_a, image_b):
    """
    Calculate the difference between two images in percent.

    :param image_a: input image A
    :param image_b: input image B
    :return: the difference between the images A and B as percentage
    """

    # if paths instead of image instances where passed in
    # load the images
    if isinstance(image_a, str):
        image_a = Image.open(image_a)

    if isinstance(image_b, str):
        image_b = Image.open(image_b)

    # first determine difference of input images
    input_images_histogram_diff = image_diff(image_a, image_b)

    # to get the worst possible difference use a black and a white image
    # of the same size and diff them

    black_reference_image = Image.new('RGB', image_a.size, (0, 0, 0))
    white_reference_image = Image.new('RGB', image_a.size, (255, 255, 255))

    worst_bw_diff = image_diff(black_reference_image, white_reference_image)

    percentage_histogram_diff = (input_images_histogram_diff / float(worst_bw_diff)) * 100

    return percentage_histogram_diff

Overall

It was a good program and overall has proven to be vitally useful in many situations. 50% compression is insane when you think about it when dealing with compression. Like zipping up an image and you get maybe 90%. In the future, I would hopefully do PNG compression as well.

Drawbacks

It was not smooth sailing the whole time. I wanted to get the metadata transferred over but that just never worked. The program can only do JPEG/JPG files which are very limited. Trying to transfer XMP data was a complete fail and wasted at least 2 hours

Future ideas

Try different file formats

Print cleanly the results and not just number validity

Alternative

I found someone else with a compression that does both jpg and png files and thought it was worth mentioning.

GitHub - dhhruv/Pixxia: 🖼 Pixxia uses lossy compression methods to reduce the document size of your JPG/PNG files.GitHub

PreviousTracking Phone NextDo Not Call Database

Last updated 2 years ago

Was this helpful?