Gmail Unsubscriber

A software to help cleanup Gmail current and future state

Origin

I was getting so sick of going through people's emails with 1k+ of emails. I didn't have many myself so I asked the question of how I could make it benefit me... I applied to a lot of jobs and they give a DONOTREPLY email that after a time would not really matter and could cut down my mailbox.

Important Information

The Code for finding and targeting the emails that are spam works. The API that Google used at the time has changed so the code cannot be compiled. There is a different setup on how it is done

First Step: Connecting

Starting out I have to connect to the inbox through Google's Gmail API. I decided to use Gmail because it is the most frequently used in my opinion and many people have asked me to clean up Gmail over any other email type.

from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
import json

# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

def main():
    """Shows basic usage of the Gmail API.
    Lists the user's Gmail labels.
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('gmail', 'v1', credentials=creds)

    # Call the Gmail API
    threadsList = service.users().threads().list(userId='me',includeSpamTrash=False,prettyPrint=True).execute()
    print(threadsList)
    
    results = service.users().labels().list(userId='me').execute()
    labels = results.get('labels', [])

    if not labels:
        print('No labels found.')
    else:
        print('Labels:')
        for label in labels:
            print(label['name'])

if __name__ == '__main__':
    main()

I got all the files and implemented them into it the code. It was fairly simple but this was the last simple thing.

Second Step: Reading messages

Overview

So we got the getting of threads and reading the messages was a lot of information. My best skill is knowing what information was needed and what wasn't. The storing of the whole message would take a lot of memory so I decided to read only the metadata. I then mapped the thread with the specified email. If the email had something suspicious to it I would flag it and read it more. Also, the message contained a default unsubscribe link placed by google that I based it off. Now we have the suspicious list and breaking down that process into the code below.

Reading in all the threads

Reads in the thread and page token to get all the messages threads.

    while moreThreads:
        threadsList = service.users().threads().list(userId='me',includeSpamTrash=False,prettyPrint=True,pageToken=nextPageToken).execute()
        for thread1 in threadsList['threads']:
            megaThreadList.append(thread1['id'])
        if 'nextPageToken' in threadsList:
            nextPageToken = threadsList['nextPageToken']
            print(nextPageToken)
        else:
            moreThreads = False

Thread Determination

Reads in the metadata of message and maps which ones are needed

for ids in megaThreadList:
        metaMessage = service.users().threads().get(userId='me',id=ids,format="metadata").execute()
        payloads = (metaMessage['messages'][0]['payload'])
        head = payloads['headers']
        # Name = List-Unsubscribe
        curEmail = ""
        for pay in head:
            if(pay['name'] == 'From'):
                temp = pay['value']
                ind = -1
                if "<" in temp:
                    ind = temp.index("<")
                if (ind < 0):
                    curEmail = temp
                else:
                    curEmail = temp[ind+1:-1]
                    ar.append(curEmail)
                    y.write(curEmail + "\n")
            if(pay['name'] == 'List-Unsubscribe'):
                temp = pay['value']
                ind = temp.index("<")
                curLink = temp[ind+1:-1]
                x.write(curLink + "\n")
                unsub.append(curLink)

Narrow Subscribers

Narrows down all the emails that were flags into a small list of who the email is subscribed to. This makes it shorter on the unsubscribe process and cuts down on duplicating unsubscribing.

    for a in unsub:
        if "," in a:
            split = a.split(",")
            cleanDom = cleanDomain(split[1])
            if not (cleanDom in subscribeList):
                subscribeList.append(cleanDom)
        else:
            cleanDom = cleanDomain(a)
            if not (cleanDom in subscribeList):
                subscribeList.append(cleanDom)

Stripping function

I was lazy on the regex and it seemed some didn't work so I just coded in the specific methods to do that for me. It was more time-consuming but felt it was easier to read for beginners that would be using the software.

def cleanUrl(url):
    newUrl = url.strip()
    
    if not("mailto" in url): # It is email back
        return newUrl
    else: # It is an actual URL that needs to be cleaned up
        url = newUrl
        newUrl = url.replace("<","").replace(">","")
        return newUrl

def cleanDomain(url):
    newUrl = url.strip().replace("<","").replace("<","")
    startInd = 0
    if ("mailto" in newUrl):
        startInd = newUrl.index(":") # MAYBE +1
        if ("?" in newUrl): # Has a subject line too
            endInd = newUrl.index("?") 
            return newUrl[startInd:endInd]
        return newUrl[startInd:] # Maybe
    elif ("https" in newUrl):
        newUrl = newUrl.replace("https://","")
        startInd = newUrl.index("/")
        return "https://" + (newUrl[0:startInd])
    elif ("http" in newUrl):
        newUrl = newUrl.replace("http://","")
        startInd = newUrl.index("/")
        return "http://" + (newUrl[0:startInd])

Third Step: Validating

Overview

Up to this point, we have a list of invalid entries. The problem is I don't want the user to be sent to a lot of invalid sites. A lot of sites at this point were becoming invalid after a short period of time which became annoying. I had to create some code to check this. Also during this stage, I started to narrow down the valid emails from just random spam fake emails because those pop up a lot. I will refer to them later when I go through the cleanup stage.

Checking Email Type

From networking, I learned there were types of DNS and with reverse, I could get a better understanding of what type I am tracing

def get_records(domain):
    """
    Get all the records associated to domain parameter.
    :param domain: 
    :return: 
    """
    ids = [ 'A','NS','MD','MF','CNAME','SOA','MB','MG', 'MR','MX','AAAA']
    
    for a in ids:
        try:
            answers = dns.resolver.query(domain, a)
            for rdata in answers:
                print(a, ':', rdata.to_text())
                return a
            
        except Exception as e:
            print(e)  # or pass
    return "NA"

Check DNS

It is only logical to be able to get the DNS to write a method for that too.

def checkDNS(url):
    b = open("valid.txt", "a") # For the shorter valid types
    c = open("invalid.txt", "a") # For the shorter invalid types
    ans = get_records(url)
    if (ans == "NA"):
        c.write(url)
    else:
        b.write(url)
    b.close()
    c.close()

Time check

With some good connection, I used what I learned in time-based SQL attacks to determine if the URL worked even if it was a valid format

def validURL(url):
    try:
        response = requests.get(url,timeout=.5)
        return True
#        print("URL is valid and exists on the internet")
    except requests.ConnectionError as exception:
        return False
    except :
        return False
#        print("URL does not exist on Internet")

Fourth Step: Clean up

The code is so messy right now, and no one really wants to read it. It has a few functions but everything is one thing so now it is time for more modifications. I add some classes and clean up the final segments of the code putting together some loose ends.

Messages Class

The messages class is to hold an object with my needed data on each message

class Message:
    thread = ""
    sender = ""
    link  = ""

    def validSender(): # If has these extensions
        accepts = ['com', 'org','net', 'gov', 'edu']
        for a in accepts:
            if a in sender:
                return true

        return false

Site class

Site class is the helper to the other functions linking emails, domain, and senders together.

class Site:
    def __init__(self):
        self.messages = []

    sender = ""
    domainName = ""
    messages = [Message()]

    def getSender(self):
        return self.sender

    def getDomain(self):
        return self.domain

    def addMessage(self, mail):
        self.messages.append(mail)
    
    def getString(self):
        fin = ""
        fin += self.sender + " " + self.domainName + " " + str(len(self.messages))
        return fin

    def getMessageSize(self):
        return len(self.messages)

    def getLink(self):
        return self.messages[0].link

Additional Things

I also cleaned up comments that were made and separated a few things in the main class into functions. The typical things getting closer to the review of code and final release

Fifth Step: User Friendly

So I mean what is a program if it is not easy to use. Not only that the user has to have control. No one would let me just run a program on their computer in places that it could delete critical emails. This can be done in a warning but I just take precautions and put a safer code in.

Unsubscribe control

The user clicks the button so they know what they are unsubscribing from. I set the limit to 5 because a lot more emails could be an issue for the browser.

    readUnsubLinks = open("webSiteFile.txt","r")
    lines = readUnsubLinks.readlines()
    browserCount = 0
    maxBrowsers = 5
    for line in lines:
        if browserCount >= maxBrowsers: # Max tabs to open
            webbrowser.open(line)
        browserCount += 1
    readUnsubLinks.close()

ReadOnly

I found it annoying that the read-only emails were present and after that time no one really needed them as it is a reminder so I decided to put in a variable to get rid of those if the user would like it.

Deleting emails

When it did delete the emails it was doing it into the trash and not permanently in case needed later. Also since I didn't read the trash and it deletes in 30 days, I felt that was the safest way.

Final Conclusion

The program was very useful and worked exactly as planned. I really liked the result and people that used it also liked it. It would take a long time based on emails but also got rid of unneeded so the second and third time ran it would be way better. It was by no means perfect but got the job done for a majority of the need so I would call it a success.

Setbacks

A few setbacks were present along the way. Major ones were when testing and having to delete the first page would change each time and unsubscribe had to choose the best link. I just choose the most recent instead of testing which could have been an issue. Also to delete required another source which I didn't notice at first and that took up some time trying something that was restricted but didn't know why.

Feedback

A few things the testers wanted to be implemented. Some wanted to just delete all the emails and unsubscribe automatically since it was overwhelming and 5 was too little. Email response unsubscribe should have been done but it required another source. Great program and the one-time reply or no reply was something they didn't realize could take up 1/4 to 1/2 of the total message count.

Final Version 1 Code

from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
import json
from tld import get_tld
import requests
import webbrowser
import dns.resolver


class Message:
    thread = ""
    sender = ""
    link  = ""

    def validSender(): # If has these extensions
        accepts = ['com', 'org','net', 'gov', 'edu']
        for a in accepts:
            if a in sender:
                return true

        return false


class Site:
    def __init__(self):
        self.messages = []

    sender = ""
    domainName = ""
    messages = [Message()]

    def getSender(self):
        return self.sender

    def getDomain(self):
        return self.domain

    def addMessage(self, mail):
        self.messages.append(mail)

    def getString(self):
        fin = ""
        fin += self.sender + " " + self.domainName + " " + str(len(self.messages))
        return fin

    def getMessageSize(self):
        return len(self.messages)

    def getLink(self):
        return self.messages[0].link


# com , org , us , edu , gov , net ,

def get_records(domain):
    """
    Get all the records associated to domain parameter.
    :param domain:
    :return:
    """
    ids = [ 'A','NS','MD','MF','CNAME','SOA','MB','MG', 'MR','MX','AAAA']

    for a in ids:
        try:
            answers = dns.resolver.query(domain, a)
            for rdata in answers:
                print(a, ':', rdata.to_text())
                return a

        except Exception as e:
            print(e)  # or pass
    return "NA"


# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly', 'https://www.googleapis.com/auth/gmail.modify']


def checkDNS(url):
    b = open("valid.txt", "a") # For the shorter valid types
    c = open("invalid.txt", "a") # For the shorter invalid types
    ans = get_records(url)
    if (ans == "NA"):
        c.write(url)
    else:
        b.write(url)
    b.close()
    c.close()


def validURL(url):
    try:
        response = requests.get(url,timeout=.5)
        return True
    except requests.ConnectionError as exception:
        return False
    except :
        return False


def cleanUrl(url):
    newUrl = url.strip()
    if not("mailto" in url): # It is email back
        return newUrl
    else: # It is an actual URL that needs to be cleaned up
        url = newUrl
        newUrl = url.replace("<","").replace(">","").trim()
        return newUrl

def cleanDomain(url):
    newUrl = url.replace("<","").replace(">","").replace(",","").strip()
    startingIndex = 0
    if ("mailto" in newUrl):
        return url
    elif ("https" in newUrl):
        newUrl = newUrl.replace("https://","")
        startingIndex = newUrl.index("/")
        return "https://" + (newUrl[0:startingIndex]).strip()
    elif ("http" in newUrl):
        newUrl = newUrl.replace("http://","")
        startingIndex = newUrl.index("/")
        return "http://" + (newUrl[0:startingIndex]).strip()



def cleanGmail():
    basePath = "User/Gmail/"
    if not os.path.exists(basePath):
        os.mkdir(basePath)
        print("Making the folder. Make sure to put credentials of Gmail account in the " + basePath + " location")
        return "Path did not exist"
    os.chdir(basePath)
    """Shows basic usage of the Gmail API.
    Lists the user's Gmail labels.
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('gmail', 'v1', credentials=creds)


    everythingFile = open("Everything.txt", "w")
    # Call the Gmail API
    megaThreadList = [] # All threads/ Emails
    sitesList = [] # List of all sites
    siteNames = [] # Names of sites for easy search
    noReply = [] # Thread of Noreply emails
    moreThreads = True
    threadsList = service.users().threads().list(userId='me',includeSpamTrash=True,prettyPrint=True).execute()
    nextPageToken = threadsList['nextPageToken']
    for thread1 in threadsList['threads']:
        megaThreadList.append(thread1['id'])

    threadPageCounter = 0
    pageLimit = 10
    while moreThreads:
        threadsList = service.users().threads().list(userId='me',includeSpamTrash=True,prettyPrint=True,pageToken=nextPageToken).execute()
        for thread1 in threadsList['threads']:
            megaThreadList.append(thread1['id'])
        if 'nextPageToken' in threadsList:
            nextPageToken = threadsList['nextPageToken']
            if threadPageCounter >= pageLimit: # Cut off after reaching pageLimit
                moreThreads = False
            threadPageCounter += 1
#            print(nextPageToken)
        else:
            moreThreads = False
#    print(threadPageCounter)
    for ids in megaThreadList:
        metaMessage = service.users().threads().get(userId='me',id=ids,format="metadata").execute()
        payloads = (metaMessage['messages'][0]['payload'])
        payloadHeaders = payloads['headers']
        # Name = List-Unsubscribe
        currentEmail = ""
        currentMessage = Message()
        currentMessage.thread = ids
        unsubscribeLink = "" # The unsubscriber link
        for headers in payloadHeaders:
            if(headers['name'] == 'From'):
                temp = headers['value']
                index = -1
                if "<" in temp:
                    index = temp.index("<")
                if (index < 0):
                    currentEmail = temp

                currentEmail = temp[index + 1:-1]
                currentMessage.sender = currentEmail
                if "noreply" in currentEmail or "no-reply" in currentEmail:
                    noReply.append(ids)

            if(headers['name'] == 'List-Unsubscribe'):
                temp = headers['value']
                index = 0
                if "<" in temp:
                    index = temp.index("<")
                currentUnsubscribeLink = temp[index+1:-1]
                unsubscribeLink = currentUnsubscribeLink
                currentMessage.link = currentUnsubscribeLink

        everythingFile.write(currentMessage.sender + "  "+ currentMessage.link + "\n")

        cleanDomainLink = unsubscribeLink
        if "," in unsubscribeLink:
            split = unsubscribeLink.split(",")
            cleanDomainLink = cleanDomain(split[1])
            currentMessage.link = cleanDomainLink
        else:
            cleanDomainLink = cleanDomain(unsubscribeLink)


        if not(cleanDomainLink is None or "mailto" in cleanDomainLink):
            if(validURL(cleanDomainLink)):
                if cleanDomainLink in siteNames: # Already exist
                    currentIndex = siteNames.index(cleanDomainLink)
                    sitesList[currentIndex].addMessage(currentMessage)
                else: # Create new Site
                    currentSite = Site()
                    siteNames.append(cleanDomainLink)
                    currentSite.domainName = cleanDomainLink
                    currentSite.addMessage(currentMessage)
                    currentSite.sender = currentMessage.sender
                    sitesList.append(currentSite)
    everythingFile.flush()
    everythingFile.close()
    siteInformation = open("SitesFile.txt","w") #Information
    unsubLink = open("unsubscribeLinks.txt", "w") # Unsubscribe Links
    ignoreFile = open("ignored.txt","w")
    # If their is one do a get and post request real quick
    for s in sitesList:
        if s.getMessageSize() == 1:
            print("Would you like to delete them messages?")
            ignoreFile.write(s.getSender() + "/n")
            print("Ignoring " + s.getSender())
        else:
            siteInformation.write(s.getString() + "\n")
            unsubLink.write(s.getLink() + "\n")
    siteInformation.close()
    ignoreFile.close()
    unsubLink.close()
    issuesFile = open("issues.txt","w")
    keeping = [] # The ones we are keeping
#    oneTimeResponse = "yes"
#    noReplyResponse = "yes"
    oneTimeResponse = input("Would you like to delete one time messages (yes/no)? ")
    noReplyResponse = input("Would you like to delete noreply messages (yes/no)? ")
    oneTime = oneTimeResponse.lower().strip() == "yes"
    noReplies = noReplyResponse.lower().strip() == "yes"
    counter = 0
    newSet = []
    if (noReplies):
        for nrthread in noReply:
            try:
                service.users().messages().trash(userId='me',id=nrthread).execute() #trashing thread
            except:
                print("That was an issue with " + nrthread)

    for s in sitesList:
        try:
            if s.getMessageSize() == 1 and oneTime:
                service.users().messages().trash(userId='me', id = s.messages.thread).execute()
            if s.getMessageSize() > 1:
                print(str(counter) + ". " + s.getString())
                newSet.append(s)
                counter += 1
        except:
            issuesFile.write(s.getString())
            print("that message does not exist")

    deletingRecords = open("deleting.txt", "w")
    keeping = input("enter in the number seperated by a , of the ones you want to keep: ")
    issuesFile.write("\nhere is the split\n\n")
    spliting = keeping.split(",")
#    spliting = []
    counter = 0
    for splits in spliting:
        if not (counter == splits):
            deletingRecords.write(newSet[counter].sender + "\n")
            for mes in newSet[counter].messages:
                try:
                    service.users().messages().trash(userId='me',id=mes.thread).execute()
                except:
                    issuesFile.write(mes.getString())
        counter += 1
    # Deleting the messages here
    deletingRecords.flush()
    deletingRecords.close()
    issuesFile.close()

    # Opening up all the unsubscribes
    readUnsubLinks = open("webSiteFile.txt","r")
    lines = readUnsubLinks.readlines()
    browserCount = 0
    maxBrowsers = 5
    for line in lines:
        if browserCount >= maxBrowsers: # Max tabs to open
            webbrowser.open(line)
        browserCount += 1
    readUnsubLinks.close()
    os.chdir("../../")
    return "All done cleaning"

Last updated