A software to help cleanup Gmail current and future state
Origin
I was getting so sick of going through people's emails with 1k+ of emails. I didn't have many myself so I asked the question of how I could make it benefit me... I applied to a lot of jobs and they give a DONOTREPLY email that after a time would not really matter and could cut down my mailbox.
Important Information
The Code for finding and targeting the emails that are spam works. The API that Google used at the time has changed so the code cannot be compiled. There is a different setup on how it is done
First Step: Connecting
Starting out I have to connect to the inbox through Google's Gmail API. I decided to use Gmail because it is the most frequently used in my opinion and many people have asked me to clean up Gmail over any other email type.
from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
import json
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
def main():
"""Shows basic usage of the Gmail API.
Lists the user's Gmail labels.
"""
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('gmail', 'v1', credentials=creds)
# Call the Gmail API
threadsList = service.users().threads().list(userId='me',includeSpamTrash=False,prettyPrint=True).execute()
print(threadsList)
results = service.users().labels().list(userId='me').execute()
labels = results.get('labels', [])
if not labels:
print('No labels found.')
else:
print('Labels:')
for label in labels:
print(label['name'])
if __name__ == '__main__':
main()
I got all the files and implemented them into it the code. It was fairly simple but this was the last simple thing.
Second Step: Reading messages
Overview
So we got the getting of threads and reading the messages was a lot of information. My best skill is knowing what information was needed and what wasn't. The storing of the whole message would take a lot of memory so I decided to read only the metadata. I then mapped the thread with the specified email. If the email had something suspicious to it I would flag it and read it more. Also, the message contained a default unsubscribe link placed by google that I based it off.
Now we have the suspicious list and breaking down that process into the code below.
Reading in all the threads
Reads in the thread and page token to get all the messages threads.
while moreThreads:
threadsList = service.users().threads().list(userId='me',includeSpamTrash=False,prettyPrint=True,pageToken=nextPageToken).execute()
for thread1 in threadsList['threads']:
megaThreadList.append(thread1['id'])
if 'nextPageToken' in threadsList:
nextPageToken = threadsList['nextPageToken']
print(nextPageToken)
else:
moreThreads = False
Thread Determination
Reads in the metadata of message and maps which ones are needed
for ids in megaThreadList:
metaMessage = service.users().threads().get(userId='me',id=ids,format="metadata").execute()
payloads = (metaMessage['messages'][0]['payload'])
head = payloads['headers']
# Name = List-Unsubscribe
curEmail = ""
for pay in head:
if(pay['name'] == 'From'):
temp = pay['value']
ind = -1
if "<" in temp:
ind = temp.index("<")
if (ind < 0):
curEmail = temp
else:
curEmail = temp[ind+1:-1]
ar.append(curEmail)
y.write(curEmail + "\n")
if(pay['name'] == 'List-Unsubscribe'):
temp = pay['value']
ind = temp.index("<")
curLink = temp[ind+1:-1]
x.write(curLink + "\n")
unsub.append(curLink)
Narrow Subscribers
Narrows down all the emails that were flags into a small list of who the email is subscribed to. This makes it shorter on the unsubscribe process and cuts down on duplicating unsubscribing.
for a in unsub:
if "," in a:
split = a.split(",")
cleanDom = cleanDomain(split[1])
if not (cleanDom in subscribeList):
subscribeList.append(cleanDom)
else:
cleanDom = cleanDomain(a)
if not (cleanDom in subscribeList):
subscribeList.append(cleanDom)
Stripping function
I was lazy on the regex and it seemed some didn't work so I just coded in the specific methods to do that for me. It was more time-consuming but felt it was easier to read for beginners that would be using the software.
def cleanUrl(url):
newUrl = url.strip()
if not("mailto" in url): # It is email back
return newUrl
else: # It is an actual URL that needs to be cleaned up
url = newUrl
newUrl = url.replace("<","").replace(">","")
return newUrl
def cleanDomain(url):
newUrl = url.strip().replace("<","").replace("<","")
startInd = 0
if ("mailto" in newUrl):
startInd = newUrl.index(":") # MAYBE +1
if ("?" in newUrl): # Has a subject line too
endInd = newUrl.index("?")
return newUrl[startInd:endInd]
return newUrl[startInd:] # Maybe
elif ("https" in newUrl):
newUrl = newUrl.replace("https://","")
startInd = newUrl.index("/")
return "https://" + (newUrl[0:startInd])
elif ("http" in newUrl):
newUrl = newUrl.replace("http://","")
startInd = newUrl.index("/")
return "http://" + (newUrl[0:startInd])
Third Step: Validating
Overview
Up to this point, we have a list of invalid entries. The problem is I don't want the user to be sent to a lot of invalid sites. A lot of sites at this point were becoming invalid after a short period of time which became annoying. I had to create some code to check this. Also during this stage, I started to narrow down the valid emails from just random spam fake emails because those pop up a lot. I will refer to them later when I go through the cleanup stage.
Checking Email Type
From networking, I learned there were types of DNS and with reverse, I could get a better understanding of what type I am tracing
def get_records(domain):
"""
Get all the records associated to domain parameter.
:param domain:
:return:
"""
ids = [ 'A','NS','MD','MF','CNAME','SOA','MB','MG', 'MR','MX','AAAA']
for a in ids:
try:
answers = dns.resolver.query(domain, a)
for rdata in answers:
print(a, ':', rdata.to_text())
return a
except Exception as e:
print(e) # or pass
return "NA"
Check DNS
It is only logical to be able to get the DNS to write a method for that too.
def checkDNS(url):
b = open("valid.txt", "a") # For the shorter valid types
c = open("invalid.txt", "a") # For the shorter invalid types
ans = get_records(url)
if (ans == "NA"):
c.write(url)
else:
b.write(url)
b.close()
c.close()
Time check
With some good connection, I used what I learned in time-based SQL attacks to determine if the URL worked even if it was a valid format
def validURL(url):
try:
response = requests.get(url,timeout=.5)
return True
# print("URL is valid and exists on the internet")
except requests.ConnectionError as exception:
return False
except :
return False
# print("URL does not exist on Internet")
Fourth Step: Clean up
The code is so messy right now, and no one really wants to read it. It has a few functions but everything is one thing so now it is time for more modifications. I add some classes and clean up the final segments of the code putting together some loose ends.
Messages Class
The messages class is to hold an object with my needed data on each message
class Message:
thread = ""
sender = ""
link = ""
def validSender(): # If has these extensions
accepts = ['com', 'org','net', 'gov', 'edu']
for a in accepts:
if a in sender:
return true
return false
Site class
Site class is the helper to the other functions linking emails, domain, and senders together.
I also cleaned up comments that were made and separated a few things in the main class into functions. The typical things getting closer to the review of code and final release
Fifth Step: User Friendly
So I mean what is a program if it is not easy to use. Not only that the user has to have control. No one would let me just run a program on their computer in places that it could delete critical emails. This can be done in a warning but I just take precautions and put a safer code in.
Unsubscribe control
The user clicks the button so they know what they are unsubscribing from. I set the limit to 5 because a lot more emails could be an issue for the browser.
readUnsubLinks = open("webSiteFile.txt","r")
lines = readUnsubLinks.readlines()
browserCount = 0
maxBrowsers = 5
for line in lines:
if browserCount >= maxBrowsers: # Max tabs to open
webbrowser.open(line)
browserCount += 1
readUnsubLinks.close()
ReadOnly
I found it annoying that the read-only emails were present and after that time no one really needed them as it is a reminder so I decided to put in a variable to get rid of those if the user would like it.
Deleting emails
When it did delete the emails it was doing it into the trash and not permanently in case needed later. Also since I didn't read the trash and it deletes in 30 days, I felt that was the safest way.
Final Conclusion
The program was very useful and worked exactly as planned. I really liked the result and people that used it also liked it. It would take a long time based on emails but also got rid of unneeded so the second and third time ran it would be way better. It was by no means perfect but got the job done for a majority of the need so I would call it a success.
Setbacks
A few setbacks were present along the way. Major ones were when testing and having to delete the first page would change each time and unsubscribe had to choose the best link. I just choose the most recent instead of testing which could have been an issue. Also to delete required another source which I didn't notice at first and that took up some time trying something that was restricted but didn't know why.
Feedback
A few things the testers wanted to be implemented. Some wanted to just delete all the emails and unsubscribe automatically since it was overwhelming and 5 was too little.
Email response unsubscribe should have been done but it required another source.
Great program and the one-time reply or no reply was something they didn't realize could take up 1/4 to 1/2 of the total message count.
Final Version 1 Code
from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
import json
from tld import get_tld
import requests
import webbrowser
import dns.resolver
class Message:
thread = ""
sender = ""
link = ""
def validSender(): # If has these extensions
accepts = ['com', 'org','net', 'gov', 'edu']
for a in accepts:
if a in sender:
return true
return false
class Site:
def __init__(self):
self.messages = []
sender = ""
domainName = ""
messages = [Message()]
def getSender(self):
return self.sender
def getDomain(self):
return self.domain
def addMessage(self, mail):
self.messages.append(mail)
def getString(self):
fin = ""
fin += self.sender + " " + self.domainName + " " + str(len(self.messages))
return fin
def getMessageSize(self):
return len(self.messages)
def getLink(self):
return self.messages[0].link
# com , org , us , edu , gov , net ,
def get_records(domain):
"""
Get all the records associated to domain parameter.
:param domain:
:return:
"""
ids = [ 'A','NS','MD','MF','CNAME','SOA','MB','MG', 'MR','MX','AAAA']
for a in ids:
try:
answers = dns.resolver.query(domain, a)
for rdata in answers:
print(a, ':', rdata.to_text())
return a
except Exception as e:
print(e) # or pass
return "NA"
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly', 'https://www.googleapis.com/auth/gmail.modify']
def checkDNS(url):
b = open("valid.txt", "a") # For the shorter valid types
c = open("invalid.txt", "a") # For the shorter invalid types
ans = get_records(url)
if (ans == "NA"):
c.write(url)
else:
b.write(url)
b.close()
c.close()
def validURL(url):
try:
response = requests.get(url,timeout=.5)
return True
except requests.ConnectionError as exception:
return False
except :
return False
def cleanUrl(url):
newUrl = url.strip()
if not("mailto" in url): # It is email back
return newUrl
else: # It is an actual URL that needs to be cleaned up
url = newUrl
newUrl = url.replace("<","").replace(">","").trim()
return newUrl
def cleanDomain(url):
newUrl = url.replace("<","").replace(">","").replace(",","").strip()
startingIndex = 0
if ("mailto" in newUrl):
return url
elif ("https" in newUrl):
newUrl = newUrl.replace("https://","")
startingIndex = newUrl.index("/")
return "https://" + (newUrl[0:startingIndex]).strip()
elif ("http" in newUrl):
newUrl = newUrl.replace("http://","")
startingIndex = newUrl.index("/")
return "http://" + (newUrl[0:startingIndex]).strip()
def cleanGmail():
basePath = "User/Gmail/"
if not os.path.exists(basePath):
os.mkdir(basePath)
print("Making the folder. Make sure to put credentials of Gmail account in the " + basePath + " location")
return "Path did not exist"
os.chdir(basePath)
"""Shows basic usage of the Gmail API.
Lists the user's Gmail labels.
"""
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('gmail', 'v1', credentials=creds)
everythingFile = open("Everything.txt", "w")
# Call the Gmail API
megaThreadList = [] # All threads/ Emails
sitesList = [] # List of all sites
siteNames = [] # Names of sites for easy search
noReply = [] # Thread of Noreply emails
moreThreads = True
threadsList = service.users().threads().list(userId='me',includeSpamTrash=True,prettyPrint=True).execute()
nextPageToken = threadsList['nextPageToken']
for thread1 in threadsList['threads']:
megaThreadList.append(thread1['id'])
threadPageCounter = 0
pageLimit = 10
while moreThreads:
threadsList = service.users().threads().list(userId='me',includeSpamTrash=True,prettyPrint=True,pageToken=nextPageToken).execute()
for thread1 in threadsList['threads']:
megaThreadList.append(thread1['id'])
if 'nextPageToken' in threadsList:
nextPageToken = threadsList['nextPageToken']
if threadPageCounter >= pageLimit: # Cut off after reaching pageLimit
moreThreads = False
threadPageCounter += 1
# print(nextPageToken)
else:
moreThreads = False
# print(threadPageCounter)
for ids in megaThreadList:
metaMessage = service.users().threads().get(userId='me',id=ids,format="metadata").execute()
payloads = (metaMessage['messages'][0]['payload'])
payloadHeaders = payloads['headers']
# Name = List-Unsubscribe
currentEmail = ""
currentMessage = Message()
currentMessage.thread = ids
unsubscribeLink = "" # The unsubscriber link
for headers in payloadHeaders:
if(headers['name'] == 'From'):
temp = headers['value']
index = -1
if "<" in temp:
index = temp.index("<")
if (index < 0):
currentEmail = temp
currentEmail = temp[index + 1:-1]
currentMessage.sender = currentEmail
if "noreply" in currentEmail or "no-reply" in currentEmail:
noReply.append(ids)
if(headers['name'] == 'List-Unsubscribe'):
temp = headers['value']
index = 0
if "<" in temp:
index = temp.index("<")
currentUnsubscribeLink = temp[index+1:-1]
unsubscribeLink = currentUnsubscribeLink
currentMessage.link = currentUnsubscribeLink
everythingFile.write(currentMessage.sender + " "+ currentMessage.link + "\n")
cleanDomainLink = unsubscribeLink
if "," in unsubscribeLink:
split = unsubscribeLink.split(",")
cleanDomainLink = cleanDomain(split[1])
currentMessage.link = cleanDomainLink
else:
cleanDomainLink = cleanDomain(unsubscribeLink)
if not(cleanDomainLink is None or "mailto" in cleanDomainLink):
if(validURL(cleanDomainLink)):
if cleanDomainLink in siteNames: # Already exist
currentIndex = siteNames.index(cleanDomainLink)
sitesList[currentIndex].addMessage(currentMessage)
else: # Create new Site
currentSite = Site()
siteNames.append(cleanDomainLink)
currentSite.domainName = cleanDomainLink
currentSite.addMessage(currentMessage)
currentSite.sender = currentMessage.sender
sitesList.append(currentSite)
everythingFile.flush()
everythingFile.close()
siteInformation = open("SitesFile.txt","w") #Information
unsubLink = open("unsubscribeLinks.txt", "w") # Unsubscribe Links
ignoreFile = open("ignored.txt","w")
# If their is one do a get and post request real quick
for s in sitesList:
if s.getMessageSize() == 1:
print("Would you like to delete them messages?")
ignoreFile.write(s.getSender() + "/n")
print("Ignoring " + s.getSender())
else:
siteInformation.write(s.getString() + "\n")
unsubLink.write(s.getLink() + "\n")
siteInformation.close()
ignoreFile.close()
unsubLink.close()
issuesFile = open("issues.txt","w")
keeping = [] # The ones we are keeping
# oneTimeResponse = "yes"
# noReplyResponse = "yes"
oneTimeResponse = input("Would you like to delete one time messages (yes/no)? ")
noReplyResponse = input("Would you like to delete noreply messages (yes/no)? ")
oneTime = oneTimeResponse.lower().strip() == "yes"
noReplies = noReplyResponse.lower().strip() == "yes"
counter = 0
newSet = []
if (noReplies):
for nrthread in noReply:
try:
service.users().messages().trash(userId='me',id=nrthread).execute() #trashing thread
except:
print("That was an issue with " + nrthread)
for s in sitesList:
try:
if s.getMessageSize() == 1 and oneTime:
service.users().messages().trash(userId='me', id = s.messages.thread).execute()
if s.getMessageSize() > 1:
print(str(counter) + ". " + s.getString())
newSet.append(s)
counter += 1
except:
issuesFile.write(s.getString())
print("that message does not exist")
deletingRecords = open("deleting.txt", "w")
keeping = input("enter in the number seperated by a , of the ones you want to keep: ")
issuesFile.write("\nhere is the split\n\n")
spliting = keeping.split(",")
# spliting = []
counter = 0
for splits in spliting:
if not (counter == splits):
deletingRecords.write(newSet[counter].sender + "\n")
for mes in newSet[counter].messages:
try:
service.users().messages().trash(userId='me',id=mes.thread).execute()
except:
issuesFile.write(mes.getString())
counter += 1
# Deleting the messages here
deletingRecords.flush()
deletingRecords.close()
issuesFile.close()
# Opening up all the unsubscribes
readUnsubLinks = open("webSiteFile.txt","r")
lines = readUnsubLinks.readlines()
browserCount = 0
maxBrowsers = 5
for line in lines:
if browserCount >= maxBrowsers: # Max tabs to open
webbrowser.open(line)
browserCount += 1
readUnsubLinks.close()
os.chdir("../../")
return "All done cleaning"