A software to help cleanup Gmail current and future state
Origin
I was getting so sick of going through people's emails with 1k+ of emails. I didn't have many myself so I asked the question of how I could make it benefit me... I applied to a lot of jobs and they give a DONOTREPLY email that after a time would not really matter and could cut down my mailbox.
Important Information
The Code for finding and targeting the emails that are spam works. The API that Google used at the time has changed so the code cannot be compiled. There is a different setup on how it is done
First Step: Connecting
Starting out I have to connect to the inbox through Google's Gmail API. I decided to use Gmail because it is the most frequently used in my opinion and many people have asked me to clean up Gmail over any other email type.
from__future__import print_functionimport pickleimport os.pathfrom googleapiclient.discovery import buildfrom google_auth_oauthlib.flow import InstalledAppFlowfrom google.auth.transport.requests import Requestimport json# If modifying these scopes, delete the file token.pickle.SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']defmain():"""Shows basic usage of the Gmail API. Lists the user's Gmail labels. """ creds =None# The file token.pickle stores the user's access and refresh tokens, and is# created automatically when the authorization flow completes for the first# time.if os.path.exists('token.pickle'):withopen('token.pickle', 'rb')as token: creds = pickle.load(token)# If there are no (valid) credentials available, let the user log in.ifnot creds ornot creds.valid:if creds and creds.expired and creds.refresh_token: creds.refresh(Request())else: flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES) creds = flow.run_local_server(port=0)# Save the credentials for the next runwithopen('token.pickle', 'wb')as token: pickle.dump(creds, token) service =build('gmail', 'v1', credentials=creds)# Call the Gmail API threadsList = service.users().threads().list(userId='me',includeSpamTrash=False,prettyPrint=True).execute()print(threadsList) results = service.users().labels().list(userId='me').execute() labels = results.get('labels', [])ifnot labels:print('No labels found.')else:print('Labels:')for label in labels:print(label['name'])if__name__=='__main__':main()
I got all the files and implemented them into it the code. It was fairly simple but this was the last simple thing.
Second Step: Reading messages
Overview
So we got the getting of threads and reading the messages was a lot of information. My best skill is knowing what information was needed and what wasn't. The storing of the whole message would take a lot of memory so I decided to read only the metadata. I then mapped the thread with the specified email. If the email had something suspicious to it I would flag it and read it more. Also, the message contained a default unsubscribe link placed by google that I based it off.
Now we have the suspicious list and breaking down that process into the code below.
Reading in all the threads
Reads in the thread and page token to get all the messages threads.
while moreThreads: threadsList = service.users().threads().list(userId='me',includeSpamTrash=False,prettyPrint=True,pageToken=nextPageToken).execute()for thread1 in threadsList['threads']: megaThreadList.append(thread1['id'])if'nextPageToken'in threadsList: nextPageToken = threadsList['nextPageToken']print(nextPageToken)else: moreThreads =False
Thread Determination
Reads in the metadata of message and maps which ones are needed
for ids in megaThreadList: metaMessage = service.users().threads().get(userId='me',id=ids,format="metadata").execute() payloads = (metaMessage['messages'][0]['payload']) head = payloads['headers']# Name = List-Unsubscribe curEmail =""for pay in head:if(pay['name']=='From'): temp = pay['value'] ind =-1if"<"in temp: ind = temp.index("<")if (ind <0): curEmail = tempelse: curEmail = temp[ind+1:-1] ar.append(curEmail) y.write(curEmail +"\n")if(pay['name']=='List-Unsubscribe'): temp = pay['value'] ind = temp.index("<") curLink = temp[ind+1:-1] x.write(curLink +"\n") unsub.append(curLink)
Narrow Subscribers
Narrows down all the emails that were flags into a small list of who the email is subscribed to. This makes it shorter on the unsubscribe process and cuts down on duplicating unsubscribing.
for a in unsub:if","in a: split = a.split(",") cleanDom =cleanDomain(split[1])ifnot (cleanDom in subscribeList): subscribeList.append(cleanDom)else: cleanDom =cleanDomain(a)ifnot (cleanDom in subscribeList): subscribeList.append(cleanDom)
Stripping function
I was lazy on the regex and it seemed some didn't work so I just coded in the specific methods to do that for me. It was more time-consuming but felt it was easier to read for beginners that would be using the software.
defcleanUrl(url): newUrl = url.strip()ifnot("mailto"in url):# It is email backreturn newUrlelse:# It is an actual URL that needs to be cleaned up url = newUrl newUrl = url.replace("<","").replace(">","")return newUrldefcleanDomain(url): newUrl = url.strip().replace("<","").replace("<","") startInd =0if ("mailto"in newUrl): startInd = newUrl.index(":")# MAYBE +1if ("?"in newUrl):# Has a subject line too endInd = newUrl.index("?")return newUrl[startInd:endInd]return newUrl[startInd:]# Maybeelif ("https"in newUrl): newUrl = newUrl.replace("https://","") startInd = newUrl.index("/")return"https://"+ (newUrl[0:startInd])elif ("http"in newUrl): newUrl = newUrl.replace("http://","") startInd = newUrl.index("/")return"http://"+ (newUrl[0:startInd])
Third Step: Validating
Overview
Up to this point, we have a list of invalid entries. The problem is I don't want the user to be sent to a lot of invalid sites. A lot of sites at this point were becoming invalid after a short period of time which became annoying. I had to create some code to check this. Also during this stage, I started to narrow down the valid emails from just random spam fake emails because those pop up a lot. I will refer to them later when I go through the cleanup stage.
Checking Email Type
From networking, I learned there were types of DNS and with reverse, I could get a better understanding of what type I am tracing
defget_records(domain):""" Get all the records associated to domain parameter. :param domain: :return: """ ids = [ 'A','NS','MD','MF','CNAME','SOA','MB','MG','MR','MX','AAAA']for a in ids:try: answers = dns.resolver.query(domain, a)for rdata in answers:print(a, ':', rdata.to_text())return aexceptExceptionas e:print(e)# or passreturn"NA"
Check DNS
It is only logical to be able to get the DNS to write a method for that too.
defcheckDNS(url): b =open("valid.txt", "a")# For the shorter valid types c =open("invalid.txt", "a")# For the shorter invalid types ans =get_records(url)if (ans =="NA"): c.write(url)else: b.write(url) b.close() c.close()
Time check
With some good connection, I used what I learned in time-based SQL attacks to determine if the URL worked even if it was a valid format
defvalidURL(url):try: response = requests.get(url,timeout=.5)returnTrue# print("URL is valid and exists on the internet")except requests.ConnectionError as exception:returnFalseexcept:returnFalse# print("URL does not exist on Internet")
Fourth Step: Clean up
The code is so messy right now, and no one really wants to read it. It has a few functions but everything is one thing so now it is time for more modifications. I add some classes and clean up the final segments of the code putting together some loose ends.
Messages Class
The messages class is to hold an object with my needed data on each message
classMessage: thread ="" sender ="" link =""defvalidSender(): # If has these extensions accepts = ['com','org','net','gov','edu']for a in accepts:if a in sender:return truereturn false
Site class
Site class is the helper to the other functions linking emails, domain, and senders together.
I also cleaned up comments that were made and separated a few things in the main class into functions. The typical things getting closer to the review of code and final release
Fifth Step: User Friendly
So I mean what is a program if it is not easy to use. Not only that the user has to have control. No one would let me just run a program on their computer in places that it could delete critical emails. This can be done in a warning but I just take precautions and put a safer code in.
Unsubscribe control
The user clicks the button so they know what they are unsubscribing from. I set the limit to 5 because a lot more emails could be an issue for the browser.
readUnsubLinks =open("webSiteFile.txt","r") lines = readUnsubLinks.readlines() browserCount =0 maxBrowsers =5for line in lines:if browserCount >= maxBrowsers:# Max tabs to open webbrowser.open(line) browserCount +=1 readUnsubLinks.close()
ReadOnly
I found it annoying that the read-only emails were present and after that time no one really needed them as it is a reminder so I decided to put in a variable to get rid of those if the user would like it.
Deleting emails
When it did delete the emails it was doing it into the trash and not permanently in case needed later. Also since I didn't read the trash and it deletes in 30 days, I felt that was the safest way.
Final Conclusion
The program was very useful and worked exactly as planned. I really liked the result and people that used it also liked it. It would take a long time based on emails but also got rid of unneeded so the second and third time ran it would be way better. It was by no means perfect but got the job done for a majority of the need so I would call it a success.
Setbacks
A few setbacks were present along the way. Major ones were when testing and having to delete the first page would change each time and unsubscribe had to choose the best link. I just choose the most recent instead of testing which could have been an issue. Also to delete required another source which I didn't notice at first and that took up some time trying something that was restricted but didn't know why.
Feedback
A few things the testers wanted to be implemented. Some wanted to just delete all the emails and unsubscribe automatically since it was overwhelming and 5 was too little.
Email response unsubscribe should have been done but it required another source.
Great program and the one-time reply or no reply was something they didn't realize could take up 1/4 to 1/2 of the total message count.
Final Version 1 Code
from__future__import print_functionimport pickleimport os.pathfrom googleapiclient.discovery import buildfrom google_auth_oauthlib.flow import InstalledAppFlowfrom google.auth.transport.requests import Requestimport jsonfrom tld import get_tldimport requestsimport webbrowserimport dns.resolverclassMessage: thread ="" sender ="" link =""defvalidSender(): # If has these extensions accepts = ['com','org','net','gov','edu']for a in accepts:if a in sender:return truereturn falseclassSite:def__init__(self): self.messages = [] sender ="" domainName ="" messages = [Message()]defgetSender(self):return self.senderdefgetDomain(self):return self.domaindefaddMessage(self,mail): self.messages.append(mail)defgetString(self): fin ="" fin += self.sender +" "+ self.domainName +" "+str(len(self.messages))return findefgetMessageSize(self):returnlen(self.messages)defgetLink(self):return self.messages[0].link# com , org , us , edu , gov , net ,defget_records(domain):""" Get all the records associated to domain parameter. :param domain: :return: """ ids = [ 'A','NS','MD','MF','CNAME','SOA','MB','MG','MR','MX','AAAA']for a in ids:try: answers = dns.resolver.query(domain, a)for rdata in answers:print(a, ':', rdata.to_text())return aexceptExceptionas e:print(e)# or passreturn"NA"# If modifying these scopes, delete the file token.pickle.SCOPES = ['https://www.googleapis.com/auth/gmail.readonly','https://www.googleapis.com/auth/gmail.modify']defcheckDNS(url): b =open("valid.txt", "a")# For the shorter valid types c =open("invalid.txt", "a")# For the shorter invalid types ans =get_records(url)if (ans =="NA"): c.write(url)else: b.write(url) b.close() c.close()defvalidURL(url):try: response = requests.get(url,timeout=.5)returnTrueexcept requests.ConnectionError as exception:returnFalseexcept:returnFalsedefcleanUrl(url): newUrl = url.strip()ifnot("mailto"in url):# It is email backreturn newUrlelse:# It is an actual URL that needs to be cleaned up url = newUrl newUrl = url.replace("<","").replace(">","").trim()return newUrldefcleanDomain(url): newUrl = url.replace("<","").replace(">","").replace(",","").strip() startingIndex =0if ("mailto"in newUrl):return urlelif ("https"in newUrl): newUrl = newUrl.replace("https://","") startingIndex = newUrl.index("/")return"https://"+ (newUrl[0:startingIndex]).strip()elif ("http"in newUrl): newUrl = newUrl.replace("http://","") startingIndex = newUrl.index("/")return"http://"+ (newUrl[0:startingIndex]).strip()defcleanGmail(): basePath ="User/Gmail/"ifnot os.path.exists(basePath): os.mkdir(basePath)print("Making the folder. Make sure to put credentials of Gmail account in the "+ basePath +" location")return"Path did not exist" os.chdir(basePath)"""Shows basic usage of the Gmail API. Lists the user's Gmail labels. """ creds =None# The file token.pickle stores the user's access and refresh tokens, and is# created automatically when the authorization flow completes for the first# time.if os.path.exists('token.pickle'):withopen('token.pickle', 'rb')as token: creds = pickle.load(token)# If there are no (valid) credentials available, let the user log in.ifnot creds ornot creds.valid:if creds and creds.expired and creds.refresh_token: creds.refresh(Request())else: flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES) creds = flow.run_local_server(port=0)# Save the credentials for the next runwithopen('token.pickle', 'wb')as token: pickle.dump(creds, token) service =build('gmail', 'v1', credentials=creds) everythingFile =open("Everything.txt", "w")# Call the Gmail API megaThreadList = [] # All threads/ Emails sitesList = [] # List of all sites siteNames = [] # Names of sites for easy search noReply = [] # Thread of Noreply emails moreThreads =True threadsList = service.users().threads().list(userId='me',includeSpamTrash=True,prettyPrint=True).execute() nextPageToken = threadsList['nextPageToken']for thread1 in threadsList['threads']: megaThreadList.append(thread1['id']) threadPageCounter =0 pageLimit =10while moreThreads: threadsList = service.users().threads().list(userId='me',includeSpamTrash=True,prettyPrint=True,pageToken=nextPageToken).execute()for thread1 in threadsList['threads']: megaThreadList.append(thread1['id'])if'nextPageToken'in threadsList: nextPageToken = threadsList['nextPageToken']if threadPageCounter >= pageLimit:# Cut off after reaching pageLimit moreThreads =False threadPageCounter +=1# print(nextPageToken)else: moreThreads =False# print(threadPageCounter)for ids in megaThreadList: metaMessage = service.users().threads().get(userId='me',id=ids,format="metadata").execute() payloads = (metaMessage['messages'][0]['payload']) payloadHeaders = payloads['headers']# Name = List-Unsubscribe currentEmail ="" currentMessage =Message() currentMessage.thread = ids unsubscribeLink =""# The unsubscriber linkfor headers in payloadHeaders:if(headers['name']=='From'): temp = headers['value'] index =-1if"<"in temp: index = temp.index("<")if (index <0): currentEmail = temp currentEmail = temp[index +1:-1] currentMessage.sender = currentEmailif"noreply"in currentEmail or"no-reply"in currentEmail: noReply.append(ids)if(headers['name']=='List-Unsubscribe'): temp = headers['value'] index =0if"<"in temp: index = temp.index("<") currentUnsubscribeLink = temp[index+1:-1] unsubscribeLink = currentUnsubscribeLink currentMessage.link = currentUnsubscribeLink everythingFile.write(currentMessage.sender +" "+ currentMessage.link +"\n") cleanDomainLink = unsubscribeLinkif","in unsubscribeLink: split = unsubscribeLink.split(",") cleanDomainLink =cleanDomain(split[1]) currentMessage.link = cleanDomainLinkelse: cleanDomainLink =cleanDomain(unsubscribeLink)ifnot(cleanDomainLink isNoneor"mailto"in cleanDomainLink):if(validURL(cleanDomainLink)):if cleanDomainLink in siteNames:# Already exist currentIndex = siteNames.index(cleanDomainLink) sitesList[currentIndex].addMessage(currentMessage)else:# Create new Site currentSite =Site() siteNames.append(cleanDomainLink) currentSite.domainName = cleanDomainLink currentSite.addMessage(currentMessage) currentSite.sender = currentMessage.sender sitesList.append(currentSite) everythingFile.flush() everythingFile.close() siteInformation =open("SitesFile.txt","w")#Information unsubLink =open("unsubscribeLinks.txt", "w")# Unsubscribe Links ignoreFile =open("ignored.txt","w")# If their is one do a get and post request real quickfor s in sitesList:if s.getMessageSize()==1:print("Would you like to delete them messages?") ignoreFile.write(s.getSender() +"/n")print("Ignoring "+ s.getSender())else: siteInformation.write(s.getString() +"\n") unsubLink.write(s.getLink() +"\n") siteInformation.close() ignoreFile.close() unsubLink.close() issuesFile =open("issues.txt","w") keeping = [] # The ones we are keeping# oneTimeResponse = "yes"# noReplyResponse = "yes" oneTimeResponse =input("Would you like to delete one time messages (yes/no)? ") noReplyResponse =input("Would you like to delete noreply messages (yes/no)? ") oneTime = oneTimeResponse.lower().strip()=="yes" noReplies = noReplyResponse.lower().strip()=="yes" counter =0 newSet = []if (noReplies):for nrthread in noReply:try: service.users().messages().trash(userId='me',id=nrthread).execute()#trashing threadexcept:print("That was an issue with "+ nrthread)for s in sitesList:try:if s.getMessageSize()==1and oneTime: service.users().messages().trash(userId='me', id = s.messages.thread).execute()if s.getMessageSize()>1:print(str(counter) +". "+ s.getString()) newSet.append(s) counter +=1except: issuesFile.write(s.getString())print("that message does not exist") deletingRecords =open("deleting.txt", "w") keeping =input("enter in the number seperated by a , of the ones you want to keep: ") issuesFile.write("\nhere is the split\n\n") spliting = keeping.split(",")# spliting = [] counter =0for splits in spliting:ifnot (counter == splits): deletingRecords.write(newSet[counter].sender +"\n")for mes in newSet[counter].messages:try: service.users().messages().trash(userId='me',id=mes.thread).execute()except: issuesFile.write(mes.getString()) counter +=1# Deleting the messages here deletingRecords.flush() deletingRecords.close() issuesFile.close()# Opening up all the unsubscribes readUnsubLinks =open("webSiteFile.txt","r") lines = readUnsubLinks.readlines() browserCount =0 maxBrowsers =5for line in lines:if browserCount >= maxBrowsers:# Max tabs to open webbrowser.open(line) browserCount +=1 readUnsubLinks.close() os.chdir("../../")return"All done cleaning"