The onset of the COVID-19 pandemic has given rise to an increase in cyberattacks and cybercrime, particularly with respect to phishing attempts. Cybercrime associated with phishing emails can significantly impact victims, who may be subjected to monetary loss and identity theft. Existing anti-phishing tools do not always catch all phishing emails, leaving the user to decide the legitimacy of an email. The ability of machine learning technology to identify reoccurring patterns yet cope with overall changes complements the nature of anti-phishing techniques, as phishing attacks may vary in wording but often follow similar patterns. This paper presents a browser extension called MailTrout, which incorporates machine learning within a usable security tool to assist users in detecting phishing emails. MailTrout demonstrated high levels of accuracy when detecting phishing emails and high levels of usability for end-users.
Australian Competition & Consumer Commission. (2015) Types of scams. https://www.scamwatch.gov.au/types-of-scams (Retrieved 13 March 2019).
A. Aggarwal, A. Rajadesingan, and P. Kumaraguru (2012) PhishAri: automatic realtime phishing detection on twitter. 2012 eCrime Researchers Summit, Las Croabas, PR, USA, 23-24 October 2012, pp. 1-12. IEEE.
Avanan. (2019) Global Phish Report. https://www.avanan.com/global-phish-report (Retrieved 7 February 2020).
A. Bangor, P. Kortum & J. Miller (2009). Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale. Journal of Usability Studies, 4(3), pp. 114-123.
S. Bansal & C. Aggarwal (2020) textstat 0.6.0. https://github.com/shivam5992/textstat (Retrieved 3 March 2020).
J. C. Begeny & D. J. Greene (2014). Can Readability Formulas Be Used to Successfully Gauge Difficulty of Reading Materials?. Psychology in the Schools, 51(2), pp. 198-215.
S. Bird, E. Klein & E. Loper (2009). 2.4.1 Wordlist Corpora. In: J. Steele, 1st ed. Natural Language Processing with Python. Sebastopol: O’Reilly Media, Inc., pp. 60-62.
Y.Y. Chen, Y.H. Lin, C.C. Kung, M.H. Chung, I. Yen (2019). Design and Implementation of Cloud Analytics-Assisted Smart Power Meters Considering Advanced Artificial Intelligence as Edge Analytics in DemandSide Management for Smart Homes. Sensors (Basel), 19(9).
F. Chollet (2015) Keras Documentation. https://keras.io/ (Retrieved 14 March 2020).
T. Dietterich (1995) Overfitting and undercomputing in machine learning. ACM computing surveys (CSUR), 27(3), pp. 326-327.
M. Dixon, N.A. Gamagedara Arachchilage and J. Nicholson (2019) Engaging users with educational games: The case of phishing. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1-6).
M. Dunlop, S. Groat and D. Shelly (2010) Goldphish: using images for content-based phishing analysis. 2010 Fifth international conference on internet monitoring and protection, Barcelona, Spain, 9-15 May 2010, pp. 123-128. IEEE.
M. Dwarampudi & N. V. S. Reddy (2019) Effects of padding on LSTMs and CNNs. https://arxiv.org/pdf/1903.07288.pdf (Retrieved 5 February 2020).
EugeneBYMCMB. (2019) The Blackmail Email Scam (part 3) : Scams. https://www.reddit.com/r/Scams/comments/biv65o/the_blackmail_email_scam_part_3/ (Retrieved 21 January 2020).
I. Fette, N. Sadeh & A. Tomasic (2007) Learning to detect phishing emails. 16th International World Wide Web Conference, Banff, Canada, May 2007, pp. 649–656. ACM.
A. Y. Fu, L. Wenyin & X. Deng (2006). Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD). IEEE Transactions on Dependable and Secure Computing, 3(4), pp. 301-311.
F. A. Gers & E. Schmidhuber (2001). LSTM recurrent networks learn simple context-free and context-sensitive languages. IEEE Transactions on Neural Networks, 12(6), pp. 13331340.
I. Goodfellow, Y. Bengio & A. Courville (2016). 6.2.2.3 Softmax Units for Multinoulli Output Distributions. In: Deep Learning. Cambridge: MIT Press, pp. 180-184.
Google. (2020a) Tensorflow. https://www.tensorflow.org/ (Retrieved 15 January 2020).
Google. (2020b). tesseract-ocr / tesseract: Tesseract Open Source OCR Engine (main repository). https://github.com/tesseractocr/tesseract (Retrieved 6 February 2020).
Google. (2020c). tf.keras.preprocessing.text .Tokenizer. https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer (Retrieved 23 March 2020).
A. Graham (2018) The cost of a cyber attack. IT Governance. https://www.itgovernance.co.uk/blog/the-cost-of-a-cyber-attack (Retrieved 7 February 2020).
L. Halgaš, I. Agrafiotis & J. R. C. Nurse (2019). Catching the Phish: Detecting Phishing Attacks using Recurrent Neural Networks (RNNs). Jeju Island: 20th World Conference on Information Security Applications, Springer.
S. Horgan, B. Collier, R. Jones, L. Shepherd (2021) Re-territorialising the policing of cybercrime in the post-COVID-19 era: towards a new vision of local democratic cyber policing. Journal of Criminal Psychology, Accepted/In Press.
Kaggle. (2019) Hillary Clinton's Emails. https://www.kaggle.com/kaggle/hillary-clinton-emails/ (Retrieved 15 March 2020).
P. Kumaraguru, S. Sheng, A. Acquisti, L.F. Cranor, J. Hong (2010). Teaching Johnny not to fall for phish. ACM Transactions on Internet Technology (TOIT), 10(2), pp. 1-31.
S. Lai, L. Xu, K. Liu & J. Zhao (2015). Recurrent convolutional neural networks for text classification. Proceedings of the National Conference on Artificial Intelligence, Volume 3, pp. 2267-2273.
H. S. Lallie, L. A. Shepherd, J. R. C. Nurse, A. Erola, G. Epiphaniou, C. Maple, & X. Bellekens (2021) Cyber security in the age of COVID-19: a timeline and analysis of cyber-crime and cyber-attacks during the pandemic. Computers & Security, 105. 102248.
M. Lee (2020) madmaze/pytesseract: A Python wrapper for Google Tesseract https://github.com/madmaze/pytesseract (Retrieved 6 February 2020).
J. McRay, 1st ed., (2015). Pareto principle. In: Leadership glossary: Essential terms for the 21st century. Santa Barbara: Mission Bell Media.
F. Postolache & M. Postolache (2010). Current and Ongoing Internet Crime Tendencies and Techniques. Preventive Legislation Measures in Romania. EIRP Proceedings, 5(1), pp. 35-43.
M. Prensky, (2001). Digital Natives, Digital Immigrants. On the Horizon, 9(5), pp. 1-6.
J. Prusa, T. M. Khoshgoftaar & N. Seliya (2015). The Effect of Dataset Size on Training Tweet Sentiment Classifiers. Miami: IEEE 14th International Conference on Machine Learning and Applications (ICML), IEEE.
D. Radev (2008). CLAIR collection of fraud email, ACL Data and Code Repository, ADCR2008T001.https://aclweb.org/aclwiki/CLAIR_collection_of_fraud_email_(Repository) (Retrieved 15 February 2020).
E. M. Rogers (2003). Diffusion of Innovations. 5th ed. New York City: Simon and Schuster.
H. Sak, A. Senior & F. Beaufays (2014). Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling. 15th Annual Conference of the International Speech Communication Association, Singapore, ISCA Archive.
J. Sauro & J. Lewis (2011). When designing usability questionnaires, does it hurt to be positive?. Proceedings of the SIGCHI Conference on human factors in computing systems, 7 May, pp. 2215-2224.
StatCounter. (2021) Browser market share worldwide. https://gs.statcounter.com/browser-market-share (Retrieved 6 May 2021).
J. Tao & X. Fang (2020). Toward multi-label sentiment analysis: a transfer learning based approach. Journal of Big Data, 7(1), pp. 1-26.
UN General Assembly. (1989). Convention on the Rights of the Child. United Nations, Treaty Series, Volume 1577, p. 3.
H. Vasa (2019) Google Images Download. https://github.com/hardikvasa/google-images-download (Retrieved 15 March 2020).
A. Verma (2018). Fraud Email Dataset | Kaggle. https://www.kaggle.com/llabhishekll/fraud-email-dataset (Retrieved 28 January 2020).
P. Wang, Y. Qian, F.K. Soong, L. He, H. Zhao (2015). Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network. ArXiv [Preprint]. (Retrieved 15 March 2020).
M. Wickline (2001) Coblis - Color Blindness Simulator. https://www.color-blindness.com/coblis-color-blindness-simulator/ (Retrieved 10 March 2020).
S. Xiao, J. Witschey & E. Murphy-Hill (2014). Social Influences on Secure Development Tool Adoption: Why Security Tools Spread. Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing, pp. 1095-1106.