News Sentiment Analysis By Using Deep Learning Framework

Attention is a deep learning mechanism which has been proved very helpful in the field of artificial intelligence and solving various AI problems, in order to bend the various intelligent tasks positively in the direction to its actual goal i.e AI. In this paper, I have used Attention Model to perform the task of sentiment analysis in any news article. After extracting the news article from a scraper and preprocessing the data, it will be fed into a sentiment analyser which will predict the sentiment of the news article at sentence and document level.


Introduction
News articles and websites are the source of information for all of us which should be accurate and unbiased. These news headlines influences all of us in one or another way. Some of the news headlines gives us positivity, motivation and happiness, while some are negative and create an environment of worries, terror and tensions, and rest of the news does not affects us at all. Also few websites and channels are corrupt and work for specific parties and publish biased news which spread chaos amongst people.
To identify such types of article sources that only spreads negativity can help us to boycott them and sustain a positive environment. Sentiment Analysis [1] helps us to understand the opinions of people and identify such types of intentions. The aim of this project is to detect and classify the news articles as positive, negative and neutral with accurate probabilistic scores by using Attention Network [2] which is a Deep Learning Framework.

Dataset Preparation
Knowledge about the suitable type of dataset, as well as manipulation of the unstructured data into a structured format as per our convenience is utmost important if we want our model to predict high accuracy results. The details about the data preparation for both train and test datasets are elaborated in the following subsections.

Train Dataset(SemEval-2016)
Train Dataset has been prepared by using SemEval-2016 Task 4: Sentiment Analysis on Twitter [3] which consists of training, dev and devtest datasets. I have merged the Training Dataset files and Additional Dataset files into a single Train Dataset file which consists of 53368 sentences.
The train dataset consists of sentiment tags namely positive, negative and neutral corresponding to each of the sentences. I manipulated the dataset by removing the sentimnet tags column and by adding three separate columns namely-"Positive, Negative and Neutral" each one which is represented in binary i.e if the sentence tag is postive then the value in the Postive Column will contain binary value 1 and rest of the columns will contain value 0, so on.

Test Dataset(News Crawler)
To prepare the Test Dataset, I built a web crawler by using a python library i.e BeautifulSoup [4] which helps us to parse through the HTML and XML files. This web crawler will collect all the articles present in the news website and tokenize them into sentences. I also incorporated this web crawler with some preprocessing algorithms that will give us clean sentences. These sentences will be compiled together to form a clean test dataset which will be fed into the sentiment analyser and the output will be predicted probability scores corresponding to each of these sentences.
As soon as the Train Dataset and Test Dataset is prepared by using SemEval-2016 and BeautifulSoup respectively, Glove Embeddings [5] are used as word vector representation which consists of 6B tokens, 400K vocab, uncased, 50d, 100d, 200d and 300d vectors.The dataset is prepared in such a way that it consists of all the sentences of a news article in a sequence which is compiled together along with the whole article. The purpose of incorporating the whole document itself with the compiled sentences of the article is to help us in predicting the document-level probabilistic scores. These scores will help us to detect the intent of the news source. Preprocessing of the dataset is done followed by preparation of embedding matrix and splitting up of dataset into train and val. And then it is fed into the network which uses LSTM merging it with Attention Network. The summary of the model is shown in the following figure.

Sentiment Predictions
The final predictions will consists of News sentiment scores for Positive, Negative and Neutral sentiments. The highest amongst the scores will determine the actual predicted sentiment. This contains Both sentence and document level scores where the last line will show the document-level scores and rest of the lines are the sentences from the news article. The predictions are shown in the following figure. As depicted in the figure, the probabilistic scores column i.e Positive, Negative and Neutral contains scores and the highest amongst them will help us to identify the sentiment of the sentence and the document. The last line of the result is the document-level prediction of the news article, which shows that the articles possesses a neutral sentiment.
For the code kindly send me an email to the given mail id.

Future Work
This sentiment analyser can be used in various tasks such as detecting the sentiments of people during pandemic such as COVID-19, it can also be further manipulated to detect the fake news in various platforms like twitter, facebook, etc. Various Deep Learning models like BERT, XLNET, etc can be used to further increase the accuracy of the model.