Influence of Rumours on Stock Market

Motivation:
About two years ago, while I was scrolling through my instagram feed, I came across a video where Christiano Ronaldo put aside a Coca-cola can and told people to drink water. Suddenly, the next day, the rates of coca-cola fell down a huge amount. The guy single-handedly moved millions for Coca-cola.
Through this example, it seems quite evident that influential people also have a huge impact on stock market prices, and one can wonder and strategize to buy/sell stocks on the basis of the rumours to make profit (maybe you can not move millions, but a significant amount) in stock maket.
Of course, the market rumours' effect on stock market prices depends upon various factors which typically should include the origin of the rumour, the spread of the rumour, the extent to which people can act on that rumour. I mean, if it were me instead of Ronaldo, then, I can say, for a fact, that prices would not be moved at all. Hence, the effect also depends upon the source which has started the rumour. There might even be more such factors to consider.
Hence, in this project, I attempt to make an API which can predict the prices of a stock market of several companies, to a certain level of accuracy, on the basis of rumours surrounding the company and/or the market.





Learning Potential:
On the top of my head, I believe that to develop such a project, we should have a grasp on sentiment analysis, LSTM models, Bayesian Inference, and various other machine-learning models and a bit of NLP.
And finally, we must know how to integrate these models and finally culminate them into an API.
Hence, I will get to learn some fundamental concepts of NLP through this project. I will be able to learn data-scraping techniques and structuring the data to train and test my models. I will additionally get a hands-on practical knowledge of making an application programming interface.

Edit:
Why stop here? I feel we can do much more than this. To introduce various strategy choices and hypothesis testing tools along with real stock price indicators along with our market rumor objective has the potential to turn into something useful. 
For that, we need to learn some more theory about Options and Futures (JC Hull ftw), as well as a condensed Algorithmic Strategy book. Now, we are cooking something delicious :D

Edit (03.10.2023):
There should be some gap in what I am trying to make and what is readily available.
Before jumping on to incorporate classic backtesting and game-theoretic approach for our application, I want to first tame our original wild beast, i.e., the sentiment analysis part. I want to make that model a bit dynamic as well. To make it clear, I want the user to adjust the control of sentiment analysis in stock-trading by themselves.
There are already available backtesting libraries. With a little-to-some modification and tweaking, it is possible to incorporate them into our application later on. This holds for game-theoretic approaches as well, though the idea innovation process is a bit cumbersome. Anyway, my point is "Let's do Sentiment Analysis first."

Exploring Github, I found some sentiment-analysis strategies performed on stocks here. This is a simple application of sentiment analysis performed on texts and stock rumour scrapped through Facebook News Articles.
This is a good start. But to make something extension, I need to scrap through more websites and articles, even, some unorthodox related news (remember the Ronaldo article). 
So, obviously, a readily-available sentiment analyzer will not work. So, we have a few problems at hand:

Qualitative:
1. Which are the sites to scrap the texts from?
2. How to scrap these from the sites?
3. How much texts to scrap from the sites?
4. Which sites to scrap the unorthodox data from?
5. How to scrap these from the sites?
6. How much texts of unorthodox news to scrap?

Quantitative:
1. How to train the sentiment analyzer in extracting the sentiment from unorthodox news?
2. How influential is that unorthodox news / How much weight should be given on the sentiment generated from the unorthodox news?
3. Unorthodox news' potential relies on how trendy / famous that news is among mass. How do we get the model to know how famous that news is?

Ideas so far:
1. For the qualitative part, I feel Reddit, Google Newsstand and some business articles available on the web should be a start. 
2. What I can do currently is distinguish the unorthodox news from the standard news, by checking up the frequency of the company name, and checking up some common words readily available in classical news. Of course, that also means I have discard some irrelevant news articles from the bucket of unorthodox news. I do not want an ad like, "Hungry for food! Burger King is the way to go!" to boost up the stocks of Burger King. 
3. I need to create a separate sentiment analyzer for the unorthodox news. I need to think about how exactly to do that sentiment analysis on the unorthodox news.
4. As for the unorthodox news, the weightage parameter of it depends on how famous that news is. It directly depends upon the popularity of the news among mass. So we need a popularity metric as well in our model.

Edit (18.10.2023):
The model would classify data based on 2 categories:
I.  Primary Classical data
Data Collection: Business article / Newspaper article (Times of India)
II. Secondary Influential data
Data Collection: (Unrestricted) Social-Media (Reddit and Twitter)

Initial Approach:
-> Extract Sentiments from Primary Data using Convolutional Sentiment Analysis/Transformers using pre-trained word embeddings
-> Web-scraping the contents of the specific company using specific keywords from Social Media sites, including
a. the number of followers of the user/corporation which posted the text.
b. usual effect and direction of influence the user/corporation has on public. This can be done by web-scraping and disambiguating the sentiment from the retweets/comments. The number of retweets/comments implies the extent, and the sentiment implies the direction of influence.
Finally, weighted average of sentiments extracted from all sources accordingly on the basis of "influence-extracted sentiments" to extract the final sentiment from Secondary Data.
-> Weighted-average of both the extracted sentiments to generate a final sentiment.

Modified Approach:
Instead of working of all of these three features distinctively, modelling of a neural network correspondingly could also be an alternate approach.
Assuming Primary and Secondary Data are independent of each other, we can train a simple feedforward network model to generate the final sentiment.

Alright, makes sense (atleast to me hehe).



Edit (06.12.2023):
Showing typical erratic ENTP behaviour, I went on a hiatus for this project. I had some personal reasons hehe :)
But, I did find many GPTs which are performing some kind of financial analysis on financial data. Some of them are BloombergGPT, FinGPT, FinBert, and even more. What I am really inclined to know is that if there is any attempt at the analysis of Annual Report (which is really niche, but important concern) and the sentiment analysis of reddit or some of such sites (there is a high chance I might find this topic in those advanced GPTs).
This is my work for today, to actually find out if what I am doing is different from these GPTs or not. If it is, how and what is different in my project in "exact" terms. (I might need to scrape my original approach)

No comments:

Post a Comment