GameStop Stock (GME) and Reddit Datasets

Wow! What a week it’s been. If you’ve been following the GameStop (GME) stock price in 2021, you’ll know that GME has grown more than 1500% at the time of writing. Several factors have contributed to this meteoric growth. As an inexperienced investor, I won’t try to explain the market forces at play. But as an avid reader of Reddit, I am well ingrained in the situation that is still evolving and taking the world’s attention. I recommend Googling GME stock and the subreddit named “WallStreetBets” (WSB) if you want to understand what’s going on.

For this month’s competition, we have a couple of interesting datasets. Firstly, we have the GME historical stock price updated through today. You’ll be able to visualize the growth of GME from their IPO until today (this dataset will be updated continually as new data is received). And second, we have a dataset containing each Reddit post from the WSB subreddit mentioning GME over the past month.

Will these datasets allow us to uncover some truth behind the situation or show any correlation at all? I’m not entirely sure. I’ll leave that up to you.

Dataset #1: GME Historical Stock Price

Columns:

  1. Date: The date when the prices were recorded.
  2. Open: The stock price at market open.
  3. Close: The stock price at market close.
  4. Low: The lowest stock price recorded during the trading period.
  5. High: The highest stock price recorded during the trading period.
  6. Volume: The total number of shares traded during the trading period.
  7. Dividends: Dividends paid to shareholders in shares.
  8. Stock Splits: Ratio in which stocks were split to create a larger number of stocks to increase liquidity.

This dataset was provided on Kaggle, link here:

Dataset #2: WallStreetBets Reddit Posts

WARNING! Please keep in mind that Reddit is an online forum. The WallStreetBets subreddit, in particular, uses pretty offensive language in many posts and comments.

Updated dataset! This .csv now contains post history going back to 2012. Total size: 169MB.

Columns:

  1. id: The reddit post id.
  2. title: The title of the Reddit post.
  3. score: The score of the reddit post (upvotes – downvotes).
  4. author: Author of the post.
  5. author_flair_text: Describes is the author has any flair (self-chosen tag)
  6. removed_by: Describes who the post was removed by, if anyone.
  7. total_awards: The number of awards the post received.
  8. awarders: Who provided the rewards.
  9. created_utc: The UTC timestamp when the post was created. Check out this video if you need to convert timestamps to datetime in Power BI.
  10. full_link: The url of the Reddit post.
  11. num_comments: The total number of comments for the post.
  12. over_18: If the post is flagged as over 18, true or false.

This dataset was provided on Kaggle, link here: