Michael Galarnyk*,
Veer Kejriwal*,
Agam Shah*,
Yash Bhardwaj,
Nicholas Watney Meyer,
Anand Krishnan,
Sudheer Chava
Georgia Institute of Technology
*Authors contributed equally
Social media has amplified the reach of financial influencers known as "finfluencers," who share stock recommendations on platforms like YouTube. Understanding their influence requires analyzing multimodal signals like tone, delivery style, and facial expressions, which extend beyond text based financial analysis. We introduce VideoConviction, a multimodal dataset with 6,000+ expert annotations, produced through 457 hours of human effort, to benchmark multimodal large language models (MLLMs) and text-based large language models (LLMs) in financial discourse. Our results show that while multimodal inputs improve stock ticker extraction (e.g., extracting Apple’s ticker AAPL), both MLLMs and LLMs struggle to distinguish investment actions and conviction—the strength of belief conveyed through confident delivery and detailed reasoning—often misclassifying general commentary as definitive recommendations. While high-conviction recommendations perform better than low-conviction ones, they still underperform the popular S&P 500 index fund. An inverse strategy—betting against finfluencer recommendations—outperforms the S&P 500 by 6.8% in annual returns but carries greater risk (Sharpe ratio of 0.41 vs. 0.65). Our benchmark enables a diverse evaluation of multimodal tasks, comparing model performance on both full video and segmented video inputs. This enables deeper advancements in multimodal financial research.
This repository contains the code and data pipelines used in our research on multimodal financial recommendations from YouTube content. It is organized into five primary subdirectories:
VideoConviction
├── back_testing/
├── data_analysis/
├── process_annotations_pipeline/
├── prompting/
├── youtube_data_pipeline/
├── .gitignore
├── LICENSE
└── README.md
Below is a high-level summary of each subdirectory’s purpose. Please refer to each subdirectory’s README.md
for detailed instructions, usage examples, and more specific documentation.
Purpose:
Implements a comprehensive framework for backtesting various equity trading strategies derived from finfluencer recommendations.
Purpose:
Houses Jupyter notebooks for exploratory data analysis (EDA) on the expert annotated dataset.
Purpose:
Provides a multi-step pipeline to validate, clean, and merge annotations with video transcripts and metadata.
Purpose:
Contains code and notebooks for prompt engineering and model inference with large language models (LLMs) and multimodal large language models (MLLMs).
Purpose:
Implements an end-to-end YouTube data pipeline.
-
Clone the Repository
git clone https://github.com/yourusername/VideoConviction.git cd VideoConviction
-
Install Dependencies
- If the subdirectory is .py based, it will have includes its own
environment.yaml
file (if files are .py based) and installation scripts (install.sh
or instructions in the README). - If it has .ipynb notebooks, the respective will have the necessary commands to add those dependencies
- If the subdirectory is .py based, it will have includes its own
-
Explore Subdirectories
- Data Collection: Start with
youtube_data_pipeline
to collect and transcribe videos. - Annotation & Merging: Move to
process_annotations_pipeline
to generate the final annotated dataset (complete_dataset.csv
). - Analysis & Modeling: Use
data_analysis
for EDA,prompting
for LLM/MLLM inference, andback_testing
to test trading strategies based on the recommendations.
- Data Collection: Start with
- YouTube Data Collection
- Gather videos (metadata, comments, transcripts) using
youtube_data_pipeline
.
- Gather videos (metadata, comments, transcripts) using
- Annotation & Merging
- Combine and clean annotations, transcripts, and metadata in
process_annotations_pipeline
.
- Combine and clean annotations, transcripts, and metadata in
- Exploratory Data Analysis
- Perform EDA in
data_analysis
to understand distributions, correlations, and dataset quality.
- Perform EDA in
- Model Prompting & Inference
- Generate prompts for text or multimodal LLMs in
prompting
, run inference, and evaluate performance.
- Generate prompts for text or multimodal LLMs in
- Backtesting
- Evaluate different trading strategies on the annotated dataset in
back_testing
, measuring returns, Sharpe ratio, etc.
- Evaluate different trading strategies on the annotated dataset in
The dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC-BY-NC-SA 4.0) license, which allows others to share, copy, distribute, and transmit the work, as well as to adapt the work, provided that appropriate credit is given, a link to the license is provided, and any changes made are indicated.
@inproceedings{galarnyk2025videoconviction,
author = {Michael Galarnyk and Veer Kejriwal and Agam Shah and Yash Bhardwaj and Nicholas Watney Meyer and Anand Krishnan and Sudheer Chava},
title = {VideoConviction: A Multimodal Benchmark for Human Conviction and Stock Market Recommendations},
booktitle = {Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '25)},
year = {2025},
location = {Toronto, ON, Canada},
pages = {12},
publisher = {ACM},
doi = {10.1145/3711896.3737417}
}