A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs
Media Content Atlas (MCA) is a first-of-its-kind pipeline that enables large-scale, AI-driven analysis of digital media experiences using multimodal LLMs. It combines recent advances in machine learning and visualization to support both open-ended and hypothesis-driven research into screen content and behavior.
🔗 Website & Demo: mediacontentatlas.github.io
🎥 Quick Video Explanation: Watch on YouTube
📄 Paper: Preprint
⏩ See Quickstart Tutorial here
📎 Citation: Cerit, M., Zelikman, E., Cho, M., Robinson, T. N., Reeves, B., Ram, N., & Haber, N. (2025). Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25). ACM. https://doi.org/10.1145/3706599.3720055
Built on 1.12 million smartphone screenshots collected from 112 adults over a month, MCA enables researchers to:
- Perform content-based clustering and topic modeling using semantic and visual signals
- Automatically generate descriptions of screen content
- Search and retrieve content across individuals and moments
- Visualize digital media behavior with an interactive dashboard
Expert reviewers rated MCA's clustering results 96% relevant and AI-generated descriptions 83% accurate.
The pipeline is fully modular, with standalone scripts and notebooks for each stage:
1. ⏩ Check out Quickstart Tutorial on Google Colab with Free T4.
Stage | Script | Description |
---|---|---|
🖼️ Embedding | anonymized_clip_embedding_generation.py |
Generate visual embeddings using CLIP |
📝 Captioning | anonymized_description_generation.py |
Generate descriptions using LLaVA-OneVision |
🔠 Embedding | anonymized_description_embedding_generation.py |
Generate sentence embeddings using GTE-Large |
🧵 Clustering | anonymized_clustering_topicmodeling_example.py |
Cluster and label screenshots using BERTopic + LLaMA2 |
📊 Visualization | anonymized_create_interactive_visualizations.ipynb |
Create an interactive dashboard using DataMapPlot |
🔍 Retrieval | anonymized_image_retrieval_app.py |
Retrieve screenshots using visual or textual similarity |
File | Description |
---|---|
anonymized_survey1.py |
Survey for cluster label relevance |
anonymized_survey2.py |
Survey for description accuracy |
anonymized_survey3.py |
Survey for retrieval performance |
We’d love to hear from you! Feel free to:
- 💬 Open an issue for bugs, suggestions, or feature requests
- 📬 Email us: [email protected]
- 🌐 Explore the lite demo: mediacontentatlas.github.io
Here’s what’s next for MCA, let us know if you'd like collaborate:
- 🔁 Reproducibility updates for easier setup
- 🧩 Customization utilities (label editing, filters, user tagging)
- 📈 Longitudinal visualizations to explore media patterns over time Stay tuned! ⭐ Star this repo to keep up with updates.
If you use MCA in your research, please cite the CHI 2025 paper:
@inproceedings{cerit2025mca,
author = {Merve Cerit and Eric Zelikman and Mu-Jung Cho and Thomas N. Robinson and Byron Reeves and Nilam Ram and Nick Haber},
title = {Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs},
booktitle = {Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25)},
year = {2025},
month = {April},
location = {Yokohama, Japan},
publisher = {ACM},
address = {New York, NY, USA},
pages = {19},
doi = {10.1145/3706599.3720055}
}