Skip to content

Python script to perform sentiment analysis on Turkish text data using multiple pre-trained transformer models and list of Turkish Sentiment Analysis Datasets between 2012 to 2022.

Notifications You must be signed in to change notification settings

sevvalckc/Turkish-SAD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 

Repository files navigation

Turkish-SAD

This repository provides a comprehensive collection of Turkish Sentiment Analysis Datasets from 2012 to 2025, covering diverse domains such as social media, e-commerce, news, political commentary, and more. It includes access links for publicly available datasets, contact information for restricted datasets, and detailed reuse references. Additionally, the repository provides a Python script for sentiment analysis using pre-trained transformer models.

Turkish Sentiment Analysis Datasets

To build this repository, we systematically reviewed academic studies indexed in Scopus and other scholarly databases. The search focused on publications that applied sentiment analysis using Turkish-language data or introduced sentiment-labeled Turkish datasets. Inclusion criteria required that papers either:

  • Used classification models on labeled Turkish sentiment datasets and reported results, or
  • Contributed novel Turkish datasets suitable for future modeling.

Search Details:

  • Query: 'sentiment analysis' AND 'Turkish dataset'
  • Databases: Scopus
  • Document Types: Conference papers, journal articles, book chapters
  • Date Range: 2012–2025

The final collection includes 78 studies and over 80 datasets. Among these:

  • More than 30 datasets are publicly available and linked,
  • Others are listed with author contacts for access,
  • Reused datasets are referenced with their original sources.

The repository provides:

  • Links to publicly available datasets
  • Contact Information for datasets that are not openly accessible
  • Reuse Citations for datasets previously published or used in multiple studies

Contents

  1. List of Datasets
  2. Usage
  3. Requirements
  4. Pre-trained Models
  5. Using Google Colab

List of Datasets

Author(s), Year (with link) Dataset Name (with download link if available) Source Availability Contact
Demirtas & Pechenizkiy, 2013 Turkish Movie Reviews, Turkish Multidomain Product Reviews Beyazperde.com, Hepsiburada.com Public [email protected], [email protected]
Cetin et al., 2013 Telecom Dataset A & B Twitter Not Available [email protected], [email protected]
Isguder & Sahin, 2014 Ekşi Sözlük Technology Brand Comments Dataset Ekşi Sözlük Not Available [email protected], [email protected], [email protected]
Turkmenoglu & Cadırci, 2014 Twitter Dataset, Movie Dataset Twitter, Beyazperde.com Not Available [email protected], [email protected]
Coban et al., 2015 Twt Twitter Not Available [email protected], [email protected], [email protected]
Ekinci & Güler, 2016 Turkcell Twitter Dataset, TTNet Twitter Dataset Twitter Not Available [email protected], [email protected], [email protected]
Ogul, 2016 Hotel Reviews Dataset Booking.com, Tripadvisor.com Not Available [email protected], [email protected]
Parlar, 2016 Turkish Multidomain Product Reviews Reused from Demirtas & Pechenizkiy, (2013) Hepsiburada.com Public [email protected] [email protected]
Ucan et al., 2016 Movie Review, Hotel Review Beyazperde.com, Otelpuan.com Public [email protected], [email protected], [email protected], [email protected]
Ayata et al., 2017 Retail, Telecom, Football, and Banking Tweets Twitter Public [email protected], [email protected]
Parlar, Saraç & Özel, 2017 Turkish Twitter Dataset Reused from Çetin & Amasyalı (2013) Twitter Not Available [email protected] [email protected] [email protected]
Hayran & Sert, 2017 Turkish Sentiment Dataset Twitter Public [email protected], [email protected]
Omurca, Ekinci & Türkmen, 2017 Turkish Hotel Review Dataset (annotated) Otelpuan.com Public [email protected] [email protected] [email protected]
Mulki, Haddad, Ali & Babaoğlu, 2018 Turkish Movie & Multidomain Product Reviews Reused from Demirtas & Pechenizkiy, (2013) Hepsiburada.com Public [email protected] [email protected] [email protected] [email protected]
Yüksel & Tan, 2018 Foursquare Venue and Comments Data Foursquare Public [email protected]
Ay Karakuş, Talo, Hallaç & Aydın, 2018 Turkish Movie Reviews Dataset Beyazperde.com Public [email protected], [email protected], [email protected], [email protected]
Yurtalan, Koyuncu & Turhan, 2019 Turkish Twitter Dataset Twitter Not Available [email protected]
Amasyalı, Taşköprü & Çalışkan, 2018 Turkish Telecom Twitter Dataset Twitter Public [email protected], [email protected], [email protected]
Çiftçi & Apaydın, 2019 Turkish Product & Movie Reviews Dataset Hepsiburada.com & Beyazperde.com Not Available [email protected] [email protected]
Çoban & Özyer, 2018 VS1 - 3000 Turkish Tweets, VS2 - Reused from Hayran & Sert (2017) Twitter Public [email protected] [email protected]
Oğul & Güran, 2019 VS1 - SemEval-2017 Task 4, VS2 - Reused from Amasyalı et al. (2018), VS3 - CrowdFlower Airline Dataset Twitter Public [email protected] [email protected]
Uslu, Tekin & Aytekin, 2019 VS1 - YTÜ/Kemik Dataset (Reused from Amasyalı et al., 2018), VS2 - Movie Comments, VS3 - Movie Comments, VS4 - Movie Comments Beyazperde.com Not Available [email protected] [email protected] [email protected]
Akın & Yıldız, 2019 VS1 - Restaurant Reviews, VS2 - Product Reviews Comment VS1: Not Available, VS2: Publicly Available [email protected] [email protected]
Santur, 2019 Turkish Movie Sentiment Analysis Dataset E-commerce Reviews Public [email protected]
Rumelli et al., 2019 Hepsiburada Product Reviews E-commerce Reviews Not Available [email protected], [email protected], [email protected], [email protected]
Shehu et al., 2019 Turkish Twitter Dataset Tweets Not Available [email protected] [email protected] [email protected] [email protected]
Erşahin et al., 2019 VS1 - Movie Review (Reused from Uçan et al., 2016), VS2 - Hotel Review (Reused from Uçan et al., 2016), VS3 - Twitter Dataset (Reused from Amasyalı et al., 2018) Comment & Tweet Public [email protected] [email protected] [email protected] [email protected]
Karamollaoğlu et al., 2019 Dataset Twitter Not Public [email protected] [email protected] [email protected] [email protected]
Bayraktar, Yavuoğlu & Özbilen, 2019 SemEval 2016 ABSA Turkish Restaurant Dataset Reused from Pontiki et al., (2016) Restaurant reviews (SemEval) Public [email protected]
Demirci, Keskin & Doğan, 2019 Twitter Dataset Twitter Not Available [email protected] [email protected] [email protected]
Güven, Diri & Çakaloğlu, 2020 Turkish Tweets Dataset Twitter Available [email protected] [email protected] [email protected]
Shehu & Tokat, 2020 Turkish Twitter Dataset Twitter Not Available [email protected], [email protected]
Kilimci, 2020 Turkish Financial Twitter Dataset Twitter Not Available [email protected]
Kilimci, Yoruk & Akyokus, 2020 Turkish Mobile Game Reviews Dataset Google Play Store Not Available [email protected] [email protected] [email protected]
Sigirci et al., 2020 Turkish Google Play Reviews Dataset Google Play Store Not Available [email protected]
Açıkalın, Bardak & Kutlu, 2020 Movie & Hotel Reviews Reused from Uçan et al., (2016) Beyazperde.com, Otelpuan.com Public [email protected]
Alqaraleh, 2021 Turkish Movie Reviews Reused from Demirtas & Pechenizkiy, (2013) Beyazperde.com Public [email protected]
Kılıç & Büyükeke, 2021 TripAdvisor, Blog, and IMDb Turkish Reviews Datasets TripAdvisor, Blog, IMDb Not Available [email protected], [email protected]
Eker, Eker & Duru, 2021 Turkish Tweets Dataset Reused from Güven et al., (2020) Twitter Public [email protected]
Salur & Aydın, 2021 Turkish ABSA Tourism Corpus TripAdvisor Public [email protected]
Aydın & Güngör, 2021 Movie Reviews Reused from Türkmenoğlu & Cadırci, (2014), Twitter Dataset Reused from Amasyalı et al., (2018) Beyazperde.com, Twitter Public [email protected]
Zeybek, Koç & Seçer, 2021 MS-TR Treebank, Built upon Turkish Sentiment Treebank (TSTB) Movie reviews, opinionated texts Public [email protected]
Shehu et al., 2021 Stemmed Turkish Twitter Dataset Twitter Available Upon Request [email protected]
Köksal & Özgür, 2021 BounTi: Turkish Sentiment Twitter Dataset Twitter Public [email protected]
Kemaloğlu, Küçüksille & Özgünsür, 2021 Turkish Social Media Sentiment Dataset Twitter Not Available [email protected] [email protected] [email protected]
Aydın, Öztürk & Çiçek, 2021 Turkish ODE Twitter Dataset Twitter Not Available [email protected]
Aygün, Kaya & Kaya, 2021 COVID-19 Vaccine Sentiment Dataset (TR & EN) Twitter Public [email protected]
Aydoğan & Kocaman, 2022 TRSAv1: Turkish E-commerce Reviews Turkish e-commerce websites Public [email protected]
Ballı et al., 2022 SentimentSet, Public Datset Reused from Beyaz (2021) Twitter Public [email protected]
Mutlu & Özgür, 2022 Turkish Targeted Sentiment Twitter Dataset Twitter Public (Tweet IDs) [email protected]
Kabakus, 2022 Turkish COVID-19 Twitter Dataset Twitter Available Upon Request [email protected]
Güven, 2022 TSAD: Turkish Hotel & Movie Reviews Reused from [Uçan et al., 2016] Beyazperde.com, Otelpuan.com Public [email protected]
Erkan & Güngör, 2023 Semeval 2016 Turkish Restaurant Reviews Reused from [Pontiki et al., 2016], Beyazperde Movie Reviews Reused from [Uçan et al., 2016] Twitter, Beyazperde.com Public [email protected]
Alnahas et al., 2022 Turkish E-commerce Reviews Dataset Turkish e-commerce websites Not Available [email protected]
Karayiğit et al., 2022 Turkish Instagram COVID-19 Comments Dataset Instagram Public [email protected]
Demir & Bilgin, 2023 Turkish News Sentiment Dataset Turkish news articles (source unspecified) Not Available [email protected]
Abdellatif et al., 2023 Turkish Twitter & Hepsiburada Dataset Twitter, Hepsiburada.com Not Available [email protected]
Altınok, 2023 Beyazperde Reviews, Supplements Reviews, Corona-mini Beyazperde.com, Vitaminler.com, Ekşi Sözlük Public [email protected]
Tohma et al., 2023 DS1 Reused from [Beyaz (2021)], SentimentSet Reused from [Özler (2021)], SCD (custom QA dataset) Twitter, Social Media, QA Dialogues 2 Public, 1 Not Available [email protected]
Aydın, Güngör & Erkan, 2023 Movie Reviews, Twitter Dataset Beyazperde.com, Twitter Public [email protected]
Yılmaz & Altunay, 2023 Turkish Smartphone Reviews Dataset E-commerce Platforms (Trendyol, Hepsiburada, N11, GittiGidiyor, Amazon Türkiye) Available Upon Request [email protected] [email protected]
Ezin, Kiziltepe & Karakus, 2024 TRSAv1 Reused from [Aydogan & Kocaman, 2023], VSCR Reused from [Altinok, 2023] E-commerce Platforms Public [email protected]
Özdemir, Giritli & Can, 2024 Turkish Hotel Reviews Dataset Booking Platforms Public [email protected]
Kiziltepe, Ezin & Karakus, 2024 VSCR Reused from [Altinok, 2023], TRSAv1 Reused from [Aydogan & Kocaman, 2023] E-commerce Platforms Public [email protected]
Polat et al., 2024 Couple Dialogue Dataset In-lab conversations (Özyeğin University) Not Public [email protected]
Ba Alawi & Bozkurt, 2024 Turkish University Twitter Dataset Twitter Not Available [email protected]
Ba Alawi & Bozkurt, 2024 VS1 - Turkish Higher Education Dataset (THED), VS2 - Reused from Ucan et al. (2016) Twitter (X), Hotel Reviews THED: Not Public, HRD: Public [email protected], [email protected]
Nasution & Onan, 2024 DTC (Topic), DTSA (Sentiment), DEC (Emotion) Newspapers, Twitter, Turkish literature Not Public [email protected]
Onan & Balbal, 2024 TRSAv1 Reused from Aydogan & Kocaman, 2023, Turkish Emotions Dataset, MR (Amazon), Swahili News Dataset E-commerce, Blogs, Amazon Reviews, News Articles Public, Not Public [email protected]
Bozuyla, 2023 Turkish Drug Review Dataset eksisozluk.com, drugs.com (translated) Not Public [email protected]
Cam et al., 2024 Financial Turkish Twitter Dataset Twitter (#Borsaistanbul, #Bist, #Bist30, #Bist100) Not Public [email protected]
Ba Alawi & Bozkurt, 2024 Turkish Universities Twitter Dataset Twitter Available Upon Request [email protected]
Najafi & Varol, 2023 VRLSentiment, TSATweets Reused from Kulcu (2015), Kemik-17bin Reused from Amasyalı et al. (2018), Kemik-3000 Reused from Amasyalı et al. (2018), BOUN (BounTi) Reused from Köksal & Özgür (2021), TSAD Reused from Uçan et al. (2016) Twitter Public [email protected]
Zümberoğlu et al., 2025 FSMTSAD, BOUN (BounTi) Reused from Köksal & Özgür (2021) Tweets, Product & Service Reviews Public [email protected]
Özmen & Gündüz, 2025 Turkish Cosmetic Product Reviews Dataset E-commerce Reviews (Trendyol) Not Public [email protected] [email protected]
Kaya, Fidan & Toroslu, 2012 Turkish Political News Columns Dataset News Columns (6 Turkish newspapers) Not Public [email protected]
Sağlam, Sever & Genç, 2016 SWNetTR Reused from Uçan, 2014, SWNetTR-GDELT, SWNetTR-PLUS, MLTC News Media (GDELT), Turkish Lexicons Public, Not Public [email protected] [email protected] [email protected]
Makinist et al., 2018 Improved Turkish Movie Review Dataset Turkish movie review website (collected via Apache MCF) Not Public [email protected]

Usage

Steps to Use:

  1. Clone this repository:
    git clone https://github.com/sevvalckc/Turkish-SAD.git
    cd Turkish-SAD
  2. Install required libraries: pip install -r requirements.txt
  3. Ensure your datasets (e.g., data1.csv, data2.csv) are placed in the same directory as the script.
  4. Run the script: python sentiment_analysis.py
  5. The script will output sentiment analysis results to CSV files for each model.

Requirements

The script requires the following Python libraries and versions:

  • Pandas version: 2.2.2
  • PyTorch version: 2.5.1+cu121
  • Transformers version: 4.46.2
  • Scipy version: 1.13.1

Install Requirements

To install all required libraries, run: pip install -r requirements.txt sv) for each model.

Pre-trained Models Used

TurkishBERTweet: VRLLab/TurkishBERTweet-Lora-SA TSAM: emre/turkish-sentiment-analysis BERTurk: akoksal/bounti XLM-T: cardiffnlp/twitter-xlm-roberta-base-sentiment

Using Google Colab

Enabling TPU and High RAM

To use this script on Google Colab with TPU and high RAM, follow these steps:

  • Open Google Colab: Go to Google Colab.
  • Upload the script: Upload sentiment_analysis.py and your datasets (data1.csv, data2.csv) to Colab.

Enable TPU:

Go to Runtime > Change runtime type. Select TPU from the Hardware accelerator dropdown menu. Enable High RAM:

Go to Runtime > Manage sessions. Click on the current session. Select High-RAM from the options available.

About

Python script to perform sentiment analysis on Turkish text data using multiple pre-trained transformer models and list of Turkish Sentiment Analysis Datasets between 2012 to 2022.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages