-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDataset info
34 lines (22 loc) · 1.69 KB
/
Dataset info
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
There is no API for quora or reddit
I search for them and got the followind links that can be used as training data:
for Quors:
https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs
for reddit:
http://files.pushshift.io/reddit/
https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/
My views on using quora/reddit forthe project:
I do not think it would be possible for us to use the above two because-
POPLI: I too believe that we should focus only on twitter as our main motive is to work on insights and things that haven't been worked on yet.
We should emphasize on completing the project as soon as possible.
1. Both do not support fetching live streaming data unlike Twiiter, so you would not be able to analyse fresh data
2. The dataset offered by Quora contains pairs of similar questions, and Questions do not give certainity about the emotion/sentiment, rather it displays the intent to know the real sentiment of the people through its answers.
3. Reddit's data can be used, however, Reddit it self cannot be given as an option for extracting information
Also, Facebook data might contain images as posts, so I suggest extracting only Facebook comments instead of posts, since there would be less of useless data
as discussions at facebook take place in the form of comments only even when the post contains photographs. ( I Hope you guys get what i am trying to say :P)
However, If you still want live data from Quora/reddit
It can still be done.
We can use a web scrapper to do that.
However this might violate Quora's policy and stuff like that.
so we must keep everything in mind before doing that.
POPLI: I got a dataset with 9 classes of 40k tweets.