Skip to content

Commit 77f8b38

Browse files
committed
converted README to markdown
1 parent 9cf1872 commit 77f8b38

File tree

2 files changed

+276
-317
lines changed

2 files changed

+276
-317
lines changed

README.md

Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
# Twitter Toolbox for Python
2+
3+
Often we need to interact with the [Twitter APIs](https://dev.twitter.com/overview/api) to grab some data for research purposes or simple curiosity.
4+
5+
The Twitter API is very rich and powerful, however for many non-experienced users it can be tedious, cumbersome and tricky to code. Specially if you just want quick and reliable access to the API's methods!
6+
7+
For all those users who just want zero programming, this Twitter Toolbox might be very handy. And for those users that want more programmatic access, this Toolbox is also suitable and helpful!
8+
9+
All you need to do to easily start working with the Twitter APIs is to:
10+
11+
1. Sign-up for your own [Twitter App](https://apps.twitter.com/).
12+
2. Configure the Toolbox with your generated personal access credentials.
13+
3. Use the provided command-line tools.
14+
4. *(optional)* use the provided higher-level Toolbox API for Python in your own code.
15+
16+
Want to grab the list of followers of user `@insight_centre`? No problem:
17+
18+
tt-users-get-followers --screen-name insight_centre --output-file followers.ids
19+
20+
Want to turn those user Ids into fully hydrated Twitter User objects? No problem:
21+
22+
tt-users-get-hydrated --user-ids followers.ids --output-file followers.json
23+
24+
Want to receive some real-time Tweets about `obama` or mentioning `@realDonaldTrump`? No problem:
25+
26+
tt-streaming-get-filter --track obama @realDonaldTrump --output-file tweets.json
27+
28+
Want to see current real-time sample of Tweets text and you have the [`jq` tool](https://stedolan.github.io/jq/) installed? No problem:
29+
30+
tt-streaming-get-sample | jq .text
31+
32+
As seen, you can omit the `--output-file` argument to get data into your standard output pipe.
33+
34+
Finally, many tools have a **bulk processing** variant that allows you to download data in batches directly and easily. For example if you have a list of user ids stored in a file, you can download the follower ids for each of them in separate files stored under a directory using just one command:
35+
36+
tt-users-bulk-get-followers --output-dir followers --user-ids user_ids.txt
37+
38+
In case of any errors, simply run the command again and it will resume the bulk processing from where it was left.
39+
40+
## Installation
41+
42+
You can use `pip` (or any `PyPI`-compatible package manager) for installation:
43+
44+
pip install twitter-toolbox
45+
46+
or, if you prefer a local user installation:
47+
48+
pip install --user twitter-toolbox
49+
50+
For **Microsoft Windows** users, you might need to run `pip` through the Python interpreter:
51+
52+
python -m pip install twitter-toolbox
53+
54+
## Configuration File
55+
56+
The Twitter Toolbox is globally configured using the simple [configuration language from Python](https://docs.python.org/2/library/configparser.html) stored into a file named `.twtoolbox.cfg` under your home directory (please note the leading period `.`).
57+
58+
You can easily create a minimal basic configuration from your Twitter API access credentials using the `tt-config` command-line tool. Example usage:
59+
60+
$ tt-config
61+
WARNING: this tool will create a **NEW** config file and
62+
overwrite any existing previous configuration.
63+
64+
Consumer Key ...... : <INPUT YOUR CONSUMER KEY HERE>
65+
Consumer Secret ... : <INPUT YOUR CONSUMER SECRET HERE>
66+
Access Token Key .. : <INPUT YOUR ACCESS TOKEN KEY HERE>
67+
Access Token Secret : <INPUT YOUR ACCESS TOKEN SECRET HERE>
68+
69+
After you input your authentication data, a new minimal configuration file will be created in your home directory (replacing any previous existing file!).
70+
71+
You can further customize this file using the below configuration sections and options. The available configuration sections and options are:
72+
73+
* `[twitter]`: **(required)** for configuring your own Twitter API's access credentials. Options: `consumer_key`, `consumer_secret`, `access_token_key`, `access_token_secret`.
74+
* `[search]`: for configuring access to the Tweets Search API. Options: `limit`.
75+
* `[search_users]`: for configuring access to the Users Search API. Options: `limit`.
76+
* `[timeline]`: for configuring access to the Users Timeline API. Options: `limit`.
77+
* `[followers]`: for configuring access to the User Followers API. Options: `limit`.
78+
* `[friends]`: for configuring access to the User Friends API. Options: `limit`.
79+
* `[sample]`: for configuring access to the Streaming API's Sample Endpoint. Options: `limit`.
80+
* `[filter]`: for configuring access to the Streaming API's Filter Endpoint. Options: `limit`.
81+
* `[firehose]`: for configuring access to the Streaming API's Firehose Endpoint. Options: `limit`.
82+
83+
All the `limit` options specify the maximum number of results (users, Tweets, Ids) you want to download from Twitter, with `0` meaning *unlimited*. Be very careful with this option, the higher the number the easier you will exhaust your [API rate limits](https://dev.twitter.com/rest/public/rate-limiting). It is strongly recommended that you use the defaults from the Toolbox.
84+
85+
The following is a full example of a suitable configuration file. You can omit those sections/options that you want the defaults to be used. The very minimum is the `[twitter]` section with your configured API credentials.
86+
87+
[twitter]
88+
consumer_key=YOUR_CONSUMER_KEY_HERE
89+
consumer_secret=YOUR_CONSUMER_SECRET_HERE
90+
access_token_key=YOUR_ACCESS_TOKEN_KEY_HERE
91+
access_token_secret=YOUR_ACCESS_TOKEN_SECRET_HERE
92+
93+
[search]
94+
limit = 0
95+
96+
[search_users]
97+
limit = 1000
98+
99+
[timeline]
100+
limit = 0
101+
102+
[followers]
103+
limit = 30000
104+
105+
[friends]
106+
limit = 30000
107+
108+
[sample]
109+
limit = 0
110+
111+
[filter]
112+
limit = 0
113+
114+
[firehose]
115+
limit = 0
116+
117+
The option values under the `[twitter]` section must be replaced by your own **Twitter App credentials**.
118+
119+
If the configuration file, any section or option are not specified, built-in defaults are used.
120+
121+
## Tools for the Streaming API
122+
123+
* `tt-streaming-get-sample`
124+
* `tt-streaming-get-filter`
125+
* `tt-streaming-get-firehose`
126+
127+
All tools have an `--output-file` argument. If omitted, the standard output pipe is used.
128+
129+
Additionally, all tools also have a `--resume` flag to indicate that you want to append data to an existing output file instead of truncating it. Beware that this option does not de-duplicate existing data.
130+
131+
Example usage:
132+
133+
tt-streaming-get-sample --output-file tweets.json
134+
tt-streaming-get-filter --track obama trump --follow 6456345 --resume
135+
tt-streaming-get-filter --locations -122.75 36.8 -121.75 37.8 -74 40 -73 41
136+
tt-streaming-get-firehose
137+
138+
## Tools for Tweets
139+
140+
* `tt-tweets-get-hydrated`
141+
* `tt-tweets-get-retweets`
142+
* `tt-tweets-get-timeline`
143+
* `tt-tweets-search`
144+
145+
All tools have an `--output-file` argument. If omitted, the standard output is used.
146+
147+
Additionally, all tools also have a `--resume` flag to indicate that you want to append data to an existing output file instead of truncating it. Beware that this option does not de-duplicate existing data.
148+
149+
Example usage:
150+
151+
tt-tweets-get-hydrated --tweet-ids tweet_ids.txt --output-file tweets.json
152+
tt-tweets-get-retweets --tweet-id 64563457564
153+
tt-tweets-get-timeline --screen-name insight_centre
154+
tt-tweets-search --query "twitter api" --resume
155+
156+
## Tools for Twitter Users
157+
158+
* `tt-users-get-hydrated`
159+
* `tt-users-get-followers`
160+
* `tt-users-get-friends`
161+
* `tt-users-search`
162+
163+
All tools have an `--output-file` argument. If omitted, the standard output is used.
164+
165+
Additionally, all tools also have a `--resume` flag to indicate that you want to append data to an existing output file instead of truncating it. Beware that this option does not de-duplicate existing data.
166+
167+
Example usage:
168+
169+
tt-users-get-hydrated --user-ids user_ids.txt --screen-names screen_names.txt
170+
tt-users-get-followers --user-id 54252345
171+
tt-users-get-friends --screen-name insight_centre --resume
172+
tt-users-search --query "rte" --output-file users.json
173+
174+
## Tools for Bulk Processing
175+
176+
* `tt-tweets-bulk-get-retweets`
177+
* `tt-tweets-bulk-get-timeline`
178+
* `tt-tweets-bulk-search`
179+
* `tt-users-bulk-get-followers`
180+
* `tt-users-bulk-get-friends`
181+
* `tt-users-bulk-search`
182+
183+
All tools have an `--output-dir` argument. The directory is automatically created if not found. Some tools support resuming the bulk processing according to existing files in the output directory.
184+
185+
Example usage:
186+
187+
tt-tweets-bulk-get-retweets --output-dir retweets --tweet-ids tweet_ids.txt
188+
tt-tweets-bulk-get-timeline --output-dir timelines --screen-names screen_names.txt
189+
tt-tweets-bulk-search --output-dir searches --queries queries.txt
190+
tt-users-bulk-get-followers --output-dir followers --user-ids user_ids.txt
191+
tt-users-bulk-get-friends --output-dir friends --screen_names screen_names.txt
192+
tt-users-bulk-search --output-dir searches --queries queries.txt
193+
194+
## Toolbox API
195+
196+
The Twitter toolbox is contained in the `twtoolbox` module. The above command-line tools are actually wrappers around the functions listed below. The same semantics are used, including reading the configuration file.
197+
198+
### Streaming API
199+
200+
The following functions are available in the `streaming` submodule:
201+
202+
* `get_sample(writer)`
203+
* `get_filter(writer, follow=None, track=None, locations=None)`
204+
* `get_firehose(writer)`
205+
206+
Example usage:
207+
208+
```python
209+
from twtoolbox import streaming
210+
211+
with open("tweets.json", "w") as writer:
212+
streaming.filter(writer, track=["obama"])
213+
```
214+
215+
### Tweets
216+
217+
The following functions are available in the `tweets` submodule:
218+
219+
* `get_hydrated(writer, tweet_ids)`
220+
* `get_retweets(writer, tweet_id)`
221+
* `get_timeline(writer, user_id=None, screen_name=None, since_id=0)`
222+
* `search(writer, query, since_id=0)`
223+
* `bulk_get_retweets(output_dir, tweet_ids)`
224+
* `bulk_get_timeline(output_dir, user_ids=None, screen_names=None)`
225+
* `bulk_search(output_dir, queries)`
226+
227+
Example usage:
228+
229+
```python
230+
from twtoolbox import tweets
231+
232+
with open("tweets.json", "w") as writer:
233+
tweets.search(writer, query="twitter api")
234+
235+
tweets.bulk_get_retweets("retweets", [768585599271993344, 768585794458120192])
236+
```
237+
238+
### Users
239+
240+
The following functions are available in the `users` submodule:
241+
242+
* `get_hydrated(writer, user_ids=None, screen_names=None)`
243+
* `get_followers(writer, user_id=None, screen_name=None)`
244+
* `get_friends(writer, user_id=None, screen_name=None)`
245+
* `search(writer, query)`
246+
* `bulk_get_followers(output_dir, user_ids=None, screen_names=None)`
247+
* `bulk_get_friends(output_dir, user_ids=None, screen_names=None)`
248+
* `bulk_search(output_dir, queries)`
249+
250+
Example usage:
251+
252+
```python
253+
from twtoolbox import users
254+
255+
with open("followers.txt", "w") as writer:
256+
users.get_followers(writer, screen_name="twitter")
257+
258+
users.bulk_get_friends("friends", user_ids=[1635345, 645648754])
259+
```
260+
261+
## License
262+
263+
This software is under the **Apache License 2.0**.
264+
265+
Licensed under the Apache License, Version 2.0 (the "License");
266+
you may not use this file except in compliance with the License.
267+
You may obtain a copy of the License at
268+
269+
http://www.apache.org/licenses/LICENSE-2.0
270+
271+
Unless required by applicable law or agreed to in writing, software
272+
distributed under the License is distributed on an "AS IS" BASIS,
273+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
274+
See the License for the specific language governing permissions and
275+
limitations under the License.
276+

0 commit comments

Comments
 (0)