|
| 1 | +# Twitter Toolbox for Python |
| 2 | + |
| 3 | +Often we need to interact with the [Twitter APIs](https://dev.twitter.com/overview/api) to grab some data for research purposes or simple curiosity. |
| 4 | + |
| 5 | +The Twitter API is very rich and powerful, however for many non-experienced users it can be tedious, cumbersome and tricky to code. Specially if you just want quick and reliable access to the API's methods! |
| 6 | + |
| 7 | +For all those users who just want zero programming, this Twitter Toolbox might be very handy. And for those users that want more programmatic access, this Toolbox is also suitable and helpful! |
| 8 | + |
| 9 | +All you need to do to easily start working with the Twitter APIs is to: |
| 10 | + |
| 11 | +1. Sign-up for your own [Twitter App](https://apps.twitter.com/). |
| 12 | +2. Configure the Toolbox with your generated personal access credentials. |
| 13 | +3. Use the provided command-line tools. |
| 14 | +4. *(optional)* use the provided higher-level Toolbox API for Python in your own code. |
| 15 | + |
| 16 | +Want to grab the list of followers of user `@insight_centre`? No problem: |
| 17 | + |
| 18 | + tt-users-get-followers --screen-name insight_centre --output-file followers.ids |
| 19 | + |
| 20 | +Want to turn those user Ids into fully hydrated Twitter User objects? No problem: |
| 21 | + |
| 22 | + tt-users-get-hydrated --user-ids followers.ids --output-file followers.json |
| 23 | + |
| 24 | +Want to receive some real-time Tweets about `obama` or mentioning `@realDonaldTrump`? No problem: |
| 25 | + |
| 26 | + tt-streaming-get-filter --track obama @realDonaldTrump --output-file tweets.json |
| 27 | + |
| 28 | +Want to see current real-time sample of Tweets text and you have the [`jq` tool](https://stedolan.github.io/jq/) installed? No problem: |
| 29 | + |
| 30 | + tt-streaming-get-sample | jq .text |
| 31 | + |
| 32 | +As seen, you can omit the `--output-file` argument to get data into your standard output pipe. |
| 33 | + |
| 34 | +Finally, many tools have a **bulk processing** variant that allows you to download data in batches directly and easily. For example if you have a list of user ids stored in a file, you can download the follower ids for each of them in separate files stored under a directory using just one command: |
| 35 | + |
| 36 | + tt-users-bulk-get-followers --output-dir followers --user-ids user_ids.txt |
| 37 | + |
| 38 | +In case of any errors, simply run the command again and it will resume the bulk processing from where it was left. |
| 39 | + |
| 40 | +## Installation |
| 41 | + |
| 42 | +You can use `pip` (or any `PyPI`-compatible package manager) for installation: |
| 43 | + |
| 44 | + pip install twitter-toolbox |
| 45 | + |
| 46 | +or, if you prefer a local user installation: |
| 47 | + |
| 48 | + pip install --user twitter-toolbox |
| 49 | + |
| 50 | +For **Microsoft Windows** users, you might need to run `pip` through the Python interpreter: |
| 51 | + |
| 52 | + python -m pip install twitter-toolbox |
| 53 | + |
| 54 | +## Configuration File |
| 55 | + |
| 56 | +The Twitter Toolbox is globally configured using the simple [configuration language from Python](https://docs.python.org/2/library/configparser.html) stored into a file named `.twtoolbox.cfg` under your home directory (please note the leading period `.`). |
| 57 | + |
| 58 | +You can easily create a minimal basic configuration from your Twitter API access credentials using the `tt-config` command-line tool. Example usage: |
| 59 | + |
| 60 | + $ tt-config |
| 61 | + WARNING: this tool will create a **NEW** config file and |
| 62 | + overwrite any existing previous configuration. |
| 63 | + |
| 64 | + Consumer Key ...... : <INPUT YOUR CONSUMER KEY HERE> |
| 65 | + Consumer Secret ... : <INPUT YOUR CONSUMER SECRET HERE> |
| 66 | + Access Token Key .. : <INPUT YOUR ACCESS TOKEN KEY HERE> |
| 67 | + Access Token Secret : <INPUT YOUR ACCESS TOKEN SECRET HERE> |
| 68 | + |
| 69 | +After you input your authentication data, a new minimal configuration file will be created in your home directory (replacing any previous existing file!). |
| 70 | + |
| 71 | +You can further customize this file using the below configuration sections and options. The available configuration sections and options are: |
| 72 | + |
| 73 | +* `[twitter]`: **(required)** for configuring your own Twitter API's access credentials. Options: `consumer_key`, `consumer_secret`, `access_token_key`, `access_token_secret`. |
| 74 | +* `[search]`: for configuring access to the Tweets Search API. Options: `limit`. |
| 75 | +* `[search_users]`: for configuring access to the Users Search API. Options: `limit`. |
| 76 | +* `[timeline]`: for configuring access to the Users Timeline API. Options: `limit`. |
| 77 | +* `[followers]`: for configuring access to the User Followers API. Options: `limit`. |
| 78 | +* `[friends]`: for configuring access to the User Friends API. Options: `limit`. |
| 79 | +* `[sample]`: for configuring access to the Streaming API's Sample Endpoint. Options: `limit`. |
| 80 | +* `[filter]`: for configuring access to the Streaming API's Filter Endpoint. Options: `limit`. |
| 81 | +* `[firehose]`: for configuring access to the Streaming API's Firehose Endpoint. Options: `limit`. |
| 82 | + |
| 83 | +All the `limit` options specify the maximum number of results (users, Tweets, Ids) you want to download from Twitter, with `0` meaning *unlimited*. Be very careful with this option, the higher the number the easier you will exhaust your [API rate limits](https://dev.twitter.com/rest/public/rate-limiting). It is strongly recommended that you use the defaults from the Toolbox. |
| 84 | + |
| 85 | +The following is a full example of a suitable configuration file. You can omit those sections/options that you want the defaults to be used. The very minimum is the `[twitter]` section with your configured API credentials. |
| 86 | + |
| 87 | + [twitter] |
| 88 | + consumer_key=YOUR_CONSUMER_KEY_HERE |
| 89 | + consumer_secret=YOUR_CONSUMER_SECRET_HERE |
| 90 | + access_token_key=YOUR_ACCESS_TOKEN_KEY_HERE |
| 91 | + access_token_secret=YOUR_ACCESS_TOKEN_SECRET_HERE |
| 92 | + |
| 93 | + [search] |
| 94 | + limit = 0 |
| 95 | + |
| 96 | + [search_users] |
| 97 | + limit = 1000 |
| 98 | + |
| 99 | + [timeline] |
| 100 | + limit = 0 |
| 101 | + |
| 102 | + [followers] |
| 103 | + limit = 30000 |
| 104 | + |
| 105 | + [friends] |
| 106 | + limit = 30000 |
| 107 | + |
| 108 | + [sample] |
| 109 | + limit = 0 |
| 110 | + |
| 111 | + [filter] |
| 112 | + limit = 0 |
| 113 | + |
| 114 | + [firehose] |
| 115 | + limit = 0 |
| 116 | + |
| 117 | +The option values under the `[twitter]` section must be replaced by your own **Twitter App credentials**. |
| 118 | + |
| 119 | +If the configuration file, any section or option are not specified, built-in defaults are used. |
| 120 | + |
| 121 | +## Tools for the Streaming API |
| 122 | + |
| 123 | +* `tt-streaming-get-sample` |
| 124 | +* `tt-streaming-get-filter` |
| 125 | +* `tt-streaming-get-firehose` |
| 126 | + |
| 127 | +All tools have an `--output-file` argument. If omitted, the standard output pipe is used. |
| 128 | + |
| 129 | +Additionally, all tools also have a `--resume` flag to indicate that you want to append data to an existing output file instead of truncating it. Beware that this option does not de-duplicate existing data. |
| 130 | + |
| 131 | +Example usage: |
| 132 | + |
| 133 | + tt-streaming-get-sample --output-file tweets.json |
| 134 | + tt-streaming-get-filter --track obama trump --follow 6456345 --resume |
| 135 | + tt-streaming-get-filter --locations -122.75 36.8 -121.75 37.8 -74 40 -73 41 |
| 136 | + tt-streaming-get-firehose |
| 137 | + |
| 138 | +## Tools for Tweets |
| 139 | + |
| 140 | +* `tt-tweets-get-hydrated` |
| 141 | +* `tt-tweets-get-retweets` |
| 142 | +* `tt-tweets-get-timeline` |
| 143 | +* `tt-tweets-search` |
| 144 | + |
| 145 | +All tools have an `--output-file` argument. If omitted, the standard output is used. |
| 146 | + |
| 147 | +Additionally, all tools also have a `--resume` flag to indicate that you want to append data to an existing output file instead of truncating it. Beware that this option does not de-duplicate existing data. |
| 148 | + |
| 149 | +Example usage: |
| 150 | + |
| 151 | + tt-tweets-get-hydrated --tweet-ids tweet_ids.txt --output-file tweets.json |
| 152 | + tt-tweets-get-retweets --tweet-id 64563457564 |
| 153 | + tt-tweets-get-timeline --screen-name insight_centre |
| 154 | + tt-tweets-search --query "twitter api" --resume |
| 155 | + |
| 156 | +## Tools for Twitter Users |
| 157 | + |
| 158 | +* `tt-users-get-hydrated` |
| 159 | +* `tt-users-get-followers` |
| 160 | +* `tt-users-get-friends` |
| 161 | +* `tt-users-search` |
| 162 | + |
| 163 | +All tools have an `--output-file` argument. If omitted, the standard output is used. |
| 164 | + |
| 165 | +Additionally, all tools also have a `--resume` flag to indicate that you want to append data to an existing output file instead of truncating it. Beware that this option does not de-duplicate existing data. |
| 166 | + |
| 167 | +Example usage: |
| 168 | + |
| 169 | + tt-users-get-hydrated --user-ids user_ids.txt --screen-names screen_names.txt |
| 170 | + tt-users-get-followers --user-id 54252345 |
| 171 | + tt-users-get-friends --screen-name insight_centre --resume |
| 172 | + tt-users-search --query "rte" --output-file users.json |
| 173 | + |
| 174 | +## Tools for Bulk Processing |
| 175 | + |
| 176 | +* `tt-tweets-bulk-get-retweets` |
| 177 | +* `tt-tweets-bulk-get-timeline` |
| 178 | +* `tt-tweets-bulk-search` |
| 179 | +* `tt-users-bulk-get-followers` |
| 180 | +* `tt-users-bulk-get-friends` |
| 181 | +* `tt-users-bulk-search` |
| 182 | + |
| 183 | +All tools have an `--output-dir` argument. The directory is automatically created if not found. Some tools support resuming the bulk processing according to existing files in the output directory. |
| 184 | + |
| 185 | +Example usage: |
| 186 | + |
| 187 | + tt-tweets-bulk-get-retweets --output-dir retweets --tweet-ids tweet_ids.txt |
| 188 | + tt-tweets-bulk-get-timeline --output-dir timelines --screen-names screen_names.txt |
| 189 | + tt-tweets-bulk-search --output-dir searches --queries queries.txt |
| 190 | + tt-users-bulk-get-followers --output-dir followers --user-ids user_ids.txt |
| 191 | + tt-users-bulk-get-friends --output-dir friends --screen_names screen_names.txt |
| 192 | + tt-users-bulk-search --output-dir searches --queries queries.txt |
| 193 | + |
| 194 | +## Toolbox API |
| 195 | + |
| 196 | +The Twitter toolbox is contained in the `twtoolbox` module. The above command-line tools are actually wrappers around the functions listed below. The same semantics are used, including reading the configuration file. |
| 197 | + |
| 198 | +### Streaming API |
| 199 | + |
| 200 | +The following functions are available in the `streaming` submodule: |
| 201 | + |
| 202 | +* `get_sample(writer)` |
| 203 | +* `get_filter(writer, follow=None, track=None, locations=None)` |
| 204 | +* `get_firehose(writer)` |
| 205 | + |
| 206 | +Example usage: |
| 207 | + |
| 208 | +```python |
| 209 | +from twtoolbox import streaming |
| 210 | + |
| 211 | +with open("tweets.json", "w") as writer: |
| 212 | + streaming.filter(writer, track=["obama"]) |
| 213 | +``` |
| 214 | + |
| 215 | +### Tweets |
| 216 | + |
| 217 | +The following functions are available in the `tweets` submodule: |
| 218 | + |
| 219 | +* `get_hydrated(writer, tweet_ids)` |
| 220 | +* `get_retweets(writer, tweet_id)` |
| 221 | +* `get_timeline(writer, user_id=None, screen_name=None, since_id=0)` |
| 222 | +* `search(writer, query, since_id=0)` |
| 223 | +* `bulk_get_retweets(output_dir, tweet_ids)` |
| 224 | +* `bulk_get_timeline(output_dir, user_ids=None, screen_names=None)` |
| 225 | +* `bulk_search(output_dir, queries)` |
| 226 | + |
| 227 | +Example usage: |
| 228 | + |
| 229 | +```python |
| 230 | +from twtoolbox import tweets |
| 231 | + |
| 232 | +with open("tweets.json", "w") as writer: |
| 233 | + tweets.search(writer, query="twitter api") |
| 234 | + |
| 235 | +tweets.bulk_get_retweets("retweets", [768585599271993344, 768585794458120192]) |
| 236 | +``` |
| 237 | + |
| 238 | +### Users |
| 239 | + |
| 240 | +The following functions are available in the `users` submodule: |
| 241 | + |
| 242 | +* `get_hydrated(writer, user_ids=None, screen_names=None)` |
| 243 | +* `get_followers(writer, user_id=None, screen_name=None)` |
| 244 | +* `get_friends(writer, user_id=None, screen_name=None)` |
| 245 | +* `search(writer, query)` |
| 246 | +* `bulk_get_followers(output_dir, user_ids=None, screen_names=None)` |
| 247 | +* `bulk_get_friends(output_dir, user_ids=None, screen_names=None)` |
| 248 | +* `bulk_search(output_dir, queries)` |
| 249 | + |
| 250 | +Example usage: |
| 251 | + |
| 252 | +```python |
| 253 | +from twtoolbox import users |
| 254 | + |
| 255 | +with open("followers.txt", "w") as writer: |
| 256 | + users.get_followers(writer, screen_name="twitter") |
| 257 | + |
| 258 | +users.bulk_get_friends("friends", user_ids=[1635345, 645648754]) |
| 259 | +``` |
| 260 | + |
| 261 | +## License |
| 262 | + |
| 263 | +This software is under the **Apache License 2.0**. |
| 264 | + |
| 265 | + Licensed under the Apache License, Version 2.0 (the "License"); |
| 266 | + you may not use this file except in compliance with the License. |
| 267 | + You may obtain a copy of the License at |
| 268 | + |
| 269 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 270 | + |
| 271 | + Unless required by applicable law or agreed to in writing, software |
| 272 | + distributed under the License is distributed on an "AS IS" BASIS, |
| 273 | + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 274 | + See the License for the specific language governing permissions and |
| 275 | + limitations under the License. |
| 276 | + |
0 commit comments