- Common Crawl: an open repository of web crawl data that can be accessed and analyzed by anyone.
- link: https://github.com/BruceDone/awesome-crawler
- author: Bruce Tang
- note: a collection of awesome web crawler,spider in different languages
- link: https://github.com/shengqiangzhang/examples-of-web-crawlers
- author: Shengqiang Zhang
- note: some interesting examples of python crawlers that are friendly to beginners.
- link: https://github.com/xianyucoder/Crack-JS
- blog: http://xianyucoder.cn/
- author: huangjin
- note: Python3爬虫项目进阶实战.
- link: https://github.com/striver-ing/wechat-spider
- author: striver-ing
- note: 开源微信爬虫:爬取公众号所有 文章、阅读量、点赞量和评论内容.
- link: https://github.com/crawlab-team/crawlab
- author: Crawlab Team
- note: 分布式爬虫管理平台,支持任何语言和框架.
- link: https://github.com/PhosphorylatedRabbits/paperscraper
- author: PhosphorylatedRabbits
- note: tools to scrape publication metadata from pubmed, arxiv, medrxiv and chemrxiv.
- link: https://github.com/liuyixin-louis/arxiv2latex
- author: Yixin Liu
- note: download the source latex code of multiple arxiv paper with one click.
- link: https://github.com/lixi5338619/magical_spider
- author: 李玺
- note: 神奇的蜘蛛🕷,一个几乎适用于所有web端站点的采集方案.
- link: https://github.com/codelucas/newspaper
- author: Lucas Ou-Yang
- note: news, full-text, and article metadata extraction in python 3.