A project to generate point of interest (POI) data sourced from websites with 'store location' pages. The project uses scrapy
, a popular Python-based web scraping framework, to execute individual site spiders that retrieve POI data, publishing the results in a standard format. There are various scrapy
tutorials on the Internet and this series on YouTube is reasonable.
Windows users may need to follow some extra steps, please follow the scrapy docs for up to date details.
These instructions were tested with Ubuntu 24.04 LTS on 2024-02-21.
-
Install
uv
:curl -LsSf https://astral.sh/uv/install.sh | sh source $HOME/.local/bin/env
-
Clone a copy of the project from the All the Places repo (or your own fork if you are considering contributing to the project):
git clone [email protected]:alltheplaces/alltheplaces.git
-
Use
uv
to install the project dependencies:cd alltheplaces uv sync
-
Test for successful project installation:
uv run scrapy
If the above runs without complaint, then you have a functional installation and are ready to run and write spiders.
These instructions were tested with macOS 15.3.2 on 2025-04-01.
-
Install
uv
:brew install uv
-
Clone a copy of the project from the All the Places repo (or your own fork if you are considering contributing to the project):
git clone [email protected]:alltheplaces/alltheplaces.git
-
Use
uv
to install the project dependencies:cd alltheplaces uv sync
-
Test for successful project installation:
uv run scrapy
If the above runs without complaint, then you have a functional installation and are ready to run and write spiders.
You can use GitHub Codespaces to run the project. This is a cloud-based development environment that is created from the project's repository and includes a pre-configured environment with all the tools you need to develop the project. To use Codespaces, click the button below:
You can use Docker to run the project. This is a container-based development environment that is created from the project's repository and includes a pre-configured environment with all the tools you need to develop the project.
-
Clone a copy of the project from the All the Places repo (or your own fork if you are considering contributing to the project):
git clone [email protected]:alltheplaces/alltheplaces.git
-
Build the Docker image:
cd alltheplaces docker build -t alltheplaces .
-
Run the Docker container:
docker run --rm -it alltheplaces
Many of the sites provide their data in a standard format. Others export their data via simple APIs. We have a number of guides to help you develop spiders:
- What should I call my spider?
- Using Wikidata and the Name Suggestion Index
- Sitemaps make finding POI pages easier
- Data from many POI pages can be extracted without writing code
- What is expected in a pull request?
- What we do behind the scenes
The output from running the project is published on a regular cadence to our website: alltheplaces.xyz. You should not run all the spiders to pick up the output: the less the project "bothers" a website the more we will be tolerated.
Communication is primarily through tickets on the project GitHub issue tracker. Many contributors are also present on OSM US Slack, in particular we watch the #poi channel.
The data generated by our spiders is provided on our website and released under Creative Commons’ CC-0 waiver.
The spider software that produces this data (this repository) is licensed under the MIT license.