This repository contains a small Flask web application that authenticates with LinkedIn, performs staff searches for specified companies, and provides CSV export of the results. It uses the staffspy
library and the linkedin_api
library.
- LinkedIn Login: Logs in to LinkedIn using credentials supplied by the user.
- Scraping Staff Profiles:
- Searches for companies similar to a given target (using the
linkedin_api
). - Scrapes staff profiles for either a single specified company or a list of discovered companies.
- Searches for companies similar to a given target (using the
- Scraping Specific Users: Optionally scrape details for specific user IDs.
- CSV Export: Stores the scraped data in CSV files in the user's Downloads folder (or the corresponding path depending on the operating system).
- Simple Flask UI: A minimal HTML interface to control the scraping process.
-
Clone the Repository:
git clone https://github.com/<YourGitHubUsername>/linkedin-flask-scraper.git cd linkedin-flask-scraper
-
Create & Activate a Virtual Environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Unix systems # or on Windows: # venv\Scripts\activate
-
Install the Dependencies:
pip install -r requirements.txt
Your
requirements.txt
might look like:Flask staffspy requests beautifulsoup4 linkedin_api
-
Set up LinkedIn credentials (if needed):
- By default, the code attempts to log into LinkedIn with the email and password provided in the UI.
- If you encounter captchas often, you can configure a solver API key (e.g.,
CAPSOLVER
or 2captcha) by passing them to theLinkedInAccount
constructor.
- Start the Flask server:
python app.py
- The server will start on port
5001
by default, and the application will open automatically in your browser athttp://127.0.0.1:5001
.
Endpoint | Method | Description |
---|---|---|
/ |
GET | Splash page with a button to start. |
/login |
GET/POST | Displays login form. After POST with valid credentials, redirects to /scrape . |
/scrape |
GET/POST | Displays a form for specifying the search parameters. On POST, executes the scraping and redirects to /results/<rows_saved> . |
/results/<int:rows_saved> |
GET | Shows how many rows of data were saved as a result of the scraping operation. |
/shutdown |
POST | Shuts down the Flask server (handy for local usage). |
Flow:
- Visit the main page (
/
). - Click “Login” to go to
/login
. - Enter your LinkedIn credentials, redirect to
/scrape
. - Enter your search parameters:
- Company Name
- Industry & Limit (optional)
- Search Term (role or position, e.g., "engineer", "marketing", etc.)
- Location
- Whether to gather extra profile data
- Max results to scrape
- User IDs (comma-separated, if you have any specific LinkedIn user IDs to scrape directly)
- After hitting Submit, you’ll be redirected to the
/results/<int:rows_saved>
page indicating how many rows were saved. - CSV files will be saved in your Downloads folder (or corresponding path if on Windows).
The primary options to configure are:
- LinkedIn Credentials: You will be prompted to enter your email and password.
- Captcha Solver (Optional):
If you need to solve captchas automatically, pass a solver API key to the
LinkedInAccount
constructor:account = LinkedInAccount( username=email, password=password, solver_api_key="YOUR_CAPSOLVER_API_KEY", solver_service=SolverType.CAPSOLVER, session_file=str(session_file), log_level=1 )
- Download Path:
By default, CSV files are saved to your system’s Downloads folder. If you need to customize this, adjust the
get_downloads_path()
function.
This project is licensed under the MIT License. You are free to use, modify, and distribute this software as needed.
- Flask - Lightweight Python web framework.
- staffspy - Library used for staff scraping.
- linkedin_api - Library for interacting with LinkedIn unofficially.
- BeautifulSoup - Library for parsing HTML and XML documents.
- requests - HTTP library for Python.
Enjoy Scraping Responsibly!
Always comply with LinkedIn’s terms of service and scrape responsibly to avoid blocking.