LinkedIn Scraper with Flask

This repository contains a small Flask web application that authenticates with LinkedIn, performs staff searches for specified companies, and provides CSV export of the results. It uses the staffspy library and the linkedin_api library.

Features

LinkedIn Login: Logs in to LinkedIn using credentials supplied by the user.
Scraping Staff Profiles:
- Searches for companies similar to a given target (using the linkedin_api).
- Scrapes staff profiles for either a single specified company or a list of discovered companies.
Scraping Specific Users: Optionally scrape details for specific user IDs.
CSV Export: Stores the scraped data in CSV files in the user's Downloads folder (or the corresponding path depending on the operating system).
Simple Flask UI: A minimal HTML interface to control the scraping process.

Installation

Clone the Repository:

git clone https://github.com/<YourGitHubUsername>/linkedin-flask-scraper.git
cd linkedin-flask-scraper

Create & Activate a Virtual Environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Unix systems
# or on Windows:
# venv\Scripts\activate

Install the Dependencies:

pip install -r requirements.txt

Your requirements.txt might look like:

Flask
staffspy
requests
beautifulsoup4
linkedin_api

Set up LinkedIn credentials (if needed):
- By default, the code attempts to log into LinkedIn with the email and password provided in the UI.
- If you encounter captchas often, you can configure a solver API key (e.g., CAPSOLVER or 2captcha) by passing them to the LinkedInAccount constructor.

Usage

Running the Application

Start the Flask server:
```
python app.py
```
The server will start on port 5001 by default, and the application will open automatically in your browser at http://127.0.0.1:5001.

Endpoints

Endpoint	Method	Description
`/`	GET	Splash page with a button to start.
`/login`	GET/POST	Displays login form. After POST with valid credentials, redirects to `/scrape`.
`/scrape`	GET/POST	Displays a form for specifying the search parameters. On POST, executes the scraping and redirects to `/results/<rows_saved>`.
`/results/<int:rows_saved>`	GET	Shows how many rows of data were saved as a result of the scraping operation.
`/shutdown`	POST	Shuts down the Flask server (handy for local usage).

Flow:

Visit the main page (/).
Click “Login” to go to /login.
Enter your LinkedIn credentials, redirect to /scrape.
Enter your search parameters:
- Company Name
- Industry & Limit (optional)
- Search Term (role or position, e.g., "engineer", "marketing", etc.)
- Location
- Whether to gather extra profile data
- Max results to scrape
- User IDs (comma-separated, if you have any specific LinkedIn user IDs to scrape directly)
After hitting Submit, you’ll be redirected to the /results/<int:rows_saved> page indicating how many rows were saved.
CSV files will be saved in your Downloads folder (or corresponding path if on Windows).

Configuration

The primary options to configure are:

LinkedIn Credentials: You will be prompted to enter your email and password.

Captcha Solver (Optional): If you need to solve captchas automatically, pass a solver API key to the LinkedInAccount constructor:

account = LinkedInAccount(
    username=email,
    password=password,
    solver_api_key="YOUR_CAPSOLVER_API_KEY",
    solver_service=SolverType.CAPSOLVER,
    session_file=str(session_file),  
    log_level=1
)

Download Path: By default, CSV files are saved to your system’s Downloads folder. If you need to customize this, adjust the get_downloads_path() function.

License

This project is licensed under the MIT License. You are free to use, modify, and distribute this software as needed.

Acknowledgments

Flask - Lightweight Python web framework.
staffspy - Library used for staff scraping.
linkedin_api - Library for interacting with LinkedIn unofficially.
BeautifulSoup - Library for parsing HTML and XML documents.
requests - HTTP library for Python.

Enjoy Scraping Responsibly!
Always comply with LinkedIn’s terms of service and scrape responsibly to avoid blocking.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LinkScrape_2.py		LinkScrape_2.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LinkedIn Scraper with Flask

Table of Contents

Features

Installation

Usage

Running the Application

Endpoints

Configuration

License

Acknowledgments

About

Releases

Packages

Languages

andydish/LinkScrape

Folders and files

Latest commit

History

Repository files navigation

LinkedIn Scraper with Flask

Table of Contents

Features

Installation

Usage

Running the Application

Endpoints

Configuration

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages