Skip to content

A Python-based tool for scraping company information from Crunchbase.

Notifications You must be signed in to change notification settings

afk-procrastinator/crunchbase-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crunchbase Company Scraper

A Python-based tool for scraping company information from Crunchbase.

Features

  • 🔐 Automatic and manual login support
  • 📋 Batch scraping from company list
  • 🤖 Anti-detection measures with randomized delays
  • 💾 CSV export with detailed company information
  • 💱 Currency conversion
  • 🌐 Proxy support via Selenium

Data Points Collected

  • Company name and legal name
  • About/Description
  • Funding information
  • Location
  • Employee count
  • Company type (Public/Private)
  • Website
  • Year founded
  • Company ranking
  • Acquisitions count
  • Investments count
  • Exits count
  • Stock symbol
  • Operating status

Prerequisites

  • Python 3.8+
  • Chrome browser
  • Crunchbase account

Installation

  1. Clone the repository:
git clone https://github.com/afk-procrastinator/crunchbase-scraper
cd crunchbase-scraper
  1. Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables:
cp .env.template .env

Edit .env with your Crunchbase credentials:

[email protected]
CRUNCHBASE_PASSWORD=your-password

Usage

  1. Create a list of companies to scrape in company_list.txt, separated by newlines:
Company Name 1
Company Name 2
  1. Run the scraper:
python main.py

The script will:

  • Log in to Crunchbase
  • Process each company in the list
  • Save results to companies.csv

Project Structure

├── src/
│   ├── auth.py         # Authentication handling
│   ├── models.py       # Data models
│   ├── scraper.py      # Core scraping logic
│   ├── selectors.py    # CSS selectors
│   └── utils.py        # Utility functions
├── main.py             # Entry point
├── requirements.txt    # Dependencies
├── .env.template       # Environment template
└── company_list.txt    # Input companies

Error Handling

  • The scraper includes automatic retry logic for failed requests
  • Manual login fallback if automatic login fails
  • Graceful handling of missing data points

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Disclaimer

This tool is for educational purposes only. Please review and comply with Crunchbase's terms of service before use.

About

A Python-based tool for scraping company information from Crunchbase.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages