Jobs scraper library for LinkedIn, Indeed, Glassdoor & ZipRecruiter
Go to file
Cullen Watson 286b9e1256 chore: version number 2023-09-21 20:28:57 -05:00
.github/workflows Library Migration (#31) 2023-09-03 09:29:25 -05:00
src fix: job type param bug 2023-09-21 17:42:24 -05:00
.gitignore Validation error (#35) 2023-09-03 20:05:31 -05:00
JobSpy_Demo.ipynb docs: update typo in example 2023-09-07 13:37:53 -05:00
LICENSE docs: Create LICENSE 2023-08-26 18:47:48 -05:00
README.md docs: add usejobspy.com 2023-09-21 20:27:04 -05:00
poetry.lock Library Migration (#31) 2023-09-03 09:29:25 -05:00
pyproject.toml chore: version number 2023-09-21 20:28:57 -05:00

README.md

JobSpy is a simple, yet comprehensive, job scraping library.

Not technical? Try out the web scraping tool on our site at usejobspy.com.

Looking to build a data-focused software product? Book a call to work with us.

Check out another project we wrote: HomeHarvest a Python package for real estate scraping

Features

  • Scrapes job postings from LinkedIn, Indeed & ZipRecruiter simultaneously
  • Aggregates the job postings in a Pandas DataFrame
  • Proxy support (HTTP/S, SOCKS)

Video Guide for JobSpy - Updated for release v1.1.3

jobspy

Installation

pip install --upgrade python-jobspy

Python version >= 3.10 required

Usage

from jobspy import scrape_jobs
import pandas as pd

jobs: pd.DataFrame = scrape_jobs(
    site_name=["indeed", "linkedin", "zip_recruiter"],
    search_term="software engineer",
    location="Dallas, TX",
    results_wanted=10,
    
    country_indeed='USA' # only needed for indeed
    
    # use if you want to use a proxy (3 types)
    # proxy="socks5://jobspy:5a4vpWtj8EeJ2hoYzk@ca.smartproxy.com:20001",
    # proxy="http://jobspy:5a4vpWtj8EeJ2hoYzk@ca.smartproxy.com:20001",
    # proxy="https://jobspy:5a4vpWtj8EeJ2hoYzk@ca.smartproxy.com:20001",
)

# formatting for pandas
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)  # set to 0 to see full job url / desc

#1 display in Jupyter Notebook (1. pip install jupyter 2. jupyter notebook)
display(jobs)

#2 output to console
#print(jobs)

#3 output to .csv
#jobs.to_csv('jobs.csv', index=False)

Output

SITE           TITLE                             COMPANY_NAME      CITY          STATE  JOB_TYPE  INTERVAL  MIN_AMOUNT  MAX_AMOUNT  JOB_URL                                            DESCRIPTION
indeed         Software Engineer                 AMERICAN SYSTEMS  Arlington     VA     None      yearly    200000      150000      https://www.indeed.com/viewjob?jk=5e409e577046...  THIS POSITION COMES WITH A 10K SIGNING BONUS!...
indeed         Senior Software Engineer          TherapyNotes.com  Philadelphia  PA     fulltime  yearly    135000      110000      https://www.indeed.com/viewjob?jk=da39574a40cb...  About Us TherapyNotes is the national leader i...
linkedin       Software Engineer - Early Career  Lockheed Martin   Sunnyvale     CA     fulltime  yearly    None        None        https://www.linkedin.com/jobs/view/3693012711      Description:By bringing together people that u...
linkedin       Full-Stack Software Engineer      Rain              New York      NY     fulltime  yearly    None        None        https://www.linkedin.com/jobs/view/3696158877      Rains mission is to create the fastest and ea...
zip_recruiter Software Engineer - New Grad       ZipRecruiter      Santa Monica  CA     fulltime  yearly    130000      150000      https://www.ziprecruiter.com/jobs/ziprecruiter...  We offer a hybrid work environment. Most US-ba...
zip_recruiter Software Developer                 TEKsystems        Phoenix       AZ     fulltime  hourly    65          75          https://www.ziprecruiter.com/jobs/teksystems-0...  Top Skills' Details• 6 years of Java developme...

Parameters for scrape_jobs()

Required
├── site_type (List[enum]): linkedin, zip_recruiter, indeed
└── search_term (str)
Optional
├── location (int)
├── distance (int): in miles
├── job_type (enum): fulltime, parttime, internship, contract
├── proxy (str): in format 'http://user:pass@host:port' or [https, socks]
├── is_remote (bool)
├── results_wanted (int): number of job results to retrieve for each site specified in 'site_type'
├── easy_apply (bool): filters for jobs that are hosted on LinkedIn
├── country_indeed (enum): filters the country on Indeed (see below for correct spelling)

JobPost Schema

JobPost
├── title (str)
├── company (str)
├── job_url (str)
├── location (object)
│   ├── country (str)
│   ├── city (str)
│   ├── state (str)
├── description (str)
├── job_type (enum): fulltime, parttime, internship, contract
├── compensation (object)
│   ├── interval (enum): yearly, monthly, weekly, daily, hourly
│   ├── min_amount (int)
│   ├── max_amount (int)
│   └── currency (enum)
└── date_posted (date)

Exceptions

The following exceptions may be raised when using JobSpy:

  • LinkedInException
  • IndeedException
  • ZipRecruiterException

Supported Countries for Job Searching

LinkedIn

LinkedIn searches globally & uses only the location parameter.

ZipRecruiter

ZipRecruiter searches for jobs in US/Canada & uses only the location parameter.

Indeed

Indeed supports most countries, but the country_indeed parameter is required. Additionally, use the location parameter to narrow down the location, e.g. city & state if necessary.

You can specify the following countries when searching on Indeed (use the exact name):

Argentina Australia Austria Bahrain
Belgium Brazil Canada Chile
China Colombia Costa Rica Czech Republic
Denmark Ecuador Egypt Finland
France Germany Greece Hong Kong
Hungary India Indonesia Ireland
Israel Italy Japan Kuwait
Luxembourg Malaysia Mexico Morocco
Netherlands New Zealand Nigeria Norway
Oman Pakistan Panama Peru
Philippines Poland Portugal Qatar
Romania Saudi Arabia Singapore South Africa
South Korea Spain Sweden Switzerland
Taiwan Thailand Turkey Ukraine
United Arab Emirates UK USA Uruguay
Venezuela Vietnam

Frequently Asked Questions


Q: Encountering issues with your queries?
A: Try reducing the number of results_wanted and/or broadening the filters. If problems persist, submit an issue.


Q: Received a response code 429?
A: This indicates that you have been blocked by the job board site for sending too many requests. Currently, LinkedIn is particularly aggressive with blocking. We recommend:

  • Waiting a few seconds between requests.
  • Trying a VPN or proxy to change your IP address.

Q: Experiencing a "Segmentation fault: 11" on macOS Catalina?
A: This is due to tls_client dependency not supporting your architecture. Solutions and workarounds include:

  • Upgrade to a newer version of MacOS
  • Reach out to the maintainers of tls_client for fixes