Jobs scraper library for LinkedIn, Indeed, Glassdoor & ZipRecruiter
Go to file
Cullen Watson 59f739018a
Proxy support (#44)
* add proxy support

* return as data frame
2023-09-07 11:28:17 -05:00
.github/workflows Library Migration (#31) 2023-09-03 09:29:25 -05:00
src Proxy support (#44) 2023-09-07 11:28:17 -05:00
.gitignore Validation error (#35) 2023-09-03 20:05:31 -05:00
JobSpy_Demo.ipynb Proxy support (#44) 2023-09-07 11:28:17 -05:00
LICENSE docs: Create LICENSE 2023-08-26 18:47:48 -05:00
README.md Proxy support (#44) 2023-09-07 11:28:17 -05:00
poetry.lock Library Migration (#31) 2023-09-03 09:29:25 -05:00
pyproject.toml Proxy support (#44) 2023-09-07 11:28:17 -05:00

README.md

JobSpy is a simple, yet comprehensive, job scraping library.

Features

  • Scrapes job postings from LinkedIn, Indeed & ZipRecruiter simultaneously
  • Aggregates the job postings in a Pandas DataFrame

Video Guide for JobSpy

jobspy

Installation

pip install python-jobspy

Python version >= 3.10 required

Usage

from jobspy import scrape_jobs
import pandas as pd

jobs: pd.DataFrame = scrape_jobs(
    site_name=["indeed", "linkedin", "zip_recruiter"],
    search_term="software engineer",
    location="Dallas, TX",
    results_wanted=10,
    
    country_indeed='USA' # only needed for indeed
    
    # use if you want to use a proxy
    # proxy="socks5://jobspy:5a4vpWtj8EeJ2hoYzk@ca.smartproxy.com:20001",
    # proxy="http://jobspy:5a4vpWtj8EeJ2hoYzk@ca.smartproxy.com:20001",
    # proxy="https://jobspy:5a4vpWtj8EeJ2hoYzk@ca.smartproxy.com:20001",
)

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)  # set to 0 to see full job url / desc

#1 output
print(jobs)
print(errors)

#2 display in Jupyter Notebook
#display(jobs)
#display(errors)

#3 output to .csv
#result.jobs.to_csv('result.jobs.csv', index=False)

Output

SITE           TITLE                             COMPANY_NAME      CITY          STATE  JOB_TYPE  INTERVAL  MIN_AMOUNT  MAX_AMOUNT  JOB_URL                                            DESCRIPTION
indeed         Software Engineer                 AMERICAN SYSTEMS  Arlington     VA     None      yearly    200000      150000      https://www.indeed.com/viewjob?jk=5e409e577046...  THIS POSITION COMES WITH A 10K SIGNING BONUS!...
indeed         Senior Software Engineer          TherapyNotes.com  Philadelphia  PA     fulltime  yearly    135000      110000      https://www.indeed.com/viewjob?jk=da39574a40cb...  About Us TherapyNotes is the national leader i...
linkedin       Software Engineer - Early Career  Lockheed Martin   Sunnyvale     CA     fulltime  yearly    None        None        https://www.linkedin.com/jobs/view/3693012711      Description:By bringing together people that u...
linkedin       Full-Stack Software Engineer      Rain              New York      NY     fulltime  yearly    None        None        https://www.linkedin.com/jobs/view/3696158877      Rains mission is to create the fastest and ea...
zip_recruiter Software Engineer - New Grad       ZipRecruiter      Santa Monica  CA     fulltime  yearly    130000      150000      https://www.ziprecruiter.com/jobs/ziprecruiter...  We offer a hybrid work environment. Most US-ba...
zip_recruiter Software Developer                 TEKsystems        Phoenix       AZ     fulltime  hourly    65          75          https://www.ziprecruiter.com/jobs/teksystems-0...  Top Skills' Details• 6 years of Java developme...

Parameters for scrape_jobs()

Required
├── site_type (List[enum]): linkedin, zip_recruiter, indeed
└── search_term (str)
Optional
├── location (int)
├── distance (int): in miles
├── job_type (enum): fulltime, parttime, internship, contract
├── is_remote (bool)
├── results_wanted (int): number of job results to retrieve for each site specified in 'site_type'
├── easy_apply (bool): filters for jobs that are hosted on LinkedIn
├── country_indeed (enum): filters the country on Indeed

JobPost Schema

JobPost
├── title (str)
├── company (str)
├── job_url (str)
├── location (object)
│   ├── country (str)
│   ├── city (str)
│   ├── state (str)
├── description (str)
├── job_type (enum): fulltime, parttime, internship, contract
├── compensation (object)
│   ├── interval (enum): yearly, monthly, weekly, daily, hourly
│   ├── min_amount (int)
│   ├── max_amount (int)
│   └── currency (enum)
└── date_posted (date)

Supported Countries for Job Searching

LinkedIn

LinkedIn searches globally & uses only the location parameter

ZipRecruiter

ZipRecruiter searches for jobs in US/Canada & uses only the location parameter

Indeed

For Indeed, the country_indeed parameter is required. Additionally, use the location parameter and include the city or state if necessary.

You can specify the following countries when searching on Indeed (use the exact name):

Argentina Australia Austria Bahrain
Belgium Brazil Canada Chile
China Colombia Costa Rica Czech Republic
Denmark Ecuador Egypt Finland
France Germany Greece Hong Kong
Hungary India Indonesia Ireland
Israel Italy Japan Kuwait
Luxembourg Malaysia Mexico Morocco
Netherlands New Zealand Nigeria Norway
Oman Pakistan Panama Peru
Philippines Poland Portugal Qatar
Romania Saudi Arabia Singapore South Africa
South Korea Spain Sweden Switzerland
Taiwan Thailand Turkey Ukraine
United Arab Emirates UK USA Uruguay
Venezuela Vietnam

Frequently Asked Questions


Q: Encountering issues with your queries?
A: Try reducing the number of results_wanted and/or broadening the filters. If problems persist, submit an issue.


Q: Received a response code 429?
A: This indicates that you have been blocked by the job board site for sending too many requests. Currently, ZipRecruiter is particularly aggressive with blocking. We recommend:

  • Waiting a few seconds between requests.
  • Trying a VPN to change your IP address.

Note: Proxy support is in development and coming soon!