JobSpy/README.md

178 lines
8.2 KiB
Markdown
Raw Normal View History

2023-09-04 20:58:46 -07:00
<img src="https://github.com/cullenwatson/JobSpy/assets/78247585/ae185b7e-e444-4712-8bb9-fa97f53e896b" width="400">
2023-07-10 20:14:38 -07:00
2023-09-03 07:29:25 -07:00
**JobSpy** is a simple, yet comprehensive, job scraping library.
2023-09-15 11:51:22 -07:00
2023-09-21 18:26:04 -07:00
**Not technical?** Try out the web scraping tool on our site at [usejobspy.com](https://usejobspy.com).
2023-11-06 21:13:19 -08:00
*Looking to build a data-focused software product?* **[Book a call](https://bunsly.com/)** *to
2023-11-30 10:49:31 -08:00
work with us.*
2023-07-10 20:14:38 -07:00
## Features
2023-09-04 20:52:21 -07:00
2023-10-30 17:57:36 -07:00
- Scrapes job postings from **LinkedIn**, **Indeed**, **Glassdoor**, & **ZipRecruiter** simultaneously
2023-09-03 07:29:25 -07:00
- Aggregates the job postings in a Pandas DataFrame
2024-02-14 14:04:23 -08:00
- Proxy support
[Video Guide for JobSpy](https://www.youtube.com/watch?v=RuP1HrAZnxs&pp=ygUgam9icyBzY3JhcGVyIGJvdCBsaW5rZWRpbiBpbmRlZWQ%3D) -
Updated for release v1.1.3
2023-09-06 09:26:55 -07:00
2023-09-03 18:05:31 -07:00
![jobspy](https://github.com/cullenwatson/JobSpy/assets/78247585/ec7ef355-05f6-4fd3-8161-a817e31c5c57)
2023-09-03 07:29:25 -07:00
### Installation
2023-09-05 11:03:32 -07:00
```
2024-03-08 23:40:01 -08:00
pip install -U python-jobspy
2023-09-05 11:03:32 -07:00
```
_Python version >= [3.10](https://www.python.org/downloads/release/python-3100/) required_
2023-09-03 07:29:25 -07:00
### Usage
```python
import csv
2023-09-03 10:30:13 -07:00
from jobspy import scrape_jobs
2023-09-03 07:29:25 -07:00
jobs = scrape_jobs(
2023-10-30 17:57:36 -07:00
site_name=["indeed", "linkedin", "zip_recruiter", "glassdoor"],
2023-09-07 11:35:10 -07:00
search_term="software engineer",
location="Dallas, TX",
results_wanted=20,
2024-03-09 11:40:34 -08:00
hours_old=72, # (only Linkedin/Indeed is hour specific, others round up to days old)
2023-10-30 17:57:36 -07:00
country_indeed='USA' # only needed for indeed / glassdoor
2023-09-03 07:29:25 -07:00
)
print(f"Found {len(jobs)} jobs")
print(jobs.head())
jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False) # to_xlsx
2023-09-03 07:29:25 -07:00
```
### Output
2023-09-03 07:29:25 -07:00
```
2024-03-11 12:45:17 -07:00
SITE TITLE COMPANY CITY STATE JOB_TYPE INTERVAL MIN_AMOUNT MAX_AMOUNT JOB_URL DESCRIPTION
2023-09-03 16:11:18 -07:00
indeed Software Engineer AMERICAN SYSTEMS Arlington VA None yearly 200000 150000 https://www.indeed.com/viewjob?jk=5e409e577046... THIS POSITION COMES WITH A 10K SIGNING BONUS!...
indeed Senior Software Engineer TherapyNotes.com Philadelphia PA fulltime yearly 135000 110000 https://www.indeed.com/viewjob?jk=da39574a40cb... About Us TherapyNotes is the national leader i...
linkedin Software Engineer - Early Career Lockheed Martin Sunnyvale CA fulltime yearly None None https://www.linkedin.com/jobs/view/3693012711 Description:By bringing together people that u...
linkedin Full-Stack Software Engineer Rain New York NY fulltime yearly None None https://www.linkedin.com/jobs/view/3696158877 Rains mission is to create the fastest and ea...
zip_recruiter Software Engineer - New Grad ZipRecruiter Santa Monica CA fulltime yearly 130000 150000 https://www.ziprecruiter.com/jobs/ziprecruiter... We offer a hybrid work environment. Most US-ba...
zip_recruiter Software Developer TEKsystems Phoenix AZ fulltime hourly 65 75 https://www.ziprecruiter.com/jobs/teksystems-0... Top Skills' Details• 6 years of Java developme...
2023-09-03 07:29:25 -07:00
```
2023-09-03 07:29:25 -07:00
### Parameters for `scrape_jobs()`
2023-08-28 10:36:54 -07:00
```plaintext
2023-08-27 14:52:27 -07:00
Optional
├── site_name (list|str): linkedin, zip_recruiter, indeed, glassdoor (default is all four)
2024-03-11 12:52:20 -07:00
├── search_term (str)
2024-03-08 23:40:01 -08:00
├── location (str)
├── distance (int): in miles, default 50
2024-03-11 12:52:20 -07:00
├── job_type (str): fulltime, parttime, internship, contract
2024-02-14 14:04:23 -08:00
├── proxy (str): in format 'http://user:pass@host:port'
2023-08-27 14:52:27 -07:00
├── is_remote (bool)
├── results_wanted (int): number of job results to retrieve for each site specified in 'site_name'
2024-03-11 19:23:20 -07:00
├── easy_apply (bool): filters for jobs that are hosted on the job board site (LinkedIn & Indeed do not allow pairing this with hours_old)
2024-03-11 12:52:20 -07:00
├── linkedin_fetch_description (bool): fetches full description for LinkedIn (slower)
2024-03-12 18:46:25 -07:00
├── linkedin_company_ids (list[int]): searches for linkedin jobs with specific company ids
├── description_format (str): markdown, html (Format type of the job descriptions. Default is markdown.)
2024-03-11 12:52:20 -07:00
├── country_indeed (str): filters the country on Indeed (see below for correct spelling)
├── offset (int): starts the search from an offset (e.g. 25 will start the search from the 25th result)
2024-03-11 19:30:57 -07:00
├── hours_old (int): filters jobs by the number of hours since the job was posted (ZipRecruiter and Glassdoor round up to next day. If you use this on Indeed, it will not filter by job_type/is_remote/easy_apply)
├── verbose (int) {0, 1, 2}: Controls the verbosity of the runtime printouts (0 prints only errors, 1 is errors+warnings, 2 is all logs. Default is 2.)
├── hyperlinks (bool): Whether to turn `job_url`s into hyperlinks. Default is false.
```
2023-09-03 07:29:25 -07:00
2023-09-03 16:11:18 -07:00
### JobPost Schema
2023-08-26 18:30:00 -07:00
```plaintext
2023-09-03 07:29:25 -07:00
JobPost
├── title (str)
2023-09-05 11:03:32 -07:00
├── company (str)
2024-02-02 15:59:24 -08:00
├── company_url (str)
2023-09-03 07:29:25 -07:00
├── job_url (str)
├── location (object)
│ ├── country (str)
│ ├── city (str)
│ ├── state (str)
├── description (str)
├── job_type (str): fulltime, parttime, internship, contract
2023-09-03 07:29:25 -07:00
├── compensation (object)
│ ├── interval (str): yearly, monthly, weekly, daily, hourly
2023-09-05 10:17:22 -07:00
│ ├── min_amount (int)
│ ├── max_amount (int)
2023-09-05 11:03:32 -07:00
│ └── currency (enum)
└── date_posted (date)
└── emails (str)
2023-10-18 12:32:21 -07:00
└── is_remote (bool)
2024-03-08 23:40:01 -08:00
Indeed specific
├── company_country (str)
└── company_addresses (str)
└── company_industry (str)
└── company_employees_label (str)
└── company_revenue_label (str)
└── company_description (str)
└── ceo_name (str)
└── ceo_photo_url (str)
└── logo_photo_url (str)
└── banner_photo_url (str)
2023-08-28 10:15:13 -07:00
```
2023-09-05 10:17:22 -07:00
## Supported Countries for Job Searching
### **LinkedIn**
2024-03-11 12:45:17 -07:00
LinkedIn searches globally & uses only the `location` parameter.
2023-09-05 10:17:22 -07:00
### **ZipRecruiter**
2023-09-07 11:46:14 -07:00
ZipRecruiter searches for jobs in **US/Canada** & uses only the `location` parameter.
2023-09-05 10:17:22 -07:00
### **Indeed / Glassdoor**
2023-10-30 17:57:36 -07:00
Indeed & Glassdoor supports most countries, but the `country_indeed` parameter is required. Additionally, use the `location`
parameter to narrow down the location, e.g. city & state if necessary.
2023-10-30 17:57:36 -07:00
You can specify the following countries when searching on Indeed (use the exact name, * indicates support for Glassdoor):
| | | | |
|----------------------|--------------|------------|----------------|
2023-10-30 17:57:36 -07:00
| Argentina | Australia* | Austria* | Bahrain |
| Belgium* | Brazil* | Canada* | Chile |
| China | Colombia | Costa Rica | Czech Republic |
| Denmark | Ecuador | Egypt | Finland |
2023-10-30 17:57:36 -07:00
| France* | Germany* | Greece | Hong Kong* |
| Hungary | India* | Indonesia | Ireland* |
| Israel | Italy* | Japan | Kuwait |
| Luxembourg | Malaysia | Mexico* | Morocco |
| Netherlands* | New Zealand* | Nigeria | Norway |
| Oman | Pakistan | Panama | Peru |
| Philippines | Poland | Portugal | Qatar |
2023-10-30 17:57:36 -07:00
| Romania | Saudi Arabia | Singapore* | South Africa |
| South Korea | Spain* | Sweden | Switzerland* |
| Taiwan | Thailand | Turkey | Ukraine |
2023-10-30 17:57:36 -07:00
| United Arab Emirates | UK* | USA* | Uruguay |
2024-03-04 15:35:57 -08:00
| Venezuela | Vietnam* | | |
2023-08-28 10:51:05 -07:00
2023-10-30 17:57:36 -07:00
2024-03-08 23:40:01 -08:00
## Notes
* Indeed is the best scraper currently with no rate limiting.
2024-03-11 12:41:12 -07:00
* All the job board endpoints are capped at around 1000 jobs on a given search.
2024-03-08 23:49:05 -08:00
* LinkedIn is the most restrictive and usually rate limits around the 10th page.
2023-09-03 18:05:31 -07:00
## Frequently Asked Questions
---
**Q: Encountering issues with your queries?**
**A:** Try reducing the number of `results_wanted` and/or broadening the filters. If problems
2023-10-18 12:32:21 -07:00
persist, [submit an issue](https://github.com/Bunsly/JobSpy/issues).
2023-09-03 18:05:31 -07:00
---
**Q: Received a response code 429?**
2023-10-10 09:54:14 -07:00
**A:** This indicates that you have been blocked by the job board site for sending too many requests. All of the job board sites are aggressive with blocking. We recommend:
2023-09-03 18:05:31 -07:00
2024-02-02 15:58:15 -08:00
- Waiting some time between scrapes (site-dependent).
2023-09-09 08:55:33 -07:00
- Trying a VPN or proxy to change your IP address.
2023-09-03 18:05:31 -07:00
2024-03-08 23:49:05 -08:00
---