Jobs scraper library for LinkedIn, Indeed, Glassdoor & ZipRecruiter
Go to file
Cullen Watson d72d14db02
Docs/readme (#16)
* docs: readme

* update readme

* docs: update readme

* docs: update readme

* docs(readme): colorcode json
2023-08-26 21:40:09 -05:00
.vscode chore: postman 2023-08-26 18:46:52 -05:00
api update resp schema (#15) 2023-08-26 20:30:00 -05:00
postman chore: postman 2023-08-26 18:46:52 -05:00
.gitignore feat(auth): add auth to jobs endpoint 2023-07-10 22:23:05 -05:00
LICENSE docs: Create LICENSE 2023-08-26 18:47:48 -05:00
README.md Docs/readme (#16) 2023-08-26 21:40:09 -05:00
main.py chore: clean up 2023-08-19 18:46:03 -05:00
requirements.txt feat(users): add register route 2023-07-10 22:23:16 -05:00
settings.py feat: optional auth 2023-08-19 20:31:10 -05:00

README.md

JobSpy AIO Scraper

Features

  • Scrapes job postings from LinkedIn, Indeed & ZipRecruiter simultaneously
  • Returns jobs with title, location, company, description & other data
  • Optional JWT authorization

API

POST /api/v1/jobs/

Request Schema

Request
├── Required
│   ├── site_type (List[enum]): linkedin, zip_recruiter, indeed
│   └── search_term (str)
└── Optional
    ├── location (int)
    ├── distance (int)
    ├── job_type (enum): fulltime, parttime, internship, contract
    ├── is_remote (bool)
    ├── results_wanted (int): per site_type
    └── easy_apply (bool): only for linkedin

Request Example

{
  "site_type": ["indeed", "linkedin"],
  "search_term": "software engineer",
  "location": "austin, tx",
  "distance": 10,
  "job_type": "fulltime",
  "results_wanted": 15
}

Response Schema

site_type (enum)
└── response (SiteResponse)
    ├── success (bool)
    ├── error (str)
    ├── jobs (List[JobPost])
    │   └── JobPost
    │       ├── title (str)
    │       ├── company_name (str)
    │       ├── job_url (str)
    │       ├── location (object)
    │       │   ├── country (str)
    │       │   ├── city (str)
    │       │   ├── state (str)
    │       │   ├── postal_code (str)
    │       │   └── address (str)
    │       ├── description (str)
    │       ├── job_type (enum)
    │       ├── compensation (object)
    │       │   ├── interval (CompensationInterval): yearly, monthly, weekly, daily, hourly
    │       │   ├── min_amount (float)
    │       │   ├── max_amount (float)
    │       │   └── currency (str): default is "US"
    │       └── date_posted (datetime)
    ├── total_results (int)
    └── returned_results (int)

Response Example

{
    "indeed": {
        "success": true,
        "error": null,
        "jobs": [
            {
                "title": "Software Engineer",
                "company_name": "INTEL",
                "job_url": "https://www.indeed.com/jobs/viewjob?jk=a2cfbb98d2002228",
                "location": {
                    "country": "US",
                    "city": "Austin",
                    "state": "TX",
                    "postal_code": null,
                    "address": null
                },
                "description": "Job Description Designs, develops, tests, and debugs..."
                "job_type": "fulltime",
                "compensation": {
                    "interval": "yearly",
                    "min_amount": 209760.0,
                    "max_amount": 139480.0,
                    "currency": "USD"
                },
                "date_posted": "2023-08-18T00:00:00"
            }, ...
          ]
    "linkedin": {
        "success": true,
        "error": null,
        "jobs": [
            {
                "title": "Software Engineer 1",
                "company_name": "Public Partnerships | PPL",
                "job_url": "https://www.linkedin.com/jobs/view/3690013792",
                "location": {
                    "country": "US",
                    "city": "Austin",
                    "state": "TX",
                    "postal_code": null,
                    "address": null
                },
                "description": "Public Partnerships LLC supports individuals with disabilities..."
                "job_type": null,
                "compensation": null,
                "date_posted": "2023-07-31T00:00:00"
            }, ...
        ],
        "total_results": 2000,
        "returned_results": 15
    }
}

Installation

Python >= 3.10 required

  1. Clone this repository git clone https://github.com/cullenwatson/jobspy
  2. Install the dependencies with pip install -r requirements.txt
  3. Run the server with uvicorn main:app --reload

Usage

Swagger UI:

To interact with the API documentation, navigate to localhost:8000/docs.

Postman:

To use Postman:

  1. Locate the files in the /postman/ directory.
  2. Import the Postman collection and environment JSON files.

FAQ

I'm having issues with my queries. What should I do?

Broadening your filters can often help. Additionally, try reducing the number of results_wanted.
If issues still persist, feel free to submit an issue.

How to enable auth?

Change AUTH_REQUIRED in /settings.py to True

The auth uses supabase. Create a project with a users table and disable RLS.

Add these three environment variables:

  • SUPABASE_URL: go to project settings -> API -> Project URL
  • SUPABASE_KEY: go to project settings -> API -> service_role secret
  • JWT_SECRET_KEY - type openssl rand -hex 32 in terminal to create a 32 byte secret key

Use these endpoints to register and get an access token:

image