JobSpy/README.md

165 lines
5.0 KiB
Markdown
Raw Normal View History

2023-08-26 12:41:33 -07:00
# JobSpy AIO Scraper
2023-07-10 20:14:38 -07:00
## Features
2023-08-26 12:41:33 -07:00
- Scrapes job postings from **LinkedIn**, **Indeed** & **ZipRecruiter** simultaneously
- Returns jobs with title, location, company, description & other data
- Optional JWT authorization
2023-08-19 18:41:46 -07:00
2023-08-19 16:44:16 -07:00
2023-08-26 12:41:33 -07:00
### API
2023-08-19 16:44:16 -07:00
2023-08-26 12:41:33 -07:00
POST `/api/v1/jobs/`
### Request Schema
2023-08-26 18:30:00 -07:00
```plaintext
Request
├── Required
│ ├── site_type (List[enum]): linkedin, zip_recruiter, indeed
│ └── search_term (str)
└── Optional
├── location (int)
├── distance (int)
├── job_type (enum): fulltime, parttime, internship, contract
├── is_remote (bool)
├── results_wanted (int): per site_type
└── easy_apply (bool): only for linkedin
2023-08-26 18:30:00 -07:00
```
### Request Example
```json
2023-08-26 12:41:33 -07:00
{
"site_type": ["indeed", "linkedin"],
"search_term": "software engineer",
"location": "austin, tx",
"distance": 10,
"job_type": "fulltime",
"results_wanted": 15
2023-08-26 12:41:33 -07:00
}
```
2023-08-19 16:44:16 -07:00
### Response Schema
2023-08-26 18:30:00 -07:00
```plaintext
site_type (enum)
└── response (SiteResponse)
├── success (bool)
├── error (str)
├── jobs (List[JobPost])
2023-08-26 18:30:00 -07:00
│ └── JobPost
│ ├── title (str)
│ ├── company_name (str)
│ ├── job_url (str)
│ ├── location (object)
│ │ ├── country (str)
│ │ ├── city (str)
│ │ ├── state (str)
│ ├── description (str)
│ ├── job_type (enum)
│ ├── compensation (object)
│ │ ├── interval (CompensationInterval): yearly, monthly, weekly, daily, hourly
│ │ ├── min_amount (float)
│ │ ├── max_amount (float)
│ │ └── currency (str): default is "US"
2023-08-26 18:30:00 -07:00
│ └── date_posted (datetime)
├── total_results (int)
└── returned_results (int)
```
2023-08-23 15:52:36 -07:00
### Response Example
```json
{
2023-08-26 18:30:00 -07:00
"indeed": {
"success": true,
"error": null,
"jobs": [
{
"title": "Software Engineer",
"company_name": "INTEL",
"job_url": "https://www.indeed.com/jobs/viewjob?jk=a2cfbb98d2002228",
"location": {
"country": "USA",
2023-08-26 18:30:00 -07:00
"city": "Austin",
"state": "TX",
},
"description": "Job Description Designs, develops, tests, and debugs..."
2023-08-26 18:30:00 -07:00
"job_type": "fulltime",
"compensation": {
"interval": "yearly",
"min_amount": 209760.0,
"max_amount": 139480.0,
"currency": "USD"
},
"date_posted": "2023-08-18T00:00:00"
}, ...
],
"total_results": 845,
"returned_results": 15
},
"linkedin": {
2023-08-26 18:30:00 -07:00
"success": true,
"error": null,
"jobs": [
{
"title": "Software Engineer 1",
"company_name": "Public Partnerships | PPL",
"job_url": "https://www.linkedin.com/jobs/view/3690013792",
2023-08-26 18:30:00 -07:00
"location": {
"country": "USA",
2023-08-26 18:30:00 -07:00
"city": "Austin",
"state": "TX",
},
"description": "Public Partnerships LLC supports individuals with disabilities..."
"job_type": null,
"compensation": null,
"date_posted": "2023-07-31T00:00:00"
}, ...
2023-08-26 18:30:00 -07:00
],
"total_results": 2000,
"returned_results": 15
2023-08-26 18:30:00 -07:00
}
}
```
2023-08-23 15:52:36 -07:00
2023-07-10 20:14:38 -07:00
## Installation
2023-08-19 18:15:41 -07:00
_Python >= 3.10 required_
2023-08-26 12:41:33 -07:00
1. Clone this repository `git clone https://github.com/cullenwatson/jobspy`
2023-08-17 13:46:03 -07:00
2. Install the dependencies with `pip install -r requirements.txt`
2023-08-19 16:44:16 -07:00
4. Run the server with `uvicorn main:app --reload`
2023-07-10 20:14:38 -07:00
## Usage
### Swagger UI:
To interact with the API documentation, navigate to [localhost:8000/docs](http://localhost:8000/docs).
### Postman:
To use Postman:
1. Locate the files in the `/postman/` directory.
2. Import the Postman collection and environment JSON files.
2023-08-19 16:44:16 -07:00
## FAQ
2023-08-19 18:37:49 -07:00
### I'm having issues with my queries. What should I do?
2023-08-19 16:44:16 -07:00
2023-08-19 18:37:49 -07:00
Broadening your filters can often help. Additionally, try reducing the number of `results_wanted`.
If issues still persist, feel free to submit an issue.
2023-08-19 16:44:16 -07:00
2023-08-19 18:37:49 -07:00
### How to enable auth?
2023-08-19 16:44:16 -07:00
2023-08-19 18:37:49 -07:00
Change `AUTH_REQUIRED` in `/settings.py` to `True`
2023-07-10 20:14:38 -07:00
2023-08-19 18:45:53 -07:00
The auth uses [supabase](https://supabase.com). Create a project with a `users` table and disable RLS.
2023-08-19 18:47:05 -07:00
<img src="https://github.com/cullenwatson/jobspy/assets/78247585/03af18e1-5386-49ad-a2cf-d34232d9d747" width="500">
2023-08-19 18:45:53 -07:00
2023-08-19 18:37:49 -07:00
Add these three environment variables:
- `SUPABASE_URL`: go to project settings -> API -> Project URL
- `SUPABASE_KEY`: go to project settings -> API -> service_role secret
- `JWT_SECRET_KEY` - type `openssl rand -hex 32` in terminal to create a 32 byte secret key
2023-08-26 12:41:33 -07:00
Use these endpoints to register and get an access token:
![image](https://github.com/cullenwatson/jobspy/assets/78247585/c84c33ec-1fe8-4152-9c8c-6c4334aecfc3)