Library Migration (#31)

2026-03-04 19:44:30 -08:00 · 2023-09-03 07:29:25 -07:00
parent 7efece8fe9
commit 153ac35248
36 changed files with 3604 additions and 1473 deletions
--- a/README.md
+++ b/README.md
@@ -1,240 +1,100 @@
-# JobSpy AIO Scraper
+# JobSpy

+**JobSpy** is a simple, yet comprehensive, job scraping library.
 ## Features

 - Scrapes job postings from **LinkedIn**, **Indeed** & **ZipRecruiter** simultaneously
- Returns jobs as JSON or CSV with title, location, company, description & other data
- Imports directly into **Google Sheets**
- Optional JWT authorization
+- Aggregates the job postings in a Pandas DataFrame

-![jobspy_gsheet](https://github.com/cullenwatson/JobSpy/assets/78247585/9f0a997c-4e33-4167-b04e-31ab1f606edb)
+### Installation
+`pip install jobscrape`  
+  
+  _Python version >= [3.10](https://www.python.org/downloads/release/python-3100/) required_ 
+
+### Usage
+
+```python
+from jobscrape import scrape_jobs
+import pandas as pd
+
+jobs: pd.DataFrame = scrape_jobs(
+    site_name=["indeed", "linkedin", "zip_recruiter"],
+    search_term="software engineer",
+    results_wanted=10
+)
+
+if jobs.empty:
+    print("No jobs found.")
+else:
+
+    #1 print
+    pd.set_option('display.max_columns', None)
+    pd.set_option('display.max_rows', None)
+    pd.set_option('display.width', None)
+    pd.set_option('display.max_colwidth', 50)  # set to 0 to see full job url / desc
+    print(jobs)
+
+    #2 display in Jupyter Notebook
+    display(jobs)
+    
+    #3 output to csv
+    jobs.to_csv('jobs.csv', index=False)
+```
+
+### Output
+```
+             site                                              title                    company_name                 city state   job_type interval min_amount max_amount                                            job_url                                        description
+           indeed                                  Software Engineer                AMERICAN SYSTEMS            Arlington    VA       None   yearly     200000     150000  https://www.indeed.com/viewjob?jk=5e409e577046...  THIS POSITION COMES WITH A 10K SIGNING BONUS! ...
+           indeed                           Senior Software Engineer                TherapyNotes.com         Philadelphia    PA   fulltime   yearly     135000     110000  https://www.indeed.com/viewjob?jk=da39574a40cb...  About Us TherapyNotes is the national leader i...
+         linkedin                   Software Engineer - Early Career                 Lockheed Martin            Sunnyvale    CA   fulltime   yearly       None       None      https://www.linkedin.com/jobs/view/3693012711  Description:By bringing together people that u...
+         linkedin                       Full-Stack Software Engineer                            Rain             New York    NY   fulltime   yearly       None       None      https://www.linkedin.com/jobs/view/3696158877  Rain’s mission is to create the fastest and ea...
+    zip_recruiter                       Software Engineer - New Grad                    ZipRecruiter         Santa Monica    CA   fulltime   yearly     130000     150000  https://www.ziprecruiter.com/jobs/ziprecruiter...  We offer a hybrid work environment. Most US-ba...
+    zip_recruiter                                 Software Developer                      TEKsystems              Phoenix    AZ   fulltime   hourly         65         75  https://www.ziprecruiter.com/jobs/teksystems-0...  Top Skills' Details• 6 years of Java developme.```
+```
+### Parameters for `scrape_jobs()`

-### API

-POST `/api/v1/jobs/`
-### Request Schema
 ```plaintext
 Required
 ├── site_type (List[enum]): linkedin, zip_recruiter, indeed
 └── search_term (str)
 Optional
 ├── location (int)
-├── distance (int)
+├── distance (int): in miles
 ├── job_type (enum): fulltime, parttime, internship, contract
 ├── is_remote (bool)
-├── results_wanted (int): per site_type
-├── easy_apply (bool): only for linkedin
-└── output_format (enum): json, csv, gsheet
-```
-### Request Example
-```json
-"site_type": ["indeed", "linkedin"],
-"search_term": "software engineer",
-"location": "austin, tx",
-"distance": 10,
-"job_type": "fulltime",
-"results_wanted": 15
-"output_format": "gsheet"
+├── results_wanted (int): number of job results to retrieve for each site specified in 'site_type'
+├── easy_apply (bool): filters for jobs on LinkedIn that have the 'Easy Apply' option
 ```
+
 ### Response Schema
 ```plaintext
-site_type (enum): 
-JobResponse
-├── success (bool)
-├── error (str)
-├── jobs (List[JobPost])
-│   └── JobPost
-│       ├── title (str)
-│       ├── company_name (str)
-│       ├── job_url (str)
-│       ├── location (object)
-│       │   ├── country (str)
-│       │   ├── city (str)
-│       │   ├── state (str)
-│       ├── description (str)
-│       ├── job_type (enum)
-│       ├── compensation (object)
-│       │   ├── interval (CompensationInterval): yearly, monthly, weekly, daily, hourly
-│       │   ├── min_amount (float)
-│       │   ├── max_amount (float)
-│       │   └── currency (str)
-│       └── date_posted (datetime)
-│
-├── total_results (int)
-└── returned_results (int) 
-```
-### Response Example (GOOGLE SHEETS)
-```json
-{
-    "status": "Successfully uploaded to Google Sheets",
-    "error": null,
-    "linkedin": null,
-    "indeed": null,
-    "zip_recruiter": null
-}
-```
-### Response Example (JSON)
-```json
-{
-    "indeed": {
-        "success": true,
-        "error": null,
-        "jobs": [
-            {
-                "title": "Software Engineer",
-                "company_name": "INTEL",
-                "job_url": "https://www.indeed.com/jobs/viewjob?jk=a2cfbb98d2002228",
-                "location": {
-                    "country": "USA",
-                    "city": "Austin",
-                    "state": "TX",
-                },
-                "description": "Job Description Designs, develops, tests, and debugs..."
-                "job_type": "fulltime",
-                "compensation": {
-                    "interval": "yearly",
-                    "min_amount": 209760.0,
-                    "max_amount": 139480.0,
-                    "currency": "USD"
-                },
-                "date_posted": "2023-08-18T00:00:00"
-            }, ...
-        ],
-        "total_results": 845,
-        "returned_results": 15
-    },
-    "linkedin": {
-        "success": true,
-        "error": null,
-        "jobs": [
-            {
-                "title": "Software Engineer 1",
-                "company_name": "Public Partnerships | PPL",
-                "job_url": "https://www.linkedin.com/jobs/view/3690013792",
-                "location": {
-                    "country": "USA",
-                    "city": "Austin",
-                    "state": "TX",
-                },
-                "description": "Public Partnerships LLC supports individuals with disabilities..."
-                "job_type": null,
-                "compensation": null,
-                "date_posted": "2023-07-31T00:00:00"
-            }, ...
-        ],
-        "total_results": 2000,
-        "returned_results": 15
-    }
-}
-```
-### Response Example (CSV)
-```
-Site, Title, Company Name, Job URL, Country, City, State, Job Type, Compensation Interval, Min Amount, Max Amount, Currency, Date Posted, Description
-indeed, Software Engineer, INTEL, https://www.indeed.com/jobs/viewjob?jk=a2cfbb98d2002228, USA, Austin, TX, fulltime, yearly, 209760.0, 139480.0, USD, 2023-08-18T00:00:00, Job Description Designs...
-linkedin, Software Engineer 1, Public Partnerships | PPL, https://www.linkedin.com/jobs/view/3690013792, USA, Austin, TX, , , , , , 2023-07-31T00:00:00, Public Partnerships LLC supports...
+JobPost
+├── title (str)
+├── company_name (str)
+├── job_url (str)
+├── location (object)
+│   ├── country (str)
+│   ├── city (str)
+│   ├── state (str)
+├── description (str)
+├── job_type (enum)
+├── compensation (object)
+│   ├── interval (CompensationInterval): yearly, monthly, weekly, daily, hourly
+│   ├── min_amount (float)
+│   ├── max_amount (float)
+│   └── currency (str)
+└── date_posted (datetime)
+
 ```

-## Installation
-### Docker Setup
-_Requires [Docker Desktop](https://www.docker.com/products/docker-desktop/)_

-[JobSpy API Image](https://ghcr.io/cullenwatson/jobspy:latest) is continuously updated and available on GitHub Container Registry.
+### FAQ
  
-To pull the Docker image:
-
-```bash
-docker pull ghcr.io/cullenwatson/jobspy:latest
-```
+#### Encountering issues with your queries?
  
-#### Params
+Try reducing the number of `results_wanted` and/or broadening the filters. If problems persist, please submit an issue.
  
-By default:
-* Port: `8000`
-* Google sheet name: `JobSpy`
-* Relative path of `client_secret.json` (for Google Sheets, see below to obtain)
-
+#### Received a response code 429?
+This means you've been blocked by the job board site for sending too many requests. Consider waiting a few seconds, or try using a VPN. Proxy support coming soon.
  
-To run the image with these default settings, use:
-    
-Example (Cmd Prompt - Windows):
-```bash
-docker run -v %cd%/client_secret.json:/app/client_secret.json -p 8000:8000 ghcr.io/cullenwatson/jobspy
-```
-  
-Example (Unix):
-```bash
-docker run -v $(pwd)/client_secret.json:/app/client_secret.json -p 8000:8000 ghcr.io/cullenwatson/jobspy
-```
-  
-#### Using custom params
-
-  Example: 
-   * Port: `8030`
-   * Google sheet name: `CustomName`
-   * Absolute path of `client_secret.json`: `C:\config\client_secret.json`
-
-  To pass these custom params:
-```bash
-docker run -v C:\config\client_secret.json:/app/client_secret.json -e GSHEET_NAME=CustomName -e PORT=8030 -p 8030:8030 ghcr.io/cullenwatson/jobspy
-```
-  
-### Python installation (alternative to Docker)
-_Python version >= [3.10](https://www.python.org/downloads/release/python-3100/) required_  
-1. Clone this repository `git clone https://github.com/cullenwatson/jobspy`
-2. Install the dependencies with `pip install -r requirements.txt`
-4. Run the server with `uvicorn main:app --reload`
-  
-### Google Sheets Setup
-  
-#### Obtaining an Access Key: [Video Guide](https://youtu.be/w533wJuilao?si=5u3m50pRtdhqkg9Z&t=43)
-  * Enable the [Google Sheets & Google Drive API](https://console.cloud.google.com/)
-  * Create credentials -> service account -> create & continue
-  * Select role -> basic: editor -> done
-  * Click on the email you just created in the service account list
-  * Go to the Keys tab -> add key -> create new key -> JSON -> Create
-  
-#### Using the key in the repo
-  * Copy the key file into the JobSpy repo as `client_secret.json`
-  * Go to [my template sheet](https://docs.google.com/spreadsheets/d/1mOgb-ZGZy_YIhnW9OCqIVvkFwiKFvhMBjNcbakW7BLo/edit?usp=sharing): File -> Make a Copy -> Rename to JobSpy
-  * Share the Google sheet with the email located in the field `client_email` in the `client_secret.json` above with editor rights
-  * If you changed the name of the sheet:
-    - Python install: add `.env` in the repo and add `GSHEET_NAME` param with the sheet name as the value, e.g. `GSHEET_NAME=CustomName`
-    - Docker install: use custom param `-e GSHEET_NAME=CustomName` in `docker run` (see above)
-  
-### How to call the API
-  
-#### [Postman](https://www.postman.com/downloads/) (preferred):
-To use Postman:
-1. Locate the files in the `/postman/` directory.
-2. Import the Postman collection and environment JSON files.
-  
-#### Swagger UI:
-Or you can call the API with the interactive documentation at [localhost:8000/docs](http://localhost:8000/docs).
-  
-## FAQ
-  
-### I'm having issues with my queries. What should I do?
-  
-Try reducing the number of `results_wanted` and/or broadening the filters. If issues still persist, feel free to submit an issue. 
-  
-### I'm getting response code 429. What should I do?
-You have been blocked by the job board site for sending too many requests. Wait a couple seconds or use a VPN.
-  
-### How to enable auth?
-  
-Change `AUTH_REQUIRED` in `/settings.py` to `True`
-  
-The auth uses [supabase](https://supabase.com). Create a project with a `users` table and disable RLS.  
-  
-<img src="https://github.com/cullenwatson/jobspy/assets/78247585/03af18e1-5386-49ad-a2cf-d34232d9d747" width="500">
-  
-Add these three environment variables:
-  
- `SUPABASE_URL`: go to project settings -> API -> Project URL  
- `SUPABASE_KEY`: go to project settings -> API -> service_role secret
- `JWT_SECRET_KEY` - type `openssl rand -hex 32` in terminal to create a 32 byte secret key
-  
-Use these endpoints to register and get an access token: 
-  
-![image](https://github.com/cullenwatson/jobspy/assets/78247585/c84c33ec-1fe8-4152-9c8c-6c4334aecfc3)
-