enh: ziprecruiter full description (#162 )

docs: readme
2026-03-04 19:44:30 -08:00 · 2024-06-09 16:21:01 -05:00 · 2024-05-29 19:32:32 -05:00 · 2024-05-28 16:04:26 -05:00 · 2024-05-28 16:01:29 -05:00 · 2024-05-28 15:39:24 -05:00
15 changed files with 2621 additions and 2141 deletions
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,7 @@
+repos:
+- repo: https://github.com/psf/black
+  rev: 24.2.0
+  hooks:
+  - id: black
+    language_version: python
+    args: [--line-length=88, --quiet]
--- a/README.md
+++ b/README.md
@@ -11,17 +11,14 @@ work with us.*

 - Scrapes job postings from **LinkedIn**, **Indeed**, **Glassdoor**, & **ZipRecruiter** simultaneously
 - Aggregates the job postings in a Pandas DataFrame
- Proxy support (HTTP/S, SOCKS)
-
-[Video Guide for JobSpy](https://www.youtube.com/watch?v=RuP1HrAZnxs&pp=ygUgam9icyBzY3JhcGVyIGJvdCBsaW5rZWRpbiBpbmRlZWQ%3D) -
-Updated for release v1.1.3
+- Proxies support

 ![jobspy](https://github.com/cullenwatson/JobSpy/assets/78247585/ec7ef355-05f6-4fd3-8161-a817e31c5c57)

 ### Installation

 ```
-pip install python-jobspy
+pip install -U python-jobspy
 ```

 _Python version >= [3.10](https://www.python.org/downloads/release/python-3100/) required_
@@ -37,18 +34,22 @@ jobs = scrape_jobs(
    search_term="software engineer",
    location="Dallas, TX",
    results_wanted=20,
-    hours_old=72, # (only linkedin is hour specific, others round up to days old)
-    country_indeed='USA'  # only needed for indeed / glassdoor
+    hours_old=72, # (only Linkedin/Indeed is hour specific, others round up to days old)
+    country_indeed='USA',  # only needed for indeed / glassdoor
+    
+    # linkedin_fetch_description=True # get full description and direct job url for linkedin (slower)
+    # proxies=["208.195.175.46:65095", "208.195.175.45:65095", "localhost"],
+    
 )
 print(f"Found {len(jobs)} jobs")
 print(jobs.head())
-jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False) # to_xlsx
+jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False) # to_excel
 ```

 ### Output

 ```
-SITE           TITLE                             COMPANY_NAME      CITY          STATE  JOB_TYPE  INTERVAL  MIN_AMOUNT  MAX_AMOUNT  JOB_URL                                            DESCRIPTION
+SITE           TITLE                             COMPANY           CITY          STATE  JOB_TYPE  INTERVAL  MIN_AMOUNT  MAX_AMOUNT  JOB_URL                                            DESCRIPTION
 indeed         Software Engineer                 AMERICAN SYSTEMS  Arlington     VA     None      yearly    200000      150000      https://www.indeed.com/viewjob?jk=5e409e577046...  THIS POSITION COMES WITH A 10K SIGNING BONUS!...
 indeed         Senior Software Engineer          TherapyNotes.com  Philadelphia  PA     fulltime  yearly    135000      110000      https://www.indeed.com/viewjob?jk=da39574a40cb...  About Us TherapyNotes is the national leader i...
 linkedin       Software Engineer - Early Career  Lockheed Martin   Sunnyvale     CA     fulltime  yearly    None        None        https://www.linkedin.com/jobs/view/3693012711      Description:By bringing together people that u...
@@ -60,24 +61,71 @@ zip_recruiter Software Developer                 TEKsystems        Phoenix
 ### Parameters for `scrape_jobs()`

 ```plaintext
-Required
-├── site_type (List[enum]): linkedin, zip_recruiter, indeed, glassdoor
-└── search_term (str)
 Optional
-├── location (int)
-├── distance (int): in miles
-├── job_type (enum): fulltime, parttime, internship, contract
-├── proxy (str): in format 'http://user:pass@host:port' or [https, socks]
+├── site_name (list|str): 
+|    linkedin, zip_recruiter, indeed, glassdoor 
+|    (default is all four)
+│
+├── search_term (str)
+│
+├── location (str)
+│
+├── distance (int): 
+|    in miles, default 50
+│
+├── job_type (str): 
+|    fulltime, parttime, internship, contract
+│
+├── proxies (list): 
+|    in format ['user:pass@host:port', 'localhost']
+|    each job board will round robin through the proxies
+│
 ├── is_remote (bool)
-├── full_description (bool): fetches full description for LinkedIn (slower)
-├── results_wanted (int): number of job results to retrieve for each site specified in 'site_type'
-├── easy_apply (bool): filters for jobs that are hosted on the job board site
-├── linkedin_company_ids (list[int): searches for linkedin jobs with specific company ids
-├── country_indeed (enum): filters the country on Indeed (see below for correct spelling)
-├── offset (num): starts the search from an offset (e.g. 25 will start the search from the 25th result)
-├── hours_old (int): filters jobs by the number of hours since the job was posted (all but LinkedIn rounds up to next day)
+│
+├── results_wanted (int): 
+|    number of job results to retrieve for each site specified in 'site_name'
+│
+├── easy_apply (bool): 
+|    filters for jobs that are hosted on the job board site
+│
+├── description_format (str): 
+|    markdown, html (Format type of the job descriptions. Default is markdown.)
+│
+├── offset (int): 
+|    starts the search from an offset (e.g. 25 will start the search from the 25th result)
+│
+├── hours_old (int): 
+|    filters jobs by the number of hours since the job was posted 
+|    (ZipRecruiter and Glassdoor round up to next day.)
+│
+├── verbose (int) {0, 1, 2}: 
+|    Controls the verbosity of the runtime printouts 
+|    (0 prints only errors, 1 is errors+warnings, 2 is all logs. Default is 2.)
+
+├── linkedin_fetch_description (bool): 
+|    fetches full description and direct job url for LinkedIn (Increases requests by O(n))
+│
+├── linkedin_company_ids (list[int]): 
+|    searches for linkedin jobs with specific company ids
+|
+├── country_indeed (str): 
+|    filters the country on Indeed & Glassdoor (see below for correct spelling)
 ```

+```
+├── Indeed limitations:
+|    Only one from this list can be used in a search:
+|    - hours_old
+|    - job_type & is_remote
+|    - easy_apply
+│
+└── LinkedIn limitations:
+|    Only one from this list can be used in a search:
+|    - hours_old
+|    - easy_apply
+```
+
+
 ### JobPost Schema

 ```plaintext
@@ -92,31 +140,34 @@ JobPost
 │   ├── state (str)
 ├── description (str)
 ├── job_type (str): fulltime, parttime, internship, contract
+├── job_function (str)
 ├── compensation (object)
 │   ├── interval (str): yearly, monthly, weekly, daily, hourly
 │   ├── min_amount (int)
 │   ├── max_amount (int)
 │   └── currency (enum)
-└── date_posted (date)
-└── emails (str)
-└── num_urgent_words (int)
+├── date_posted (date)
+├── emails (str)
 └── is_remote (bool)
+
+Indeed specific
+├── company_country (str)
+└── company_addresses (str)
+└── company_industry (str)
+└── company_employees_label (str)
+└── company_revenue_label (str)
+└── company_description (str)
+└── ceo_name (str)
+└── ceo_photo_url (str)
+└── logo_photo_url (str)
+└── banner_photo_url (str)
 ```

-### Exceptions
-
-The following exceptions may be raised when using JobSpy:
-
-* `LinkedInException`
-* `IndeedException`
-* `ZipRecruiterException`
-* `GlassdoorException`
-
 ## Supported Countries for Job Searching

 ### **LinkedIn**

-LinkedIn searches globally & uses only the `location` parameter. You can only fetch 1000 jobs max from the LinkedIn endpoint we're using
+LinkedIn searches globally & uses only the `location` parameter. 

 ### **ZipRecruiter**

@@ -146,10 +197,14 @@ You can specify the following countries when searching on Indeed (use the exact
 | South Korea          | Spain*       | Sweden     | Switzerland*   |
 | Taiwan               | Thailand     | Turkey     | Ukraine        |
 | United Arab Emirates | UK*          | USA*       | Uruguay        |
-| Venezuela            | Vietnam      |            |                |
+| Venezuela            | Vietnam*     |            |                |


-Glassdoor can only fetch 900 jobs from the endpoint we're using on a given search.
+## Notes
+* Indeed is the best scraper currently with no rate limiting.  
+* All the job board endpoints are capped at around 1000 jobs on a given search.  
+* LinkedIn is the most restrictive and usually rate limits around the 10th page with one ip. Proxies are a must basically.
+
 ## Frequently Asked Questions

 ---
@@ -163,11 +218,7 @@ persist, [submit an issue](https://github.com/Bunsly/JobSpy/issues).
 **Q: Received a response code 429?**  
 **A:** This indicates that you have been blocked by the job board site for sending too many requests. All of the job board sites are aggressive with blocking. We recommend:

- Waiting some time between scrapes (site-dependent).
- Trying a VPN or proxy to change your IP address.
+- Wait some time between scrapes (site-dependent).
+- Try using the proxies param to change your IP address.

 ---
-
-
-
-  
--- a/examples/JobSpy_AllSites.py
+++ b/examples/JobSpy_AllSites.py
@@ -1,30 +0,0 @@
-from jobspy import scrape_jobs
-import pandas as pd
-
-jobs: pd.DataFrame = scrape_jobs(
-    site_name=["indeed", "linkedin", "zip_recruiter", "glassdoor"],
-    search_term="software engineer",
-    location="Dallas, TX",
-    results_wanted=25,  # be wary the higher it is, the more likey you'll get blocked (rotating proxy can help tho)
-    country_indeed="USA",
-    # proxy="http://jobspy:5a4vpWtj8EeJ2hoYzk@ca.smartproxy.com:20001",
-)
-
-# formatting for pandas
-pd.set_option("display.max_columns", None)
-pd.set_option("display.max_rows", None)
-pd.set_option("display.width", None)
-pd.set_option("display.max_colwidth", 50)  # set to 0 to see full job url / desc
-
-# 1: output to console
-print(jobs)
-
-# 2: output to .csv
-jobs.to_csv("./jobs.csv", index=False)
-print("outputted to jobs.csv")
-
-# 3: output to .xlsx
-# jobs.to_xlsx('jobs.xlsx', index=False)
-
-# 4: display in Jupyter Notebook (1. pip install jupyter 2. jupyter notebook)
-# display(jobs)
--- a/examples/JobSpy_Demo.ipynb
+++ b/examples/JobSpy_Demo.ipynb
@@ -1,167 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "00a94b47-f47b-420f-ba7e-714ef219c006",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from jobspy import scrape_jobs\n",
-    "import pandas as pd\n",
-    "from IPython.display import display, HTML"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9f773e6c-d9fc-42cc-b0ef-63b739e78435",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "pd.set_option('display.max_columns', None)\n",
-    "pd.set_option('display.max_rows', None)\n",
-    "pd.set_option('display.width', None)\n",
-    "pd.set_option('display.max_colwidth', 50)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1253c1f8-9437-492e-9dd3-e7fe51099420",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# example 1 (no hyperlinks, USA)\n",
-    "jobs = scrape_jobs(\n",
-    "    site_name=[\"linkedin\"],\n",
-    "    location='san francisco',\n",
-    "    search_term=\"engineer\",\n",
-    "    results_wanted=5,\n",
-    "\n",
-    "    # use if you want to use a proxy\n",
-    "    # proxy=\"socks5://jobspy:5a4vpWtj4EeJ2hoYzk@us.smartproxy.com:10001\",\n",
-    "    proxy=\"http://jobspy:5a4vpWtj4EeJ2hoYzk@us.smartproxy.com:10001\",\n",
-    "    #proxy=\"https://jobspy:5a4vpWtj4EeJ2hoYzk@us.smartproxy.com:10001\",\n",
-    ")\n",
-    "display(jobs)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "6a581b2d-f7da-4fac-868d-9efe143ee20a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# example 2 - remote USA & hyperlinks\n",
-    "jobs = scrape_jobs(\n",
-    "    site_name=[\"linkedin\", \"zip_recruiter\", \"indeed\"],\n",
-    "    # location='san francisco',\n",
-    "    search_term=\"software engineer\",\n",
-    "    country_indeed=\"USA\",\n",
-    "    hyperlinks=True,\n",
-    "    is_remote=True,\n",
-    "    results_wanted=5, \n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fe8289bc-5b64-4202-9a64-7c117c83fd9a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# use if hyperlinks=True\n",
-    "html = jobs.to_html(escape=False)\n",
-    "# change max-width: 200px to show more or less of the content\n",
-    "truncate_width = f'<style>.dataframe td {{ max-width: 200px; overflow: hidden; text-overflow: ellipsis; white-space: nowrap; }}</style>{html}'\n",
-    "display(HTML(truncate_width))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "951c2fe1-52ff-407d-8bb1-068049b36777",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# example 3 - with hyperlinks, international - linkedin (no zip_recruiter)\n",
-    "jobs = scrape_jobs(\n",
-    "    site_name=[\"linkedin\"],\n",
-    "    location='berlin',\n",
-    "    search_term=\"engineer\",\n",
-    "    hyperlinks=True,\n",
-    "    results_wanted=5,\n",
-    "    easy_apply=True\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1e37a521-caef-441c-8fc2-2eb5b2e7da62",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# use if hyperlinks=True\n",
-    "html = jobs.to_html(escape=False)\n",
-    "# change max-width: 200px to show more or less of the content\n",
-    "truncate_width = f'<style>.dataframe td {{ max-width: 200px; overflow: hidden; text-overflow: ellipsis; white-space: nowrap; }}</style>{html}'\n",
-    "display(HTML(truncate_width))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0650e608-0b58-4bf5-ae86-68348035b16a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# example 4 - international indeed (no zip_recruiter)\n",
-    "jobs = scrape_jobs(\n",
-    "    site_name=[\"indeed\"],\n",
-    "    search_term=\"engineer\",\n",
-    "    country_indeed = \"China\",\n",
-    "    hyperlinks=True\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "40913ac8-3f8a-4d7e-ac47-afb88316432b",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# use if hyperlinks=True\n",
-    "html = jobs.to_html(escape=False)\n",
-    "# change max-width: 200px to show more or less of the content\n",
-    "truncate_width = f'<style>.dataframe td {{ max-width: 200px; overflow: hidden; text-overflow: ellipsis; white-space: nowrap; }}</style>{html}'\n",
-    "display(HTML(truncate_width))"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.11.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/examples/JobSpy_LongScrape.py
+++ b/examples/JobSpy_LongScrape.py
@@ -1,77 +0,0 @@
-from jobspy import scrape_jobs
-import pandas as pd
-import os
-import time
-
-# creates csv a new filename if the jobs.csv already exists.
-csv_filename = "jobs.csv"
-counter = 1
-while os.path.exists(csv_filename):
-    csv_filename = f"jobs_{counter}.csv"
-    counter += 1
-
-# results wanted and offset
-results_wanted = 1000
-offset = 0
-
-all_jobs = []
-
-# max retries
-max_retries = 3
-
-# nuumber of results at each iteration
-results_in_each_iteration = 30
-
-while len(all_jobs) < results_wanted:
-    retry_count = 0
-    while retry_count < max_retries:
-        print("Doing from", offset, "to", offset + results_in_each_iteration, "jobs")
-        try:
-            jobs = scrape_jobs(
-                site_name=["indeed"],
-                search_term="software engineer",
-                # New York, NY
-                # Dallas, TX
-
-                # Los Angeles, CA
-                location="Los Angeles, CA",
-                results_wanted=min(results_in_each_iteration, results_wanted - len(all_jobs)),
-                country_indeed="USA",
-                offset=offset,
-                # proxy="http://jobspy:5a4vpWtj8EeJ2hoYzk@ca.smartproxy.com:20001",
-            )
-
-            # Add the scraped jobs to the list
-            all_jobs.extend(jobs.to_dict('records'))
-
-            # Increment the offset for the next page of results
-            offset += results_in_each_iteration
-
-            # Add a delay to avoid rate limiting (you can adjust the delay time as needed)
-            print(f"Scraped {len(all_jobs)} jobs")
-            print("Sleeping secs", 100 * (retry_count + 1))
-            time.sleep(100 * (retry_count + 1))  # Sleep for 2 seconds between requests
-
-            break  # Break out of the retry loop if successful
-        except Exception as e:
-            print(f"Error: {e}")
-            retry_count += 1
-            print("Sleeping secs before retry", 100 * (retry_count + 1))
-            time.sleep(100 * (retry_count + 1))
-            if retry_count >= max_retries:
-                print("Max retries reached. Exiting.")
-                break
-
-# DataFrame from the collected job data
-jobs_df = pd.DataFrame(all_jobs)
-
-# Formatting
-pd.set_option("display.max_columns", None)
-pd.set_option("display.max_rows", None)
-pd.set_option("display.width", None)
-pd.set_option("display.max_colwidth", 50)
-
-print(jobs_df)
-
-jobs_df.to_csv(csv_filename, index=False)
-print(f"Outputted to {csv_filename}")
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "python-jobspy"
-version = "1.1.44"
+version = "1.1.56"
 description = "Job scraper for LinkedIn, Indeed, Glassdoor & ZipRecruiter"
 authors = ["Zachary Hampton <zachary@bunsly.com>", "Cullen Watson <cullen@bunsly.com>"]
 homepage = "https://github.com/Bunsly/JobSpy"
@@ -13,17 +13,24 @@ packages = [
 [tool.poetry.dependencies]
 python = "^3.10"
 requests = "^2.31.0"
-tls-client = "*"
 beautifulsoup4 = "^4.12.2"
 pandas = "^2.1.0"
 NUMPY = "1.24.2"
 pydantic = "^2.3.0"
+tls-client = "^1.0.1"
+markdownify = "^0.11.6"
+regex = "^2024.4.28"


 [tool.poetry.group.dev.dependencies]
 pytest = "^7.4.1"
 jupyter = "^1.0.0"
+black = "*"
+pre-commit = "*"

 [build-system]
 requires = ["poetry-core"]
 build-backend = "poetry.core.masonry.api"
+
+[tool.black]
+line-length = 88
--- a/src/jobspy/init.py
+++ b/src/jobspy/init.py
@@ -1,8 +1,11 @@
+from __future__ import annotations
+
 import pandas as pd
 from typing import Tuple
 from concurrent.futures import ThreadPoolExecutor, as_completed

 from .jobs import JobType, Location
+from .scrapers.utils import logger, set_logger_level
 from .scrapers.indeed import IndeedScraper
 from .scrapers.ziprecruiter import ZipRecruiterScraper
 from .scrapers.glassdoor import GlassdoorScraper
@@ -15,40 +18,41 @@ from .scrapers.exceptions import (
    GlassdoorException,
 )

-SCRAPER_MAPPING = {
-    Site.LINKEDIN: LinkedInScraper,
-    Site.INDEED: IndeedScraper,
-    Site.ZIP_RECRUITER: ZipRecruiterScraper,
-    Site.GLASSDOOR: GlassdoorScraper,
-}
-
-
-def _map_str_to_site(site_name: str) -> Site:
-    return Site[site_name.upper()]
-

 def scrape_jobs(
    site_name: str | list[str] | Site | list[Site] | None = None,
    search_term: str | None = None,
    location: str | None = None,
-    distance: int | None = None,
+    distance: int | None = 50,
    is_remote: bool = False,
    job_type: str | None = None,
    easy_apply: bool | None = None,
    results_wanted: int = 15,
    country_indeed: str = "usa",
    hyperlinks: bool = False,
-    proxy: str | None = None,
-    full_description: bool | None = False,
+    proxies: list[str] | str | None = None,
+    description_format: str = "markdown",
+    linkedin_fetch_description: bool | None = False,
    linkedin_company_ids: list[int] | None = None,
    offset: int | None = 0,
    hours_old: int = None,
+    verbose: int = 2,
    **kwargs,
 ) -> pd.DataFrame:
    """
    Simultaneously scrapes job data from multiple job sites.
-    :return: results_wanted: pandas dataframe containing job data
+    :return: pandas dataframe containing job data
    """
+    SCRAPER_MAPPING = {
+        Site.LINKEDIN: LinkedInScraper,
+        Site.INDEED: IndeedScraper,
+        Site.ZIP_RECRUITER: ZipRecruiterScraper,
+        Site.GLASSDOOR: GlassdoorScraper,
+    }
+    set_logger_level(verbose)
+
+    def map_str_to_site(site_name: str) -> Site:
+        return Site[site_name.upper()]

    def get_enum_from_value(value_str):
        for job_type in JobType:
@@ -61,12 +65,12 @@ def scrape_jobs(
    def get_site_type():
        site_types = list(Site)
        if isinstance(site_name, str):
-            site_types = [_map_str_to_site(site_name)]
+            site_types = [map_str_to_site(site_name)]
        elif isinstance(site_name, Site):
            site_types = [site_name]
        elif isinstance(site_name, list):
            site_types = [
-                _map_str_to_site(site) if isinstance(site, str) else site
+                map_str_to_site(site) if isinstance(site, str) else site
                for site in site_name
            ]
        return site_types
@@ -82,32 +86,21 @@ def scrape_jobs(
        is_remote=is_remote,
        job_type=job_type,
        easy_apply=easy_apply,
-        full_description=full_description,
+        description_format=description_format,
+        linkedin_fetch_description=linkedin_fetch_description,
        results_wanted=results_wanted,
        linkedin_company_ids=linkedin_company_ids,
        offset=offset,
-        hours_old=hours_old
+        hours_old=hours_old,
    )

    def scrape_site(site: Site) -> Tuple[str, JobResponse]:
        scraper_class = SCRAPER_MAPPING[site]
-        scraper = scraper_class(proxy=proxy)
-
-        try:
-            scraped_data: JobResponse = scraper.scrape(scraper_input)
-        except (LinkedInException, IndeedException, ZipRecruiterException) as lie:
-            raise lie
-        except Exception as e:
-            if site == Site.LINKEDIN:
-                raise LinkedInException(str(e))
-            if site == Site.INDEED:
-                raise IndeedException(str(e))
-            if site == Site.ZIP_RECRUITER:
-                raise ZipRecruiterException(str(e))
-            if site == Site.GLASSDOOR:
-                raise GlassdoorException(str(e))
-            else:
-                raise e
+        scraper = scraper_class(proxies=proxies)
+        scraped_data: JobResponse = scraper.scrape(scraper_input)
+        cap_name = site.value.capitalize()
+        site_name = "ZipRecruiter" if cap_name == "Zip_recruiter" else cap_name
+        logger.info(f"{site_name} finished scraping")
        return site.value, scraped_data

    site_to_jobs_dict = {}
@@ -130,9 +123,8 @@ def scrape_jobs(
    for site, job_response in site_to_jobs_dict.items():
        for job in job_response.jobs:
            job_data = job.dict()
-            job_data[
-                "job_url_hyper"
-            ] = f'<a href="{job_data["job_url"]}">{job_data["job_url"]}</a>'
+            job_url = job_data["job_url"]
+            job_data["job_url_hyper"] = f'<a href="{job_url}">{job_url}</a>'
            job_data["site"] = site
            job_data["company"] = job_data["company_name"]
            job_data["job_type"] = (
@@ -168,13 +160,20 @@ def scrape_jobs(
            jobs_dfs.append(job_df)

    if jobs_dfs:
-        jobs_df = pd.concat(jobs_dfs, ignore_index=True)
-        desired_order: list[str] = [
-            "job_url_hyper" if hyperlinks else "job_url",
+        # Step 1: Filter out all-NA columns from each DataFrame before concatenation
+        filtered_dfs = [df.dropna(axis=1, how="all") for df in jobs_dfs]
+
+        # Step 2: Concatenate the filtered DataFrames
+        jobs_df = pd.concat(filtered_dfs, ignore_index=True)
+
+        # Desired column order
+        desired_order = [
+            "id",
            "site",
+            "job_url_hyper" if hyperlinks else "job_url",
+            "job_url_direct",
            "title",
            "company",
-            "company_url",
            "location",
            "job_type",
            "date_posted",
@@ -183,13 +182,31 @@ def scrape_jobs(
            "max_amount",
            "currency",
            "is_remote",
-            "num_urgent_words",
-            "benefits",
+            "job_function",
            "emails",
            "description",
+            "company_url",
+            "company_url_direct",
+            "company_addresses",
+            "company_industry",
+            "company_num_employees",
+            "company_revenue",
+            "company_description",
+            "logo_photo_url",
+            "banner_photo_url",
+            "ceo_name",
+            "ceo_photo_url",
        ]
-        jobs_formatted_df = jobs_df[desired_order]
-    else:
-        jobs_formatted_df = pd.DataFrame()

-    return jobs_formatted_df.sort_values(by=['site', 'date_posted'], ascending=[True, False])
+        # Step 3: Ensure all desired columns are present, adding missing ones as empty
+        for column in desired_order:
+            if column not in jobs_df.columns:
+                jobs_df[column] = None  # Add missing columns as empty
+
+        # Reorder the DataFrame according to the desired order
+        jobs_df = jobs_df[desired_order]
+
+        # Step 4: Sort the DataFrame as required
+        return jobs_df.sort_values(by=["site", "date_posted"], ascending=[True, False])
+    else:
+        return pd.DataFrame()
--- a/src/jobspy/jobs/init.py
+++ b/src/jobspy/jobs/init.py
@@ -1,3 +1,5 @@
+from __future__ import annotations
+
 from typing import Optional
 from datetime import date
 from enum import Enum
@@ -57,7 +59,7 @@ class JobType(Enum):
 class Country(Enum):
    """
    Gets the subdomain for Indeed and Glassdoor.
-    The second item in the tuple is the subdomain for Indeed
+    The second item in the tuple is the subdomain (and API country code if there's a ':' separator) for Indeed
    The third item in the tuple is the subdomain (and tld if there's a ':' separator) for Glassdoor
    """

@@ -118,11 +120,11 @@ class Country(Enum):
    TURKEY = ("turkey", "tr")
    UKRAINE = ("ukraine", "ua")
    UNITEDARABEMIRATES = ("united arab emirates", "ae")
-    UK = ("uk,united kingdom", "uk", "co.uk")
-    USA = ("usa,us,united states", "www", "com")
+    UK = ("uk,united kingdom", "uk:gb", "co.uk")
+    USA = ("usa,us,united states", "www:us", "com")
    URUGUAY = ("uruguay", "uy")
    VENEZUELA = ("venezuela", "ve")
-    VIETNAM = ("vietnam", "vn")
+    VIETNAM = ("vietnam", "vn", "com")

    # internal for ziprecruiter
    US_CANADA = ("usa/ca", "www")
@@ -132,7 +134,10 @@ class Country(Enum):

    @property
    def indeed_domain_value(self):
-        return self.value[1]
+        subdomain, _, api_country_code = self.value[1].partition(":")
+        if subdomain and api_country_code:
+            return subdomain, api_country_code.upper()
+        return self.value[1], self.value[1].upper()

    @property
    def glassdoor_domain_value(self):
@@ -145,7 +150,7 @@ class Country(Enum):
        else:
            raise Exception(f"Glassdoor is not available for {self.name}")

-    def get_url(self):
+    def get_glassdoor_url(self):
        return f"https://{self.glassdoor_domain_value}/"

    @classmethod
@@ -153,7 +158,7 @@ class Country(Enum):
        """Convert a string to the corresponding Country enum."""
        country_str = country_str.strip().lower()
        for country in cls:
-            country_names = country.value[0].split(',')
+            country_names = country.value[0].split(",")
            if country_str in country_names:
                return country
        valid_countries = [country.value for country in cls]
@@ -163,7 +168,7 @@ class Country(Enum):


 class Location(BaseModel):
-    country: Country | None = None
+    country: Country | str | None = None
    city: Optional[str] = None
    state: Optional[str] = None

@@ -173,7 +178,12 @@ class Location(BaseModel):
            location_parts.append(self.city)
        if self.state:
            location_parts.append(self.state)
-        if self.country and self.country not in (Country.US_CANADA, Country.WORLDWIDE):
+        if isinstance(self.country, str):
+            location_parts.append(self.country)
+        elif self.country and self.country not in (
+            Country.US_CANADA,
+            Country.WORLDWIDE,
+        ):
            country_name = self.country.value[0]
            if "," in country_name:
                country_name = country_name.split(",")[0]
@@ -210,23 +220,42 @@ class Compensation(BaseModel):
    currency: Optional[str] = "USD"


+class DescriptionFormat(Enum):
+    MARKDOWN = "markdown"
+    HTML = "html"
+
+
 class JobPost(BaseModel):
+    id: str | None = None
    title: str
-    company_name: str
+    company_name: str | None
    job_url: str
+    job_url_direct: str | None = None
    location: Optional[Location]

    description: str | None = None
    company_url: str | None = None
+    company_url_direct: str | None = None

    job_type: list[JobType] | None = None
    compensation: Compensation | None = None
    date_posted: date | None = None
-    benefits: str | None = None
    emails: list[str] | None = None
-    num_urgent_words: int | None = None
    is_remote: bool | None = None
-    # company_industry: str | None = None
+
+    # indeed specific
+    company_addresses: str | None = None
+    company_industry: str | None = None
+    company_num_employees: str | None = None
+    company_revenue: str | None = None
+    company_description: str | None = None
+    ceo_name: str | None = None
+    ceo_photo_url: str | None = None
+    logo_photo_url: str | None = None
+    banner_photo_url: str | None = None
+
+    # linkedin only atm
+    job_function: str | None = None


 class JobResponse(BaseModel):
--- a/src/jobspy/scrapers/init.py
+++ b/src/jobspy/scrapers/init.py
@@ -1,4 +1,15 @@
-from ..jobs import Enum, BaseModel, JobType, JobResponse, Country
+from __future__ import annotations
+
+from abc import ABC, abstractmethod
+
+from ..jobs import (
+    Enum,
+    BaseModel,
+    JobType,
+    JobResponse,
+    Country,
+    DescriptionFormat,
+)


 class Site(Enum):
@@ -18,17 +29,19 @@ class ScraperInput(BaseModel):
    is_remote: bool = False
    job_type: JobType | None = None
    easy_apply: bool | None = None
-    full_description: bool = False
    offset: int = 0
+    linkedin_fetch_description: bool = False
    linkedin_company_ids: list[int] | None = None
+    description_format: DescriptionFormat | None = DescriptionFormat.MARKDOWN

    results_wanted: int = 15
    hours_old: int | None = None


-class Scraper:
-    def __init__(self, site: Site, proxy: list[str] | None = None):
+class Scraper(ABC):
+    def __init__(self, site: Site, proxies: list[str] | None = None):
+        self.proxies = proxies
        self.site = site
-        self.proxy = (lambda p: {"http": p, "https": p} if p else None)(proxy)

+    @abstractmethod
    def scrape(self, scraper_input: ScraperInput) -> JobResponse: ...
--- a/src/jobspy/scrapers/glassdoor/init.py
+++ b/src/jobspy/scrapers/glassdoor/init.py
@@ -4,16 +4,24 @@ jobspy.scrapers.glassdoor

 This module contains routines to scrape Glassdoor.
 """
+
+from __future__ import annotations
+
+import re
 import json
 import requests
-from typing import Optional
+from typing import Optional, Tuple
 from datetime import datetime, timedelta
 from concurrent.futures import ThreadPoolExecutor, as_completed
-from ..utils import count_urgent_words, extract_emails_from_text

 from .. import Scraper, ScraperInput, Site
+from ..utils import extract_emails_from_text
 from ..exceptions import GlassdoorException
-from ..utils import create_session
+from ..utils import (
+    create_session,
+    markdown_converter,
+    logger,
+)
 from ...jobs import (
    JobPost,
    Compensation,
@@ -21,84 +29,154 @@ from ...jobs import (
    Location,
    JobResponse,
    JobType,
+    DescriptionFormat,
 )


 class GlassdoorScraper(Scraper):
-    def __init__(self, proxy: Optional[str] = None):
+    def __init__(self, proxies: list[str] | str | None = None):
        """
        Initializes GlassdoorScraper with the Glassdoor job search url
        """
        site = Site(Site.GLASSDOOR)
-        super().__init__(site, proxy=proxy)
+        super().__init__(site, proxies=proxies)

-        self.url = None
+        self.base_url = None
        self.country = None
        self.session = None
+        self.scraper_input = None
        self.jobs_per_page = 30
+        self.max_pages = 30
        self.seen_urls = set()

-    def fetch_jobs_page(
+    def scrape(self, scraper_input: ScraperInput) -> JobResponse:
+        """
+        Scrapes Glassdoor for jobs with scraper_input criteria.
+        :param scraper_input: Information about job search criteria.
+        :return: JobResponse containing a list of jobs.
+        """
+        self.scraper_input = scraper_input
+        self.scraper_input.results_wanted = min(900, scraper_input.results_wanted)
+        self.base_url = self.scraper_input.country.get_glassdoor_url()
+
+        self.session = create_session(proxies=self.proxies, is_tls=True, has_retry=True)
+        token = self._get_csrf_token()
+        self.headers["gd-csrf-token"] = token if token else self.fallback_token
+
+        location_id, location_type = self._get_location(
+            scraper_input.location, scraper_input.is_remote
+        )
+        if location_type is None:
+            logger.error("Glassdoor: location not parsed")
+            return JobResponse(jobs=[])
+        all_jobs: list[JobPost] = []
+        cursor = None
+
+        range_start = 1 + (scraper_input.offset // self.jobs_per_page)
+        tot_pages = (scraper_input.results_wanted // self.jobs_per_page) + 2
+        range_end = min(tot_pages, self.max_pages + 1)
+        for page in range(range_start, range_end):
+            logger.info(f"Glassdoor search page: {page}")
+            try:
+                jobs, cursor = self._fetch_jobs_page(
+                    scraper_input, location_id, location_type, page, cursor
+                )
+                all_jobs.extend(jobs)
+                if not jobs or len(all_jobs) >= scraper_input.results_wanted:
+                    all_jobs = all_jobs[: scraper_input.results_wanted]
+                    break
+            except Exception as e:
+                logger.error(f"Glassdoor: {str(e)}")
+                break
+        return JobResponse(jobs=all_jobs)
+
+    def _fetch_jobs_page(
        self,
        scraper_input: ScraperInput,
        location_id: int,
        location_type: str,
        page_num: int,
        cursor: str | None,
-    ) -> (list[JobPost], str | None):
+    ) -> Tuple[list[JobPost], str | None]:
        """
        Scrapes a page of Glassdoor for jobs with scraper_input criteria
        """
+        jobs = []
+        self.scraper_input = scraper_input
        try:
-            payload = self.add_payload(
-                scraper_input, location_id, location_type, page_num, cursor
-            )
+            payload = self._add_payload(location_id, location_type, page_num, cursor)
            response = self.session.post(
-                f"{self.url}/graph", headers=self.headers(), timeout=10, data=payload
+                f"{self.base_url}/graph",
+                headers=self.headers,
+                timeout_seconds=15,
+                data=payload,
            )
            if response.status_code != 200:
-                raise GlassdoorException(
-                    f"bad response status code: {response.status_code}"
-                )
+                exc_msg = f"bad response status code: {response.status_code}"
+                raise GlassdoorException(exc_msg)
            res_json = response.json()[0]
            if "errors" in res_json:
                raise ValueError("Error encountered in API response")
-        except Exception as e:
-            raise GlassdoorException(str(e))
+        except (
+            requests.exceptions.ReadTimeout,
+            GlassdoorException,
+            ValueError,
+            Exception,
+        ) as e:
+            logger.error(f"Glassdoor: {str(e)}")
+            return jobs, None

        jobs_data = res_json["data"]["jobListings"]["jobListings"]

-        jobs = []
        with ThreadPoolExecutor(max_workers=self.jobs_per_page) as executor:
-            future_to_job_data = {executor.submit(self.process_job, job): job for job in jobs_data}
+            future_to_job_data = {
+                executor.submit(self._process_job, job): job for job in jobs_data
+            }
            for future in as_completed(future_to_job_data):
                try:
                    job_post = future.result()
                    if job_post:
                        jobs.append(job_post)
                except Exception as exc:
-                    raise GlassdoorException(f'Glassdoor generated an exception: {exc}')
+                    raise GlassdoorException(f"Glassdoor generated an exception: {exc}")

        return jobs, self.get_cursor_for_page(
            res_json["data"]["jobListings"]["paginationCursors"], page_num + 1
        )

-    def process_job(self, job_data):
-        """Processes a single job and fetches its description."""
+    def _get_csrf_token(self):
+        """
+        Fetches csrf token needed for API by visiting a generic page
+        """
+        res = self.session.get(
+            f"{self.base_url}/Job/computer-science-jobs.htm", headers=self.headers
+        )
+        pattern = r'"token":\s*"([^"]+)"'
+        matches = re.findall(pattern, res.text)
+        token = None
+        if matches:
+            token = matches[0]
+        return token
+
+    def _process_job(self, job_data):
+        """
+        Processes a single job and fetches its description.
+        """
        job_id = job_data["jobview"]["job"]["listingId"]
-        job_url = f'{self.url}job-listing/j?jl={job_id}'
+        job_url = f"{self.base_url}job-listing/j?jl={job_id}"
        if job_url in self.seen_urls:
            return None
        self.seen_urls.add(job_url)
        job = job_data["jobview"]
        title = job["job"]["jobTitleText"]
        company_name = job["header"]["employerNameFromSearch"]
-        company_id = job_data['jobview']['header']['employer']['id']
+        company_id = job_data["jobview"]["header"]["employer"]["id"]
        location_name = job["header"].get("locationName", "")
        location_type = job["header"].get("locationType", "")
        age_in_days = job["header"].get("ageInDays")
        is_remote, location = False, None
-        date_posted = (datetime.now() - timedelta(days=age_in_days)).date() if age_in_days is not None else None
+        date_diff = (datetime.now() - timedelta(days=age_in_days)).date()
+        date_posted = date_diff if age_in_days is not None else None

        if location_type == "S":
            is_remote = True
@@ -106,15 +184,15 @@ class GlassdoorScraper(Scraper):
            location = self.parse_location(location_name)

        compensation = self.parse_compensation(job["header"])
-
        try:
-            description = self.fetch_job_description(job_id)
+            description = self._fetch_job_description(job_id)
        except:
            description = None
-
-        job_post = JobPost(
+        company_url = f"{self.base_url}Overview/W-EI_IE{company_id}.htm"
+        return JobPost(
+            id=str(job_id),
            title=title,
-            company_url=f"{self.url}Overview/W-EI_IE{company_id}.htm" if company_id else None,
+            company_url=company_url if company_id else None,
            company_name=company_name,
            date_posted=date_posted,
            job_url=job_url,
@@ -123,62 +201,20 @@ class GlassdoorScraper(Scraper):
            is_remote=is_remote,
            description=description,
            emails=extract_emails_from_text(description) if description else None,
-            num_urgent_words=count_urgent_words(description) if description else None,
        )
-        return job_post

-    def scrape(self, scraper_input: ScraperInput) -> JobResponse:
+    def _fetch_job_description(self, job_id):
        """
-        Scrapes Glassdoor for jobs with scraper_input criteria.
-        :param scraper_input: Information about job search criteria.
-        :return: JobResponse containing a list of jobs.
+        Fetches the job description for a single job ID.
        """
-        scraper_input.results_wanted = min(900, scraper_input.results_wanted)
-        self.country = scraper_input.country
-        self.url = self.country.get_url()
-
-        location_id, location_type = self.get_location(
-            scraper_input.location, scraper_input.is_remote
-        )
-        all_jobs: list[JobPost] = []
-        cursor = None
-        max_pages = 30
-        self.session = create_session(self.proxy, is_tls=False, has_retry=True)
-        self.session.get(self.url)
-
-        try:
-            for page in range(
-                1 + (scraper_input.offset // self.jobs_per_page),
-                min(
-                    (scraper_input.results_wanted // self.jobs_per_page) + 2,
-                    max_pages + 1,
-                ),
-            ):
-                try:
-                    jobs, cursor = self.fetch_jobs_page(
-                        scraper_input, location_id, location_type, page, cursor
-                    )
-                    all_jobs.extend(jobs)
-                    if len(all_jobs) >= scraper_input.results_wanted:
-                        all_jobs = all_jobs[: scraper_input.results_wanted]
-                        break
-                except Exception as e:
-                    raise GlassdoorException(str(e))
-        except Exception as e:
-            raise GlassdoorException(str(e))
-
-        return JobResponse(jobs=all_jobs)
-
-    def fetch_job_description(self, job_id):
-        """Fetches the job description for a single job ID."""
-        url = f"{self.url}/graph"
+        url = f"{self.base_url}/graph"
        body = [
            {
                "operationName": "JobDetailQuery",
                "variables": {
                    "jl": job_id,
                    "queryString": "q",
-                    "pageTypeEnum": "SERP"
+                    "pageTypeEnum": "SERP",
                },
                "query": """
                query JobDetailQuery($jl: Long!, $queryString: String, $pageTypeEnum: PageTypeEnum) {
@@ -193,22 +229,89 @@ class GlassdoorScraper(Scraper):
                        __typename
                    }
                }
-                """
+                """,
            }
        ]
-        response = requests.post(url, json=body, headers=GlassdoorScraper.headers())
-        if response.status_code != 200:
+        res = requests.post(url, json=body, headers=self.headers)
+        if res.status_code != 200:
            return None
-        data = response.json()[0]
-        desc = data['data']['jobview']['job']['description']
+        data = res.json()[0]
+        desc = data["data"]["jobview"]["job"]["description"]
+        if self.scraper_input.description_format == DescriptionFormat.MARKDOWN:
+            desc = markdown_converter(desc)
        return desc

+    def _get_location(self, location: str, is_remote: bool) -> (int, str):
+        if not location or is_remote:
+            return "11047", "STATE"  # remote options
+        url = f"{self.base_url}/findPopularLocationAjax.htm?maxLocationsToReturn=10&term={location}"
+        res = self.session.get(url, headers=self.headers)
+        if res.status_code != 200:
+            if res.status_code == 429:
+                err = f"429 Response - Blocked by Glassdoor for too many requests"
+                logger.error(err)
+                return None, None
+            else:
+                err = f"Glassdoor response status code {res.status_code}"
+                err += f" - {res.text}"
+                logger.error(f"Glassdoor response status code {res.status_code}")
+                return None, None
+        items = res.json()
+
+        if not items:
+            raise ValueError(f"Location '{location}' not found on Glassdoor")
+        location_type = items[0]["locationType"]
+        if location_type == "C":
+            location_type = "CITY"
+        elif location_type == "S":
+            location_type = "STATE"
+        elif location_type == "N":
+            location_type = "COUNTRY"
+        return int(items[0]["locationId"]), location_type
+
+    def _add_payload(
+        self,
+        location_id: int,
+        location_type: str,
+        page_num: int,
+        cursor: str | None = None,
+    ) -> str:
+        fromage = None
+        if self.scraper_input.hours_old:
+            fromage = max(self.scraper_input.hours_old // 24, 1)
+        filter_params = []
+        if self.scraper_input.easy_apply:
+            filter_params.append({"filterKey": "applicationType", "values": "1"})
+        if fromage:
+            filter_params.append({"filterKey": "fromAge", "values": str(fromage)})
+        payload = {
+            "operationName": "JobSearchResultsQuery",
+            "variables": {
+                "excludeJobListingIds": [],
+                "filterParams": filter_params,
+                "keyword": self.scraper_input.search_term,
+                "numJobsToShow": 30,
+                "locationType": location_type,
+                "locationId": int(location_id),
+                "parameterUrlInput": f"IL.0,12_I{location_type}{location_id}",
+                "pageNumber": page_num,
+                "pageCursor": cursor,
+                "fromage": fromage,
+                "sort": "date",
+            },
+            "query": self.query_template,
+        }
+        if self.scraper_input.job_type:
+            payload["variables"]["filterParams"].append(
+                {"filterKey": "jobType", "values": self.scraper_input.job_type.value[0]}
+            )
+        return json.dumps([payload])
+
    @staticmethod
    def parse_compensation(data: dict) -> Optional[Compensation]:
        pay_period = data.get("payPeriod")
        adjusted_pay = data.get("payPeriodAdjustedPay")
        currency = data.get("payCurrency", "USD")
-
        if not pay_period or not adjusted_pay:
            return None

@@ -219,7 +322,6 @@ class GlassdoorScraper(Scraper):
            interval = CompensationInterval.get_interval(pay_period)
        min_amount = int(adjusted_pay.get("p10") // 1)
        max_amount = int(adjusted_pay.get("p90") // 1)
-
        return Compensation(
            interval=interval,
            min_amount=min_amount,
@@ -227,59 +329,44 @@ class GlassdoorScraper(Scraper):
            currency=currency,
        )

-    def get_location(self, location: str, is_remote: bool) -> (int, str):
-        if not location or is_remote:
-            return "11047", "STATE"  # remote options
-        url = f"{self.url}/findPopularLocationAjax.htm?maxLocationsToReturn=10&term={location}"
-        session = create_session(self.proxy, has_retry=True)
-        response = session.get(url)
-        if response.status_code != 200:
-            raise GlassdoorException(
-                f"bad response status code: {response.status_code}"
-            )
-        items = response.json()
-        if not items:
-            raise ValueError(f"Location '{location}' not found on Glassdoor")
-        location_type = items[0]["locationType"]
-        if location_type == "C":
-            location_type = "CITY"
-        elif location_type == "S":
-            location_type = "STATE"
-        elif location_type == 'N':
-            location_type = "COUNTRY"
-        return int(items[0]["locationId"]), location_type
+    @staticmethod
+    def get_job_type_enum(job_type_str: str) -> list[JobType] | None:
+        for job_type in JobType:
+            if job_type_str in job_type.value:
+                return [job_type]

    @staticmethod
-    def add_payload(
-        scraper_input,
-        location_id: int,
-        location_type: str,
-        page_num: int,
-        cursor: str | None = None,
-    ) -> str:
-        # `fromage` is the posting time filter in days
-        fromage = max(scraper_input.hours_old // 24, 1) if scraper_input.hours_old else None
-        filter_params = []
-        if scraper_input.easy_apply:
-            filter_params.append({"filterKey": "applicationType", "values": "1"})
-        if fromage:
-            filter_params.append({"filterKey": "fromAge", "values": str(fromage)})
-        payload = {
-            "operationName": "JobSearchResultsQuery",
-            "variables": {
-                "excludeJobListingIds": [],
-                "filterParams": filter_params,
-                "keyword": scraper_input.search_term,
-                "numJobsToShow": 30,
-                "locationType": location_type,
-                "locationId": int(location_id),
-                "parameterUrlInput": f"IL.0,12_I{location_type}{location_id}",
-                "pageNumber": page_num,
-                "pageCursor": cursor,
-                "fromage": fromage,
-                "sort": "date"
-            },
-            "query": """
+    def parse_location(location_name: str) -> Location | None:
+        if not location_name or location_name == "Remote":
+            return
+        city, _, state = location_name.partition(", ")
+        return Location(city=city, state=state)
+
+    @staticmethod
+    def get_cursor_for_page(pagination_cursors, page_num):
+        for cursor_data in pagination_cursors:
+            if cursor_data["pageNumber"] == page_num:
+                return cursor_data["cursor"]
+
+    fallback_token = "Ft6oHEWlRZrxDww95Cpazw:0pGUrkb2y3TyOpAIqF2vbPmUXoXVkD3oEGDVkvfeCerceQ5-n8mBg3BovySUIjmCPHCaW0H2nQVdqzbtsYqf4Q:wcqRqeegRUa9MVLJGyujVXB7vWFPjdaS1CtrrzJq-ok"
+    headers = {
+        "authority": "www.glassdoor.com",
+        "accept": "*/*",
+        "accept-language": "en-US,en;q=0.9",
+        "apollographql-client-name": "job-search-next",
+        "apollographql-client-version": "4.65.5",
+        "content-type": "application/json",
+        "origin": "https://www.glassdoor.com",
+        "referer": "https://www.glassdoor.com/",
+        "sec-ch-ua": '"Chromium";v="118", "Google Chrome";v="118", "Not=A?Brand";v="99"',
+        "sec-ch-ua-mobile": "?0",
+        "sec-ch-ua-platform": '"macOS"',
+        "sec-fetch-dest": "empty",
+        "sec-fetch-mode": "cors",
+        "sec-fetch-site": "same-origin",
+        "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
+    }
+    query_template = """
            query JobSearchResultsQuery(
                $excludeJobListingIds: [Long!], 
                $keyword: String, 
@@ -444,55 +531,4 @@ class GlassdoorScraper(Scraper):
                }
                __typename
            }
-            """
-        }
-
-        if scraper_input.job_type:
-            payload["variables"]["filterParams"].append(
-                {"filterKey": "jobType", "values": scraper_input.job_type.value[0]}
-            )
-        return json.dumps([payload])
-
-    @staticmethod
-    def get_job_type_enum(job_type_str: str) -> list[JobType] | None:
-        for job_type in JobType:
-            if job_type_str in job_type.value:
-                return [job_type]
-
-    @staticmethod
-    def parse_location(location_name: str) -> Location | None:
-        if not location_name or location_name == "Remote":
-            return
-        city, _, state = location_name.partition(", ")
-        return Location(city=city, state=state)
-
-    @staticmethod
-    def get_cursor_for_page(pagination_cursors, page_num):
-        for cursor_data in pagination_cursors:
-            if cursor_data["pageNumber"] == page_num:
-                return cursor_data["cursor"]
-
-    @staticmethod
-    def headers() -> dict:
-        """
-        Returns headers needed for requests
-        :return: dict - Dictionary containing headers
-        """
-        return {
-            "authority": "www.glassdoor.com",
-            "accept": "*/*",
-            "accept-language": "en-US,en;q=0.9",
-            "apollographql-client-name": "job-search-next",
-            "apollographql-client-version": "4.65.5",
-            "content-type": "application/json",
-            "gd-csrf-token": "Ft6oHEWlRZrxDww95Cpazw:0pGUrkb2y3TyOpAIqF2vbPmUXoXVkD3oEGDVkvfeCerceQ5-n8mBg3BovySUIjmCPHCaW0H2nQVdqzbtsYqf4Q:wcqRqeegRUa9MVLJGyujVXB7vWFPjdaS1CtrrzJq-ok",
-            "origin": "https://www.glassdoor.com",
-            "referer": "https://www.glassdoor.com/",
-            "sec-ch-ua": '"Chromium";v="118", "Google Chrome";v="118", "Not=A?Brand";v="99"',
-            "sec-ch-ua-mobile": "?0",
-            "sec-ch-ua-platform": '"macOS"',
-            "sec-fetch-dest": "empty",
-            "sec-fetch-mode": "cors",
-            "sec-fetch-site": "same-origin",
-            "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
-        }
+    """
--- a/src/jobspy/scrapers/indeed/init.py
+++ b/src/jobspy/scrapers/indeed/init.py
@@ -4,24 +4,21 @@ jobspy.scrapers.indeed

 This module contains routines to scrape Indeed.
 """
-import re
-import math
-import json
-import requests
-from typing import Any
-from datetime import datetime

-from bs4 import BeautifulSoup
-from bs4.element import Tag
+from __future__ import annotations
+
+import math
+from typing import Tuple
+from datetime import datetime
 from concurrent.futures import ThreadPoolExecutor, Future

-from ..exceptions import IndeedException
+from .. import Scraper, ScraperInput, Site
 from ..utils import (
-    count_urgent_words,
    extract_emails_from_text,
-    create_session,
    get_enum_from_job_type,
-    logger
+    markdown_converter,
+    logger,
+    create_session,
 )
 from ...jobs import (
    JobPost,
@@ -30,122 +27,26 @@ from ...jobs import (
    Location,
    JobResponse,
    JobType,
+    DescriptionFormat,
 )
-from .. import Scraper, ScraperInput, Site


 class IndeedScraper(Scraper):
-    def __init__(self, proxy: str | None = None):
+    def __init__(self, proxies: list[str] | str | None = None):
        """
-        Initializes IndeedScraper with the Indeed job search url
+        Initializes IndeedScraper with the Indeed API url
        """
-        self.url = None
-        self.country = None
-        site = Site(Site.INDEED)
-        super().__init__(site, proxy=proxy)
+        super().__init__(Site.INDEED, proxies=proxies)

-        self.jobs_per_page = 25
+        self.session = create_session(proxies=self.proxies, is_tls=False)
+        self.scraper_input = None
+        self.jobs_per_page = 100
+        self.num_workers = 10
        self.seen_urls = set()
-
-    def scrape_page(
-        self, scraper_input: ScraperInput, page: int
-    ) -> list[JobPost]:
-        """
-        Scrapes a page of Indeed for jobs with scraper_input criteria
-        :param scraper_input:
-        :param page:
-        :return: jobs found on page, total number of jobs found for search
-        """
-        job_list = []
-        self.country = scraper_input.country
-        domain = self.country.indeed_domain_value
-        self.url = f"https://{domain}.indeed.com"
-
-        try:
-            session = create_session(self.proxy)
-            response = session.get(
-                f"{self.url}/m/jobs",
-                headers=self.get_headers(),
-                params=self.add_params(scraper_input, page),
-                allow_redirects=True,
-                timeout_seconds=10,
-            )
-            if response.status_code not in range(200, 400):
-                raise IndeedException(
-                    f"bad response with status code: {response.status_code}"
-                )
-        except Exception as e:
-            if "Proxy responded with" in str(e):
-                logger.error(f'Indeed: Bad proxy')
-            else:
-                logger.error(f'Indeed: {str(e)}')
-            return job_list
-
-        soup = BeautifulSoup(response.content, "html.parser")
-        if "did not match any jobs" in response.text:
-            return job_list
-
-        jobs = IndeedScraper.parse_jobs(
-            soup
-        )  #: can raise exception, handled by main scrape function
-
-        if (
-            not jobs.get("metaData", {})
-            .get("mosaicProviderJobCardsModel", {})
-            .get("results")
-        ):
-            raise IndeedException("No jobs found.")
-
-        def process_job(job: dict, job_detailed: dict) -> JobPost | None:
-            job_url = f'{self.url}/m/jobs/viewjob?jk={job["jobkey"]}'
-            job_url_client = f'{self.url}/viewjob?jk={job["jobkey"]}'
-            if job_url in self.seen_urls:
-                return None
-            self.seen_urls.add(job_url)
-            description = job_detailed['description']['html']
-
-
-            job_type = IndeedScraper.get_job_type(job)
-            timestamp_seconds = job["pubDate"] / 1000
-            date_posted = datetime.fromtimestamp(timestamp_seconds)
-            date_posted = date_posted.strftime("%Y-%m-%d")
-
-            job_post = JobPost(
-                title=job["normTitle"],
-                description=description,
-                company_name=job["company"],
-                company_url=f"{self.url}{job_detailed['employer']['relativeCompanyPageUrl']}" if job_detailed['employer'] else None,
-                location=Location(
-                    city=job.get("jobLocationCity"),
-                    state=job.get("jobLocationState"),
-                    country=self.country,
-                ),
-                job_type=job_type,
-                compensation=self.get_compensation(job, job_detailed),
-                date_posted=date_posted,
-                job_url=job_url_client,
-                emails=extract_emails_from_text(description) if description else None,
-                num_urgent_words=count_urgent_words(description)
-                if description
-                else None,
-                is_remote=IndeedScraper.is_job_remote(job, job_detailed, description)
-
-            )
-            return job_post
-
-        workers = 10
-        jobs = jobs["metaData"]["mosaicProviderJobCardsModel"]["results"]
-        job_keys = [job['jobkey'] for job in jobs]
-        jobs_detailed = self.get_job_details(job_keys)
-
-        with ThreadPoolExecutor(max_workers=workers) as executor:
-            job_results: list[Future] = [
-                executor.submit(process_job, job, job_detailed['job']) for job, job_detailed in zip(jobs, jobs_detailed)
-            ]
-
-        job_list = [result.result() for result in job_results if result.result()]
-
-        return job_list
+        self.headers = None
+        self.api_country_code = None
+        self.base_url = None
+        self.api_url = "https://apis.indeed.com/graphql"

    def scrape(self, scraper_input: ScraperInput) -> JobResponse:
        """
@@ -153,284 +54,381 @@ class IndeedScraper(Scraper):
        :param scraper_input:
        :return: job_response
        """
-        job_list = self.scrape_page(scraper_input, 0)
-        pages_processed = 1
+        self.scraper_input = scraper_input
+        domain, self.api_country_code = self.scraper_input.country.indeed_domain_value
+        self.base_url = f"https://{domain}.indeed.com"
+        self.headers = self.api_headers.copy()
+        self.headers["indeed-co"] = self.scraper_input.country.indeed_domain_value
+        job_list = []
+        page = 1

-        while len(self.seen_urls) < scraper_input.results_wanted:
-            pages_to_process = math.ceil((scraper_input.results_wanted - len(self.seen_urls)) / self.jobs_per_page)
-            new_jobs = False
-
-            with ThreadPoolExecutor(max_workers=10) as executor:
-                futures: list[Future] = [
-                    executor.submit(self.scrape_page, scraper_input, page + pages_processed)
-                    for page in range(pages_to_process)
-                ]
-
-                for future in futures:
-                    jobs = future.result()
-                    if jobs:
-                        job_list += jobs
-                        new_jobs = True
-                    if len(self.seen_urls) >= scraper_input.results_wanted:
-                        break
-
-            pages_processed += pages_to_process
-            if not new_jobs:
+        cursor = None
+        offset_pages = math.ceil(self.scraper_input.offset / 100)
+        for _ in range(offset_pages):
+            logger.info(f"Indeed skipping search page: {page}")
+            __, cursor = self._scrape_page(cursor)
+            if not __:
+                logger.info(f"Indeed found no jobs on page: {page}")
                break

+        while len(self.seen_urls) < scraper_input.results_wanted:
+            logger.info(f"Indeed search page: {page}")
+            jobs, cursor = self._scrape_page(cursor)
+            if not jobs:
+                logger.info(f"Indeed found no jobs on page: {page}")
+                break
+            job_list += jobs
+            page += 1
+        return JobResponse(jobs=job_list[: scraper_input.results_wanted])

-        if len(self.seen_urls) > scraper_input.results_wanted:
-            job_list = job_list[:scraper_input.results_wanted]
+    def _scrape_page(self, cursor: str | None) -> Tuple[list[JobPost], str | None]:
+        """
+        Scrapes a page of Indeed for jobs with scraper_input criteria
+        :param cursor:
+        :return: jobs found on page, next page cursor
+        """
+        jobs = []
+        new_cursor = None
+        filters = self._build_filters()
+        search_term = (
+            self.scraper_input.search_term.replace('"', '\\"')
+            if self.scraper_input.search_term
+            else ""
+        )
+        query = self.job_search_query.format(
+            what=(f'what: "{search_term}"' if search_term else ""),
+            location=(
+                f'location: {{where: "{self.scraper_input.location}", radius: {self.scraper_input.distance}, radiusUnit: MILES}}'
+                if self.scraper_input.location
+                else ""
+            ),
+            dateOnIndeed=self.scraper_input.hours_old,
+            cursor=f'cursor: "{cursor}"' if cursor else "",
+            filters=filters,
+        )
+        payload = {
+            "query": query,
+        }
+        api_headers = self.api_headers.copy()
+        api_headers["indeed-co"] = self.api_country_code
+        response = self.session.post(
+            self.api_url,
+            headers=api_headers,
+            json=payload,
+            timeout=10,
+        )
+        if response.status_code != 200:
+            logger.info(
+                f"Indeed responded with status code: {response.status_code} (submit GitHub issue if this appears to be a bug)"
+            )
+            return jobs, new_cursor
+        data = response.json()
+        jobs = data["data"]["jobSearch"]["results"]
+        new_cursor = data["data"]["jobSearch"]["pageInfo"]["nextCursor"]

-        return JobResponse(jobs=job_list)
+        with ThreadPoolExecutor(max_workers=self.num_workers) as executor:
+            job_results: list[Future] = [
+                executor.submit(self._process_job, job["job"]) for job in jobs
+            ]
+        job_list = [result.result() for result in job_results if result.result()]
+        return job_list, new_cursor
+
+    def _build_filters(self):
+        """
+        Builds the filters dict for job type/is_remote. If hours_old is provided, composite filter for job_type/is_remote is not possible.
+        IndeedApply: filters: { keyword: { field: "indeedApplyScope", keys: ["DESKTOP"] } }
+        """
+        filters_str = ""
+        if self.scraper_input.hours_old:
+            filters_str = """
+            filters: {{
+                date: {{
+                  field: "dateOnIndeed",
+                  start: "{start}h"
+                }}
+            }}
+            """.format(
+                start=self.scraper_input.hours_old
+            )
+        elif self.scraper_input.easy_apply:
+            filters_str = """
+            filters: {
+                keyword: {
+                  field: "indeedApplyScope",
+                  keys: ["DESKTOP"]
+                }
+            }
+            """
+        elif self.scraper_input.job_type or self.scraper_input.is_remote:
+            job_type_key_mapping = {
+                JobType.FULL_TIME: "CF3CP",
+                JobType.PART_TIME: "75GKK",
+                JobType.CONTRACT: "NJXCK",
+                JobType.INTERNSHIP: "VDTG7",
+            }
+
+            keys = []
+            if self.scraper_input.job_type:
+                key = job_type_key_mapping[self.scraper_input.job_type]
+                keys.append(key)
+
+            if self.scraper_input.is_remote:
+                keys.append("DSQF7")
+
+            if keys:
+                keys_str = '", "'.join(keys)  # Prepare your keys string
+                filters_str = f"""
+                filters: {{
+                  composite: {{
+                    filters: [{{
+                      keyword: {{
+                        field: "attributes",
+                        keys: ["{keys_str}"]
+                      }}
+                    }}]
+                  }}
+                }}
+                """
+        return filters_str
+
+    def _process_job(self, job: dict) -> JobPost | None:
+        """
+        Parses the job dict into JobPost model
+        :param job: dict to parse
+        :return: JobPost if it's a new job
+        """
+        job_url = f'{self.base_url}/viewjob?jk={job["key"]}'
+        if job_url in self.seen_urls:
+            return
+        self.seen_urls.add(job_url)
+        description = job["description"]["html"]
+        if self.scraper_input.description_format == DescriptionFormat.MARKDOWN:
+            description = markdown_converter(description)
+
+        job_type = self._get_job_type(job["attributes"])
+        timestamp_seconds = job["datePublished"] / 1000
+        date_posted = datetime.fromtimestamp(timestamp_seconds).strftime("%Y-%m-%d")
+        employer = job["employer"].get("dossier") if job["employer"] else None
+        employer_details = employer.get("employerDetails", {}) if employer else {}
+        rel_url = job["employer"]["relativeCompanyPageUrl"] if job["employer"] else None
+        return JobPost(
+            id=str(job["key"]),
+            title=job["title"],
+            description=description,
+            company_name=job["employer"].get("name") if job.get("employer") else None,
+            company_url=(f"{self.base_url}{rel_url}" if job["employer"] else None),
+            company_url_direct=(
+                employer["links"]["corporateWebsite"] if employer else None
+            ),
+            location=Location(
+                city=job.get("location", {}).get("city"),
+                state=job.get("location", {}).get("admin1Code"),
+                country=job.get("location", {}).get("countryCode"),
+            ),
+            job_type=job_type,
+            compensation=self._get_compensation(job),
+            date_posted=date_posted,
+            job_url=job_url,
+            job_url_direct=(
+                job["recruit"].get("viewJobUrl") if job.get("recruit") else None
+            ),
+            emails=extract_emails_from_text(description) if description else None,
+            is_remote=self._is_job_remote(job, description),
+            company_addresses=(
+                employer_details["addresses"][0]
+                if employer_details.get("addresses")
+                else None
+            ),
+            company_industry=(
+                employer_details["industry"]
+                .replace("Iv1", "")
+                .replace("_", " ")
+                .title()
+                if employer_details.get("industry")
+                else None
+            ),
+            company_num_employees=employer_details.get("employeesLocalizedLabel"),
+            company_revenue=employer_details.get("revenueLocalizedLabel"),
+            company_description=employer_details.get("briefDescription"),
+            ceo_name=employer_details.get("ceoName"),
+            ceo_photo_url=employer_details.get("ceoPhotoUrl"),
+            logo_photo_url=(
+                employer["images"].get("squareLogoUrl")
+                if employer and employer.get("images")
+                else None
+            ),
+            banner_photo_url=(
+                employer["images"].get("headerImageUrl")
+                if employer and employer.get("images")
+                else None
+            ),
+        )

    @staticmethod
-    def get_job_type(job: dict) -> list[JobType] | None:
+    def _get_job_type(attributes: list) -> list[JobType]:
        """
-        Parses the job to get list of job types
-        :param job:
-        :return:
+        Parses the attributes to get list of job types
+        :param attributes:
+        :return: list of JobType
        """
        job_types: list[JobType] = []
-        for taxonomy in job["taxonomyAttributes"]:
-            if taxonomy["label"] == "job-types":
-                for i in range(len(taxonomy["attributes"])):
-                    label = taxonomy["attributes"][i].get("label")
-                    if label:
-                        job_type_str = label.replace("-", "").replace(" ", "").lower()
-                        job_type = get_enum_from_job_type(job_type_str)
-                        if job_type:
-                            job_types.append(job_type)
+        for attribute in attributes:
+            job_type_str = attribute["label"].replace("-", "").replace(" ", "").lower()
+            job_type = get_enum_from_job_type(job_type_str)
+            if job_type:
+                job_types.append(job_type)
        return job_types

    @staticmethod
-    def get_compensation(job: dict, job_detailed: dict) -> Compensation:
+    def _get_compensation(job: dict) -> Compensation | None:
        """
-        Parses the job to get
+        Parses the job to get compensation
+        :param job:
        :param job:
-        :param job_detailed:
        :return: compensation object
        """
-        comp = job_detailed['compensation']['baseSalary']
-        if comp:
-            interval = IndeedScraper.get_correct_interval(comp['unitOfWork'])
-            if interval:
-                return Compensation(
-                    interval=interval,
-                    min_amount=round(comp['range'].get('min'), 2) if comp['range'].get('min') is not None else None,
-                    max_amount=round(comp['range'].get('max'), 2) if comp['range'].get('max') is not None else None,
-                    currency=job_detailed['compensation']['currencyCode']
-                )
-
-        extracted_salary = job.get("extractedSalary")
-        compensation = None
-        if extracted_salary:
-            salary_snippet = job.get("salarySnippet")
-            currency = salary_snippet.get("currency") if salary_snippet else None
-            interval = (extracted_salary.get("type"),)
-            if isinstance(interval, tuple):
-                interval = interval[0]
-
-            interval = interval.upper()
-            if interval in CompensationInterval.__members__:
-                compensation = Compensation(
-                    interval=CompensationInterval[interval],
-                    min_amount=int(extracted_salary.get("min")),
-                    max_amount=int(extracted_salary.get("max")),
-                    currency=currency,
-                )
-        return compensation
-
-    @staticmethod
-    def parse_jobs(soup: BeautifulSoup) -> dict:
-        """
-        Parses the jobs from the soup object
-        :param soup:
-        :return: jobs
-        """
-
-        def find_mosaic_script() -> Tag | None:
-            """
-            Finds jobcards script tag
-            :return: script_tag
-            """
-            script_tags = soup.find_all("script")
-
-            for tag in script_tags:
-                if (
-                    tag.string
-                    and "mosaic.providerData" in tag.string
-                    and "mosaic-provider-jobcards" in tag.string
-                ):
-                    return tag
+        comp = job["compensation"]["baseSalary"]
+        if not comp:
            return None
-
-        script_tag = find_mosaic_script()
-
-        if script_tag:
-            script_str = script_tag.string
-            pattern = r'window.mosaic.providerData\["mosaic-provider-jobcards"\]\s*=\s*({.*?});'
-            p = re.compile(pattern, re.DOTALL)
-            m = p.search(script_str)
-            if m:
-                jobs = json.loads(m.group(1).strip())
-                return jobs
-            else:
-                raise IndeedException("Could not find mosaic provider job cards data")
-        else:
-            raise IndeedException(
-                "Could not find any results for the search"
-            )
-
-    @staticmethod
-    def get_headers():
-        return {
-          'Host': 'www.indeed.com',
-          'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
-          'sec-fetch-site': 'same-origin',
-          'sec-fetch-dest': 'document',
-          'accept-language': 'en-US,en;q=0.9',
-          'sec-fetch-mode': 'navigate',
-          'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 16_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 Indeed App 192.0',
-          'referer': 'https://www.indeed.com/m/jobs?q=software%20intern&l=Dallas%2C%20TX&from=serpso&rq=1&rsIdx=3',
-        }
-
-    @staticmethod
-    def add_params(scraper_input: ScraperInput, page: int) -> dict[str, str | Any]:
-        # `fromage` is the posting time filter in days
-        fromage = max(scraper_input.hours_old // 24, 1) if scraper_input.hours_old else None
-        params = {
-            "q": scraper_input.search_term,
-            "l": scraper_input.location if scraper_input.location else scraper_input.country.value[0].split(',')[-1],
-            "filter": 0,
-            "start": scraper_input.offset + page * 10,
-            "sort": "date",
-            "fromage": fromage,
-        }
-        if scraper_input.distance:
-            params["radius"] = scraper_input.distance
-
-        sc_values = []
-        if scraper_input.is_remote:
-            sc_values.append("attr(DSQF7)")
-        if scraper_input.job_type:
-            sc_values.append("jt({})".format(scraper_input.job_type.value[0]))
-
-        if sc_values:
-            params["sc"] = "0kf:" + "".join(sc_values) + ";"
-
-        if scraper_input.easy_apply:
-            params['iafilter'] = 1
-
-        return params
-
-    @staticmethod
-    def is_job_remote(job: dict, job_detailed: dict, description: str) -> bool:
-        remote_keywords = ['remote', 'work from home', 'wfh']
-        is_remote_in_attributes = any(
-            any(keyword in attr['label'].lower() for keyword in remote_keywords)
-            for attr in job_detailed['attributes']
+        interval = IndeedScraper._get_compensation_interval(comp["unitOfWork"])
+        if not interval:
+            return None
+        min_range = comp["range"].get("min")
+        max_range = comp["range"].get("max")
+        return Compensation(
+            interval=interval,
+            min_amount=round(min_range, 2) if min_range is not None else None,
+            max_amount=round(max_range, 2) if max_range is not None else None,
+            currency=job["compensation"]["currencyCode"],
+        )
+
+    @staticmethod
+    def _is_job_remote(job: dict, description: str) -> bool:
+        """
+        Searches the description, location, and attributes to check if job is remote
+        """
+        remote_keywords = ["remote", "work from home", "wfh"]
+        is_remote_in_attributes = any(
+            any(keyword in attr["label"].lower() for keyword in remote_keywords)
+            for attr in job["attributes"]
+        )
+        is_remote_in_description = any(
+            keyword in description.lower() for keyword in remote_keywords
        )
-        is_remote_in_description = any(keyword in description.lower() for keyword in remote_keywords)
        is_remote_in_location = any(
-            keyword in job_detailed['location']['formatted']['long'].lower()
+            keyword in job["location"]["formatted"]["long"].lower()
            for keyword in remote_keywords
        )
-        is_remote_in_taxonomy = any(
-            taxonomy["label"] == "remote" and len(taxonomy["attributes"]) > 0
-            for taxonomy in job.get("taxonomyAttributes", [])
+        return (
+            is_remote_in_attributes or is_remote_in_description or is_remote_in_location
        )
-        return is_remote_in_attributes or is_remote_in_description or is_remote_in_location or is_remote_in_taxonomy
-
-    def get_job_details(self, job_keys: list[str]) -> dict:
-        """
-        Queries the GraphQL endpoint for detailed job information for the given job keys.
-        """
-        url = "https://apis.indeed.com/graphql"
-        headers = {
-            'Host': 'apis.indeed.com',
-            'content-type': 'application/json',
-            'indeed-api-key': '161092c2017b5bbab13edb12461a62d5a833871e7cad6d9d475304573de67ac8',
-            'accept': 'application/json',
-            'indeed-locale': 'en-US',
-            'accept-language': 'en-US,en;q=0.9',
-            'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 16_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 Indeed App 193.1',
-            'indeed-app-info': 'appv=193.1; appid=com.indeed.jobsearch; osv=16.6.1; os=ios; dtype=phone',
-            'indeed-co': 'US',
-        }
-
-        job_keys_gql = '[' + ', '.join(f'"{key}"' for key in job_keys) + ']'
-
-        payload = {
-            "query": f"""
-            query GetJobData {{
-              jobData(input: {{
-                jobKeys: {job_keys_gql}
-              }}) {{
-                results {{
-                  job {{
-                    key
-                    title
-                    description {{
-                      html
-                    }}
-                    location {{
-                      countryName
-                      countryCode
-                      city
-                      postalCode
-                      streetAddress
-                      formatted {{
-                        short
-                        long
-                      }}
-                    }}
-                    compensation {{
-                      baseSalary {{
-                        unitOfWork
-                        range {{
-                          ... on Range {{
-                            min
-                            max
-                          }}
-                        }}
-                      }}
-                      currencyCode
-                    }}
-                    attributes {{
-                      label
-                    }}
-                    employer {{
-                      relativeCompanyPageUrl
-                    }}
-                    recruit {{
-                      viewJobUrl
-                      detailedSalary
-                      workSchedule
-                    }}
-                  }}
-                }}
-              }}
-            }}
-            """
-        }
-        response = requests.post(url, headers=headers, json=payload, proxies=self.proxy)
-        if response.status_code == 200:
-            return response.json()['data']['jobData']['results']
-        else:
-            return {}

    @staticmethod
-    def get_correct_interval(interval: str) -> CompensationInterval:
+    def _get_compensation_interval(interval: str) -> CompensationInterval:
        interval_mapping = {
            "DAY": "DAILY",
            "YEAR": "YEARLY",
            "HOUR": "HOURLY",
            "WEEK": "WEEKLY",
-            "MONTH": "MONTHLY"
+            "MONTH": "MONTHLY",
        }
        mapped_interval = interval_mapping.get(interval.upper(), None)
        if mapped_interval and mapped_interval in CompensationInterval.__members__:
            return CompensationInterval[mapped_interval]
        else:
            raise ValueError(f"Unsupported interval: {interval}")
+
+    api_headers = {
+        "Host": "apis.indeed.com",
+        "content-type": "application/json",
+        "indeed-api-key": "161092c2017b5bbab13edb12461a62d5a833871e7cad6d9d475304573de67ac8",
+        "accept": "application/json",
+        "indeed-locale": "en-US",
+        "accept-language": "en-US,en;q=0.9",
+        "user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 Indeed App 193.1",
+        "indeed-app-info": "appv=193.1; appid=com.indeed.jobsearch; osv=16.6.1; os=ios; dtype=phone",
+    }
+    job_search_query = """
+        query GetJobData {{
+          jobSearch(
+            {what}
+            {location}
+            includeSponsoredResults: NONE
+            limit: 100
+            sort: DATE
+            {cursor}
+            {filters}
+          ) {{
+            pageInfo {{
+              nextCursor
+            }}
+            results {{
+              trackingKey
+              job {{
+                key
+                title
+                datePublished
+                dateOnIndeed
+                description {{
+                  html
+                }}
+                location {{
+                  countryName
+                  countryCode
+                  admin1Code
+                  city
+                  postalCode
+                  streetAddress
+                  formatted {{
+                    short
+                    long
+                  }}
+                }}
+                compensation {{
+                  baseSalary {{
+                    unitOfWork
+                    range {{
+                      ... on Range {{
+                        min
+                        max
+                      }}
+                    }}
+                  }}
+                  currencyCode
+                }}
+                attributes {{
+                  key
+                  label
+                }}
+                employer {{
+                  relativeCompanyPageUrl
+                  name
+                  dossier {{
+                      employerDetails {{
+                        addresses
+                        industry
+                        employeesLocalizedLabel
+                        revenueLocalizedLabel
+                        briefDescription
+                        ceoName
+                        ceoPhotoUrl
+                      }}
+                      images {{
+                            headerImageUrl
+                            squareLogoUrl
+                      }}
+                      links {{
+                        corporateWebsite
+                    }}
+                  }}
+                }}
+                recruit {{
+                  viewJobUrl
+                  detailedSalary
+                  workSchedule
+                }}
+              }}
+            }}
+          }}
+        }}
+        """
--- a/src/jobspy/scrapers/linkedin/init.py
+++ b/src/jobspy/scrapers/linkedin/init.py
@@ -4,48 +4,62 @@ jobspy.scrapers.linkedin

 This module contains routines to scrape LinkedIn.
 """
+
+from __future__ import annotations
+
 import time
 import random
+import regex as re
 from typing import Optional
 from datetime import datetime

-import requests
-from requests.exceptions import ProxyError
-from threading import Lock
 from bs4.element import Tag
 from bs4 import BeautifulSoup
-from urllib.parse import urlparse, urlunparse
+from urllib.parse import urlparse, urlunparse, unquote

 from .. import Scraper, ScraperInput, Site
 from ..exceptions import LinkedInException
-from ..utils import create_session
+from ..utils import create_session, remove_attributes
 from ...jobs import (
    JobPost,
    Location,
    JobResponse,
    JobType,
    Country,
-    Compensation
+    Compensation,
+    DescriptionFormat,
 )
 from ..utils import (
-    count_urgent_words,
+    logger,
    extract_emails_from_text,
    get_enum_from_job_type,
-    currency_parser
+    currency_parser,
+    markdown_converter,
 )


 class LinkedInScraper(Scraper):
-    DELAY = 3
+    base_url = "https://www.linkedin.com"
+    delay = 3
+    band_delay = 4
+    jobs_per_page = 25

-    def __init__(self, proxy: Optional[str] = None):
+    def __init__(self, proxies: list[str] | str | None = None):
        """
        Initializes LinkedInScraper with the LinkedIn job search url
        """
-        site = Site(Site.LINKEDIN)
+        super().__init__(Site.LINKEDIN, proxies=proxies)
+        self.session = create_session(
+            proxies=self.proxies,
+            is_tls=False,
+            has_retry=True,
+            delay=5,
+            clear_cookies=True,
+        )
+        self.session.headers.update(self.headers)
+        self.scraper_input = None
        self.country = "worldwide"
-        self.url = "https://www.linkedin.com"
-        super().__init__(site, proxy=proxy)
+        self.job_url_direct_regex = re.compile(r'(?<=\?url=)[^"]+')

    def scrape(self, scraper_input: ScraperInput) -> JobResponse:
        """
@@ -53,67 +67,65 @@ class LinkedInScraper(Scraper):
        :param scraper_input:
        :return: job_response
        """
+        self.scraper_input = scraper_input
        job_list: list[JobPost] = []
        seen_urls = set()
-        url_lock = Lock()
-        page = scraper_input.offset // 25 + 25 if scraper_input.offset else 0
-
+        page = scraper_input.offset // 10 * 10 if scraper_input.offset else 0
+        request_count = 0
        seconds_old = (
-            scraper_input.hours_old * 3600
-            if scraper_input.hours_old
-            else None
+            scraper_input.hours_old * 3600 if scraper_input.hours_old else None
+        )
+        continue_search = (
+            lambda: len(job_list) < scraper_input.results_wanted and page < 1000
        )
-
-        def job_type_code(job_type_enum):
-            mapping = {
-                JobType.FULL_TIME: "F",
-                JobType.PART_TIME: "P",
-                JobType.INTERNSHIP: "I",
-                JobType.CONTRACT: "C",
-                JobType.TEMPORARY: "T",
-            }
-
-            return mapping.get(job_type_enum, "")
-
-        continue_search = lambda: len(job_list) < scraper_input.results_wanted and page < 1000
-
        while continue_search():
-            session = create_session(is_tls=False, has_retry=True, delay=5)
+            request_count += 1
+            logger.info(f"LinkedIn search page: {request_count}")
            params = {
                "keywords": scraper_input.search_term,
                "location": scraper_input.location,
                "distance": scraper_input.distance,
                "f_WT": 2 if scraper_input.is_remote else None,
-                "f_JT": job_type_code(scraper_input.job_type)
-                if scraper_input.job_type
-                else None,
+                "f_JT": (
+                    self.job_type_code(scraper_input.job_type)
+                    if scraper_input.job_type
+                    else None
+                ),
                "pageNum": 0,
-                "start": page + scraper_input.offset,
+                "start": page,
                "f_AL": "true" if scraper_input.easy_apply else None,
-                "f_C": ','.join(map(str, scraper_input.linkedin_company_ids)) if scraper_input.linkedin_company_ids else None,
-                "f_TPR": f"r{seconds_old}",
+                "f_C": (
+                    ",".join(map(str, scraper_input.linkedin_company_ids))
+                    if scraper_input.linkedin_company_ids
+                    else None
+                ),
            }
+            if seconds_old is not None:
+                params["f_TPR"] = f"r{seconds_old}"

            params = {k: v for k, v in params.items() if v is not None}
            try:
-                response = session.get(
-                    f"{self.url}/jobs-guest/jobs/api/seeMoreJobPostings/search?",
+                response = self.session.get(
+                    f"{self.base_url}/jobs-guest/jobs/api/seeMoreJobPostings/search?",
                    params=params,
-                    allow_redirects=True,
-                    proxies=self.proxy,
-                    headers=self.headers(),
                    timeout=10,
                )
-                response.raise_for_status()
-
-            except requests.HTTPError as e:
-                raise LinkedInException(
-                    f"bad response status code: {e.response.status_code}"
-                )
-            except ProxyError as e:
-                raise LinkedInException("bad proxy")
+                if response.status_code not in range(200, 400):
+                    if response.status_code == 429:
+                        err = (
+                            f"429 Response - Blocked by LinkedIn for too many requests"
+                        )
+                    else:
+                        err = f"LinkedIn response status code {response.status_code}"
+                        err += f" - {response.text}"
+                    logger.error(err)
+                    return JobResponse(jobs=job_list)
            except Exception as e:
-                raise LinkedInException(str(e))
+                if "Proxy responded with" in str(e):
+                    logger.error(f"LinkedIn: Bad proxy")
+                else:
+                    logger.error(f"LinkedIn: {str(e)}")
+                return JobResponse(jobs=job_list)

            soup = BeautifulSoup(response.text, "html.parser")
            job_cards = soup.find_all("div", class_="base-search-card")
@@ -126,30 +138,32 @@ class LinkedInScraper(Scraper):
                if href_tag and "href" in href_tag.attrs:
                    href = href_tag.attrs["href"].split("?")[0]
                    job_id = href.split("-")[-1]
-                    job_url = f"{self.url}/jobs/view/{job_id}"
+                    job_url = f"{self.base_url}/jobs/view/{job_id}"

-                with url_lock:
-                    if job_url in seen_urls:
-                        continue
-                    seen_urls.add(job_url)
-
-                # Call process_job directly without threading
+                if job_url in seen_urls:
+                    continue
+                seen_urls.add(job_url)
                try:
-                    job_post = self.process_job(job_card, job_url, scraper_input.full_description)
+                    fetch_desc = scraper_input.linkedin_fetch_description
+                    job_post = self._process_job(job_card, job_url, fetch_desc)
                    if job_post:
                        job_list.append(job_post)
+                    if not continue_search():
+                        break
                except Exception as e:
-                    raise LinkedInException("Exception occurred while processing jobs")
+                    raise LinkedInException(str(e))

            if continue_search():
-                time.sleep(random.uniform(LinkedInScraper.DELAY, LinkedInScraper.DELAY + 2))
-                page += 25
+                time.sleep(random.uniform(self.delay, self.delay + self.band_delay))
+                page += len(job_list)

        job_list = job_list[: scraper_input.results_wanted]
        return JobResponse(jobs=job_list)

-    def process_job(self, job_card: Tag, job_url: str, full_descr: bool) -> Optional[JobPost]:
-        salary_tag = job_card.find('span', class_='job-search-card__salary-info')
+    def _process_job(
+        self, job_card: Tag, job_url: str, full_descr: bool
+    ) -> Optional[JobPost]:
+        salary_tag = job_card.find("span", class_="job-search-card__salary-info")

        compensation = None
        if salary_tag:
@@ -178,26 +192,26 @@ class LinkedInScraper(Scraper):
        company = company_a_tag.get_text(strip=True) if company_a_tag else "N/A"

        metadata_card = job_card.find("div", class_="base-search-card__metadata")
-        location = self.get_location(metadata_card)
+        location = self._get_location(metadata_card)

        datetime_tag = (
            metadata_card.find("time", class_="job-search-card__listdate")
            if metadata_card
            else None
        )
-        date_posted = description = job_type = None
+        date_posted = None
        if datetime_tag and "datetime" in datetime_tag.attrs:
            datetime_str = datetime_tag["datetime"]
            try:
                date_posted = datetime.strptime(datetime_str, "%Y-%m-%d")
-            except Exception as e:
+            except:
                date_posted = None
-        benefits_tag = job_card.find("span", class_="result-benefits__text")
-        benefits = " ".join(benefits_tag.get_text().split()) if benefits_tag else None
+        job_details = {}
        if full_descr:
-            description, job_type = self.get_job_description(job_url)
+            job_details = self._get_job_details(job_url)

        return JobPost(
+            id=self._get_id(job_url),
            title=title,
            company_name=company,
            company_url=company_url,
@@ -205,31 +219,37 @@ class LinkedInScraper(Scraper):
            date_posted=date_posted,
            job_url=job_url,
            compensation=compensation,
-            benefits=benefits,
-            job_type=job_type,
-            description=description,
-            emails=extract_emails_from_text(description) if description else None,
-            num_urgent_words=count_urgent_words(description) if description else None,
+            job_type=job_details.get("job_type"),
+            description=job_details.get("description"),
+            job_url_direct=job_details.get("job_url_direct"),
+            emails=extract_emails_from_text(job_details.get("description")),
+            logo_photo_url=job_details.get("logo_photo_url"),
+            job_function=job_details.get("job_function"),
        )

-    def get_job_description(
-        self, job_page_url: str
-    ) -> tuple[None, None] | tuple[str | None, tuple[str | None, JobType | None]]:
+    def _get_id(self, url: str):
        """
-        Retrieves job description by going to the job page url
+        Extracts the job id from the job url
+        :param url:
+        :return: str
+        """
+        if not url:
+            return None
+        return url.split("/")[-1]
+
+    def _get_job_details(self, job_page_url: str) -> dict:
+        """
+        Retrieves job description and other job details by going to the job page url
        :param job_page_url:
-        :return: description or None
+        :return: dict
        """
        try:
-            session = create_session(is_tls=False, has_retry=True)
-            response = session.get(job_page_url, timeout=5, proxies=self.proxy)
+            response = self.session.get(job_page_url, timeout=5)
            response.raise_for_status()
-        except requests.HTTPError as e:
-            return None, None
-        except Exception as e:
-            return None, None
-        if response.url == "https://www.linkedin.com/signup":
-            return None, None
+        except:
+            return {}
+        if "linkedin.com/signup" in response.url:
+            return {}

        soup = BeautifulSoup(response.text, "html.parser")
        div_content = soup.find(
@@ -237,44 +257,33 @@ class LinkedInScraper(Scraper):
        )
        description = None
        if div_content is not None:
-            def remove_attributes(tag):
-                for attr in list(tag.attrs):
-                    del tag[attr]
-                return tag
-
            div_content = remove_attributes(div_content)
            description = div_content.prettify(formatter="html")
+            if self.scraper_input.description_format == DescriptionFormat.MARKDOWN:
+                description = markdown_converter(description)

-        def get_job_type(
-            soup_job_type: BeautifulSoup,
-        ) -> list[JobType] | None:
-            """
-            Gets the job type from job page
-            :param soup_job_type:
-            :return: JobType
-            """
-            h3_tag = soup_job_type.find(
-                "h3",
-                class_="description__job-criteria-subheader",
-                string=lambda text: "Employment type" in text,
+        h3_tag = soup.find(
+            "h3", text=lambda text: text and "Job function" in text.strip()
+        )
+
+        job_function = None
+        if h3_tag:
+            job_function_span = h3_tag.find_next(
+                "span", class_="description__job-criteria-text"
            )
+            if job_function_span:
+                job_function = job_function_span.text.strip()
+        return {
+            "description": description,
+            "job_type": self._parse_job_type(soup),
+            "job_url_direct": self._parse_job_url_direct(soup),
+            "logo_photo_url": soup.find("img", {"class": "artdeco-entity-image"}).get(
+                "data-delayed-url"
+            ),
+            "job_function": job_function,
+        }

-            employment_type = None
-            if h3_tag:
-                employment_type_span = h3_tag.find_next_sibling(
-                    "span",
-                    class_="description__job-criteria-text description__job-criteria-text--criteria",
-                )
-                if employment_type_span:
-                    employment_type = employment_type_span.get_text(strip=True)
-                    employment_type = employment_type.lower()
-                    employment_type = employment_type.replace("-", "")
-
-            return [get_enum_from_job_type(employment_type)] if employment_type else []
-
-        return description, get_job_type(soup)
-
-    def get_location(self, metadata_card: Optional[Tag]) -> Location:
+    def _get_location(self, metadata_card: Optional[Tag]) -> Location:
        """
        Extracts the location data from the job metadata card.
        :param metadata_card
@@ -296,28 +305,67 @@ class LinkedInScraper(Scraper):
                )
            elif len(parts) == 3:
                city, state, country = parts
-                location = Location(
-                    city=city,
-                    state=state,
-                    country=Country.from_string(country),
-                )
-
+                country = Country.from_string(country)
+                location = Location(city=city, state=state, country=country)
        return location

    @staticmethod
-    def headers() -> dict:
+    def _parse_job_type(soup_job_type: BeautifulSoup) -> list[JobType] | None:
+        """
+        Gets the job type from job page
+        :param soup_job_type:
+        :return: JobType
+        """
+        h3_tag = soup_job_type.find(
+            "h3",
+            class_="description__job-criteria-subheader",
+            string=lambda text: "Employment type" in text,
+        )
+        employment_type = None
+        if h3_tag:
+            employment_type_span = h3_tag.find_next_sibling(
+                "span",
+                class_="description__job-criteria-text description__job-criteria-text--criteria",
+            )
+            if employment_type_span:
+                employment_type = employment_type_span.get_text(strip=True)
+                employment_type = employment_type.lower()
+                employment_type = employment_type.replace("-", "")
+
+        return [get_enum_from_job_type(employment_type)] if employment_type else []
+
+    def _parse_job_url_direct(self, soup: BeautifulSoup) -> str | None:
+        """
+        Gets the job url direct from job page
+        :param soup:
+        :return: str
+        """
+        job_url_direct = None
+        job_url_direct_content = soup.find("code", id="applyUrl")
+        if job_url_direct_content:
+            job_url_direct_match = self.job_url_direct_regex.search(
+                job_url_direct_content.decode_contents().strip()
+            )
+            if job_url_direct_match:
+                job_url_direct = unquote(job_url_direct_match.group())
+
+        return job_url_direct
+
+    @staticmethod
+    def job_type_code(job_type_enum: JobType) -> str:
        return {
-            "authority": "www.linkedin.com",
-            "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
-            "accept-language": "en-US,en;q=0.9",
-            "cache-control": "max-age=0",
-            "sec-ch-ua": '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
-            # 'sec-ch-ua-mobile': '?0',
-            # 'sec-ch-ua-platform': '"macOS"',
-            # 'sec-fetch-dest': 'document',
-            # 'sec-fetch-mode': 'navigate',
-            # 'sec-fetch-site': 'none',
-            # 'sec-fetch-user': '?1',
-            "upgrade-insecure-requests": "1",
-            "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
-        }
+            JobType.FULL_TIME: "F",
+            JobType.PART_TIME: "P",
+            JobType.INTERNSHIP: "I",
+            JobType.CONTRACT: "C",
+            JobType.TEMPORARY: "T",
+        }.get(job_type_enum, "")
+
+    headers = {
+        "authority": "www.linkedin.com",
+        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
+        "accept-language": "en-US,en;q=0.9",
+        "cache-control": "max-age=0",
+        "upgrade-insecure-requests": "1",
+        "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
+    }
--- a/src/jobspy/scrapers/utils.py
+++ b/src/jobspy/scrapers/utils.py
@@ -1,35 +1,149 @@
+from __future__ import annotations
+
 import re
 import logging
-import numpy as np
+from itertools import cycle

-import tls_client
 import requests
+import tls_client
+import numpy as np
+from markdownify import markdownify as md
 from requests.adapters import HTTPAdapter, Retry

 from ..jobs import JobType

 logger = logging.getLogger("JobSpy")
+logger.propagate = False
 if not logger.handlers:
-    logger.setLevel(logging.ERROR)
+    logger.setLevel(logging.INFO)
    console_handler = logging.StreamHandler()
-    console_handler.setLevel(logging.ERROR)
-    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
+    format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+    formatter = logging.Formatter(format)
    console_handler.setFormatter(formatter)
    logger.addHandler(console_handler)


-def count_urgent_words(description: str) -> int:
-    """
-    Count the number of urgent words or phrases in a job description.
-    """
-    urgent_patterns = re.compile(
-        r"\burgen(t|cy)|\bimmediate(ly)?\b|start asap|\bhiring (now|immediate(ly)?)\b",
-        re.IGNORECASE,
-    )
-    matches = re.findall(urgent_patterns, description)
-    count = len(matches)
+class RotatingProxySession:
+    def __init__(self, proxies=None):
+        if isinstance(proxies, str):
+            self.proxy_cycle = cycle([self.format_proxy(proxies)])
+        elif isinstance(proxies, list):
+            self.proxy_cycle = (
+                cycle([self.format_proxy(proxy) for proxy in proxies])
+                if proxies
+                else None
+            )
+        else:
+            self.proxy_cycle = None

-    return count
+    @staticmethod
+    def format_proxy(proxy):
+        """Utility method to format a proxy string into a dictionary."""
+        if proxy.startswith("http://") or proxy.startswith("https://"):
+            return {"http": proxy, "https": proxy}
+        return {"http": f"http://{proxy}", "https": f"http://{proxy}"}
+
+
+class RequestsRotating(RotatingProxySession, requests.Session):
+
+    def __init__(self, proxies=None, has_retry=False, delay=1, clear_cookies=False):
+        RotatingProxySession.__init__(self, proxies=proxies)
+        requests.Session.__init__(self)
+        self.clear_cookies = clear_cookies
+        self.allow_redirects = True
+        self.setup_session(has_retry, delay)
+
+    def setup_session(self, has_retry, delay):
+        if has_retry:
+            retries = Retry(
+                total=3,
+                connect=3,
+                status=3,
+                status_forcelist=[500, 502, 503, 504, 429],
+                backoff_factor=delay,
+            )
+            adapter = HTTPAdapter(max_retries=retries)
+            self.mount("http://", adapter)
+            self.mount("https://", adapter)
+
+    def request(self, method, url, **kwargs):
+        if self.clear_cookies:
+            self.cookies.clear()
+
+        if self.proxy_cycle:
+            next_proxy = next(self.proxy_cycle)
+            if next_proxy["http"] != "http://localhost":
+                self.proxies = next_proxy
+            else:
+                self.proxies = {}
+        return requests.Session.request(self, method, url, **kwargs)
+
+
+class TLSRotating(RotatingProxySession, tls_client.Session):
+
+    def __init__(self, proxies=None):
+        RotatingProxySession.__init__(self, proxies=proxies)
+        tls_client.Session.__init__(self, random_tls_extension_order=True)
+
+    def execute_request(self, *args, **kwargs):
+        if self.proxy_cycle:
+            next_proxy = next(self.proxy_cycle)
+            if next_proxy["http"] != "http://localhost":
+                self.proxies = next_proxy
+            else:
+                self.proxies = {}
+        response = tls_client.Session.execute_request(self, *args, **kwargs)
+        response.ok = response.status_code in range(200, 400)
+        return response
+
+
+def create_session(
+    *,
+    proxies: dict | str | None = None,
+    is_tls: bool = True,
+    has_retry: bool = False,
+    delay: int = 1,
+    clear_cookies: bool = False,
+) -> requests.Session:
+    """
+    Creates a requests session with optional tls, proxy, and retry settings.
+    :return: A session object
+    """
+    if is_tls:
+        session = TLSRotating(proxies=proxies)
+    else:
+        session = RequestsRotating(
+            proxies=proxies,
+            has_retry=has_retry,
+            delay=delay,
+            clear_cookies=clear_cookies,
+        )
+
+    return session
+
+
+def set_logger_level(verbose: int = 2):
+    """
+    Adjusts the logger's level. This function allows the logging level to be changed at runtime.
+
+    Parameters:
+    - verbose: int {0, 1, 2} (default=2, all logs)
+    """
+    if verbose is None:
+        return
+    level_name = {2: "INFO", 1: "WARNING", 0: "ERROR"}.get(verbose, "INFO")
+    level = getattr(logging, level_name.upper(), None)
+    if level is not None:
+        logger.setLevel(level)
+    else:
+        raise ValueError(f"Invalid log level: {level_name}")
+
+
+def markdown_converter(description_html: str):
+    if description_html is None:
+        return None
+    markdown = md(description_html)
+    return markdown.strip()


 def extract_emails_from_text(text: str) -> list[str] | None:
@@ -39,37 +153,6 @@ def extract_emails_from_text(text: str) -> list[str] | None:
    return email_regex.findall(text)


-def create_session(proxy: dict | None = None, is_tls: bool = True, has_retry: bool = False, delay: int = 1) -> requests.Session:
-    """
-    Creates a requests session with optional tls, proxy, and retry settings.
-
-    :return: A session object
-    """
-    if is_tls:
-        session = tls_client.Session(
-            client_identifier="chrome112",
-            random_tls_extension_order=True,
-        )
-        session.proxies = proxy
-    else:
-        session = requests.Session()
-        session.allow_redirects = True
-        if proxy:
-            session.proxies.update(proxy)
-        if has_retry:
-            retries = Retry(total=3,
-                            connect=3,
-                            status=3,
-                            status_forcelist=[500, 502, 503, 504, 429],
-                            backoff_factor=delay)
-            adapter = HTTPAdapter(max_retries=retries)
-
-            session.mount('http://', adapter)
-            session.mount('https://', adapter)
-
-    return session
-
-
 def get_enum_from_job_type(job_type_str: str) -> JobType | None:
    """
    Given a string, returns the corresponding JobType enum member if a match is found.
@@ -84,17 +167,21 @@ def get_enum_from_job_type(job_type_str: str) -> JobType | None:
 def currency_parser(cur_str):
    # Remove any non-numerical characters
    # except for ',' '.' or '-' (e.g. EUR)
-    cur_str = re.sub("[^-0-9.,]", '', cur_str)
+    cur_str = re.sub("[^-0-9.,]", "", cur_str)
    # Remove any 000s separators (either , or .)
-    cur_str = re.sub("[.,]", '', cur_str[:-3]) + cur_str[-3:]
+    cur_str = re.sub("[.,]", "", cur_str[:-3]) + cur_str[-3:]

-    if '.' in list(cur_str[-3:]):
+    if "." in list(cur_str[-3:]):
        num = float(cur_str)
-    elif ',' in list(cur_str[-3:]):
-        num = float(cur_str.replace(',', '.'))
+    elif "," in list(cur_str[-3:]):
+        num = float(cur_str.replace(",", "."))
    else:
        num = float(cur_str)

    return np.round(num, 2)


+def remove_attributes(tag):
+    for attr in list(tag.attrs):
+        del tag[attr]
+    return tag
--- a/src/jobspy/scrapers/ziprecruiter/init.py
+++ b/src/jobspy/scrapers/ziprecruiter/init.py
@@ -4,35 +4,86 @@ jobspy.scrapers.ziprecruiter

 This module contains routines to scrape ZipRecruiter.
 """
+
+from __future__ import annotations
+
+import json
 import math
+import re
 import time
-from datetime import datetime, timezone
+from datetime import datetime
 from typing import Optional, Tuple, Any

 from concurrent.futures import ThreadPoolExecutor

+from bs4 import BeautifulSoup
+
 from .. import Scraper, ScraperInput, Site
-from ..exceptions import ZipRecruiterException
-from ...jobs import JobPost, Compensation, Location, JobResponse, JobType, Country
-from ..utils import count_urgent_words, extract_emails_from_text, create_session
+from ..utils import (
+    logger,
+    extract_emails_from_text,
+    create_session,
+    markdown_converter,
+    remove_attributes,
+)
+from ...jobs import (
+    JobPost,
+    Compensation,
+    Location,
+    JobResponse,
+    JobType,
+    Country,
+    DescriptionFormat,
+)


 class ZipRecruiterScraper(Scraper):
-    def __init__(self, proxy: Optional[str] = None):
+    base_url = "https://www.ziprecruiter.com"
+    api_url = "https://api.ziprecruiter.com"
+
+    def __init__(self, proxies: list[str] | str | None = None):
        """
        Initializes ZipRecruiterScraper with the ZipRecruiter job search url
        """
-        site = Site(Site.ZIP_RECRUITER)
-        self.url = "https://www.ziprecruiter.com"
-        self.session = create_session(proxy)
-        self.get_cookies()
-        super().__init__(site, proxy=proxy)
+        super().__init__(Site.ZIP_RECRUITER, proxies=proxies)

+        self.scraper_input = None
+        self.session = create_session(proxies=proxies)
+        self._get_cookies()
+
+        self.delay = 5
        self.jobs_per_page = 20
        self.seen_urls = set()
-        self.delay = 5

-    def find_jobs_in_page(
+    def scrape(self, scraper_input: ScraperInput) -> JobResponse:
+        """
+        Scrapes ZipRecruiter for jobs with scraper_input criteria.
+        :param scraper_input: Information about job search criteria.
+        :return: JobResponse containing a list of jobs.
+        """
+        self.scraper_input = scraper_input
+        job_list: list[JobPost] = []
+        continue_token = None
+
+        max_pages = math.ceil(scraper_input.results_wanted / self.jobs_per_page)
+        for page in range(1, max_pages + 1):
+            if len(job_list) >= scraper_input.results_wanted:
+                break
+            if page > 1:
+                time.sleep(self.delay)
+            logger.info(f"ZipRecruiter search page: {page}")
+            jobs_on_page, continue_token = self._find_jobs_in_page(
+                scraper_input, continue_token
+            )
+            if jobs_on_page:
+                job_list.extend(jobs_on_page)
+            else:
+                break
+            if not continue_token:
+                break
+        return JobResponse(jobs=job_list[: scraper_input.results_wanted])
+
+    def _find_jobs_in_page(
        self, scraper_input: ScraperInput, continue_token: str | None = None
    ) -> Tuple[list[JobPost], Optional[str]]:
        """
@@ -41,73 +92,54 @@ class ZipRecruiterScraper(Scraper):
        :param continue_token:
        :return: jobs found on page
        """
-        params = self.add_params(scraper_input)
+        jobs_list = []
+        params = self._add_params(scraper_input)
        if continue_token:
            params["continue_from"] = continue_token
        try:
-            response = self.session.get(
-                f"https://api.ziprecruiter.com/jobs-app/jobs",
-                headers=self.headers(),
-                params=params
+            res = self.session.get(
+                f"{self.api_url}/jobs-app/jobs", headers=self.headers, params=params
            )
-            if response.status_code != 200:
-                raise ZipRecruiterException(
-                    f"bad response status code: {response.status_code}"
-                )
+            if res.status_code not in range(200, 400):
+                if res.status_code == 429:
+                    err = "429 Response - Blocked by ZipRecruiter for too many requests"
+                else:
+                    err = f"ZipRecruiter response status code {res.status_code}"
+                    err += f" with response: {res.text}"  # ZipRecruiter likely not available in EU
+                logger.error(err)
+                return jobs_list, ""
        except Exception as e:
-            if "Proxy responded with non 200 code" in str(e):
-                raise ZipRecruiterException("bad proxy")
-            raise ZipRecruiterException(str(e))
-
-        response_data = response.json()
-        jobs_list = response_data.get("jobs", [])
-        next_continue_token = response_data.get("continue", None)
+            if "Proxy responded with" in str(e):
+                logger.error(f"Indeed: Bad proxy")
+            else:
+                logger.error(f"Indeed: {str(e)}")
+            return jobs_list, ""

+        res_data = res.json()
+        jobs_list = res_data.get("jobs", [])
+        next_continue_token = res_data.get("continue", None)
        with ThreadPoolExecutor(max_workers=self.jobs_per_page) as executor:
-            job_results = [executor.submit(self.process_job, job) for job in jobs_list]
+            job_results = [executor.submit(self._process_job, job) for job in jobs_list]

        job_list = list(filter(None, (result.result() for result in job_results)))
        return job_list, next_continue_token

-    def scrape(self, scraper_input: ScraperInput) -> JobResponse:
+    def _process_job(self, job: dict) -> JobPost | None:
        """
-        Scrapes ZipRecruiter for jobs with scraper_input criteria.
-        :param scraper_input: Information about job search criteria.
-        :return: JobResponse containing a list of jobs.
+        Processes an individual job dict from the response
        """
-        job_list: list[JobPost] = []
-        continue_token = None
-
-        max_pages = math.ceil(scraper_input.results_wanted / self.jobs_per_page)
-
-        for page in range(1, max_pages + 1):
-            if len(job_list) >= scraper_input.results_wanted:
-                break
-
-            if page > 1:
-                time.sleep(self.delay)
-
-            jobs_on_page, continue_token = self.find_jobs_in_page(
-                scraper_input, continue_token
-            )
-            if jobs_on_page:
-                job_list.extend(jobs_on_page)
-
-            if not continue_token:
-                break
-
-        return JobResponse(jobs=job_list[: scraper_input.results_wanted])
-
-    def process_job(self, job: dict) -> JobPost | None:
-        """Processes an individual job dict from the response"""
        title = job.get("name")
-        job_url = f"https://www.ziprecruiter.com/jobs//j?lvk={job['listing_key']}"
+        job_url = f"{self.base_url}/jobs//j?lvk={job['listing_key']}"
        if job_url in self.seen_urls:
            return
        self.seen_urls.add(job_url)

        description = job.get("job_description", "").strip()
-
+        description = (
+            markdown_converter(description)
+            if self.scraper_input.description_format == DescriptionFormat.MARKDOWN
+            else description
+        )
        company = job.get("hiring_company", {}).get("name")
        country_value = "usa" if job.get("job_country") == "US" else "canada"
        country_enum = Country.from_string(country_value)
@@ -115,86 +147,106 @@ class ZipRecruiterScraper(Scraper):
        location = Location(
            city=job.get("job_city"), state=job.get("job_state"), country=country_enum
        )
-        job_type = ZipRecruiterScraper.get_job_type_enum(
+        job_type = self._get_job_type_enum(
            job.get("employment_type", "").replace("_", "").lower()
        )
-        date_posted = datetime.fromisoformat(job['posted_time'].rstrip("Z")).date()
+        date_posted = datetime.fromisoformat(job["posted_time"].rstrip("Z")).date()
+        comp_interval = job.get("compensation_interval")
+        comp_interval = "yearly" if comp_interval == "annual" else comp_interval
+        comp_min = int(job["compensation_min"]) if "compensation_min" in job else None
+        comp_max = int(job["compensation_max"]) if "compensation_max" in job else None
+        comp_currency = job.get("compensation_currency")
+        description_full, job_url_direct = self._get_descr(job_url)

        return JobPost(
+            id=str(job["listing_key"]),
            title=title,
            company_name=company,
            location=location,
            job_type=job_type,
            compensation=Compensation(
-                interval="yearly"
-                if job.get("compensation_interval") == "annual"
-                else job.get("compensation_interval"),
-                min_amount=int(job["compensation_min"])
-                if "compensation_min" in job
-                else None,
-                max_amount=int(job["compensation_max"])
-                if "compensation_max" in job
-                else None,
-                currency=job.get("compensation_currency"),
+                interval=comp_interval,
+                min_amount=comp_min,
+                max_amount=comp_max,
+                currency=comp_currency,
            ),
            date_posted=date_posted,
            job_url=job_url,
-            description=description,
+            description=description_full if description_full else description,
            emails=extract_emails_from_text(description) if description else None,
-            num_urgent_words=count_urgent_words(description) if description else None,
+            job_url_direct=job_url_direct,
        )

-    def get_cookies(self):
-        url="https://api.ziprecruiter.com/jobs-app/event"
-        data="event_type=session&logged_in=false&number_of_retry=1&property=model%3AiPhone&property=os%3AiOS&property=locale%3Aen_us&property=app_build_number%3A4734&property=app_version%3A91.0&property=manufacturer%3AApple&property=timestamp%3A2024-01-12T12%3A04%3A42-06%3A00&property=screen_height%3A852&property=os_version%3A16.6.1&property=source%3Ainstall&property=screen_width%3A393&property=device_model%3AiPhone%2014%20Pro&property=brand%3AApple"
-        self.session.post(url, data=data, headers=ZipRecruiterScraper.headers())
+    def _get_descr(self, job_url):
+        res = self.session.get(job_url, headers=self.headers, allow_redirects=True)
+        description_full = job_url_direct = None
+        if res.ok:
+            soup = BeautifulSoup(res.text, "html.parser")
+            job_descr_div = soup.find("div", class_="job_description")
+            company_descr_section = soup.find("section", class_="company_description")
+            job_description_clean = (
+                remove_attributes(job_descr_div).prettify(formatter="html")
+                if job_descr_div
+                else ""
+            )
+            company_description_clean = (
+                remove_attributes(company_descr_section).prettify(formatter="html")
+                if company_descr_section
+                else ""
+            )
+            description_full = job_description_clean + company_description_clean
+            script_tag = soup.find("script", type="application/json")
+            if script_tag:
+                job_json = json.loads(script_tag.string)
+                job_url_val = job_json["model"]["saveJobURL"]
+                m = re.search(r"job_url=(.+)", job_url_val)
+                if m:
+                    job_url_direct = m.group(1)
+
+            if self.scraper_input.description_format == DescriptionFormat.MARKDOWN:
+                description_full = markdown_converter(description_full)
+
+        return description_full, job_url_direct
+
+    def _get_cookies(self):
+        data = "event_type=session&logged_in=false&number_of_retry=1&property=model%3AiPhone&property=os%3AiOS&property=locale%3Aen_us&property=app_build_number%3A4734&property=app_version%3A91.0&property=manufacturer%3AApple&property=timestamp%3A2024-01-12T12%3A04%3A42-06%3A00&property=screen_height%3A852&property=os_version%3A16.6.1&property=source%3Ainstall&property=screen_width%3A393&property=device_model%3AiPhone%2014%20Pro&property=brand%3AApple"
+        url = f"{self.api_url}/jobs-app/event"
+        self.session.post(url, data=data, headers=self.headers)

    @staticmethod
-    def get_job_type_enum(job_type_str: str) -> list[JobType] | None:
+    def _get_job_type_enum(job_type_str: str) -> list[JobType] | None:
        for job_type in JobType:
            if job_type_str in job_type.value:
                return [job_type]
        return None

    @staticmethod
-    def add_params(scraper_input) -> dict[str, str | Any]:
+    def _add_params(scraper_input) -> dict[str, str | Any]:
        params = {
            "search": scraper_input.search_term,
            "location": scraper_input.location,
        }
        if scraper_input.hours_old:
-            fromage = max(scraper_input.hours_old // 24, 1) if scraper_input.hours_old else None
-            params['days'] = fromage
-        job_type_map = {
-            JobType.FULL_TIME: 'full_time',
-            JobType.PART_TIME: 'part_time'
-        }
+            params["days"] = max(scraper_input.hours_old // 24, 1)
+        job_type_map = {JobType.FULL_TIME: "full_time", JobType.PART_TIME: "part_time"}
        if scraper_input.job_type:
-            params['employment_type'] = job_type_map[scraper_input.job_type] if scraper_input.job_type in job_type_map else scraper_input.job_type.value[0]
+            job_type = scraper_input.job_type
+            params["employment_type"] = job_type_map.get(job_type, job_type.value[0])
        if scraper_input.easy_apply:
-            params['zipapply'] = 1
+            params["zipapply"] = 1
        if scraper_input.is_remote:
            params["remote"] = 1
        if scraper_input.distance:
            params["radius"] = scraper_input.distance
+        return {k: v for k, v in params.items() if v is not None}

-        params = {k: v for k, v in params.items() if v is not None}
-
-        return params
-
-    @staticmethod
-    def headers() -> dict:
-        """
-        Returns headers needed for requests
-        :return: dict - Dictionary containing headers
-        """
-        return {
-            "Host": "api.ziprecruiter.com",
-            "accept": "*/*",
-            "x-zr-zva-override": "100000000;vid:ZT1huzm_EQlDTVEc",
-            "x-pushnotificationid": "0ff4983d38d7fc5b3370297f2bcffcf4b3321c418f5c22dd152a0264707602a0",
-            "x-deviceid": "D77B3A92-E589-46A4-8A39-6EF6F1D86006",
-            "user-agent": "Job Search/87.0 (iPhone; CPU iOS 16_6_1 like Mac OS X)",
-            "authorization": "Basic YTBlZjMyZDYtN2I0Yy00MWVkLWEyODMtYTI1NDAzMzI0YTcyOg==",
-            "accept-language": "en-US,en;q=0.9",
-        }
+    headers = {
+        "Host": "api.ziprecruiter.com",
+        "accept": "*/*",
+        "x-zr-zva-override": "100000000;vid:ZT1huzm_EQlDTVEc",
+        "x-pushnotificationid": "0ff4983d38d7fc5b3370297f2bcffcf4b3321c418f5c22dd152a0264707602a0",
+        "x-deviceid": "D77B3A92-E589-46A4-8A39-6EF6F1D86006",
+        "user-agent": "Job Search/87.0 (iPhone; CPU iOS 16_6_1 like Mac OS X)",
+        "authorization": "Basic YTBlZjMyZDYtN2I0Yy00MWVkLWEyODMtYTI1NDAzMzI0YTcyOg==",
+        "accept-language": "en-US,en;q=0.9",
+    }
Author	SHA1	Message	Date
Cullen Watson	ccb0c17660	enh: ziprecruiter full description (#162 )	2024-06-09 16:21:01 -05:00
Cullen Watson	df339610fa	docs: readme	2024-05-29 19:32:32 -05:00
Cullen Watson	c501006bd8	docs: readme	2024-05-28 16:04:26 -05:00
Cullen Watson	89a3ee231c	enh(li): job function (#160 )	2024-05-28 16:01:29 -05:00
Cullen	6439f71433	chore: version	2024-05-28 15:39:24 -05:00
adamagassi	7f6271b2e0	LinkedIn scraper fixes: (#159 ) Correct initial page offset calculation Separate page variable from request counter Fix job offset starting value Increment offset by number of jobs returned instead of expected value	2024-05-28 15:38:13 -05:00
Cullen Watson	5cb7ffe5fd	enh: proxies (#157 ) * enh: proxies * enh: proxies	2024-05-25 14:04:09 -05:00
Cullen Watson	cd29f79796	docs: readme	2024-05-25 11:46:23 -05:00
Cullen Watson	65d2e5e707	Update pyproject.toml	2024-05-20 11:46:36 -05:00
fasih hussain	08d63a87a2	chore: id added for JobPost schema (#152 )	2024-05-20 11:45:52 -05:00
Cullen	1ffdb1756f	fix: dup line	2024-04-30 12:11:48 -05:00
Cullen Watson	1185693422	delete empty file	2024-04-30 12:06:20 -05:00
Lluís Salord Quetglas	dcd7144318	FIX: Allow Indeed search term with complex syntax (#139 )	2024-04-30 12:05:43 -05:00
Cullen Watson	bf73c061bd	enh: linkedin company logo (#141 )	2024-04-30 12:03:10 -05:00
Lluís Salord Quetglas	8dd08ed9fd	FEAT: Allow LinkedIn scraper to get external job apply url (#140 )	2024-04-30 11:36:01 -05:00
Cullen Watson	5d3df732e6	docs: readme	2024-03-12 20:46:25 -05:00
Kellen Mace	86f858e06d	Update scrape_jobs() parameters info in readme (#130 )	2024-03-12 20:45:13 -05:00
Cullen	1089d1f0a5	docs: readme	2024-03-11 21:30:57 -05:00
Cullen	3e93454738	fix(indeed): readd param	2024-03-11 21:23:20 -05:00
Cullen Watson	0d150d519f	docs: readme	2024-03-11 14:52:20 -05:00
Cullen Watson	cc3497f929	docs: readme	2024-03-11 14:45:17 -05:00
Cullen Watson	5986f75346	docs: readme	2024-03-11 14:41:12 -05:00
VitaminB16	4b7bdb9313	feat: Adjust log verbosity via verbose arg (#128 )	2024-03-11 14:38:44 -05:00
Cullen Watson	80213f28d2	chore: version	2024-03-11 09:43:12 -05:00
Cullen Watson	ada38532c3	fix: indeed empty location term	2024-03-11 09:42:43 -05:00
Cullen Watson	3b0017964c	fix: indeed empty search term	2024-03-11 09:21:11 -05:00
VitaminB16	94d8f555fd	format: Apply Black formatter to the codebase (#127 )	2024-03-10 23:36:27 -05:00
Cullen Watson	e8b4b376b8	docs: readme	2024-03-09 13:40:34 -06:00
Cullen Watson	54ac1bad16	docs: readme	2024-03-09 01:49:05 -06:00
Cullen Watson	0a669e9ba8	enh: indeed more fields (#126 )	2024-03-09 01:40:01 -06:00
gigaSec	a4f6851c32	Fix GlassDoor Country Vietnam(#122 )	2024-03-04 17:35:57 -06:00
troy-conte	db01bc6bbb	log search updates, fix glassdoor (#120 )	2024-03-04 16:39:38 -06:00
Cullen Watson	f8a4eccc6b	Remove pandas warning (#118 )	2024-02-29 21:30:56 -06:00
Cullen Watson	ba3a16b228	Description format (#107 )	2024-02-14 16:04:23 -06:00