Refactor scraper to use direct requests and bump to 0.8.18

- Replace session-based approach with direct requests calls - Move headers to module-level DEFAULT_HEADERS constant - Temporarily disable extra_property_data feature 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Version bump to 0.8.17
2026-03-05 12:04:31 -08:00 · 2025-12-26 00:29:53 -07:00 · 2025-12-21 19:11:10 -07:00 · 2025-12-21 16:22:03 -07:00 · 2025-12-21 16:03:57 -07:00 · 2025-12-21 14:32:59 -07:00
11 changed files with 1804 additions and 467 deletions
--- a/README.md
+++ b/README.md
@@ -7,9 +7,13 @@
 ## HomeHarvest Features
- **Source**: Fetches properties directly from **Realtor.com**.
+- **Source**: Fetches properties directly from **Realtor.com**
- **Data Format**: Structures data to resemble MLS listings.
+- **Data Format**: Structures data to resemble MLS listings
- **Export Flexibility**: Options to save as either CSV or Excel.
+- **Export Options**: Save as CSV, Excel, or return as Pandas/Pydantic/Raw
 - **Flexible Filtering**: Filter by beds, baths, price, sqft, lot size, year built
 - **Time-Based Queries**: Search by hours, days, or specific date ranges
 - **Multiple Listing Types**: Query for_sale, for_rent, sold, pending, or all at once
 - **Sorting**: Sort results by price, date, size, or last update
 ![homeharvest](https://github.com/ZacharyHampton/HomeHarvest/assets/78247585/b3d5d727-e67b-4a9f-85d8-1e65fd18620a)
@@ -26,134 +30,78 @@ pip install -U homeharvest
 ```py
 from homeharvest import scrape_property
 from datetime import datetime
 # Generate filename based on current timestamp
 current_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
 filename = f"HomeHarvest_{current_timestamp}.csv"
 properties = scrape_property(
-  location="San Diego, CA",
+    location="San Diego, CA",
-  listing_type="sold",  # or (for_sale, for_rent, pending)
+    listing_type="sold",  # for_sale, for_rent, pending
-  past_days=30,  # sold in last 30 days - listed in last 30 days if (for_sale, for_rent)
+    past_days=30
  # property_type=['single_family','multi_family'],
  # date_from="2023-05-01", # alternative to past_days
  # date_to="2023-05-28",
  # foreclosure=True
  # mls_only=True,  # only fetch MLS listings
 )
 print(f"Number of properties: {len(properties)}")
-# Export to csv
+properties.to_csv("results.csv", index=False)
-properties.to_csv(filename, index=False)
+print(f"Found {len(properties)} properties")
 print(properties.head())
 ```
 ### Flexible Location Formats
 ```py
-# HomeHarvest supports any of these location formats:
+# Accepts: zip code, city, "city, state", full address, etc.
 properties = scrape_property(location="92104")  # Just zip code
 properties = scrape_property(location="San Diego")  # Just city  
 properties = scrape_property(location="San Diego, CA")  # City, state
 properties = scrape_property(location="San Diego, California")  # Full state name
 properties = scrape_property(location="1234 Main St, San Diego, CA 92104")  # Full address
 # You can also search for properties within a radius of a specific address
 properties = scrape_property(
-    location="1234 Main St, San Diego, CA 92104",
+    location="San Diego, CA",  # or "92104", "San Diego", "1234 Main St, San Diego, CA 92104"
-    radius=5.0  # 5 mile radius
+    radius=5.0  # Optional: search within radius (miles) of address
 )
 ```
 ### Advanced Filtering Examples
-#### Hour-Based Filtering
+#### Time-Based Filtering
 ```py
-# Get properties listed in the last 24 hours
+from datetime import datetime, timedelta
 # Filter by hours or use datetime/timedelta objects
 properties = scrape_property(
    location="Austin, TX",
    listing_type="for_sale",
-    past_hours=24
+    past_hours=24,  # or timedelta(hours=24) for Pythonic approach
-)
+    # date_from=datetime.now() - timedelta(days=7),  # Alternative: datetime objects
-
+    # date_to=datetime.now(),  # Automatic hour precision detection
 # Get properties listed during specific hours (e.g., business hours)
 properties = scrape_property(
    location="Dallas, TX",
    listing_type="for_sale",
    datetime_from="2025-01-20T09:00:00",
    datetime_to="2025-01-20T17:00:00"
 )
 ```
 #### Property Filters
 ```py
-# Filter by bedrooms, bathrooms, and square footage
+# Combine any filters: beds, baths, sqft, price, lot_sqft, year_built
 properties = scrape_property(
    location="San Francisco, CA",
    listing_type="for_sale",
-    beds_min=2,
+    beds_min=3, beds_max=5,
    beds_max=4,
    baths_min=2.0,
-    sqft_min=1000,
+    sqft_min=1500, sqft_max=3000,
-    sqft_max=2500
+    price_min=300000, price_max=800000,
 )
 # Filter by price range
 properties = scrape_property(
    location="Phoenix, AZ",
    listing_type="for_sale",
    price_min=200000,
    price_max=500000
 )
 # Filter by year built
 properties = scrape_property(
    location="Seattle, WA",
    listing_type="for_sale",
    year_built_min=2000,
    beds_min=3
 )
 # Combine multiple filters
 properties = scrape_property(
    location="Denver, CO",
    listing_type="for_sale",
    beds_min=3,
    baths_min=2.0,
    sqft_min=1500,
    price_min=300000,
    price_max=600000,
    year_built_min=1990,
    lot_sqft_min=5000
 )
 ```
-#### Sorting Results
+#### Sorting & Listing Types
 ```py
-# Sort by price (cheapest first)
+# Sort options: list_price, list_date, sqft, beds, baths, last_update_date
 # Listing types: "for_sale", "for_rent", "sold", "pending", "off_market", list, or None (common types)
 properties = scrape_property(
    location="Miami, FL",
-    listing_type="for_sale",
+    listing_type=["for_sale", "pending"],  # Single string, list, or None
-    sort_by="list_price",
+    sort_by="list_price",  # Sort field
-    sort_direction="asc",
+    sort_direction="asc",  # "asc" or "desc"
    limit=100
 )
 ```
-# Sort by newest listings
+#### Pagination Control
-properties = scrape_property(
+```py
-    location="Boston, MA",
+# Sequential mode with early termination (more efficient for narrow filters)
    listing_type="for_sale",
    sort_by="list_date",
    sort_direction="desc"
 )
 # Sort by square footage (largest first)
 properties = scrape_property(
    location="Los Angeles, CA",
    listing_type="for_sale",
-    sort_by="sqft",
+    updated_in_past_hours=2,  # Narrow time window
-    sort_direction="desc"
+    parallel=False  # Fetch pages sequentially, stop when filters no longer match
 )
 ```
@@ -192,30 +140,38 @@ for prop in properties[:5]:
 ```
 Required
 ├── location (str): Flexible location search - accepts any of these formats:
-    - ZIP code: "92104"
+│    - ZIP code: "92104"
-    - City: "San Diego" or "San Francisco"
+│    - City: "San Diego" or "San Francisco"
-    - City, State (abbreviated or full): "San Diego, CA" or "San Diego, California"
+│    - City, State (abbreviated or full): "San Diego, CA" or "San Diego, California"
-    - Full address: "1234 Main St, San Diego, CA 92104"
+│    - Full address: "1234 Main St, San Diego, CA 92104"
-    - Neighborhood: "Downtown San Diego"
+│    - Neighborhood: "Downtown San Diego"
-    - County: "San Diego County"
+│    - County: "San Diego County"
-├── listing_type (option): Choose the type of listing.
+│    - State (no support for abbreviated): "California"
-    - 'for_rent'
+│
-    - 'for_sale'
+├── listing_type (str | list[str] | None): Choose the type of listing.
-    - 'sold'
+│    - 'for_sale'
-    - 'pending' (for pending/contingent sales)
+│    - 'for_rent'
-
+│    - 'sold'
 │    - 'pending'
 │    - 'off_market'
 │    - 'new_community'
 │    - 'other'
 │    - 'ready_to_build'
 │    - List of strings returns properties matching ANY status: ['for_sale', 'pending']
 │    - None returns common listing types (for_sale, for_rent, sold, pending, off_market)
 │
 Optional
 ├── property_type (list): Choose the type of properties.
-    - 'single_family'
+│    - 'single_family'
-    - 'multi_family'
+│    - 'multi_family'
-    - 'condos'
+│    - 'condos'
-    - 'condo_townhome_rowhome_coop'
+│    - 'condo_townhome_rowhome_coop'
-    - 'condo_townhome'
+│    - 'condo_townhome'
-    - 'townhomes'
+│    - 'townhomes'
-    - 'duplex_triplex'
+│    - 'duplex_triplex'
-    - 'farm'
+│    - 'farm'
-    - 'land'
+│    - 'land'
-    - 'mobile'
+│    - 'mobile'
 │
 ├── return_type (option): Choose the return type.
 │    - 'pandas' (default)
@@ -228,19 +184,28 @@ Optional
 ├── past_days (integer): Number of past days to filter properties. Utilizes 'last_sold_date' for 'sold' listing types, and 'list_date' for others (for_rent, for_sale).
 │    Example: 30 (fetches properties listed/sold in the last 30 days)
 │
-├── past_hours (integer): Number of past hours to filter properties (more precise than past_days). Uses client-side filtering.
+├── past_hours (integer | timedelta): Number of past hours to filter properties (more precise than past_days). Uses client-side filtering.
-│    Example: 24 (fetches properties from the last 24 hours)
+│    Example: 24 or timedelta(hours=24) (fetches properties from the last 24 hours)
 │    Note: Cannot be used together with past_days or date_from/date_to
 │
 ├── date_from, date_to (string): Start and end dates to filter properties listed or sold, both dates are required.
-|    (use this to get properties in chunks as there's a 10k result limit)
+│    (use this to get properties in chunks as there's a 10k result limit)
-│    Format for both must be "YYYY-MM-DD".
+│    Accepts multiple formats with automatic precision detection:
-│    Example: "2023-05-01", "2023-05-15" (fetches properties listed/sold between these dates)
+│    - Date strings: "YYYY-MM-DD" (day precision)
 │    - Datetime strings: "YYYY-MM-DDTHH:MM:SS" (hour precision, uses client-side filtering)
 │    - date objects: date(2025, 1, 20) (day precision)
 │    - datetime objects: datetime(2025, 1, 20, 9, 0) (hour precision)
 │    Examples:
 │      Day precision: "2023-05-01", "2023-05-15"
 │      Hour precision: "2025-01-20T09:00:00", "2025-01-20T17:00:00"
 │
-├── datetime_from, datetime_to (string): ISO 8601 datetime strings for hour-precise filtering. Uses client-side filtering.
+├── updated_since (datetime | str): Filter properties updated since a specific date/time (based on last_update_date field)
-│    Format: "YYYY-MM-DDTHH:MM:SS" or "YYYY-MM-DD"
+│    Accepts datetime objects or ISO 8601 strings
-│    Example: "2025-01-20T09:00:00", "2025-01-20T17:00:00" (fetches properties between 9 AM and 5 PM)
+│    Example: updated_since=datetime(2025, 11, 10, 9, 0) or "2025-11-10T09:00:00"
-│    Note: Cannot be used together with date_from/date_to
+│
 ├── updated_in_past_hours (integer | timedelta): Filter properties updated in the past X hours (based on last_update_date field)
 │    Accepts integer (hours) or timedelta object
 │    Example: updated_in_past_hours=24 or timedelta(hours=24)
 │
 ├── beds_min, beds_max (integer): Filter by number of bedrooms
 │    Example: beds_min=2, beds_max=4 (2-4 bedrooms)
@@ -261,7 +226,7 @@ Optional
 │    Example: year_built_min=2000, year_built_max=2024 (built between 2000-2024)
 │
 ├── sort_by (string): Sort results by field
-│    Options: 'list_date', 'sold_date', 'list_price', 'sqft', 'beds', 'baths'
+│    Options: 'list_date', 'sold_date', 'list_price', 'sqft', 'beds', 'baths', 'last_update_date'
 │    Example: sort_by='list_price'
 │
 ├── sort_direction (string): Sort direction, default is 'desc'
@@ -280,7 +245,9 @@ Optional
 │
 ├── limit (integer): Limit the number of properties to fetch. Max & default is 10000.
 │
-└── offset (integer): Starting position for pagination within the 10k limit. Use with limit to fetch results in chunks.
+├── offset (integer): Starting position for pagination within the 10k limit. Use with limit to fetch results in chunks.
 │
 └── parallel (True/False): Controls pagination strategy. Default is True (fetch pages in parallel for speed). Set to False for sequential fetching with early termination (useful for rate limiting or narrow time windows).
 ```
 ### Property Schema
@@ -327,6 +294,7 @@ Property
 │ ├── sold_price
 │ ├── last_sold_date  # datetime (full timestamp: YYYY-MM-DD HH:MM:SS)
 │ ├── last_status_change_date  # datetime (full timestamp: YYYY-MM-DD HH:MM:SS)
 │ ├── last_update_date  # datetime (full timestamp: YYYY-MM-DD HH:MM:SS)
 │ ├── last_sold_price
 │ ├── price_per_sqft
 │ ├── new_construction
--- a/homeharvest/init.py
+++ b/homeharvest/init.py
@@ -1,31 +1,37 @@
 import warnings
 import pandas as pd
 from datetime import datetime, timedelta, date
 from .core.scrapers import ScraperInput
-from .utils import process_result, ordered_properties, validate_input, validate_dates, validate_limit, validate_offset, validate_datetime, validate_filters, validate_sort
+from .utils import (
    process_result, ordered_properties, validate_input, validate_dates, validate_limit,
    validate_offset, validate_datetime, validate_filters, validate_sort, validate_last_update_filters,
    convert_to_datetime_string, extract_timedelta_hours, extract_timedelta_days, detect_precision_and_convert
 )
 from .core.scrapers.realtor import RealtorScraper
 from .core.scrapers.models import ListingType, SearchPropertyType, ReturnType, Property
 from typing import Union, Optional, List
 def scrape_property(
    location: str,
-    listing_type: str = "for_sale",
+    listing_type: str | list[str] | None = None,
    return_type: str = "pandas",
    property_type: Optional[List[str]] = None,
    radius: float = None,
    mls_only: bool = False,
-    past_days: int = None,
+    past_days: int | timedelta = None,
    proxy: str = None,
-    date_from: str = None,
+    date_from: datetime | date | str = None,
-    date_to: str = None,
+    date_to: datetime | date | str = None,
    foreclosure: bool = None,
    extra_property_data: bool = True,
    exclude_pending: bool = False,
    limit: int = 10000,
    offset: int = 0,
    # New date/time filtering parameters
-    past_hours: int = None,
+    past_hours: int | timedelta = None,
-    datetime_from: str = None,
+    # New last_update_date filtering parameters
-    datetime_to: str = None,
+    updated_since: datetime | str = None,
    updated_in_past_hours: int | timedelta = None,
    # New property filtering parameters
    beds_min: int = None,
    beds_max: int = None,
@@ -42,12 +48,16 @@ def scrape_property(
    # New sorting parameters
    sort_by: str = None,
    sort_direction: str = "desc",
    # Pagination control
    parallel: bool = True,
 ) -> Union[pd.DataFrame, list[dict], list[Property]]:
    """
    Scrape properties from Realtor.com based on a given location and listing type.
    :param location: Location to search (e.g. "Dallas, TX", "85281", "2530 Al Lipscomb Way")
-    :param listing_type: Listing Type (for_sale, for_rent, sold, pending)
+    :param listing_type: Listing Type - can be a string, list of strings, or None.
        Options: for_sale, for_rent, sold, pending, off_market, new_community, other, ready_to_build
        Examples: "for_sale", ["for_sale", "pending"], None (returns all types)
    :param return_type: Return type (pandas, pydantic, raw)
    :param property_type: Property Type (single_family, multi_family, condos, condo_townhome_rowhome_coop, condo_townhome, townhomes, duplex_triplex, farm, land, mobile)
    :param radius: Get properties within _ (e.g. 1.0) miles. Only applicable for individual addresses.
@@ -57,7 +67,15 @@ def scrape_property(
        - PENDING: Filters by pending_date. Contingent properties without pending_date are included.
        - SOLD: Filters by sold_date (when property was sold)
        - FOR_SALE/FOR_RENT: Filters by list_date (when property was listed)
-    :param date_from, date_to: Get properties sold or listed (dependent on your listing_type) between these dates. format: 2021-01-28
+    :param date_from, date_to: Get properties sold or listed (dependent on your listing_type) between these dates.
        Accepts multiple formats for flexible precision:
        - Date strings: "2025-01-20" (day-level precision)
        - Datetime strings: "2025-01-20T14:30:00" (hour-level precision)
        - date objects: date(2025, 1, 20) (day-level precision)
        - datetime objects: datetime(2025, 1, 20, 14, 30) (hour-level precision)
        The precision is automatically detected based on the input format.
        Timezone handling: Naive datetimes are treated as local time and automatically converted to UTC.
        Timezone-aware datetimes are converted to UTC. For best results, use timezone-aware datetimes.
    :param foreclosure: If set, fetches only foreclosure listings.
    :param extra_property_data: Increases requests by O(n). If set, this fetches additional property data (e.g. agent, broker, property evaluations etc.)
    :param exclude_pending: If true, this excludes pending or contingent properties from the results, unless listing type is pending.
@@ -65,49 +83,102 @@ def scrape_property(
    :param offset: Starting position for pagination within the 10k limit (offset + limit cannot exceed 10,000). Use with limit to fetch results in chunks (e.g., offset=200, limit=200 fetches results 200-399). Should be a multiple of 200 (page size) for optimal performance. Default is 0. Note: Cannot be used to bypass the 10k API limit - use date ranges (date_from/date_to) to narrow searches and fetch more data.
    New parameters:
-    :param past_hours: Get properties in the last _ hours (requires client-side filtering)
+    :param past_hours: Get properties in the last _ hours (requires client-side filtering). Accepts int or timedelta.
-    :param datetime_from, datetime_to: ISO 8601 datetime strings for precise time filtering (e.g. "2025-01-20T14:30:00")
+    :param updated_since: Filter by last_update_date (when property was last updated). Accepts datetime object or ISO 8601 string (client-side filtering).
        Timezone handling: Naive datetimes (like datetime.now()) are treated as local time and automatically converted to UTC.
        Timezone-aware datetimes are converted to UTC. Examples:
        - datetime.now() - uses your local timezone
        - datetime.now(timezone.utc) - uses UTC explicitly
    :param updated_in_past_hours: Filter by properties updated in the last _ hours. Accepts int or timedelta (client-side filtering)
    :param beds_min, beds_max: Filter by number of bedrooms
    :param baths_min, baths_max: Filter by number of bathrooms
    :param sqft_min, sqft_max: Filter by square footage
    :param price_min, price_max: Filter by listing price
    :param lot_sqft_min, lot_sqft_max: Filter by lot size
    :param year_built_min, year_built_max: Filter by year built
-    :param sort_by: Sort results by field (list_date, sold_date, list_price, sqft, beds, baths)
+    :param sort_by: Sort results by field (list_date, sold_date, list_price, sqft, beds, baths, last_update_date)
    :param sort_direction: Sort direction (asc, desc)
    :param parallel: Controls pagination strategy. True (default) = fetch all pages in parallel for maximum speed.
        False = fetch pages sequentially with early termination checks (useful for rate limiting or narrow time windows).
        Sequential mode will stop paginating as soon as time-based filters indicate no more matches are possible.
    Note: past_days and past_hours also accept timedelta objects for more Pythonic usage.
    """
    validate_input(listing_type)
    validate_dates(date_from, date_to)
    validate_limit(limit)
    validate_offset(offset, limit)
    validate_datetime(datetime_from)
    validate_datetime(datetime_to)
    validate_filters(
        beds_min, beds_max, baths_min, baths_max, sqft_min, sqft_max,
        price_min, price_max, lot_sqft_min, lot_sqft_max, year_built_min, year_built_max
    )
    validate_sort(sort_by, sort_direction)
    # Validate new last_update_date filtering parameters
    validate_last_update_filters(
        convert_to_datetime_string(updated_since),
        extract_timedelta_hours(updated_in_past_hours)
    )
    # Convert listing_type to appropriate format
    if listing_type is None:
        converted_listing_type = None
    elif isinstance(listing_type, list):
        converted_listing_type = [ListingType(lt.upper()) for lt in listing_type]
    else:
        converted_listing_type = ListingType(listing_type.upper())
    # Convert date_from/date_to with precision detection
    converted_date_from, date_from_precision = detect_precision_and_convert(date_from)
    converted_date_to, date_to_precision = detect_precision_and_convert(date_to)
    # Validate converted dates
    validate_dates(converted_date_from, converted_date_to)
    # Convert datetime/timedelta objects to appropriate formats
    converted_past_days = extract_timedelta_days(past_days)
    converted_past_hours = extract_timedelta_hours(past_hours)
    converted_updated_since = convert_to_datetime_string(updated_since)
    converted_updated_in_past_hours = extract_timedelta_hours(updated_in_past_hours)
    # Auto-apply optimal sort for time-based filters (unless user specified different sort)
    if (converted_updated_since or converted_updated_in_past_hours) and not sort_by:
        sort_by = "last_update_date"
        if not sort_direction:
            sort_direction = "desc"  # Most recent first
    # Auto-apply optimal sort for PENDING listings with date filters
    # PENDING API filtering is broken, so we rely on client-side filtering
    # Sorting by pending_date ensures efficient pagination with early termination
    elif (converted_listing_type == ListingType.PENDING and
          (converted_past_days or converted_past_hours or converted_date_from) and
          not sort_by):
        sort_by = "pending_date"
        if not sort_direction:
            sort_direction = "desc"  # Most recent first
    scraper_input = ScraperInput(
        location=location,
-        listing_type=ListingType(listing_type.upper()),
+        listing_type=converted_listing_type,
        return_type=ReturnType(return_type.lower()),
        property_type=[SearchPropertyType[prop.upper()] for prop in property_type] if property_type else None,
        proxy=proxy,
        radius=radius,
        mls_only=mls_only,
-        last_x_days=past_days,
+        last_x_days=converted_past_days,
-        date_from=date_from,
+        date_from=converted_date_from,
-        date_to=date_to,
+        date_to=converted_date_to,
        date_from_precision=date_from_precision,
        date_to_precision=date_to_precision,
        foreclosure=foreclosure,
        extra_property_data=extra_property_data,
        exclude_pending=exclude_pending,
        limit=limit,
        offset=offset,
        # New date/time filtering
-        past_hours=past_hours,
+        past_hours=converted_past_hours,
-        datetime_from=datetime_from,
+        # New last_update_date filtering
-        datetime_to=datetime_to,
+        updated_since=converted_updated_since,
        updated_in_past_hours=converted_updated_in_past_hours,
        # New property filtering
        beds_min=beds_min,
        beds_max=beds_max,
@@ -124,6 +195,8 @@ def scrape_property(
        # New sorting
        sort_by=sort_by,
        sort_direction=sort_direction,
        # Pagination control
        parallel=parallel,
    )
    site = RealtorScraper(scraper_input)
--- a/homeharvest/cli.py
+++ b/homeharvest/cli.py
@@ -1,85 +0,0 @@
 import argparse
 import datetime
 from homeharvest import scrape_property
 def main():
    parser = argparse.ArgumentParser(description="Home Harvest Property Scraper")
    parser.add_argument("location", type=str, help="Location to scrape (e.g., San Francisco, CA)")
    parser.add_argument(
        "-l",
        "--listing_type",
        type=str,
        default="for_sale",
        choices=["for_sale", "for_rent", "sold", "pending"],
        help="Listing type to scrape",
    )
    parser.add_argument(
        "-o",
        "--output",
        type=str,
        default="excel",
        choices=["excel", "csv"],
        help="Output format",
    )
    parser.add_argument(
        "-f",
        "--filename",
        type=str,
        default=None,
        help="Name of the output file (without extension)",
    )
    parser.add_argument("-p", "--proxy", type=str, default=None, help="Proxy to use for scraping")
    parser.add_argument(
        "-d",
        "--days",
        type=int,
        default=None,
        help="Sold/listed in last _ days filter.",
    )
    parser.add_argument(
        "-r",
        "--radius",
        type=float,
        default=None,
        help="Get comparable properties within _ (eg. 0.0) miles. Only applicable for individual addresses.",
    )
    parser.add_argument(
        "-m",
        "--mls_only",
        action="store_true",
        help="If set, fetches only MLS listings.",
    )
    args = parser.parse_args()
    result = scrape_property(
        args.location,
        args.listing_type,
        radius=args.radius,
        proxy=args.proxy,
        mls_only=args.mls_only,
        past_days=args.days,
    )
    if not args.filename:
        timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
        args.filename = f"HomeHarvest_{timestamp}"
    if args.output == "excel":
        output_filename = f"{args.filename}.xlsx"
        result.to_excel(output_filename, index=False)
        print(f"Excel file saved as {output_filename}")
    elif args.output == "csv":
        output_filename = f"{args.filename}.csv"
        result.to_csv(output_filename, index=False)
        print(f"CSV file saved as {output_filename}")
 if __name__ == "__main__":
    main()
--- a/homeharvest/core/scrapers/init.py
+++ b/homeharvest/core/scrapers/init.py
@@ -2,8 +2,6 @@ from __future__ import annotations
 from typing import Union
 import requests
 from requests.adapters import HTTPAdapter
 from urllib3.util.retry import Retry
 import uuid
 from ...exceptions import AuthenticationError
 from .models import Property, ListingType, SiteName, SearchPropertyType, ReturnType
@@ -11,9 +9,30 @@ import json
 from pydantic import BaseModel
 DEFAULT_HEADERS = {
    'Content-Type': 'application/json',
    'Accept': '*/*',
    'Accept-Language': 'en-US,en;q=0.9',
    'Cache-Control': 'no-cache',
    'Origin': 'https://www.realtor.com',
    'Pragma': 'no-cache',
    'Referer': 'https://www.realtor.com/',
    'rdc-client-name': 'RDC_WEB_SRP_FS_PAGE',
    'rdc-client-version': '3.0.2515',
    'sec-ch-ua': '"Google Chrome";v="135", "Not-A.Brand";v="8", "Chromium";v="135"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"macOS"',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-site',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36',
    'x-is-bot': 'false',
 }
 class ScraperInput(BaseModel):
    location: str
-    listing_type: ListingType
+    listing_type: ListingType | list[ListingType] | None
    property_type: list[SearchPropertyType] | None = None
    radius: float | None = None
    mls_only: bool | None = False
@@ -21,6 +40,8 @@ class ScraperInput(BaseModel):
    last_x_days: int | None = None
    date_from: str | None = None
    date_to: str | None = None
    date_from_precision: str | None = None  # "day" or "hour"
    date_to_precision: str | None = None    # "day" or "hour"
    foreclosure: bool | None = False
    extra_property_data: bool | None = True
    exclude_pending: bool | None = False
@@ -30,8 +51,10 @@ class ScraperInput(BaseModel):
    # New date/time filtering parameters
    past_hours: int | None = None
-    datetime_from: str | None = None
+
-    datetime_to: str | None = None
+    # New last_update_date filtering parameters
    updated_since: str | None = None
    updated_in_past_hours: int | None = None
    # New property filtering parameters
    beds_min: int | None = None
@@ -51,10 +74,11 @@ class ScraperInput(BaseModel):
    sort_by: str | None = None
    sort_direction: str = "desc"
    # Pagination control
    parallel: bool = True
 class Scraper:
    session = None
    def __init__(
        self,
        scraper_input: ScraperInput,
@@ -62,40 +86,8 @@ class Scraper:
        self.location = scraper_input.location
        self.listing_type = scraper_input.listing_type
        self.property_type = scraper_input.property_type
-
+        self.proxy = scraper_input.proxy
-        if not self.session:
+        self.proxies = {"http": self.proxy, "https": self.proxy} if self.proxy else None
            Scraper.session = requests.Session()
            retries = Retry(
                total=3, backoff_factor=4, status_forcelist=[429, 403], allowed_methods=frozenset(["GET", "POST"])
            )
            adapter = HTTPAdapter(max_retries=retries)
            Scraper.session.mount("http://", adapter)
            Scraper.session.mount("https://", adapter)
            Scraper.session.headers.update(
                {
                    "accept": "application/json, text/javascript",
                    "accept-language": "en-US,en;q=0.9",
                    "cache-control": "no-cache",
                    "content-type": "application/json",
                    "origin": "https://www.realtor.com",
                    "pragma": "no-cache",
                    "priority": "u=1, i",
                    "rdc-ab-tests": "commute_travel_time_variation:v1",
                    "sec-ch-ua": '"Not)A;Brand";v="99", "Google Chrome";v="127", "Chromium";v="127"',
                    "sec-ch-ua-mobile": "?0",
                    "sec-ch-ua-platform": '"Windows"',
                    "sec-fetch-dest": "empty",
                    "sec-fetch-mode": "cors",
                    "sec-fetch-site": "same-origin",
                    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
                }
            )
        if scraper_input.proxy:
            proxy_url = scraper_input.proxy
            proxies = {"http": proxy_url, "https": proxy_url}
            self.session.proxies.update(proxies)
        self.listing_type = scraper_input.listing_type
        self.radius = scraper_input.radius
@@ -103,8 +95,10 @@ class Scraper:
        self.mls_only = scraper_input.mls_only
        self.date_from = scraper_input.date_from
        self.date_to = scraper_input.date_to
        self.date_from_precision = scraper_input.date_from_precision
        self.date_to_precision = scraper_input.date_to_precision
        self.foreclosure = scraper_input.foreclosure
-        self.extra_property_data = scraper_input.extra_property_data
+        self.extra_property_data = False  # TODO: temporarily disabled
        self.exclude_pending = scraper_input.exclude_pending
        self.limit = scraper_input.limit
        self.offset = scraper_input.offset
@@ -112,8 +106,10 @@ class Scraper:
        # New date/time filtering
        self.past_hours = scraper_input.past_hours
-        self.datetime_from = scraper_input.datetime_from
+
-        self.datetime_to = scraper_input.datetime_to
+        # New last_update_date filtering
        self.updated_since = scraper_input.updated_since
        self.updated_in_past_hours = scraper_input.updated_in_past_hours
        # New property filtering
        self.beds_min = scraper_input.beds_min
@@ -133,6 +129,9 @@ class Scraper:
        self.sort_by = scraper_input.sort_by
        self.sort_direction = scraper_input.sort_direction
        # Pagination control
        self.parallel = scraper_input.parallel
    def search(self) -> list[Union[Property | dict]]: ...
    @staticmethod
--- a/homeharvest/core/scrapers/models.py
+++ b/homeharvest/core/scrapers/models.py
@@ -43,6 +43,10 @@ class ListingType(Enum):
    FOR_RENT = "FOR_RENT"
    PENDING = "PENDING"
    SOLD = "SOLD"
    OFF_MARKET = "OFF_MARKET"
    NEW_COMMUNITY = "NEW_COMMUNITY"
    OTHER = "OTHER"
    READY_TO_BUILD = "READY_TO_BUILD"
 class PropertyType(Enum):
@@ -193,6 +197,7 @@ class Property(BaseModel):
    pending_date: datetime | None = Field(None, description="The date listing went into pending state")
    last_sold_date: datetime | None = Field(None, description="Last time the Home was sold")
    last_status_change_date: datetime | None = Field(None, description="Last time the status of the listing changed")
    last_update_date: datetime | None = Field(None, description="Last time the home was updated")
    prc_sqft: int | None = None
    new_construction: bool | None = Field(None, description="Search for new construction homes")
    hoa_fee: int | None = Field(None, description="Search for homes where HOA fee is known and falls within specified range")
--- a/homeharvest/core/scrapers/realtor/init.py
+++ b/homeharvest/core/scrapers/realtor/init.py
@@ -8,25 +8,27 @@ This module implements the scraper for realtor.com
 from __future__ import annotations
 import json
 import requests
 from concurrent.futures import ThreadPoolExecutor, as_completed
 from datetime import datetime
 from json import JSONDecodeError
 from typing import Dict, Union
 from tenacity import (
    retry,
    retry_if_exception_type,
    retry_if_not_exception_type,
    wait_exponential,
    stop_after_attempt,
 )
-from .. import Scraper
+from .. import Scraper, DEFAULT_HEADERS
 from ....exceptions import AuthenticationError
 from ..models import (
    Property,
    ListingType,
    ReturnType
 )
-from .queries import GENERAL_RESULTS_QUERY, SEARCH_HOMES_DATA, HOMES_DATA, HOME_FRAGMENT
+from .queries import GENERAL_RESULTS_QUERY, HOMES_DATA, SEARCH_SUGGESTIONS_QUERY
 from .processors import (
    process_property,
    process_extra_property_details,
@@ -35,56 +37,127 @@ from .processors import (
 class RealtorScraper(Scraper):
-    SEARCH_GQL_URL = "https://www.realtor.com/api/v1/rdc_search_srp?client_id=rdc-search-new-communities&schema=vesta"
+    SEARCH_GQL_URL = "https://www.realtor.com/frontdoor/graphql"
    PROPERTY_URL = "https://www.realtor.com/realestateandhomes-detail/"
    PROPERTY_GQL = "https://graph.realtor.com/graphql"
    ADDRESS_AUTOCOMPLETE_URL = "https://parser-external.geo.moveaws.com/suggest"
    NUM_PROPERTY_WORKERS = 20
    DEFAULT_PAGE_SIZE = 200
    def __init__(self, scraper_input):
        super().__init__(scraper_input)
-    def handle_location(self):
+    @staticmethod
-        params = {
+    def _minify_query(query: str) -> str:
-            "input": self.location,
+        """Minify GraphQL query by collapsing whitespace to single spaces."""
-            "client_id": self.listing_type.value.lower().replace("_", "-"),
+        # Split on whitespace, filter empty strings, join with single space
-            "limit": "1",
+        return ' '.join(query.split())
-            "area_types": "city,state,county,postal_code,address,street,neighborhood,school,school_district,university,park",
+
    def _graphql_post(self, query: str, variables: dict, operation_name: str) -> dict:
        """
        Execute a GraphQL query.
        Args:
            query: GraphQL query string (must include operationName matching operation_name param)
            variables: Query variables dictionary
            operation_name: Name of the GraphQL operation
        Returns:
            Response JSON dictionary
        """
        payload = {
            "operationName": operation_name,
            "query": self._minify_query(query),
            "variables": variables,
        }
-        response = self.session.get(
+        response = requests.post(
-            self.ADDRESS_AUTOCOMPLETE_URL,
+            self.SEARCH_GQL_URL,
-            params=params,
+            headers=DEFAULT_HEADERS,
            data=json.dumps(payload, separators=(',', ':')),
            proxies=self.proxies
        )
        response_json = response.json()
-        result = response_json["autocomplete"]
+        if response.status_code == 403:
            if not self.proxy:
                raise AuthenticationError(
                    "Received 403 Forbidden from Realtor.com API.",
                    response=response
                )
            else:
                raise Exception("Received 403 Forbidden, retrying...")
-        if not result:
+        return response.json()
    @retry(
        retry=retry_if_exception_type(Exception),
        wait=wait_exponential(multiplier=1, min=1, max=4),
        stop=stop_after_attempt(3),
    )
    def handle_location(self):
        variables = {
            "searchInput": {
                "search_term": self.location
            }
        }
        response_json = self._graphql_post(SEARCH_SUGGESTIONS_QUERY, variables, "Search_suggestions")
        if (
            response_json is None
            or "data" not in response_json
            or response_json["data"] is None
            or "search_suggestions" not in response_json["data"]
            or response_json["data"]["search_suggestions"] is None
            or "geo_results" not in response_json["data"]["search_suggestions"]
            or not response_json["data"]["search_suggestions"]["geo_results"]
        ):
            # If we got a 400 error with "Required parameter is missing", raise to trigger retry
            if response_json and "errors" in response_json:
                error_msgs = [e.get("message", "") for e in response_json.get("errors", [])]
                if any("Required parameter is missing" in msg for msg in error_msgs):
                    raise Exception(f"Transient API error: {error_msgs}")
            return None
-        return result[0]
+        geo_result = response_json["data"]["search_suggestions"]["geo_results"][0]
        geo = geo_result.get("geo", {})
        result = {
            "text": geo_result.get("text"),
            "area_type": geo.get("area_type"),
            "city": geo.get("city"),
            "state_code": geo.get("state_code"),
            "postal_code": geo.get("postal_code"),
            "county": geo.get("county"),
            "centroid": geo.get("centroid"),
        }
        if geo.get("area_type") == "address":
            # Try to get mpr_id directly from API response first
            if geo.get("mpr_id"):
                result["mpr_id"] = geo.get("mpr_id")
            else:
                # Fallback: extract from _id field if it has addr: prefix
                geo_id = geo.get("_id", "")
                if geo_id.startswith("addr:"):
                    result["mpr_id"] = geo_id.replace("addr:", "")
        return result
    def get_latest_listing_id(self, property_id: str) -> str | None:
-        query = """query Property($property_id: ID!) {
+        query = """
                fragment ListingFragment on Listing {
                    listing_id
                    primary
                }
                query GetPropertyListingId($property_id: ID!) {
                    property(id: $property_id) {
                        listings {
-                            listing_id
+                            ...ListingFragment
                            primary
                        }
                    }
                }
                """
        variables = {"property_id": property_id}
-        payload = {
+        response_json = self._graphql_post(query, variables, "GetPropertyListingId")
            "query": query,
            "variables": variables,
        }
        response = self.session.post(self.SEARCH_GQL_URL, json=payload)
        response_json = response.json()
        property_info = response_json["data"]["property"]
        if property_info["listings"] is None:
@@ -100,31 +173,40 @@ class RealtorScraper(Scraper):
            return property_info["listings"][0]["listing_id"]
    def handle_home(self, property_id: str) -> list[Property]:
        """Fetch single home with proper error handling."""
        query = (
-            """query Home($property_id: ID!) {
+            """query GetHomeDetails($property_id: ID!) {
                    home(property_id: $property_id) %s
                }"""
            % HOMES_DATA
        )
        variables = {"property_id": property_id}
        payload = {
            "query": query,
            "variables": variables,
        }
-        response = self.session.post(self.SEARCH_GQL_URL, json=payload)
+        try:
-        response_json = response.json()
+            data = self._graphql_post(query, variables, "GetHomeDetails")
-        property_info = response_json["data"]["home"]
+            # Check for errors or missing data
            if "errors" in data or "data" not in data:
                return []
-        if self.return_type != ReturnType.raw:
+            if data["data"] is None or "home" not in data["data"]:
-            return [process_property(property_info, self.mls_only, self.extra_property_data, 
+                return []
                                   self.exclude_pending, self.listing_type, get_key, process_extra_property_details)]
        else:
            return [property_info]
            property_info = data["data"]["home"]
            if property_info is None:
                return []
            # Process based on return type
            if self.return_type != ReturnType.raw:
                return [process_property(property_info, self.mls_only, self.extra_property_data,
                                       self.exclude_pending, self.listing_type, get_key,
                                       process_extra_property_details)]
            else:
                return [property_info]
        except Exception:
            return []
    def general_search(self, variables: dict, search_type: str) -> Dict[str, Union[int, Union[list[Property], list[dict]]]]:
        """
@@ -134,34 +216,56 @@ class RealtorScraper(Scraper):
        date_param = ""
        # Determine date field based on listing type
-        if self.listing_type == ListingType.SOLD:
+        # Convert listing_type to list for uniform handling
-            date_field = "sold_date"
+        if self.listing_type is None:
-        elif self.listing_type in [ListingType.FOR_SALE, ListingType.FOR_RENT]:
+            # When None, return all common listing types as documented
-            date_field = "list_date"
+            # Note: NEW_COMMUNITY, OTHER, and READY_TO_BUILD are excluded as they typically return no results
-        else:  # PENDING
+            listing_types = [
-            # Skip server-side date filtering for PENDING as both pending_date and contract_date
+                ListingType.FOR_SALE,
-            # filters are broken in the API. Client-side filtering will be applied later.
+                ListingType.FOR_RENT,
-            date_field = None
+                ListingType.SOLD,
                ListingType.PENDING,
                ListingType.OFF_MARKET,
            ]
            date_field = None  # When no listing_type is specified, skip date filtering
        elif isinstance(self.listing_type, list):
            listing_types = self.listing_type
            # For multiple types, we'll use a general date field or skip
            date_field = None  # Skip date filtering for mixed types
        else:
            listing_types = [self.listing_type]
            # Determine date field for single type
            if self.listing_type == ListingType.SOLD:
                date_field = "sold_date"
            elif self.listing_type in [ListingType.FOR_SALE, ListingType.FOR_RENT]:
                date_field = "list_date"
            else:  # PENDING or other types
                # Skip server-side date filtering for PENDING as both pending_date and contract_date
                # filters are broken in the API. Client-side filtering will be applied later.
                date_field = None
        # Build date parameter (expand to full days if hour-based filtering is used)
        if date_field:
-            if self.datetime_from or self.datetime_to:
+            # Check if we have hour precision (need to extract date part for API, then filter client-side)
            has_hour_precision = (self.date_from_precision == "hour" or self.date_to_precision == "hour")
            if has_hour_precision and (self.date_from or self.date_to):
                # Hour-based datetime filtering: extract date parts for API, client-side filter by hours
                from datetime import datetime
                min_date = None
                max_date = None
-                if self.datetime_from:
+                if self.date_from:
                    try:
-                        dt_from = datetime.fromisoformat(self.datetime_from.replace('Z', '+00:00'))
+                        dt_from = datetime.fromisoformat(self.date_from.replace('Z', '+00:00'))
                        min_date = dt_from.strftime("%Y-%m-%d")
                    except (ValueError, AttributeError):
                        pass
-                if self.datetime_to:
+                if self.date_to:
                    try:
-                        dt_to = datetime.fromisoformat(self.datetime_to.replace('Z', '+00:00'))
+                        dt_to = datetime.fromisoformat(self.date_to.replace('Z', '+00:00'))
                        max_date = dt_to.strftime("%Y-%m-%d")
                    except (ValueError, AttributeError):
                        pass
@@ -250,13 +354,19 @@ class RealtorScraper(Scraper):
        # Build sort parameter
        if self.sort_by:
            sort_param = f"sort: [{{ field: {self.sort_by}, direction: {self.sort_direction} }}]"
-        elif self.listing_type == ListingType.SOLD:
+        elif isinstance(self.listing_type, ListingType) and self.listing_type == ListingType.SOLD:
            sort_param = "sort: [{ field: sold_date, direction: desc }]"
        else:
            sort_param = ""  #: prioritize normal fractal sort from realtor
        # Handle PENDING with or_filters
        # Only use or_filters when PENDING is the only type or mixed only with FOR_SALE
        # Using or_filters with other types (SOLD, FOR_RENT, etc.) will exclude those types
        has_pending = ListingType.PENDING in listing_types
        other_types = [lt for lt in listing_types if lt not in [ListingType.PENDING, ListingType.FOR_SALE]]
        use_or_filters = has_pending and len(other_types) == 0
        pending_or_contingent_param = (
-            "or_filters: { contingent: true, pending: true }" if self.listing_type == ListingType.PENDING else ""
+            "or_filters: { contingent: true, pending: true }" if use_or_filters else ""
        )
        # Build bucket parameter (only use fractal sort if no custom sort is specified)
@@ -264,7 +374,27 @@ class RealtorScraper(Scraper):
        if not self.sort_by:
            bucket_param = 'bucket: { sort: "fractal_v1.1.3_fr" }'
-        listing_type = ListingType.FOR_SALE if self.listing_type == ListingType.PENDING else self.listing_type
+        # Build status parameter
        # For PENDING, we need to query as FOR_SALE with or_filters for pending/contingent
        status_types = []
        for lt in listing_types:
            if lt == ListingType.PENDING:
                if ListingType.FOR_SALE not in status_types:
                    status_types.append(ListingType.FOR_SALE)
            else:
                if lt not in status_types:
                    status_types.append(lt)
        # Build status parameter string
        if status_types:
            status_values = [st.value.lower() for st in status_types]
            if len(status_values) == 1:
                status_param = f"status: {status_values[0]}"
            else:
                status_param = f"status: [{', '.join(status_values)}]"
        else:
            status_param = ""  # No status parameter means return all types
        is_foreclosure = ""
        if variables.get("foreclosure") is True:
@@ -273,19 +403,19 @@ class RealtorScraper(Scraper):
            is_foreclosure = "foreclosure: false"
        if search_type == "comps":  #: comps search, came from an address
-            query = """query Property_search(
+            query = """query GetHomeSearch(
                    $coordinates: [Float]!
                    $radius: String!
                    $offset: Int!,
                    ) {
-                        home_search(
+                        homeSearch: home_search(
                            query: {
                                %s
                                nearby: {
                                    coordinates: $coordinates
                                    radius: $radius
                                }
-                                status: %s
+                                %s
                                %s
                                %s
                                %s
@@ -297,7 +427,7 @@ class RealtorScraper(Scraper):
                    ) %s
                }""" % (
                is_foreclosure,
-                listing_type.value.lower(),
+                status_param,
                date_param,
                property_type_param,
                property_filters_param,
@@ -306,21 +436,15 @@ class RealtorScraper(Scraper):
                GENERAL_RESULTS_QUERY,
            )
        elif search_type == "area":  #: general search, came from a general location
-            query = """query Home_search(
+            query = """query GetHomeSearch(
-                                $city: String,
+                                $search_location: SearchLocation,
-                                $county: [String],
+                                $offset: Int
                                $state_code: String,
                                $postal_code: String
                                $offset: Int,
                            ) {
-                                home_search(
+                                homeSearch: home_search(
                                    query: {
                                        %s
-                                        city: $city
+                                        search_location: $search_location
-                                        county: $county
+                                        %s
                                        postal_code: $postal_code
                                        state_code: $state_code
                                        status: %s
                                        %s
                                        %s
                                        %s
@@ -333,7 +457,7 @@ class RealtorScraper(Scraper):
                                ) %s
                            }""" % (
                is_foreclosure,
-                listing_type.value.lower(),
+                status_param,
                date_param,
                property_type_param,
                property_filters_param,
@@ -344,11 +468,11 @@ class RealtorScraper(Scraper):
            )
        else:  #: general search, came from an address
            query = (
-                """query Property_search(
+                """query GetHomeSearch(
                        $property_id: [ID]!
                        $offset: Int!,
                    ) {
-                        home_search(
+                        homeSearch: home_search(
                            query: {
                                property_id: $property_id
                            }
@@ -359,14 +483,8 @@ class RealtorScraper(Scraper):
                % GENERAL_RESULTS_QUERY
            )
-        payload = {
+        response_json = self._graphql_post(query, variables, "GetHomeSearch")
-            "query": query,
+        search_key = "homeSearch"
            "variables": variables,
        }
        response = self.session.post(self.SEARCH_GQL_URL, json=payload)
        response_json = response.json()
        search_key = "home_search" if "home_search" in query else "property_search"
        properties: list[Union[Property, dict]] = []
@@ -455,24 +573,16 @@ class RealtorScraper(Scraper):
                if not location_info.get("centroid"):
                    return []
-                coordinates = list(location_info["centroid"].values())
+                centroid = location_info["centroid"]
                coordinates = [centroid["lon"], centroid["lat"]]  # GeoJSON order: [lon, lat]
                search_variables |= {
                    "coordinates": coordinates,
                    "radius": "{}mi".format(self.radius),
                }
-        elif location_type == "postal_code":
+        else:  #: general search (city, county, postal_code, etc.)
            search_variables |= {
-                "postal_code": location_info.get("postal_code"),
+                "search_location": {"location": location_info.get("text")},
            }
        else:  #: general search, location
            search_variables |= {
                "city": location_info.get("city"),
                "county": location_info.get("county"),
                "state_code": location_info.get("state_code"),
                "postal_code": location_info.get("postal_code"),
            }
        if self.foreclosure:
@@ -482,52 +592,80 @@ class RealtorScraper(Scraper):
        total = result["total"]
        homes = result["properties"]
-        with ThreadPoolExecutor() as executor:
+        # Fetch remaining pages based on parallel parameter
-            # Store futures with their offsets to maintain proper sort order
+        if self.offset + self.DEFAULT_PAGE_SIZE < min(total, self.offset + self.limit):
-            # Start from offset + page_size and go up to offset + limit
+            if self.parallel:
-            futures_with_offsets = [
+                # Parallel mode: Fetch all remaining pages in parallel
-                (i, executor.submit(
+                with ThreadPoolExecutor() as executor:
-                    self.general_search,
+                    futures_with_offsets = [
-                    variables=search_variables | {"offset": i},
+                        (i, executor.submit(
-                    search_type=search_type,
+                            self.general_search,
-                ))
+                            variables=search_variables | {"offset": i},
-                for i in range(
+                            search_type=search_type,
                        ))
                        for i in range(
                            self.offset + self.DEFAULT_PAGE_SIZE,
                            min(total, self.offset + self.limit),
                            self.DEFAULT_PAGE_SIZE,
                        )
                    ]
                    # Collect results and sort by offset to preserve API sort order
                    results = []
                    for offset, future in futures_with_offsets:
                        results.append((offset, future.result()["properties"]))
                    results.sort(key=lambda x: x[0])
                    for offset, properties in results:
                        homes.extend(properties)
            else:
                # Sequential mode: Fetch pages one by one with early termination checks
                for current_offset in range(
                    self.offset + self.DEFAULT_PAGE_SIZE,
                    min(total, self.offset + self.limit),
                    self.DEFAULT_PAGE_SIZE,
-                )
+                ):
-            ]
+                    # Check if we should continue based on time-based filters
                    if not self._should_fetch_more_pages(homes):
                        break
-            # Collect results and sort by offset to preserve API sort order across pages
+                    result = self.general_search(
-            results = []
+                        variables=search_variables | {"offset": current_offset},
-            for offset, future in futures_with_offsets:
+                        search_type=search_type,
-                results.append((offset, future.result()["properties"]))
+                    )
-
+                    page_properties = result["properties"]
-            # Sort by offset and concatenate in correct order
+                    homes.extend(page_properties)
            results.sort(key=lambda x: x[0])
            for offset, properties in results:
                homes.extend(properties)
        # Apply client-side hour-based filtering if needed
        # (API only supports day-level filtering, so we post-filter for hour precision)
-        if self.past_hours or self.datetime_from or self.datetime_to:
+        has_hour_precision = (self.date_from_precision == "hour" or self.date_to_precision == "hour")
        if self.past_hours or has_hour_precision:
            homes = self._apply_hour_based_date_filter(homes)
        # Apply client-side date filtering for PENDING properties
        # (server-side filters are broken in the API)
        elif self.listing_type == ListingType.PENDING and (self.last_x_days or self.date_from):
            homes = self._apply_pending_date_filter(homes)
        # Apply client-side filtering by last_update_date if specified
        if self.updated_since or self.updated_in_past_hours:
            homes = self._apply_last_update_date_filter(homes)
        # Apply client-side sort to ensure results are properly ordered
        # This is necessary after filtering and to guarantee sort order across page boundaries
        if self.sort_by:
            homes = self._apply_sort(homes)
        # Apply raw data filters (exclude_pending and mls_only) for raw return type
        # These filters are normally applied in process_property() but are bypassed for raw data
        if self.return_type == ReturnType.raw:
            homes = self._apply_raw_data_filters(homes)
        return homes
    def _apply_hour_based_date_filter(self, homes):
        """Apply client-side hour-based date filtering for all listing types.
-        This is used when past_hours, datetime_from, or datetime_to are specified,
+        This is used when past_hours or date_from/date_to have hour precision,
        since the API only supports day-level filtering.
        """
        if not homes:
@@ -541,17 +679,17 @@ class RealtorScraper(Scraper):
        if self.past_hours:
            cutoff_datetime = datetime.now() - timedelta(hours=self.past_hours)
            date_range = {'type': 'since', 'date': cutoff_datetime}
-        elif self.datetime_from or self.datetime_to:
+        elif self.date_from or self.date_to:
            try:
                from_datetime = None
                to_datetime = None
-                if self.datetime_from:
+                if self.date_from:
-                    from_datetime_str = self.datetime_from.replace('Z', '+00:00') if self.datetime_from.endswith('Z') else self.datetime_from
+                    from_datetime_str = self.date_from.replace('Z', '+00:00') if self.date_from.endswith('Z') else self.date_from
                    from_datetime = datetime.fromisoformat(from_datetime_str).replace(tzinfo=None)
-                if self.datetime_to:
+                if self.date_to:
-                    to_datetime_str = self.datetime_to.replace('Z', '+00:00') if self.datetime_to.endswith('Z') else self.datetime_to
+                    to_datetime_str = self.date_to.replace('Z', '+00:00') if self.date_to.endswith('Z') else self.date_to
                    to_datetime = datetime.fromisoformat(to_datetime_str).replace(tzinfo=None)
                if from_datetime and to_datetime:
@@ -684,17 +822,66 @@ class RealtorScraper(Scraper):
                return getattr(home.flags, 'is_contingent', False)
            return False
    def _apply_last_update_date_filter(self, homes):
        """Apply client-side filtering by last_update_date.
        This is used when updated_since or updated_in_past_hours are specified.
        Filters properties based on when they were last updated.
        """
        if not homes:
            return homes
        from datetime import datetime, timedelta, timezone
        # Determine date range for last_update_date filtering
        date_range = None
        if self.updated_in_past_hours:
            # Use UTC now, strip timezone to match naive property dates
            cutoff_datetime = (datetime.now(timezone.utc) - timedelta(hours=self.updated_in_past_hours)).replace(tzinfo=None)
            date_range = {'type': 'since', 'date': cutoff_datetime}
        elif self.updated_since:
            try:
                since_datetime_str = self.updated_since.replace('Z', '+00:00') if self.updated_since.endswith('Z') else self.updated_since
                since_datetime = datetime.fromisoformat(since_datetime_str).replace(tzinfo=None)
                date_range = {'type': 'since', 'date': since_datetime}
            except (ValueError, AttributeError):
                return homes  # If parsing fails, return unfiltered
        if not date_range:
            return homes
        filtered_homes = []
        for home in homes:
            # Extract last_update_date from the property
            property_date = self._extract_date_from_home(home, 'last_update_date')
            # Skip properties without last_update_date
            if property_date is None:
                continue
            # Check if property date falls within the specified range
            if self._is_datetime_in_range(property_date, date_range):
                filtered_homes.append(home)
        return filtered_homes
    def _get_date_range(self):
        """Get the date range for filtering based on instance parameters."""
-        from datetime import datetime, timedelta
+        from datetime import datetime, timedelta, timezone
        if self.last_x_days:
-            cutoff_date = datetime.now() - timedelta(days=self.last_x_days)
+            # Use UTC now, strip timezone to match naive property dates
            cutoff_date = (datetime.now(timezone.utc) - timedelta(days=self.last_x_days)).replace(tzinfo=None)
            return {'type': 'since', 'date': cutoff_date}
        elif self.date_from and self.date_to:
            try:
-                from_date = datetime.fromisoformat(self.date_from)
+                # Parse and strip timezone to match naive property dates
-                to_date = datetime.fromisoformat(self.date_to)
+                from_date_str = self.date_from.replace('Z', '+00:00') if self.date_from.endswith('Z') else self.date_from
                to_date_str = self.date_to.replace('Z', '+00:00') if self.date_to.endswith('Z') else self.date_to
                from_date = datetime.fromisoformat(from_date_str).replace(tzinfo=None)
                to_date = datetime.fromisoformat(to_date_str).replace(tzinfo=None)
                return {'type': 'range', 'from_date': from_date, 'to_date': to_date}
            except ValueError:
                return None
@@ -746,6 +933,74 @@ class RealtorScraper(Scraper):
            return date_range['from_date'] <= date_obj <= date_range['to_date']
        return False
    def _should_fetch_more_pages(self, first_page):
        """Determine if we should continue pagination based on first page results.
        This optimization prevents unnecessary API calls when using time-based filters
        with date sorting. If the last property on page 1 is already outside the time
        window, all future pages will also be outside (due to sort order).
        Args:
            first_page: List of properties from the first page
        Returns:
            bool: True if we should continue pagination, False to stop early
        """
        from datetime import datetime, timedelta, timezone
        # Check for last_update_date filters
        if (self.updated_since or self.updated_in_past_hours) and self.sort_by == "last_update_date":
            if not first_page:
                return False
            last_property = first_page[-1]
            last_date = self._extract_date_from_home(last_property, 'last_update_date')
            if not last_date:
                return True
            # Build date range for last_update_date filter
            if self.updated_since:
                try:
                    cutoff_datetime = datetime.fromisoformat(self.updated_since.replace('Z', '+00:00') if self.updated_since.endswith('Z') else self.updated_since)
                    # Strip timezone to match naive datetimes from _parse_date_value
                    cutoff_datetime = cutoff_datetime.replace(tzinfo=None)
                    date_range = {'type': 'since', 'date': cutoff_datetime}
                except ValueError:
                    return True
            elif self.updated_in_past_hours:
                # Use UTC now, strip timezone to match naive property dates
                cutoff_datetime = (datetime.now(timezone.utc) - timedelta(hours=self.updated_in_past_hours)).replace(tzinfo=None)
                date_range = {'type': 'since', 'date': cutoff_datetime}
            else:
                return True
            return self._is_datetime_in_range(last_date, date_range)
        # Check for PENDING date filters
        if (self.listing_type == ListingType.PENDING and
            (self.last_x_days or self.past_hours or self.date_from) and
            self.sort_by == "pending_date"):
            if not first_page:
                return False
            last_property = first_page[-1]
            last_date = self._extract_date_from_home(last_property, 'pending_date')
            if not last_date:
                return True
            # Build date range for pending date filter
            date_range = self._get_date_range()
            if not date_range:
                return True
            return self._is_datetime_in_range(last_date, date_range)
        # No optimization applicable, continue pagination
        return True
    def _apply_sort(self, homes):
        """Apply client-side sorting to ensure results are properly ordered.
@@ -764,6 +1019,8 @@ class RealtorScraper(Scraper):
        def get_sort_key(home):
            """Extract the sort field value from a home (handles both dict and Property object)."""
            from datetime import datetime
            if isinstance(home, dict):
                value = home.get(self.sort_by)
            else:
@@ -776,23 +1033,26 @@ class RealtorScraper(Scraper):
                return (1, 0) if self.sort_direction == "desc" else (1, float('inf'))
            # For datetime fields, convert string to datetime for proper sorting
-            if self.sort_by in ['list_date', 'sold_date', 'pending_date']:
+            if self.sort_by in ['list_date', 'sold_date', 'pending_date', 'last_update_date']:
                if isinstance(value, str):
                    try:
                        from datetime import datetime
                        # Handle timezone indicators
                        date_value = value
                        if date_value.endswith('Z'):
                            date_value = date_value[:-1] + '+00:00'
                        parsed_date = datetime.fromisoformat(date_value)
-                        return (0, parsed_date)
+                        # Normalize to timezone-naive for consistent comparison
                        return 0, parsed_date.replace(tzinfo=None)
                    except (ValueError, AttributeError):
                        # If parsing fails, treat as None
                        return (1, 0) if self.sort_direction == "desc" else (1, float('inf'))
-                return (0, value)
+                # Handle datetime objects directly (normalize timezone)
                if isinstance(value, datetime):
                    return 0, value.replace(tzinfo=None)
                return 0, value
            # For numeric fields, ensure we can compare
-            return (0, value)
+            return 0, value
        # Sort the homes
        reverse = (self.sort_direction == "desc")
@@ -800,11 +1060,52 @@ class RealtorScraper(Scraper):
        return sorted_homes
    def _apply_raw_data_filters(self, homes):
        """Apply exclude_pending and mls_only filters for raw data returns.
        These filters are normally applied in process_property(), but that function
        is bypassed when return_type="raw", so we need to apply them here instead.
        Args:
            homes: List of properties (either dicts or Property objects)
        Returns:
            Filtered list of properties
        """
        if not homes:
            return homes
        # Only filter raw data (dict objects)
        # Property objects have already been filtered in process_property()
        if homes and not isinstance(homes[0], dict):
            return homes
        filtered_homes = []
        for home in homes:
            # Apply exclude_pending filter
            if self.exclude_pending and self.listing_type != ListingType.PENDING:
                flags = home.get('flags', {})
                is_pending = flags.get('is_pending', False)
                is_contingent = flags.get('is_contingent', False)
                if is_pending or is_contingent:
                    continue  # Skip this property
            # Apply mls_only filter
            if self.mls_only:
                source = home.get('source', {})
                if not source or not source.get('id'):
                    continue  # Skip this property
            filtered_homes.append(home)
        return filtered_homes
    @retry(
-        retry=retry_if_exception_type(JSONDecodeError),
+        retry=retry_if_exception_type((JSONDecodeError, Exception)) & retry_if_not_exception_type(AuthenticationError),
-        wait=wait_exponential(min=4, max=10),
+        wait=wait_exponential(multiplier=1, min=1, max=10),
        stop=stop_after_attempt(3),
    )
    def get_bulk_prop_details(self, property_ids: list[str]) -> dict:
@@ -817,24 +1118,25 @@ class RealtorScraper(Scraper):
        property_ids = list(set(property_ids))
        # Construct the bulk query
        fragments = "\n".join(
-            f'home_{property_id}: home(property_id: {property_id}) {{ ...HomeData }}'
+            f'home_{property_id}: home(property_id: {property_id}) {HOMES_DATA}'
            for property_id in property_ids
        )
-        query = f"""{HOME_FRAGMENT}
+        query = f"""query GetHome {{
    {fragments}
 }}"""
-        query GetHomes {{
+        data = self._graphql_post(query, {}, "GetHome")
            {fragments}
        }}"""
-        response = self.session.post(self.SEARCH_GQL_URL, json={"query": query})
+        if "data" not in data or data["data"] is None:
-        data = response.json()
+            # If we got a 400 error with "Required parameter is missing", raise to trigger retry
-
+            if data and "errors" in data:
-        if "data" not in data:
+                error_msgs = [e.get("message", "") for e in data.get("errors", [])]
                if any("Required parameter is missing" in msg for msg in error_msgs):
                    raise Exception(f"Transient API error: {error_msgs}")
            return {}
        properties = data["data"]
-        return {data.replace('home_', ''): properties[data] for data in properties if properties[data]}
+        return {key.replace('home_', ''): properties[key] for key in properties if properties[key]}
--- a/homeharvest/core/scrapers/realtor/processors.py
+++ b/homeharvest/core/scrapers/realtor/processors.py
@@ -126,6 +126,7 @@ def process_property(result: dict, mls_only: bool = False, extra_property_data:
        last_sold_date=(datetime.fromisoformat(result["last_sold_date"].replace('Z', '+00:00') if result["last_sold_date"].endswith('Z') else result["last_sold_date"]) if result.get("last_sold_date") else None),
        pending_date=(datetime.fromisoformat(result["pending_date"].replace('Z', '+00:00') if result["pending_date"].endswith('Z') else result["pending_date"]) if result.get("pending_date") else None),
        last_status_change_date=(datetime.fromisoformat(result["last_status_change_date"].replace('Z', '+00:00') if result["last_status_change_date"].endswith('Z') else result["last_status_change_date"]) if result.get("last_status_change_date") else None),
        last_update_date=(datetime.fromisoformat(result["last_update_date"].replace('Z', '+00:00') if result["last_update_date"].endswith('Z') else result["last_update_date"]) if result.get("last_update_date") else None),
        new_construction=result["flags"].get("is_new_construction") is True,
        hoa_fee=(result["hoa"]["fee"] if result.get("hoa") and isinstance(result["hoa"], dict) else None),
        latitude=(result["location"]["address"]["coordinate"].get("lat") if able_to_get_lat_long else None),
--- a/homeharvest/core/scrapers/realtor/queries.py
+++ b/homeharvest/core/scrapers/realtor/queries.py
@@ -1,3 +1,193 @@
 SEARCH_RESULTS_FRAGMENT = """
 fragment PropertyResult on SearchHome {
    __typename
    pending_date
    listing_id
    property_id
    href
    permalink
    list_date
    status
    mls_status
    last_sold_price
    last_sold_date
    last_status_change_date
    last_update_date
    list_price
    list_price_max
    list_price_min
    price_per_sqft
    tags
    open_houses {
        start_date
        end_date
        description
        time_zone
        dst
        href
        methods
    }
    details {
        category
        text
        parent_category
    }
    pet_policy {
        cats
        dogs
        dogs_small
        dogs_large
        __typename
    }
    units {
        availability {
          date
          __typename
        }
        description {
          baths_consolidated
          baths
          beds
          sqft
          __typename
        }
        photos(https: true) {
            title
            href
            tags {
                label
            }
        }
        list_price
        __typename
    }
    flags {
        is_contingent
        is_pending
        is_new_construction
    }
    description {
        type
        sqft
        beds
        baths_full
        baths_half
        lot_sqft
        year_built
        garage
        type
        name
        stories
        text
    }
    source {
        id
        listing_id
    }
    hoa {
        fee
    }
    location {
        address {
            street_direction
            street_number
            street_name
            street_suffix
            line
            unit
            city
            state_code
            postal_code
            coordinate {
                lon
                lat
            }
        }
        county {
            name
            fips_code
        }
        neighborhoods {
            name
        }
    }
    tax_record {
        cl_id
        public_record_id
        last_update_date
        apn
        tax_parcel_id
    }
    primary_photo(https: true) {
        href
    }
    advertisers {
        email
        broker {
            name
            fulfillment_id
        }
        type
        name
        fulfillment_id
        builder {
            name
            fulfillment_id
        }
        phones {
            ext
            primary
            type
            number
        }
        office {
            name
            email
            fulfillment_id
            href
            phones {
                number
                type
                primary
                ext
            }
            mls_set
        }
        corporation {
            specialties
            name
            bio
            href
            fulfillment_id
        }
        mls_set
        nrds_id
        state_license
        rental_corporation {
            fulfillment_id
        }
        rental_management {
            name
            href
            fulfillment_id
        }
    }
    current_estimates {
        __typename
        source {
            __typename
            type
            name
        }
        estimate
        estimateHigh: estimate_high
        estimateLow: estimate_low
        date
        isBestHomeValue: isbest_homevalue
    }
 }
 """
 _SEARCH_HOMES_DATA_BASE = """{
    pending_date
    listing_id
@@ -10,6 +200,7 @@ _SEARCH_HOMES_DATA_BASE = """{
    last_sold_price
    last_sold_date
    last_status_change_date
    last_update_date
    list_price
    list_price_max
    list_price_min
@@ -180,8 +371,189 @@ _SEARCH_HOMES_DATA_BASE = """{
 HOME_FRAGMENT = """
-fragment HomeData on Home {
+fragment PropertyResult on Home {
    __typename
    pending_date
    listing_id
    property_id
    href
    permalink
    list_date
    status
    mls_status
    last_sold_price
    last_sold_date
    last_status_change_date
    last_update_date
    list_price
    list_price_max
    list_price_min
    price_per_sqft
    tags
    open_houses {
        start_date
        end_date
        description
        time_zone
        dst
        href
        methods
    }
    details {
        category
        text
        parent_category
    }
    pet_policy {
        cats
        dogs
        dogs_small
        dogs_large
        __typename
    }
    units {
        availability {
          date
          __typename
        }
        description {
          baths_consolidated
          baths
          beds
          sqft
          __typename
        }
        photos(https: true) {
            title
            href
            tags {
                label
            }
        }
        list_price
        __typename
    }
    flags {
        is_contingent
        is_pending
        is_new_construction
    }
    description {
        type
        sqft
        beds
        baths_full
        baths_half
        lot_sqft
        year_built
        garage
        type
        name
        stories
        text
    }
    source {
        id
        listing_id
    }
    hoa {
        fee
    }
    location {
        address {
            street_direction
            street_number
            street_name
            street_suffix
            line
            unit
            city
            state_code
            postal_code
            coordinate {
                lon
                lat
            }
        }
        county {
            name
            fips_code
        }
        neighborhoods {
            name
        }
        parcel {
            parcel_id
        }
    }
    tax_record {
        cl_id
        public_record_id
        last_update_date
        apn
        tax_parcel_id
    }
    primary_photo(https: true) {
        href
    }
    photos(https: true) {
        title
        href
        tags {
            label
        }
    }
    advertisers {
        email
        broker {
            name
            fulfillment_id
        }
        type
        name
        fulfillment_id
        builder {
            name
            fulfillment_id
        }
        phones {
            ext
            primary
            type
            number
        }
        office {
            name
            email
            fulfillment_id
            href
            phones {
                number
                type
                primary
                ext
            }
            mls_set
        }
        corporation {
            specialties
            name
            bio
            href
            fulfillment_id
        }
        mls_set
        nrds_id
        state_license
        rental_corporation {
            fulfillment_id
        }
        rental_management {
            name
            href
            fulfillment_id
        }
    }
    nearbySchools: nearby_schools(radius: 5.0, limit_per_level: 3) {
        __typename schools { district { __typename id name } }
    }
@@ -197,11 +569,6 @@ fragment HomeData on Home {
            last_n_days
        }
    }
    location {
        parcel {
            parcel_id
        }
    }
    taxHistory: tax_history { __typename tax year assessment { __typename building land total } }
    property_history {
        date
@@ -226,6 +593,18 @@ fragment HomeData on Home {
        text
        category
    }
    estimates {
        __typename
        currentValues: current_values {
            __typename
            source { __typename type name }
            estimate
            estimateHigh: estimate_high
            estimateLow: estimate_low
            date
            isBestHomeValue: isbest_homevalue
        }
    }
 }
 """
@@ -299,8 +678,128 @@ current_estimates {
 }
 }""" % _SEARCH_HOMES_DATA_BASE
-GENERAL_RESULTS_QUERY = """{
+# Query body using inline fields (kept for backward compatibility)
 GENERAL_RESULTS_QUERY_BODY = """{
                            count
                            total
                            results %s
                        }""" % SEARCH_HOMES_DATA
 GENERAL_RESULTS_QUERY = """{
                            __typename
                            count
                            total
                            results %s
                        }""" % SEARCH_HOMES_DATA
 LISTING_PHOTOS_FRAGMENT = """
 fragment ListingPhotosFragment on SearchHome {
    __typename
    photos(https: true) {
        __typename
        title
        href
        tags {
            __typename
            label
            probability
        }
    }
 }
 """
 SEARCH_SUGGESTIONS_QUERY = """query Search_suggestions($searchInput: SearchSuggestionsInput!) {
  search_suggestions(search_input: $searchInput) {
    raw_input_parser_result
    typeahead_results {
      display_string
      display_geo
      geo {
        _id
        _score
        mpr_id
        area_type
        city
        state_code
        state
        postal_code
        country
        lat
        lon
        county
        counties {
          name
          fips
          state_code
        }
        slug_id
        geo_id
        score
        name
        city_slug_id
        centroid {
          lat
          lon
        }
        county_needed_for_uniq
        street
        line
        school
        school_id
        school_district
        has_catchment
        university
        university_id
        neighborhood
        park
      }
      url
    }
    geo_results {
      type
      text
      geo {
        _id
        _score
        mpr_id
        area_type
        city
        state_code
        state
        postal_code
        country
        lat
        lon
        county
        counties {
          name
          fips
          state_code
        }
        slug_id
        geo_id
        score
        name
        city_slug_id
        centroid {
          lat
          lon
        }
        county_needed_for_uniq
        street
        line
        school
        school_id
        school_district
        has_catchment
        university
        university_id
        neighborhood
        park
      }
    }
    no_matches
    has_results
    original_string
  }
 }"""
--- a/homeharvest/utils.py
+++ b/homeharvest/utils.py
@@ -38,6 +38,7 @@ ordered_properties = [
    "last_sold_date",
    "last_sold_price",
    "last_status_change_date",
    "last_update_date",
    "assessed_value",
    "estimated_value",
    "tax",
@@ -156,24 +157,45 @@ def process_result(result: Property) -> pd.DataFrame:
    return properties_df[ordered_properties]
-def validate_input(listing_type: str) -> None:
+def validate_input(listing_type: str | list[str] | None) -> None:
-    if listing_type.upper() not in ListingType.__members__:
+    if listing_type is None:
-        raise InvalidListingType(f"Provided listing type, '{listing_type}', does not exist.")
+        return  # None is valid - returns all types
    if isinstance(listing_type, list):
        for lt in listing_type:
            if lt.upper() not in ListingType.__members__:
                raise InvalidListingType(f"Provided listing type, '{lt}', does not exist.")
    else:
        if listing_type.upper() not in ListingType.__members__:
            raise InvalidListingType(f"Provided listing type, '{listing_type}', does not exist.")
 def validate_dates(date_from: str | None, date_to: str | None) -> None:
-    if isinstance(date_from, str) != isinstance(date_to, str):
+    # Allow either date_from or date_to individually, or both together
-        raise InvalidDate("Both date_from and date_to must be provided.")
+    try:
        # Validate and parse date_from if provided
        date_from_obj = None
        if date_from:
            date_from_str = date_from.replace('Z', '+00:00') if date_from.endswith('Z') else date_from
            date_from_obj = datetime.fromisoformat(date_from_str)
-    if date_from and date_to:
+        # Validate and parse date_to if provided
-        try:
+        date_to_obj = None
-            date_from_obj = datetime.strptime(date_from, "%Y-%m-%d")
+        if date_to:
-            date_to_obj = datetime.strptime(date_to, "%Y-%m-%d")
+            date_to_str = date_to.replace('Z', '+00:00') if date_to.endswith('Z') else date_to
            date_to_obj = datetime.fromisoformat(date_to_str)
-            if date_to_obj < date_from_obj:
+        # If both provided, ensure date_to is after date_from
-                raise InvalidDate("date_to must be after date_from.")
+        if date_from_obj and date_to_obj and date_to_obj < date_from_obj:
-        except ValueError:
+            raise InvalidDate(f"date_to ('{date_to}') must be after date_from ('{date_from}').")
-            raise InvalidDate(f"Invalid date format or range")
+
    except ValueError as e:
        # Provide specific guidance on the expected format
        raise InvalidDate(
            f"Invalid date format. Expected ISO 8601 format. "
            f"Examples: '2025-01-20' (date only) or '2025-01-20T14:30:00' (with time). "
            f"Got: date_from='{date_from}', date_to='{date_to}'. Error: {e}"
        )
 def validate_limit(limit: int) -> None:
@@ -213,21 +235,53 @@ def validate_offset(offset: int, limit: int = 10000) -> None:
        )
-def validate_datetime(datetime_str: str | None) -> None:
+def validate_datetime(datetime_value) -> None:
-    """Validate ISO 8601 datetime format."""
+    """Validate datetime value (accepts datetime objects or ISO 8601 strings)."""
-    if not datetime_str:
+    if datetime_value is None:
        return
    # Already a datetime object - valid
    from datetime import datetime as dt, date
    if isinstance(datetime_value, (dt, date)):
        return
    # Must be a string - validate ISO 8601 format
    if not isinstance(datetime_value, str):
        raise InvalidDate(
            f"Invalid datetime value. Expected datetime object, date object, or ISO 8601 string. "
            f"Got: {type(datetime_value).__name__}"
        )
    try:
        # Try parsing as ISO 8601 datetime
-        datetime.fromisoformat(datetime_str.replace('Z', '+00:00'))
+        datetime.fromisoformat(datetime_value.replace('Z', '+00:00'))
    except (ValueError, AttributeError):
        raise InvalidDate(
-            f"Invalid datetime format: '{datetime_str}'. "
+            f"Invalid datetime format: '{datetime_value}'. "
            f"Expected ISO 8601 format (e.g., '2025-01-20T14:30:00' or '2025-01-20')."
        )
 def validate_last_update_filters(updated_since: str | None, updated_in_past_hours: int | None) -> None:
    """Validate last_update_date filtering parameters."""
    if updated_since and updated_in_past_hours:
        raise ValueError(
            "Cannot use both 'updated_since' and 'updated_in_past_hours' parameters together. "
            "Please use only one method to filter by last_update_date."
        )
    # Validate updated_since format if provided
    if updated_since:
        validate_datetime(updated_since)
    # Validate updated_in_past_hours range if provided
    if updated_in_past_hours is not None:
        if updated_in_past_hours < 1:
            raise ValueError(
                f"updated_in_past_hours must be at least 1. Got: {updated_in_past_hours}"
            )
 def validate_filters(
    beds_min: int | None = None,
    beds_max: int | None = None,
@@ -259,7 +313,7 @@ def validate_filters(
 def validate_sort(sort_by: str | None, sort_direction: str | None = "desc") -> None:
    """Validate sort parameters."""
-    valid_sort_fields = ["list_date", "sold_date", "list_price", "sqft", "beds", "baths"]
+    valid_sort_fields = ["list_date", "sold_date", "list_price", "sqft", "beds", "baths", "last_update_date"]
    valid_directions = ["asc", "desc"]
    if sort_by and sort_by not in valid_sort_fields:
@@ -273,3 +327,159 @@ def validate_sort(sort_by: str | None, sort_direction: str | None = "desc") -> N
            f"Invalid sort_direction value: '{sort_direction}'. "
            f"Valid options: {', '.join(valid_directions)}"
        )
 def convert_to_datetime_string(value) -> str | None:
    """
    Convert datetime object or string to ISO 8601 string format with UTC timezone.
    Accepts:
    - datetime.datetime objects (naive or timezone-aware)
      - Naive datetimes are treated as local time and converted to UTC
      - Timezone-aware datetimes are converted to UTC
    - datetime.date objects (treated as midnight UTC)
    - ISO 8601 strings (returned as-is)
    - None (returns None)
    Returns ISO 8601 formatted string with UTC timezone or None.
    Examples:
        >>> # Naive datetime (treated as local time)
        >>> convert_to_datetime_string(datetime(2025, 1, 20, 14, 30))
        '2025-01-20T22:30:00+00:00'  # Assuming PST (UTC-8)
        >>> # Timezone-aware datetime
        >>> convert_to_datetime_string(datetime(2025, 1, 20, 14, 30, tzinfo=timezone.utc))
        '2025-01-20T14:30:00+00:00'
    """
    if value is None:
        return None
    # Already a string - return as-is
    if isinstance(value, str):
        return value
    # datetime.datetime object
    from datetime import datetime, date, timezone
    if isinstance(value, datetime):
        # Handle naive datetime - treat as local time and convert to UTC
        if value.tzinfo is None:
            # Convert naive datetime to aware local time, then to UTC
            local_aware = value.astimezone()
            utc_aware = local_aware.astimezone(timezone.utc)
            return utc_aware.isoformat()
        else:
            # Already timezone-aware, convert to UTC
            utc_aware = value.astimezone(timezone.utc)
            return utc_aware.isoformat()
    # datetime.date object (convert to datetime at midnight UTC)
    if isinstance(value, date):
        utc_datetime = datetime.combine(value, datetime.min.time()).replace(tzinfo=timezone.utc)
        return utc_datetime.isoformat()
    raise ValueError(
        f"Invalid datetime value. Expected datetime object, date object, or ISO 8601 string. "
        f"Got: {type(value).__name__}"
    )
 def extract_timedelta_hours(value) -> int | None:
    """
    Extract hours from int or timedelta object.
    Accepts:
    - int (returned as-is)
    - timedelta objects (converted to total hours)
    - None (returns None)
    Returns integer hours or None.
    """
    if value is None:
        return None
    # Already an int - return as-is
    if isinstance(value, int):
        return value
    # timedelta object - convert to hours
    from datetime import timedelta
    if isinstance(value, timedelta):
        return int(value.total_seconds() / 3600)
    raise ValueError(
        f"Invalid past_hours value. Expected int or timedelta object. "
        f"Got: {type(value).__name__}"
    )
 def extract_timedelta_days(value) -> int | None:
    """
    Extract days from int or timedelta object.
    Accepts:
    - int (returned as-is)
    - timedelta objects (converted to total days)
    - None (returns None)
    Returns integer days or None.
    """
    if value is None:
        return None
    # Already an int - return as-is
    if isinstance(value, int):
        return value
    # timedelta object - convert to days
    from datetime import timedelta
    if isinstance(value, timedelta):
        return int(value.total_seconds() / 86400)  # 86400 seconds in a day
    raise ValueError(
        f"Invalid past_days value. Expected int or timedelta object. "
        f"Got: {type(value).__name__}"
    )
 def detect_precision_and_convert(value):
    """
    Detect if input has time precision and convert to ISO string.
    Accepts:
    - datetime.datetime objects → (ISO string, "hour")
    - datetime.date objects → (ISO string at midnight, "day")
    - ISO 8601 datetime strings with time → (string as-is, "hour")
    - Date-only strings "YYYY-MM-DD" → (string as-is, "day")
    - None → (None, None)
    Returns:
        tuple: (iso_string, precision) where precision is "day" or "hour"
    """
    if value is None:
        return (None, None)
    from datetime import datetime as dt, date
    # datetime.datetime object - has time precision
    if isinstance(value, dt):
        return (value.isoformat(), "hour")
    # datetime.date object - day precision only
    if isinstance(value, date):
        # Convert to datetime at midnight
        return (dt.combine(value, dt.min.time()).isoformat(), "day")
    # String - detect if it has time component
    if isinstance(value, str):
        # ISO 8601 datetime with time component (has 'T' and time)
        if 'T' in value:
            return (value, "hour")
        # Date-only string
        else:
            return (value, "day")
    raise ValueError(
        f"Invalid date value. Expected datetime object, date object, or ISO 8601 string. "
        f"Got: {type(value).__name__}"
    )
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,14 +1,11 @@
 [tool.poetry]
 name = "homeharvest"
-version = "0.7.2"
+version = "0.8.18"
 description = "Real estate scraping library"
 authors = ["Zachary Hampton <zachary@bunsly.com>", "Cullen Watson <cullen@bunsly.com>"]
 homepage = "https://github.com/ZacharyHampton/HomeHarvest"
 readme = "README.md"
 [tool.poetry.scripts]
 homeharvest = "homeharvest.cli:main"
 [tool.poetry.dependencies]
 python = ">=3.9"
 requests = "^2.32.4"
--- a/tests/test_realtor.py
+++ b/tests/test_realtor.py
@@ -1,3 +1,6 @@
 import pytz
 from concurrent.futures import ThreadPoolExecutor, as_completed
 from homeharvest import scrape_property, Property
 import pandas as pd
@@ -85,6 +88,25 @@ def test_realtor_date_range_sold():
    )
 def test_listing_type_none_includes_sold():
    """Test that listing_type=None includes sold listings (issue #142)"""
    # Get properties with listing_type=None (should include all common types)
    result_none = scrape_property(
        location="Warren, MI",
        listing_type=None
    )
    # Verify we got results
    assert result_none is not None and len(result_none) > 0
    # Verify sold listings are included
    status_types = set(result_none['status'].unique())
    assert 'SOLD' in status_types, "SOLD listings should be included when listing_type=None"
    # Verify we get multiple listing types (not just one)
    assert len(status_types) > 1, "Should return multiple listing types when listing_type=None"
 def test_realtor_single_property():
    results = [
        scrape_property(
@@ -169,7 +191,13 @@ def test_realtor_without_extra_details():
        ),
    ]
-    assert not results[0].equals(results[1])
+    # When extra_property_data=False, these fields should be None
    extra_fields = ["nearby_schools", "assessed_value", "tax", "tax_history"]
    # Check that all extra fields are None when extra_property_data=False
    for field in extra_fields:
        if field in results[0].columns:
            assert results[0][field].isna().all(), f"Field '{field}' should be None when extra_property_data=False"
 def test_pr_zip_code():
@@ -280,13 +308,37 @@ def test_phone_number_matching():
    assert row["agent_phones"].values[0] == matching_row["agent_phones"].values[0]
 def test_parallel_search_consistency():
    """Test that the same search executed 3 times in parallel returns consistent results"""
    def search_task():
        return scrape_property(
            location="Phoenix, AZ",
            listing_type="for_sale",
            limit=100
        )
    with ThreadPoolExecutor(max_workers=3) as executor:
        futures = [executor.submit(search_task) for _ in range(3)]
        results = [future.result() for future in as_completed(futures)]
    # Verify all results are valid
    assert all([result is not None for result in results])
    assert all([isinstance(result, pd.DataFrame) for result in results])
    assert all([len(result) > 0 for result in results])
    # Verify all results have the same length (primary consistency check)
    lengths = [len(result) for result in results]
    assert len(set(lengths)) == 1, \
        f"All parallel searches should return same number of results, got lengths: {lengths}"
 def test_return_type():
    results = {
        "pandas": [scrape_property(location="Surprise, AZ", listing_type="for_rent", limit=100)],
        "pydantic": [scrape_property(location="Surprise, AZ", listing_type="for_rent", limit=100, return_type="pydantic")],
        "raw": [
            scrape_property(location="Surprise, AZ", listing_type="for_rent", limit=100, return_type="raw"),
-            scrape_property(location="66642", listing_type="for_rent", limit=100, return_type="raw"),
+            scrape_property(location="85281", listing_type="for_rent", limit=100, return_type="raw"),
        ],
    }
@@ -607,7 +659,7 @@ def test_past_hours_all_listing_types():
 def test_datetime_filtering():
-    """Test datetime_from and datetime_to parameters with hour precision"""
+    """Test date_from and date_to parameters with hour precision"""
    from datetime import datetime, timedelta
    # Get a recent date range (e.g., yesterday)
@@ -618,28 +670,28 @@ def test_datetime_filtering():
    result = scrape_property(
        location="Dallas, TX",
        listing_type="for_sale",
-        datetime_from=f"{date_str}T09:00:00",
+        date_from=f"{date_str}T09:00:00",
-        datetime_to=f"{date_str}T17:00:00",
+        date_to=f"{date_str}T17:00:00",
        limit=30
    )
    assert result is not None
-    # Test with only datetime_from
+    # Test with only date_from
    result_from_only = scrape_property(
        location="Houston, TX",
        listing_type="for_sale",
-        datetime_from=f"{date_str}T00:00:00",
+        date_from=f"{date_str}T00:00:00",
        limit=30
    )
    assert result_from_only is not None
-    # Test with only datetime_to
+    # Test with only date_to
    result_to_only = scrape_property(
        location="Austin, TX",
        listing_type="for_sale",
-        datetime_to=f"{date_str}T23:59:59",
+        date_to=f"{date_str}T23:59:59",
        limit=30
    )
@@ -1106,8 +1158,10 @@ def test_last_status_change_date_field():
    )
    assert result_pending is not None
-    assert "last_status_change_date" in result_pending.columns, \
+    # Only check columns if we have results (empty DataFrame has no columns)
-        "last_status_change_date column should be present in PENDING results"
+    if len(result_pending) > 0:
        assert "last_status_change_date" in result_pending.columns, \
            "last_status_change_date column should be present in PENDING results"
    # Test 3: Field is present in FOR_SALE listings
    result_for_sale = scrape_property(
@@ -1270,3 +1324,317 @@ def test_last_status_change_date_hour_filtering():
                        f"PENDING property pending_date {pending_date} should be within 48 hours of {cutoff_time}"
                except (ValueError, TypeError):
                    pass  # Skip if parsing fails
 def test_exclude_pending_with_raw_data():
    """Test that exclude_pending parameter works correctly with return_type='raw'"""
    # Query for sale properties with exclude_pending=True and raw data
    result = scrape_property(
        location="Phoenix, AZ",
        listing_type="for_sale",
        exclude_pending=True,
        return_type="raw",
        limit=50
    )
    assert result is not None and len(result) > 0
    # Verify that no pending or contingent properties are in the results
    for prop in result:
        flags = prop.get('flags', {})
        is_pending = flags.get('is_pending', False)
        is_contingent = flags.get('is_contingent', False)
        assert not is_pending, f"Property {prop.get('property_id')} should not be pending when exclude_pending=True"
        assert not is_contingent, f"Property {prop.get('property_id')} should not be contingent when exclude_pending=True"
 def test_mls_only_with_raw_data():
    """Test that mls_only parameter works correctly with return_type='raw'"""
    # Query with mls_only=True and raw data
    result = scrape_property(
        location="Dallas, TX",
        listing_type="for_sale",
        mls_only=True,
        return_type="raw",
        limit=50
    )
    assert result is not None and len(result) > 0
    # Verify that all properties have MLS IDs (stored in source.id)
    for prop in result:
        source = prop.get('source', {})
        mls_id = source.get('id') if source else None
        assert mls_id is not None and mls_id != "", \
            f"Property {prop.get('property_id')} should have an MLS ID (source.id) when mls_only=True, got: {mls_id}"
 def test_combined_filters_with_raw_data():
    """Test that both exclude_pending and mls_only work together with return_type='raw'"""
    # Query with both filters enabled and raw data
    result = scrape_property(
        location="Austin, TX",
        listing_type="for_sale",
        exclude_pending=True,
        mls_only=True,
        return_type="raw",
        limit=30
    )
    assert result is not None and len(result) > 0
    # Verify both filters are applied
    for prop in result:
        # Check exclude_pending filter
        flags = prop.get('flags', {})
        is_pending = flags.get('is_pending', False)
        is_contingent = flags.get('is_contingent', False)
        assert not is_pending, f"Property {prop.get('property_id')} should not be pending"
        assert not is_contingent, f"Property {prop.get('property_id')} should not be contingent"
        # Check mls_only filter
        source = prop.get('source', {})
        mls_id = source.get('id') if source else None
        assert mls_id is not None and mls_id != "", \
            f"Property {prop.get('property_id')} should have an MLS ID (source.id)"
 def test_updated_since_filtering():
    """Test the updated_since parameter for filtering by last_update_date"""
    from datetime import datetime, timedelta
    # Test 1: Filter by last update in past 10 minutes (user's example)
    cutoff_time = datetime.now() - timedelta(minutes=10)
    result_10min = scrape_property(
        location="California",
        updated_since=cutoff_time,
        sort_by="last_update_date",
        sort_direction="desc",
        limit=100
    )
    assert result_10min is not None
    print(f"\n10-minute window returned {len(result_10min)} properties")
    # Test 2: Verify all results have last_update_date within range
    if len(result_10min) > 0:
        for idx in range(min(10, len(result_10min))):
            update_date_str = result_10min.iloc[idx]["last_update_date"]
            if pd.notna(update_date_str):
                try:
                    # Handle timezone-aware datetime strings
                    date_str = str(update_date_str)
                    if '+' in date_str or date_str.endswith('Z'):
                        # Remove timezone for comparison with naive cutoff_time
                        date_str = date_str.replace('+00:00', '').replace('Z', '')
                    update_date = datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S")
                    assert update_date >= cutoff_time, \
                        f"Property last_update_date {update_date} should be >= {cutoff_time}"
                    print(f"Property {idx}: last_update_date = {update_date} (valid)")
                except (ValueError, TypeError) as e:
                    print(f"Warning: Could not parse date {update_date_str}: {e}")
    # Test 3: Compare different time windows
    result_1hour = scrape_property(
        location="California",
        updated_since=datetime.now() - timedelta(hours=1),
        limit=50
    )
    result_24hours = scrape_property(
        location="California",
        updated_since=datetime.now() - timedelta(hours=24),
        limit=50
    )
    print(f"1-hour window: {len(result_1hour)} properties")
    print(f"24-hour window: {len(result_24hours)} properties")
    # Longer time window should return same or more results
    if len(result_1hour) > 0 and len(result_24hours) > 0:
        assert len(result_1hour) <= len(result_24hours), \
            "1-hour filter should return <= 24-hour results"
    # Test 4: Verify sorting works with filtering
    if len(result_10min) > 1:
        # Get non-null dates
        dates = []
        for idx in range(len(result_10min)):
            date_str = result_10min.iloc[idx]["last_update_date"]
            if pd.notna(date_str):
                try:
                    # Handle timezone-aware datetime strings
                    clean_date_str = str(date_str)
                    if '+' in clean_date_str or clean_date_str.endswith('Z'):
                        clean_date_str = clean_date_str.replace('+00:00', '').replace('Z', '')
                    dates.append(datetime.strptime(clean_date_str, "%Y-%m-%d %H:%M:%S"))
                except (ValueError, TypeError):
                    pass
        if len(dates) > 1:
            # Check if sorted descending
            for i in range(len(dates) - 1):
                assert dates[i] >= dates[i + 1], \
                    f"Results should be sorted by last_update_date descending: {dates[i]} >= {dates[i+1]}"
 def test_updated_since_optimization():
    """Test that updated_since optimization works (auto-sort + early termination)"""
    from datetime import datetime, timedelta
    import time
    # Test 1: Verify auto-sort is applied when using updated_since without explicit sort
    start_time = time.time()
    result = scrape_property(
        location="California",
        updated_since=datetime.now() - timedelta(minutes=5),
        # NO sort_by specified - should auto-apply sort_by="last_update_date"
        limit=50
    )
    elapsed_time = time.time() - start_time
    print(f"\nAuto-sort test: {len(result)} properties in {elapsed_time:.2f}s")
    # Should complete quickly due to early termination optimization (<5 seconds)
    assert elapsed_time < 5.0, f"Query should be fast with optimization, took {elapsed_time:.2f}s"
    # Verify results are sorted by last_update_date (proving auto-sort worked)
    if len(result) > 1:
        dates = []
        for idx in range(min(10, len(result))):
            date_str = result.iloc[idx]["last_update_date"]
            if pd.notna(date_str):
                try:
                    clean_date_str = str(date_str)
                    if '+' in clean_date_str or clean_date_str.endswith('Z'):
                        clean_date_str = clean_date_str.replace('+00:00', '').replace('Z', '')
                    dates.append(datetime.strptime(clean_date_str, "%Y-%m-%d %H:%M:%S"))
                except (ValueError, TypeError):
                    pass
        if len(dates) > 1:
            # Verify descending order (most recent first)
            for i in range(len(dates) - 1):
                assert dates[i] >= dates[i + 1], \
                    "Auto-applied sort should order by last_update_date descending"
    print("Auto-sort optimization verified ✓")
 def test_pending_date_optimization():
    """Test that PENDING + date filters get auto-sort and early termination"""
    from datetime import datetime, timedelta
    import time
    # Test: Verify auto-sort is applied for PENDING with past_days
    start_time = time.time()
    result = scrape_property(
        location="California",
        listing_type="pending",
        past_days=7,
        # NO sort_by specified - should auto-apply sort_by="pending_date"
        limit=50
    )
    elapsed_time = time.time() - start_time
    print(f"\nPENDING auto-sort test: {len(result)} properties in {elapsed_time:.2f}s")
    # Should complete quickly due to optimization (<10 seconds)
    assert elapsed_time < 10.0, f"PENDING query should be fast with optimization, took {elapsed_time:.2f}s"
    # Verify results are sorted by pending_date (proving auto-sort worked)
    if len(result) > 1:
        dates = []
        for idx in range(min(10, len(result))):
            date_str = result.iloc[idx]["pending_date"]
            if pd.notna(date_str):
                try:
                    clean_date_str = str(date_str)
                    if '+' in clean_date_str or clean_date_str.endswith('Z'):
                        clean_date_str = clean_date_str.replace('+00:00', '').replace('Z', '')
                    dates.append(datetime.strptime(clean_date_str, "%Y-%m-%d %H:%M:%S"))
                except (ValueError, TypeError):
                    pass
        if len(dates) > 1:
            # Verify descending order (most recent first)
            for i in range(len(dates) - 1):
                assert dates[i] >= dates[i + 1], \
                    "PENDING auto-applied sort should order by pending_date descending"
    print("PENDING optimization verified ✓")
 def test_basic_last_update_date():
    from datetime import datetime, timedelta
    # Test with naive datetime (treated as local time)
    now = datetime.now()
    properties = scrape_property(
        "California",
        updated_since=now - timedelta(minutes=10),
        sort_by="last_update_date",
        sort_direction="desc"
    )
    # Convert now to timezone-aware for comparison with UTC dates in DataFrame
    now_utc = now.astimezone(tz=pytz.timezone("UTC"))
    # Check all last_update_date values are <= now
    assert (properties["last_update_date"] <= now_utc).all()
    # Verify we got some results
    assert len(properties) > 0
 def test_timezone_aware_last_update_date():
    """Test that timezone-aware datetimes work correctly for updated_since"""
    from datetime import datetime, timedelta, timezone
    # Test with timezone-aware datetime (explicit UTC)
    now_utc = datetime.now(timezone.utc)
    properties = scrape_property(
        "California",
        updated_since=now_utc - timedelta(minutes=10),
        sort_by="last_update_date",
        sort_direction="desc"
    )
    # Check all last_update_date values are <= now
    assert (properties["last_update_date"] <= now_utc).all()
    # Verify we got some results
    assert len(properties) > 0
 def test_timezone_handling_date_range():
    """Test timezone handling for date_from and date_to parameters"""
    from datetime import datetime, timedelta
    # Test with naive datetimes for date range (PENDING properties)
    now = datetime.now()
    three_days_ago = now - timedelta(days=3)
    properties = scrape_property(
        "California",
        listing_type="pending",
        date_from=three_days_ago,
        date_to=now
    )
    # Verify we got results and they're within the date range
    if len(properties) > 0:
        # Convert now to UTC for comparison
        now_utc = now.astimezone(tz=pytz.timezone("UTC"))
        assert (properties["pending_date"] <= now_utc).all()
Author	SHA1	Message	Date
zacharyhampton	8a6ac96db4	Refactor scraper to use direct requests and bump to 0.8.18 - Replace session-based approach with direct requests calls - Move headers to module-level DEFAULT_HEADERS constant - Temporarily disable extra_property_data feature 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 00:29:53 -07:00
zacharyhampton	129ab37dff	Version bump to 0.8.17 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 19:11:10 -07:00
zacharyhampton	9a0cac650e	Version bump to 0.8.16	2025-12-21 16:22:03 -07:00
zacharyhampton	a1c1bcc822	Version bump to 0.8.15 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 16:03:57 -07:00
zacharyhampton	6f3faceb27	Version bump to 0.8.14 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 14:32:59 -07:00
zacharyhampton	cab0216f29	Version bump to 0.8.13 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 12:30:46 -07:00
zacharyhampton	8ee720ce5c	Version bump to 0.8.12 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-20 15:30:26 -07:00
zacharyhampton	8eb138ee1a	Version bump to 0.8.11 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-17 22:42:01 -07:00
Zachary Hampton	ef6db606fd	Version bump to 0.8.10 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-16 18:32:33 -08:00
zacharyhampton	9406c92a66	Version bump to 0.8.9 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-14 17:55:33 -08:00
zacharyhampton	fefacdd264	Version bump to 0.8.8 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-14 17:32:06 -08:00
Zachary Hampton	3579c10196	Merge pull request #147 from ZacharyHampton/feature/ios-mobile-headers Improve API stability and reliability	2025-12-05 19:30:25 -08:00
Zachary Hampton	f5784e0191	Update to iOS mobile app headers for improved API stability - Replace browser-based headers with iOS mobile app headers - Update GraphQL query names to match iOS app conventions (1:1 alignment) - Add _graphql_post() wrapper to centralize GraphQL calls with dynamic operation names - Simplify session management by removing unnecessary thread-local complexity - Add test_parallel_search_consistency test to verify concurrent request stability - Bump version from 0.8.6b to 0.8.7 Changes fix API flakiness under concurrent load - parallel consistency test now passes 100% (5/5 runs). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-05 19:27:47 -08:00
Zachary Hampton	57093f5d17	Merge pull request #145 from ZacharyHampton/fix/realtor-403-error Fix 403 error from Realtor.com API changes	2025-12-04 23:10:32 -08:00
zacharyhampton	406ff97260	- version bump	2025-12-04 23:08:37 -08:00
zacharyhampton	a8c9d0fd66	Replace REST autocomplete with GraphQL Search_suggestions query - Replace /suggest REST endpoint with GraphQL Search_suggestions query - Use search_location field instead of individual city/county/state/postal_code fields - Fix coordinate order to [lon, lat] (GeoJSON standard) for radius searches - Extract mpr_id from addr: prefix for single address lookups 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-04 21:08:01 -08:00
Zachary Hampton	0b283e18bd	Fix 403 error from Realtor.com API changes - Update GraphQL endpoint to api.frontdoor.realtor.com - Update HTTP headers with newer Chrome version and correct client name/version - Improve error handling in handle_home method - Fix response validation for missing/null data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-04 18:56:10 -08:00
Zachary Hampton	8bf1f9e24b	Add regression test for listing_type=None including sold listings Adds test_listing_type_none_includes_sold() to verify that when listing_type=None, sold listings are included in the results. This prevents regression of issue #142. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-14 13:38:48 -08:00
Zachary Hampton	79b2b648f5	Fix sold listings not included when listing_type=None (issue #142 ) When listing_type=None, sold listings were excluded despite documentation stating all types should be returned. This fix includes two changes: 1. Explicitly include common listing types (for_sale, for_rent, sold, pending, off_market) when listing_type=None instead of sending empty status parameter 2. Fix or_filters logic to only apply for PENDING when not mixed with other types like SOLD, preventing unintended filtering Updated README documentation to accurately reflect that None returns common listing types rather than all 8 types. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-14 13:30:54 -08:00
Zachary Hampton	c2f01df1ad	Add configurable parallel/sequential pagination with `parallel` parameter - Add `parallel: bool = True` parameter to control pagination strategy - Parallel mode (default): Fetches all pages in parallel for maximum speed - Sequential mode: Fetches pages one-by-one with early termination checks - Early termination stops pagination when time-based filters indicate no more matches - Useful for rate limiting and narrow time windows - Simplified pagination logic by removing hybrid first-page pre-check - Updated README with usage example and parameter documentation - Version bump to 0.8.4 - All 54 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-13 10:36:47 -08:00
Zachary Hampton	9b61a89c77	Fix timezone handling for all date parameters - Treat naive datetimes as local time and convert to UTC automatically - Support both naive and timezone-aware datetimes for updated_since, date_from, date_to - Fix timezone comparison bug that caused incorrect filtering with naive datetimes - Update documentation with clear timezone handling examples - Add comprehensive timezone tests for naive and aware datetimes - Bump version to 0.8.3 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 17:40:21 -08:00
Zachary Hampton	7065f8a0d4	Optimize time-based filtering with auto-sort and early termination ## Performance Optimizations ### Auto-Apply Optimal Sort - Auto-apply `sort_by="last_update_date"` when using `updated_since` or `updated_in_past_hours` - Auto-apply `sort_by="pending_date"` when using PENDING listings with date filters - Ensures API returns properties in chronological order for efficient filtering - Users can still override by specifying different `sort_by` ### Early Termination - Pre-check page 1 before launching parallel pagination - If last property is outside time window, stop pagination immediately - Avoids 95%+ of unnecessary API calls for narrow time windows - Only applies when conditions guarantee correctness (date sort + time filter) ## Impact - 10x faster for narrow time windows (2-3 seconds vs 30+ seconds) - Fixes inefficiency where 10,000 properties fetched to return 10 matches - Maintains backward compatibility - falls back when optimization unavailable ## Changes - homeharvest/__init__.py: Auto-sort logic for time filters - homeharvest/core/scrapers/realtor/__init__.py: `_should_fetch_more_pages()` method + early termination in pagination - tests/test_realtor.py: Tests for optimization behavior - README.md: Updated parameters documentation with all 8 listing types 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 16:52:49 -08:00
Zachary Hampton	d88f781b47	- readme	2025-11-11 15:34:28 -08:00
Zachary Hampton	282064d8be	- readme	2025-11-11 15:21:08 -08:00
Zachary Hampton	3a5066466b	Merge pull request #141 from ZacharyHampton/feature/flexible-listing-type-and-last-update-date Add flexible listing_type support and last_update_date field	2025-11-11 15:33:27 -07:00
Zachary Hampton	a8926915b6	- readme	2025-11-11 14:33:06 -08:00
Zachary Hampton	f0c332128e	Fix test failures after date parameter consolidation - Fix validate_dates() to allow date_from or date_to individually - Update test_datetime_filtering to use date_from/date_to instead of datetime_from/datetime_to - Fix test_return_type zip code (66642 -> 85281) to ensure rental availability - Rewrite test_realtor_without_extra_details assertions to check specific fields - Add empty DataFrame check in test_last_status_change_date_field All 48 tests now passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 12:52:15 -08:00
Zachary Hampton	2326d8cee9	- delete cli & version bump	2025-11-11 12:20:29 -08:00
Zachary Hampton	c7a0d6d398	Consolidate date_from/date_to parameters - remove datetime_from/datetime_to Simplified the time filtering interface by consolidating datetime_from/datetime_to into date_from/date_to with automatic precision detection. Changes: - Remove datetime_from and datetime_to parameters (confusing to have both) - Update date_from/date_to to accept multiple formats: - Date strings: "2025-01-20" (day precision) - Datetime strings: "2025-01-20T14:30:00" (hour precision) - date objects: date(2025, 1, 20) (day precision) - datetime objects: datetime(2025, 1, 20, 9, 0) (hour precision) - Add detect_precision_and_convert() helper to automatically detect precision - Add date_from_precision and date_to_precision fields to track precision level - Update filtering logic to use precision fields instead of separate parameters - Update README to remove datetime_from/datetime_to examples - Update validation to accept ISO datetime strings Benefits: - Single, intuitive parameter name (date_from/date_to) - Automatic precision detection based on input format - Reduced API surface area and cognitive load - More Pythonic - accept multiple input types All changes are backward compatible for existing date_from/date_to string usage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 12:19:15 -08:00
Zachary Hampton	940b663011	Update README with new features - Add examples for multiple listing types - Add examples for filtering by last_update_date - Add examples for Pythonic datetime/timedelta usage - Update basic usage example with new parameters - Add sort_by last_update_date example 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 12:02:35 -08:00
Zachary Hampton	a6fe0d2675	Add last_update_date filtering and improve time interface DX Part A: Add last_update_date filtering (client-side) - Add updated_since parameter (accepts datetime object or ISO string) - Add updated_in_past_hours parameter (accepts int or timedelta) - Implement _apply_last_update_date_filter() method for client-side filtering - Add mutual exclusion validation for updated_* parameters Part B: Improve time interface DX - Accept datetime/timedelta objects for datetime_from, datetime_to - Accept timedelta objects for past_hours, past_days - Add type conversion helper functions in utils.py - Improve validation error messages with specific examples - Update validate_datetime to accept datetime objects Helper functions added: - convert_to_datetime_string() - Converts datetime objects to ISO strings - extract_timedelta_hours() - Extracts hours from timedelta objects - extract_timedelta_days() - Extracts days from timedelta objects - validate_last_update_filters() - Validates last_update_date parameters All changes are backward compatible - existing string/int parameters still work. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 12:00:15 -08:00
Zachary Hampton	3a0e91b876	Add flexible listing_type support and last_update_date field - Add support for str, list[str], and None as listing_type values - Single string: maintains backward compatibility (e.g., "for_sale") - List of strings: returns properties matching ANY status (OR logic) - None: returns all property types (omits status filter) - Expand ListingType enum with all GraphQL HomeStatus values - Add OFF_MARKET, NEW_COMMUNITY, OTHER, READY_TO_BUILD - Add last_update_date field support - Add to GraphQL query, Property model, and processors - Add to sort validation and datetime field sorting - Field description: "Last time the home was updated" - Update GraphQL query construction to support status arrays - Single type: status: for_sale - Multiple types: status: [for_sale, sold] - None: omit status parameter entirely - Update validation logic to handle new parameter types 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 11:28:35 -08:00
Zachary Hampton	4e6e144617	Fix exclude_pending and mls_only filters not working with raw return type When return_type="raw" was specified, the exclude_pending and mls_only parameters were ignored because these filters only existed in process_property(), which is bypassed for raw data returns. Changes: - Added _apply_raw_data_filters() method to handle client-side filtering for raw data - Applied the filter in search() method after sorting but before returning - Fixed exclude_pending to check flags.is_pending and flags.is_contingent - Fixed mls_only to check source.id (not mls.id which doesn't exist in raw data) - Added comprehensive tests for both filters with raw data Fixes #140 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-10 11:21:28 -08:00