[chore]: clean up

pull/31/head
Cullen Watson 2023-10-04 08:58:55 -05:00
parent f8c0dd766d
commit 51bde20c3c
8 changed files with 277 additions and 348 deletions

147
README.md
View File

@ -1,6 +1,6 @@
<img src="https://github.com/ZacharyHampton/HomeHarvest/assets/78247585/d1a2bf8b-09f5-4c57-b33a-0ada8a34f12d" width="400">
**HomeHarvest** is a simple, yet comprehensive, real estate scraping library.
**HomeHarvest** is a simple, yet comprehensive, real estate scraping library that extracts and formats data in the style of MLS listings.
[![Try with Replit](https://replit.com/badge?caption=Try%20with%20Replit)](https://replit.com/@ZacharyHampton/HomeHarvestDemo)
@ -11,10 +11,14 @@
Check out another project we wrote: ***[JobSpy](https://github.com/cullenwatson/JobSpy)** a Python package for job scraping*
## Features
## HomeHarvest Features
- Scrapes properties from **Zillow**, **Realtor.com** & **Redfin** simultaneously
- Aggregates the properties in a Pandas DataFrame
- **Source**: Fetches properties directly from **Realtor.com**.
- **Data Format**: Structures data to resemble MLS listings.
- **Export Flexibility**: Options to save as either CSV or Excel.
- **Usage Modes**:
- **CLI**: For users who prefer command-line operations.
- **Python**: For those who'd like to integrate scraping into their Python scripts.
[Video Guide for HomeHarvest](https://youtu.be/JnV7eR2Ve2o) - _updated for release v0.2.7_
@ -29,21 +33,6 @@ pip install homeharvest
## Usage
### Python
```py
from homeharvest import scrape_property
import pandas as pd
properties: pd.DataFrame = scrape_property(
location="85281",
listing_type="for_rent" # for_sale / sold
)
#: Note, to export to CSV or Excel, use properties.to_csv() or properties.to_excel().
print(properties)
```
### CLI
```
@ -55,7 +44,6 @@ positional arguments:
location Location to scrape (e.g., San Francisco, CA)
options:
-h, --help show this help message and exit
-l {for_sale,for_rent,sold}, --listing_type {for_sale,for_rent,sold}
Listing type to scrape
-o {excel,csv}, --output {excel,csv}
@ -72,104 +60,107 @@ options:
> homeharvest "San Francisco, CA" -l for_rent -o excel -f HomeHarvest
```
## Output
### Python
```py
>>> properties.head()
property_url site_name listing_type apt_min_price apt_max_price ...
0 https://www.redfin.com/AZ/Tempe/1003-W-Washing... redfin for_rent 1666.0 2750.0 ...
1 https://www.redfin.com/AZ/Tempe/VELA-at-Town-L... redfin for_rent 1665.0 3763.0 ...
2 https://www.redfin.com/AZ/Tempe/Camden-Tempe/a... redfin for_rent 1939.0 3109.0 ...
3 https://www.redfin.com/AZ/Tempe/Emerson-Park/a... redfin for_rent 1185.0 1817.0 ...
4 https://www.redfin.com/AZ/Tempe/Rio-Paradiso-A... redfin for_rent 1470.0 2235.0 ...
[5 rows x 41 columns]
from homeharvest import scrape_property
from datetime import datetime
# Generate filename based on current timestamp
current_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"output/{current_timestamp}.csv"
properties = scrape_property(
location="San Diego, CA",
listing_type="sold", # for_sale, for_rent
)
print(f"Number of properties: {len(properties)}")
properties.to_csv(filename, index=False)
```
### Parameters for `scrape_properties()`
## Output
```plaintext
>>> properties.head()
MLS MLS # Status Style ... COEDate LotSFApx PrcSqft Stories
0 SDCA 230018348 SOLD CONDOS ... 2023-10-03 290110 803 2
1 SDCA 230016614 SOLD TOWNHOMES ... 2023-10-03 None 838 3
2 SDCA 230016367 SOLD CONDOS ... 2023-10-03 30056 649 1
3 MRCA NDP2306335 SOLD SINGLE_FAMILY ... 2023-10-03 7519 661 2
4 SDCA 230014532 SOLD CONDOS ... 2023-10-03 None 752 1
[5 rows x 22 columns]
```
### Parameters for `scrape_property()`
```
Required
├── location (str): address in various formats e.g. just zip, full address, city/state, etc.
└── listing_type (enum): for_rent, for_sale, sold
Optional
├── site_name (list[enum], default=all three sites): zillow, realtor.com, redfin
├── proxy (str): in format 'http://user:pass@host:port' or [https, socks]
└── keep_duplicates (bool, default=False): whether to keep or remove duplicate properties based on address
├── radius_for_comps (float): Radius in miles to find comparable properties based on individual addresses.
├── sold_last_x_days (int): Number of past days to filter sold properties.
├── proxy (str): in format 'http://user:pass@host:port'
```
### Property Schema
```plaintext
Property
├── Basic Information:
│ ├── property_url (str)
├── site_name (enum): zillow, redfin, realtor.com
├── listing_type (enum): for_sale, for_rent, sold
└── property_type (enum): house, apartment, condo, townhouse, single_family, multi_family, building
├── mls (str)
├── mls_id (str)
└── status (str)
├── Address Details:
│ ├── street_address (str)
│ ├── street (str)
│ ├── unit (str)
│ ├── city (str)
│ ├── state (str)
│ ├── zip_code (str)
│ ├── unit (str)
│ └── country (str)
│ └── zip (str)
├── House for Sale Features:
│ ├── tax_assessed_value (int)
│ ├── lot_area_value (float)
│ ├── lot_area_unit (str)
│ ├── stories (int)
├── Property Description:
│ ├── style (str)
│ ├── beds (int)
│ ├── baths_full (int)
│ ├── baths_half (int)
│ ├── sqft (int)
│ ├── lot_sqft (int)
│ ├── sold_price (int)
│ ├── year_built (int)
│ └── price_per_sqft (int)
│ ├── garage (float)
│ └── stories (int)
├── Building for Sale and Apartment Details:
│ ├── bldg_name (str)
│ ├── beds_min (int)
│ ├── beds_max (int)
│ ├── baths_min (float)
│ ├── baths_max (float)
│ ├── sqft_min (int)
│ ├── sqft_max (int)
│ ├── price_min (int)
│ ├── price_max (int)
│ ├── area_min (int)
│ └── unit_count (int)
├── Property Listing Details:
│ ├── list_price (int)
│ ├── list_date (str)
│ ├── last_sold_date (str)
│ ├── prc_sqft (int)
│ └── hoa_fee (int)
├── Miscellaneous Details:
│ ├── mls_id (str)
│ ├── agent_name (str)
│ ├── img_src (str)
│ ├── description (str)
│ ├── status_text (str)
│ └── posted_time (str)
└── Location Details:
├── latitude (float)
└── longitude (float)
├── Location Details:
│ ├── latitude (float)
│ ├── longitude (float)
│ └── neighborhoods (str)
```
## Supported Countries for Property Scraping
* **Zillow**: contains listings in the **US** & **Canada**
* **Realtor.com**: mainly from the **US** but also has international listings
* **Redfin**: listings mainly in the **US**, **Canada**, & has expanded to some areas in **Mexico**
### Exceptions
The following exceptions may be raised when using HomeHarvest:
- `InvalidSite` - valid options: `zillow`, `redfin`, `realtor.com`
- `InvalidListingType` - valid options: `for_sale`, `for_rent`, `sold`
- `NoResultsFound` - no properties found from your input
- `GeoCoordsNotFound` - if Zillow scraper is not able to derive geo-coordinates from the location you input
## Frequently Asked Questions
---
**Q: Encountering issues with your queries?**
**A:** Try a single site and/or broaden the location. If problems persist, [submit an issue](https://github.com/ZacharyHampton/HomeHarvest/issues).
**Q: Encountering issues with your searches?**
**A:** Try to broaden the location. If problems persist, [submit an issue](https://github.com/ZacharyHampton/HomeHarvest/issues).
---
**Q: Received a Forbidden 403 response code?**
**A:** This indicates that you have been blocked by the real estate site for sending too many requests. Currently, **Zillow** is particularly aggressive with blocking. We recommend:
**A:** This indicates that you have been blocked by Realtor.com for sending too many requests. We recommend:
- Waiting a few seconds between requests.
- Trying a VPN to change your IP address.

View File

@ -1,5 +1,4 @@
import pandas as pd
from typing import Union
import concurrent.futures
from concurrent.futures import ThreadPoolExecutor
@ -7,7 +6,7 @@ from .core.scrapers import ScraperInput
from .utils import process_result, ordered_properties
from .core.scrapers.realtor import RealtorScraper
from .core.scrapers.models import ListingType, Property, SiteName
from .exceptions import InvalidSite, InvalidListingType
from .exceptions import InvalidListingType
_scrapers = {
@ -15,10 +14,7 @@ _scrapers = {
}
def _validate_input(site_name: str, listing_type: str) -> None:
if site_name.lower() not in _scrapers:
raise InvalidSite(f"Provided site, '{site_name}', does not exist.")
def _validate_input(listing_type: str) -> None:
if listing_type.upper() not in ListingType.__members__:
raise InvalidListingType(f"Provided listing type, '{listing_type}', does not exist.")
@ -27,7 +23,7 @@ def _scrape_single_site(location: str, site_name: str, listing_type: str, radius
"""
Helper function to scrape a single site.
"""
_validate_input(site_name, listing_type)
_validate_input(listing_type)
scraper_input = ScraperInput(
location=location,
@ -40,6 +36,7 @@ def _scrape_single_site(location: str, site_name: str, listing_type: str, radius
site = _scrapers[site_name.lower()](scraper_input)
results = site.search()
print(f"found {len(results)}")
properties_dfs = [process_result(result) for result in results]
if not properties_dfs:
@ -50,22 +47,19 @@ def _scrape_single_site(location: str, site_name: str, listing_type: str, radius
def scrape_property(
location: str,
#: site_name: Union[str, list[str]] = "realtor.com",
listing_type: str = "for_sale",
radius: float = None,
sold_last_x_days: int = None,
proxy: str = None,
) -> pd.DataFrame:
"""
Scrape property from various sites from a given location and listing type.
Scrape properties from Realtor.com based on a given location and listing type.
:param sold_last_x_days: Sold in last x days
:param radius: Radius in miles to find comparable properties on individual addresses
:param keep_duplicates:
:param proxy:
:param location: US Location (e.g. 'San Francisco, CA', 'Cook County, IL', '85281', '2530 Al Lipscomb Way')
:param site_name: Site name or list of site names (e.g. ['realtor.com', 'zillow'], 'redfin')
:param listing_type: Listing type (e.g. 'for_sale', 'for_rent', 'sold')
:param listing_type: Listing type (e.g. 'for_sale', 'for_rent', 'sold'). Default is 'for_sale'.
:param radius: Radius in miles to find comparable properties on individual addresses. Optional.
:param sold_last_x_days: Number of past days to filter sold properties. Optional.
:param proxy: Proxy IP address to be used for scraping. Optional.
:returns: pd.DataFrame containing properties
"""
site_name = "realtor.com"

View File

@ -38,7 +38,8 @@ def main():
parser.add_argument(
"-r",
"--radius",
"--sold-properties-radius",
dest="sold_properties_radius", # This makes sure the parsed argument is stored as radius_for_comps in args
type=float,
default=None,
help="Get comparable properties within _ (eg. 0.0) miles. Only applicable for individual addresses."
@ -46,7 +47,7 @@ def main():
args = parser.parse_args()
result = scrape_property(args.location, args.listing_type, proxy=args.proxy)
result = scrape_property(args.location, args.listing_type, radius_for_comps=args.radius_for_comps, proxy=args.proxy)
if not args.filename:
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")

View File

@ -32,39 +32,34 @@ class Address:
@dataclass
class Agent:
name: str
phone: str | None = None
email: str | None = None
class Description:
style: str | None = None
beds: int | None = None
baths_full: int | None = None
baths_half: int | None = None
sqft: int | None = None
lot_sqft: int | None = None
sold_price: int | None = None
year_built: int | None = None
garage: float | None = None
stories: int | None = None
@dataclass
class Property:
property_url: str | None = None
property_url: str
mls: str | None = None
mls_id: str | None = None
status: str | None = None
style: str | None = None
beds: int | None = None
baths_full: int | None = None
baths_half: int | None = None
list_price: int | None = None
list_date: str | None = None
sold_price: int | None = None
last_sold_date: str | None = None
prc_sqft: float | None = None
est_sf: int | None = None
lot_sf: int | None = None
hoa_fee: int | None = None
address: Address | None = None
yr_blt: int | None = None
list_price: int | None = None
list_date: str | None = None
last_sold_date: str | None = None
prc_sqft: int | None = None
hoa_fee: int | None = None
description: Description | None = None
latitude: float | None = None
longitude: float | None = None
stories: int | None = None
prkg_gar: float | None = None
neighborhoods: Optional[str] = None

View File

@ -2,38 +2,26 @@
homeharvest.realtor.__init__
~~~~~~~~~~~~
This module implements the scraper for relator.com
This module implements the scraper for realtor.com
"""
from ..models import Property, Address, ListingType
from typing import Dict, Union, Optional
from concurrent.futures import ThreadPoolExecutor, as_completed
from .. import Scraper
from ....exceptions import NoResultsFound
from concurrent.futures import ThreadPoolExecutor, as_completed
from ..models import Property, Address, ListingType, Description
class RealtorScraper(Scraper):
SEARCH_URL = "https://www.realtor.com/api/v1/rdc_search_srp?client_id=rdc-search-new-communities&schema=vesta"
PROPERTY_URL = "https://www.realtor.com/realestateandhomes-detail/"
ADDRESS_AUTOCOMPLETE_URL = "https://parser-external.geo.moveaws.com/suggest"
def __init__(self, scraper_input):
self.counter = 1
super().__init__(scraper_input)
self.search_url = (
"https://www.realtor.com/api/v1/rdc_search_srp?client_id=rdc-search-new-communities&schema=vesta"
)
def handle_location(self):
headers = {
"authority": "parser-external.geo.moveaws.com",
"accept": "*/*",
"accept-language": "en-US,en;q=0.9",
"origin": "https://www.realtor.com",
"referer": "https://www.realtor.com/",
"sec-ch-ua": '"Chromium";v="116", "Not)A;Brand";v="24", "Google Chrome";v="116"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Windows"',
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "cross-site",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36",
}
params = {
"input": self.location,
"client_id": self.listing_type.value.lower().replace("_", "-"),
@ -42,9 +30,8 @@ class RealtorScraper(Scraper):
}
response = self.session.get(
"https://parser-external.geo.moveaws.com/suggest",
self.ADDRESS_AUTOCOMPLETE_URL,
params=params,
headers=headers,
)
response_json = response.json()
@ -70,22 +57,19 @@ class RealtorScraper(Scraper):
stories
}
address {
address_validation_code
city
country
county
line
postal_code
state_code
street_direction
street_name
street_number
street_name
street_suffix
street_post_direction
unit_value
unit
unit_descriptor
zip
city
state_code
postal_code
location {
coordinate {
lat
lon
}
}
}
basic {
baths
@ -113,25 +97,24 @@ class RealtorScraper(Scraper):
"variables": variables,
}
response = self.session.post(self.search_url, json=payload)
response = self.session.post(self.SEARCH_URL, json=payload)
response_json = response.json()
property_info = response_json["data"]["property"]
return [
Property(
property_url="https://www.realtor.com/realestateandhomes-detail/"
+ property_info["details"]["permalink"],
stories=property_info["details"]["stories"],
mls_id=property_id,
property_url=f"{self.PROPERTY_URL}{property_info['details']['permalink']}",
address=self._parse_address(property_info, search_type="handle_address"),
description=self._parse_description(property_info)
)
]
def general_search(self, variables: dict, search_type: str, return_total: bool = False) -> list[Property] | int:
def general_search(self, variables: dict, search_type: str) -> Dict[str, Union[int, list[Property]]]:
"""
Handles a location area & returns a list of properties
"""
results_query = """{
count
total
@ -141,86 +124,87 @@ class RealtorScraper(Scraper):
status
last_sold_price
last_sold_date
hoa {
fee
}
list_price
price_per_sqft
description {
sqft
beds
baths_full
baths_half
beds
lot_sqft
sqft
sold_price
year_built
garage
sold_price
type
sub_type
name
stories
}
source {
raw {
area
status
style
}
last_update_date
contract_date
id
listing_id
name
type
listing_href
community_id
management_id
corporation_id
subdivision_status
spec_id
plan_id
tier_rank
feed_type
}
hoa {
fee
}
location {
address {
street_number
street_name
street_suffix
unit
city
country
line
postal_code
state_code
state
postal_code
coordinate {
lon
lat
}
street_direction
street_name
street_number
street_post_direction
street_suffix
unit
}
neighborhoods {
name
}
}
list_price
price_per_sqft
style_category_tags {
exterior
}
source {
id
}
}
}
}"""
sold_date_param = ('sold_date: { min: "$today-%sD" }' % self.sold_last_x_days
if self.listing_type == ListingType.SOLD and self.sold_last_x_days is not None
if self.listing_type == ListingType.SOLD and self.sold_last_x_days
else "")
sort_param = ('sort: [{ field: sold_date, direction: desc }]'
if self.listing_type == ListingType.SOLD
else 'sort: [{ field: list_date, direction: desc }]')
if search_type == "area":
if search_type == "comps":
print('general - comps')
query = (
"""query Property_search(
$coordinates: [Float]!
$radius: String!
$offset: Int!,
) {
property_search(
query: {
nearby: {
coordinates: $coordinates
radius: $radius
}
status: %s
%s
}
%s
limit: 200
offset: $offset
) %s""" % (
self.listing_type.value.lower(),
sold_date_param,
sort_param,
results_query
)
)
else:
print('general - not comps')
query = (
"""query Home_search(
$city: String,
@ -238,60 +222,27 @@ class RealtorScraper(Scraper):
status: %s
%s
}
%s
limit: 200
offset: $offset
) %s"""
% (
self.listing_type.value.lower(),
sold_date_param,
sort_param,
results_query
)
)
elif search_type == "comp_address":
query = (
"""query Property_search(
$coordinates: [Float]!
$radius: String!
$offset: Int!,
) {
property_search(
query: {
nearby: {
coordinates: $coordinates
radius: $radius
}
%s
}
limit: 200
offset: $offset
) %s""" % (sold_date_param, results_query))
else:
query = (
"""query Property_search(
$property_id: [ID]!
$offset: Int!,
) {
property_search(
query: {
property_id: $property_id
%s
}
limit: 200
offset: $offset
) %s""" % (sold_date_param, results_query))
payload = {
"query": query,
"variables": variables,
}
response = self.session.post(self.search_url, json=payload)
response = self.session.post(self.SEARCH_URL, json=payload)
response.raise_for_status()
response_json = response.json()
search_key = "home_search" if search_type == "area" else "property_search"
if return_total:
return response_json["data"][search_key]["total"]
search_key = "property_search" if search_type == "comps" else "home_search"
properties: list[Property] = []
@ -303,7 +254,7 @@ class RealtorScraper(Scraper):
or response_json["data"][search_key] is None
or "results" not in response_json["data"][search_key]
):
return []
return {"total": 0, "properties": []}
for result in response_json["data"][search_key]["results"]:
self.counter += 1
@ -312,16 +263,90 @@ class RealtorScraper(Scraper):
if "source" in result and isinstance(result["source"], dict)
else None
)
mls_id = (
result["source"].get("listing_id")
if "source" in result and isinstance(result["source"], dict)
else None
)
if not mls_id:
if not mls:
continue
# not type
able_to_get_lat_long = result and result.get("location") and result["location"].get("address") and result["location"]["address"].get("coordinate")
realty_property = Property(
mls=mls,
mls_id=result["source"].get("listing_id") if "source" in result and isinstance(result["source"], dict) else None,
property_url=f"{self.PROPERTY_URL}{result['property_id']}",
status=result["status"].upper(),
list_price=result["list_price"],
list_date=result["list_date"].split("T")[0] if result.get("list_date") else None,
prc_sqft=result.get("price_per_sqft"),
last_sold_date=result.get("last_sold_date"),
hoa_fee=result["hoa"]["fee"] if result.get("hoa") and isinstance(result["hoa"], dict) else None,
latitude=result["location"]["address"]["coordinate"].get("lat") if able_to_get_lat_long else None,
longitude=result["location"]["address"]["coordinate"].get("lon") if able_to_get_lat_long else None,
address=self._parse_address(result, search_type="general_search"),
neighborhoods=self._parse_neighborhoods(result),
description=self._parse_description(result)
)
properties.append(realty_property)
# print(response_json["data"]["property_search"], variables["offset"])
# print(response_json["data"]["home_search"]["total"], variables["offset"])
return {
"total": response_json["data"][search_key]["total"],
"properties": properties,
}
def search(self):
location_info = self.handle_location()
location_type = location_info["area_type"]
search_variables = {
"offset": 0,
}
search_type = "comps" if self.radius and location_type == "address" else "area"
print(search_type)
if location_type == "address":
if not self.radius: #: single address search, non comps
property_id = location_info["mpr_id"]
search_variables |= {"property_id": property_id}
return self.handle_address(property_id)
else: #: general search, comps (radius)
coordinates = list(location_info["centroid"].values())
search_variables |= {
"coordinates": coordinates,
"radius": "{}mi".format(self.radius),
}
else: #: general search, location
search_variables |= {
"city": location_info.get("city"),
"county": location_info.get("county"),
"state_code": location_info.get("state_code"),
"postal_code": location_info.get("postal_code"),
}
result = self.general_search(search_variables, search_type=search_type)
total = result["total"]
homes = result["properties"]
with ThreadPoolExecutor(max_workers=10) as executor:
futures = [
executor.submit(
self.general_search,
variables=search_variables | {"offset": i},
search_type=search_type,
)
for i in range(200, min(total, 10000), 200)
]
for future in as_completed(futures):
homes.extend(future.result()["properties"])
return homes
@staticmethod
def _parse_neighborhoods(result: dict) -> Optional[str]:
neighborhoods_list = []
neighborhoods = result["location"].get("neighborhoods", [])
@ -331,103 +356,38 @@ class RealtorScraper(Scraper):
if name:
neighborhoods_list.append(name)
neighborhoods_str = (
", ".join(neighborhoods_list) if neighborhoods_list else None
)
return ", ".join(neighborhoods_list) if neighborhoods_list else None
able_to_get_lat_long = result and result.get("location") and result["location"].get("address") and result["location"]["address"].get("coordinate")
realty_property = Property(
property_url="https://www.realtor.com/realestateandhomes-detail/"
+ result["property_id"],
mls=mls,
mls_id=mls_id,
status=result["status"].upper(),
style=result["description"]["type"].upper(),
beds=result["description"]["beds"],
baths_full=result["description"]["baths_full"],
baths_half=result["description"]["baths_half"],
est_sf=result["description"]["sqft"],
lot_sf=result["description"]["lot_sqft"],
list_price=result["list_price"],
list_date=result["list_date"].split("T")[0]
if result["list_date"]
else None,
sold_price=result["description"]["sold_price"],
prc_sqft=result["price_per_sqft"],
last_sold_date=result["last_sold_date"],
hoa_fee=result["hoa"]["fee"] if result.get("hoa") and isinstance(result["hoa"], dict) else None,
address=Address(
@staticmethod
def _parse_address(result: dict, search_type):
if search_type == "general_search":
return Address(
street=f"{result['location']['address']['street_number']} {result['location']['address']['street_name']} {result['location']['address']['street_suffix']}",
unit=result["location"]["address"]["unit"],
city=result["location"]["address"]["city"],
state=result["location"]["address"]["state_code"],
zip=result["location"]["address"]["postal_code"],
),
yr_blt=result["description"]["year_built"],
latitude=result["location"]["address"]["coordinate"].get("lat") if able_to_get_lat_long else None,
longitude=result["location"]["address"]["coordinate"].get("lon") if able_to_get_lat_long else None,
prkg_gar=result["description"]["garage"],
stories=result["description"]["stories"],
neighborhoods=neighborhoods_str,
)
properties.append(realty_property)
return properties
def search(self):
location_info = self.handle_location()
location_type = location_info["area_type"]
is_for_comps = self.radius is not None and location_type == "address"
offset = 0
search_variables = {
"offset": offset,
}
search_type = "comp_address" if is_for_comps \
else "address" if location_type == "address" and not is_for_comps \
else "area"
if location_type == "address" and not is_for_comps: #: single address search, non comps
property_id = location_info["mpr_id"]
search_variables = search_variables | {"property_id": property_id}
general_search = self.general_search(search_variables, search_type)
if general_search:
return general_search
else:
return self.handle_address(property_id) #: TODO: support single address search for query by property address (can go from property -> listing to get better data)
elif not is_for_comps: #: area search
search_variables = search_variables | {
"city": location_info.get("city"),
"county": location_info.get("county"),
"state_code": location_info.get("state_code"),
"postal_code": location_info.get("postal_code"),
}
else: #: comps search
coordinates = list(location_info["centroid"].values())
search_variables = search_variables | {
"coordinates": coordinates,
"radius": "{}mi".format(self.radius),
}
total = self.general_search(search_variables, return_total=True, search_type=search_type)
homes = []
with ThreadPoolExecutor(max_workers=10) as executor:
futures = [
executor.submit(
self.general_search,
variables=search_variables | {"offset": i},
return_total=False,
search_type=search_type,
return Address(
street=f"{result['address']['street_number']} {result['address']['street_name']} {result['address']['street_suffix']}",
unit=result['address']['unit'],
city=result['address']['city'],
state=result['address']['state_code'],
zip=result['address']['postal_code'],
)
for i in range(0, total, 200)
]
for future in as_completed(futures):
homes.extend(future.result())
return homes
@staticmethod
def _parse_description(result: dict) -> Description:
description_data = result.get("description", {})
return Description(
style=description_data.get("type", "").upper(),
beds=description_data.get("beds"),
baths_full=description_data.get("baths_full"),
baths_half=description_data.get("baths_half"),
sqft=description_data.get("sqft"),
lot_sqft=description_data.get("lot_sqft"),
sold_price=description_data.get("sold_price"),
year_built=description_data.get("year_built"),
garage=description_data.get("garage"),
stories=description_data.get("stories"),
)

View File

@ -1,18 +1,6 @@
class InvalidSite(Exception):
"""Raised when a provided site is does not exist."""
class InvalidListingType(Exception):
"""Raised when a provided listing type is does not exist."""
class NoResultsFound(Exception):
"""Raised when no results are found for the given location"""
class GeoCoordsNotFound(Exception):
"""Raised when no property is found for the given address"""
class SearchTooBroad(Exception):
"""Raised when the search is too broad"""

View File

@ -39,7 +39,6 @@ def process_result(result: Property) -> pd.DataFrame:
prop_data["MLS"] = prop_data["mls"]
prop_data["MLS #"] = prop_data["mls_id"]
prop_data["Status"] = prop_data["status"]
prop_data["Style"] = prop_data["style"]
if "address" in prop_data:
address_data = prop_data["address"]
@ -49,26 +48,27 @@ def process_result(result: Property) -> pd.DataFrame:
prop_data["State"] = address_data.state
prop_data["Zip"] = address_data.zip
prop_data["Community"] = prop_data["neighborhoods"]
prop_data["Beds"] = prop_data["beds"]
prop_data["FB"] = prop_data["baths_full"]
prop_data["NumHB"] = prop_data["baths_half"]
prop_data["EstSF"] = prop_data["est_sf"]
prop_data["ListPrice"] = prop_data["list_price"]
prop_data["Lst Date"] = prop_data["list_date"]
prop_data["Sold Price"] = prop_data["sold_price"]
prop_data["COEDate"] = prop_data["last_sold_date"]
prop_data["LotSFApx"] = prop_data["lot_sf"]
prop_data["PrcSqft"] = prop_data["prc_sqft"]
prop_data["HOAFee"] = prop_data["hoa_fee"]
if prop_data.get("prc_sqft") is not None:
prop_data["PrcSqft"] = round(prop_data["prc_sqft"], 2)
description = result.description
prop_data["Style"] = description.style
prop_data["Beds"] = description.beds
prop_data["FB"] = description.baths_full
prop_data["NumHB"] = description.baths_half
prop_data["EstSF"] = description.sqft
prop_data["LotSFApx"] = description.lot_sqft
prop_data["Sold Price"] = description.sold_price
prop_data["YrBlt"] = description.year_built
prop_data["PrkgGar"] = description.garage
prop_data["Stories"] = description.stories
prop_data["YrBlt"] = prop_data["yr_blt"]
prop_data["LATITUDE"] = prop_data["latitude"]
prop_data["LONGITUDE"] = prop_data["longitude"]
prop_data["Stories"] = prop_data["stories"]
prop_data["PrkgGar"] = prop_data["prkg_gar"]
prop_data["Community"] = prop_data["neighborhoods"]
properties_df = pd.DataFrame([prop_data])
properties_df = properties_df.reindex(columns=ordered_properties)