Compare commits

..

22 Commits

Author SHA1 Message Date
Cullen
be20258535 fix: redfin 2024-04-04 17:05:41 -05:00
Cullen
d05bc5d79f fix: redfin 2024-04-04 17:05:00 -05:00
Zachary Hampton
01c53f9399 - redfin bug fix
- add recent features for issues
2023-09-28 15:19:43 -07:00
Zachary Hampton
9200c17df2 - version bump 2023-09-23 10:55:50 -07:00
Zachary Hampton
9e262bf214 Merge remote-tracking branch 'origin/master' 2023-09-23 10:55:29 -07:00
Zachary Hampton
82f78fb578 - zillow bug fix 2023-09-23 10:55:14 -07:00
Cullen Watson
b0e40df00a Update pyproject.toml 2023-09-22 09:51:24 -05:00
Cullen Watson
2fc40e0dad fix: cookie 2023-09-22 09:47:37 -05:00
Zachary Hampton
254f3a68a1 - redfin bug fix 2023-09-21 18:54:03 -07:00
Zachary Hampton
05713c76b0 - redfin bug fix
- .get
2023-09-21 11:27:12 -07:00
Cullen Watson
9120cc9bfe fix: remove line 2023-09-21 13:10:14 -05:00
Cullen Watson
eee4b19515 Merge branch 'master' of https://github.com/ZacharyHampton/HomeHarvest 2023-09-21 13:06:15 -05:00
Cullen Watson
c25961eded fix: KeyEror : [minBaths] 2023-09-21 13:06:06 -05:00
Zachary Hampton
0884c3d163 Update README.md 2023-09-21 09:55:29 -07:00
Cullen Watson
8f37bfdeb8 chore: version number 2023-09-21 11:19:23 -05:00
Cullen Watson
48c2338276 fix: keyerror 2023-09-21 11:18:37 -05:00
Cullen Watson
f58a1f4a74 docs: tryhomeharvest.com 2023-09-21 10:57:11 -05:00
Zachary Hampton
4cef926d7d Merge pull request #14 from ZacharyHampton/keep_duplicates_flag
Keep duplicates flag
2023-09-20 20:27:08 -07:00
Cullen Watson
e82eeaa59f docs: add keep duplicates flag 2023-09-20 20:25:50 -05:00
Cullen Watson
644f16b25b feat: keep duplicates flag 2023-09-20 20:24:18 -05:00
Cullen Watson
e9ddc6df92 docs: update tutorial vid for release v0.2.7 2023-09-19 22:18:49 -05:00
Cullen Watson
50fb1c391d docs: update property schema 2023-09-19 21:35:37 -05:00
12 changed files with 169 additions and 87 deletions

View File

@@ -4,13 +4,19 @@
[![Try with Replit](https://replit.com/badge?caption=Try%20with%20Replit)](https://replit.com/@ZacharyHampton/HomeHarvestDemo)
\
**Not technical?** Try out the web scraping tool on our site at [tryhomeharvest.com](https://tryhomeharvest.com).
*Looking to build a data-focused software product?* **[Book a call](https://calendly.com/zachary-products/15min)** *to work with us.*
Check out another project we wrote: ***[JobSpy](https://github.com/cullenwatson/JobSpy)** a Python package for job scraping*
## Features
- Scrapes properties from **Zillow**, **Realtor.com** & **Redfin** simultaneously
- Aggregates the properties in a Pandas DataFrame
[Video Guide for HomeHarvest](https://www.youtube.com/watch?v=HCoHoiJdWQY)
[Video Guide for HomeHarvest](https://youtu.be/JnV7eR2Ve2o) - _updated for release v0.2.7_
![homeharvest](https://github.com/ZacharyHampton/HomeHarvest/assets/78247585/b3d5d727-e67b-4a9f-85d8-1e65fd18620a)
@@ -37,6 +43,7 @@ By default:
- The `-o` or `--output` default format is `excel`. Options are `csv` or `excel`.
- If `-f` or `--filename` is left blank, the default is `HomeHarvest_<current_timestamp>`.
- If `-p` or `--proxy` is not provided, the scraper uses the local IP.
- Use `-k` or `--keep_duplicates` to keep duplicate properties based on address. If not provided, duplicates will be removed.
### Python
```py
@@ -71,8 +78,9 @@ Required
├── location (str): address in various formats e.g. just zip, full address, city/state, etc.
└── listing_type (enum): for_rent, for_sale, sold
Optional
├── site_name (List[enum], default=all three sites): zillow, realtor.com, redfin
├── site_name (list[enum], default=all three sites): zillow, realtor.com, redfin
├── proxy (str): in format 'http://user:pass@host:port' or [https, socks]
└── keep_duplicates (bool, default=False): whether to keep or remove duplicate properties based on address
```
### Property Schema
@@ -81,7 +89,7 @@ Property
├── Basic Information:
│ ├── property_url (str)
│ ├── site_name (enum): zillow, redfin, realtor.com
│ ├── listing_type (enum: ListingType)
│ ├── listing_type (enum): for_sale, for_rent, sold
│ └── property_type (enum): house, apartment, condo, townhouse, single_family, multi_family, building
├── Address Details:
@@ -92,45 +100,38 @@ Property
│ ├── unit (str)
│ └── country (str)
├── Property Features:
│ ├── price (int)
├── House for Sale Features:
│ ├── tax_assessed_value (int)
│ ├── currency (str)
│ ├── square_feet (int)
│ ├── beds (int)
│ ├── baths (float)
│ ├── lot_area_value (float)
│ ├── lot_area_unit (str)
│ ├── stories (int)
── year_built (int)
── year_built (int)
│ └── price_per_sqft (int)
├── Building for Sale and Apartment Details:
│ ├── bldg_name (str)
│ ├── beds_min (int)
│ ├── beds_max (int)
│ ├── baths_min (float)
│ ├── baths_max (float)
│ ├── sqft_min (int)
│ ├── sqft_max (int)
│ ├── price_min (int)
│ ├── price_max (int)
│ ├── area_min (int)
│ └── unit_count (int)
├── Miscellaneous Details:
│ ├── price_per_sqft (int)
│ ├── mls_id (str)
│ ├── agent_name (str)
│ ├── img_src (str)
│ ├── description (str)
│ ├── status_text (str)
── latitude (float)
│ ├── longitude (float)
│ └── posted_time (str) [Only for Zillow]
── posted_time (str)
── Building Details (for property_type: building):
├── bldg_name (str)
── bldg_unit_count (int)
│ ├── bldg_min_beds (int)
│ ├── bldg_min_baths (float)
│ └── bldg_min_area (int)
└── Apartment Details (for property type: apartment):
├── apt_min_beds: int
├── apt_max_beds: int
├── apt_min_baths: float
├── apt_max_baths: float
├── apt_min_price: int
├── apt_max_price: int
├── apt_min_sqft: int
├── apt_max_sqft: int
── Location Details:
├── latitude (float)
── longitude (float)
```
## Supported Countries for Property Scraping
@@ -144,7 +145,7 @@ The following exceptions may be raised when using HomeHarvest:
- `InvalidSite` - valid options: `zillow`, `redfin`, `realtor.com`
- `InvalidListingType` - valid options: `for_sale`, `for_rent`, `sold`
- `NoResultsFound` - no properties found from your input
- `GeoCoordsNotFound` - if Zillow scraper is not able to create geo-coordinates from the location you input
- `GeoCoordsNotFound` - if Zillow scraper is not able to derive geo-coordinates from the location you input
## Frequently Asked Questions

11
example.py Normal file
View File

@@ -0,0 +1,11 @@
from homeharvest import scrape_property
import pandas as pd
properties: pd.DataFrame = scrape_property(
site_name=["redfin"],
location="85281",
listing_type="for_rent" # for_sale / sold
)
print(properties)
properties.to_csv('properties.csv', index=False)

View File

@@ -57,6 +57,10 @@ def _get_ordered_properties(result: Property) -> list[str]:
"stories",
"year_built",
"agent_name",
"agent_phone",
"agent_email",
"days_on_market",
"sold_date",
"mls_id",
"img_src",
"latitude",
@@ -84,6 +88,18 @@ def _process_result(result: Property) -> pd.DataFrame:
del prop_data["address"]
if "agent" in prop_data and prop_data["agent"] is not None:
agent_data = prop_data["agent"]
prop_data["agent_name"] = agent_data.name
prop_data["agent_phone"] = agent_data.phone
prop_data["agent_email"] = agent_data.email
del prop_data["agent"]
else:
prop_data["agent_name"] = None
prop_data["agent_phone"] = None
prop_data["agent_email"] = None
properties_df = pd.DataFrame([prop_data])
properties_df = properties_df[_get_ordered_properties(result)]
@@ -119,6 +135,7 @@ def scrape_property(
site_name: Union[str, list[str]] = None,
listing_type: str = "for_sale",
proxy: str = None,
keep_duplicates: bool = False
) -> pd.DataFrame:
"""
Scrape property from various sites from a given location and listing type.
@@ -165,5 +182,6 @@ def scrape_property(
if col not in final_df.columns:
final_df[col] = None
final_df = final_df.drop_duplicates(subset=columns_to_track, keep="first")
if not keep_duplicates:
final_df = final_df.drop_duplicates(subset=columns_to_track, keep="first")
return final_df

View File

@@ -42,11 +42,18 @@ def main():
help="Name of the output file (without extension)",
)
parser.add_argument(
"-k",
"--keep_duplicates",
action="store_true",
help="Keep duplicate properties based on address"
)
parser.add_argument("-p", "--proxy", type=str, default=None, help="Proxy to use for scraping")
args = parser.parse_args()
result = scrape_property(args.location, args.site_name, args.listing_type, proxy=args.proxy)
result = scrape_property(args.location, args.site_name, args.listing_type, proxy=args.proxy, keep_duplicates=args.keep_duplicates)
if not args.filename:
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")

View File

@@ -17,6 +17,7 @@ class Scraper:
self.listing_type = scraper_input.listing_type
self.session = requests.Session()
self.session.headers.update({"user-agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36'})
if scraper_input.proxy:
proxy_url = scraper_input.proxy
proxies = {"http": proxy_url, "https": proxy_url}

View File

@@ -1,6 +1,7 @@
from dataclasses import dataclass
from enum import Enum
from typing import Tuple
from datetime import datetime
class SiteName(Enum):
@@ -64,6 +65,13 @@ class Address:
zip_code: str | None = None
@dataclass
class Agent:
name: str
phone: str | None = None
email: str | None = None
@dataclass
class Property:
property_url: str
@@ -81,11 +89,11 @@ class Property:
price_per_sqft: int | None = None
mls_id: str | None = None
agent_name: str | None = None
agent: Agent | None = None
img_src: str | None = None
description: str | None = None
status_text: str | None = None
posted_time: str | None = None
posted_time: datetime | None = None
# building for sale
bldg_name: str | None = None
@@ -107,3 +115,6 @@ class Property:
latitude: float | None = None
longitude: float | None = None
sold_date: datetime | None = None
days_on_market: int | None = None

View File

@@ -8,8 +8,9 @@ import json
from typing import Any
from .. import Scraper
from ....utils import parse_address_two, parse_address_one
from ..models import Property, Address, PropertyType, ListingType, SiteName
from ....exceptions import NoResultsFound
from ..models import Property, Address, PropertyType, ListingType, SiteName, Agent
from ....exceptions import NoResultsFound, SearchTooBroad
from datetime import datetime
class RedfinScraper(Scraper):
@@ -30,6 +31,8 @@ class RedfinScraper(Scraper):
return "6" #: city
elif match_type == "1":
return "address" #: address, needs to be handled differently
elif match_type == "11":
return "state"
if "exactMatch" not in response_json["payload"]:
raise NoResultsFound("No results found for location: {}".format(self.location))
@@ -74,6 +77,8 @@ class RedfinScraper(Scraper):
else:
lot_size = lot_size_data
lat_long = get_value("latLong")
return Property(
site_name=self.site_name,
listing_type=self.listing_type,
@@ -88,15 +93,20 @@ class RedfinScraper(Scraper):
sqft_min=get_value("sqFt"),
sqft_max=get_value("sqFt"),
stories=home["stories"] if "stories" in home else None,
agent_name=get_value("listingAgent"),
agent=Agent( #: listingAgent, some have sellingAgent as well
name=home['listingAgent'].get('name') if 'listingAgent' in home else None,
phone=home['listingAgent'].get('phone') if 'listingAgent' in home else None,
),
description=home["listingRemarks"] if "listingRemarks" in home else None,
year_built=get_value("yearBuilt") if not single_search else home["yearBuilt"],
year_built=get_value("yearBuilt") if not single_search else home.get("yearBuilt"),
lot_area_value=lot_size,
property_type=PropertyType.from_int_code(home.get("propertyType")),
price_per_sqft=get_value("pricePerSqFt"),
price_per_sqft=get_value("pricePerSqFt") if type(home.get("pricePerSqFt")) != int else home.get("pricePerSqFt"),
mls_id=get_value("mlsId"),
latitude=home["latLong"]["latitude"] if "latLong" in home and "latitude" in home["latLong"] else None,
longitude=home["latLong"]["longitude"] if "latLong" in home and "longitude" in home["latLong"] else None,
latitude=lat_long.get('latitude') if lat_long else None,
longitude=lat_long.get('longitude') if lat_long else None,
sold_date=datetime.fromtimestamp(home['soldDate'] / 1000) if 'soldDate' in home else None,
days_on_market=get_value("dom")
)
def _handle_rentals(self, region_id, region_type):
@@ -183,7 +193,7 @@ class RedfinScraper(Scraper):
),
property_url="https://www.redfin.com{}".format(building["url"]),
listing_type=self.listing_type,
unit_count=building["numUnitsForSale"],
unit_count=building.get("numUnitsForSale"),
)
def handle_address(self, home_id: str):
@@ -207,6 +217,9 @@ class RedfinScraper(Scraper):
def search(self):
region_id, region_type = self._handle_location()
if region_type == "state":
raise SearchTooBroad("State searches are not supported, please use a more specific location.")
if region_type == "address":
home_id = region_id
return self.handle_address(home_id)
@@ -220,7 +233,14 @@ class RedfinScraper(Scraper):
url = f"https://www.redfin.com/stingray/api/gis?al=1&region_id={region_id}&region_type={region_type}&sold_within_days=30&num_homes=100000"
response = self.session.get(url)
response_json = json.loads(response.text.replace("{}&&", ""))
homes = [self._parse_home(home) for home in response_json["payload"]["homes"]] + [
self._parse_building(building) for building in response_json["payload"]["buildings"].values()
]
return homes
if "payload" in response_json:
homes_list = response_json["payload"].get("homes", [])
buildings_list = response_json["payload"].get("buildings", {}).values()
homes = [self._parse_home(home) for home in homes_list] + [
self._parse_building(building) for building in buildings_list
]
return homes
else:
return []

View File

@@ -9,12 +9,13 @@ import json
from .. import Scraper
from ....utils import parse_address_one, parse_address_two
from ....exceptions import GeoCoordsNotFound, NoResultsFound
from ..models import Property, Address, ListingType, PropertyType
from ..models import Property, Address, ListingType, PropertyType, Agent
class ZillowScraper(Scraper):
def __init__(self, scraper_input):
super().__init__(scraper_input)
self.cookies = None
if not self.is_plausible_location(self.location):
raise NoResultsFound("Invalid location input: {}".format(self.location))
@@ -135,6 +136,7 @@ class ZillowScraper(Scraper):
}
resp = self.session.put(url, headers=self._get_headers(), json=payload)
resp.raise_for_status()
self.cookies = resp.cookies
a = resp.json()
return self._parse_properties(resp.json())
@@ -147,26 +149,26 @@ class ZillowScraper(Scraper):
if "hdpData" in result:
home_info = result["hdpData"]["homeInfo"]
address_data = {
"address_one": parse_address_one(home_info["streetAddress"])[0],
"address_one": parse_address_one(home_info.get("streetAddress"))[0],
"address_two": parse_address_two(home_info["unit"]) if "unit" in home_info else "#",
"city": home_info["city"],
"state": home_info["state"],
"zip_code": home_info["zipcode"],
"city": home_info.get("city"),
"state": home_info.get("state"),
"zip_code": home_info.get("zipcode"),
}
property_obj = Property(
site_name=self.site_name,
address=Address(**address_data),
property_url=f"https://www.zillow.com{result['detailUrl']}",
tax_assessed_value=int(home_info["taxAssessedValue"]) if "taxAssessedValue" in home_info else None,
property_type=PropertyType(home_info["homeType"]),
property_type=PropertyType(home_info.get("homeType")),
listing_type=ListingType(
home_info["statusType"] if "statusType" in home_info else self.listing_type
),
status_text=result.get("statusText"),
posted_time=result["variableData"]["text"]
posted_time=result["variableData"]["text"] #: TODO: change to datetime
if "variableData" in result
and "text" in result["variableData"]
and result["variableData"]["type"] == "TIME_ON_INFO"
and "text" in result["variableData"]
and result["variableData"]["type"] == "TIME_ON_INFO"
else None,
price_min=home_info.get("price"),
price_max=home_info.get("price"),
@@ -198,18 +200,17 @@ class ZillowScraper(Scraper):
site_name=self.site_name,
property_type=PropertyType("BUILDING"),
listing_type=ListingType(result["statusType"]),
img_src=result["imgSrc"],
img_src=result.get("imgSrc"),
address=self._extract_address(result["address"]),
baths_min=result["minBaths"],
baths_min=result.get("minBaths"),
area_min=result.get("minArea"),
bldg_name=result.get("communityName"),
status_text=result["statusText"],
beds_min=result["minBeds"],
price_min=price_value if "+/mo" in result["price"] else None,
price_max=price_value if "+/mo" in result["price"] else None,
latitude=result["latLong"]["latitude"],
longitude=result["latLong"]["longitude"],
unit_count=result["unitCount"],
status_text=result.get("statusText"),
price_min=price_value if "+/mo" in result.get("price") else None,
price_max=price_value if "+/mo" in result.get("price") else None,
latitude=result.get("latLong", {}).get("latitude"),
longitude=result.get("latLong", {}).get("longitude"),
unit_count=result.get("unitCount"),
)
properties_list.append(building_obj)
@@ -238,14 +239,16 @@ class ZillowScraper(Scraper):
return Property(
site_name=self.site_name,
property_url=url,
property_type=PropertyType(property_type),
property_type=PropertyType(property_type) if property_type in PropertyType.__members__ else None,
listing_type=self.listing_type,
address=address,
year_built=property_data.get("yearBuilt"),
tax_assessed_value=property_data.get("taxAssessedValue"),
lot_area_value=property_data.get("lotAreaValue"),
lot_area_unit=property_data["lotAreaUnits"].lower() if "lotAreaUnits" in property_data else None,
agent_name=property_data.get("attributionInfo", {}).get("agentName"),
agent=Agent(
name=property_data.get("attributionInfo", {}).get("agentName")
),
stories=property_data.get("resoFacts", {}).get("stories"),
mls_id=property_data.get("attributionInfo", {}).get("mlsId"),
beds_min=property_data.get("bedrooms"),
@@ -295,21 +298,23 @@ class ZillowScraper(Scraper):
zip_code=zip_code,
)
@staticmethod
def _get_headers():
return {
"authority": "www.zillow.com",
"accept": "*/*",
"accept-language": "en-US,en;q=0.9",
"content-type": "application/json",
"cookie": 'zjs_user_id=null; zg_anonymous_id=%220976ab81-2950-4013-98f0-108b15a554d2%22; zguid=24|%246b1bc625-3955-4d1e-a723-e59602e4ed08; g_state={"i_p":1693611172520,"i_l":1}; zgsession=1|d48820e2-1659-4d2f-b7d2-99a8127dd4f3; zjs_anonymous_id=%226b1bc625-3955-4d1e-a723-e59602e4ed08%22; JSESSIONID=82E8274D3DC8AF3AB9C8E613B38CF861; search=6|1697585860120%7Crb%3DDallas%252C-TX%26rect%3D33.016646%252C-96.555516%252C32.618763%252C-96.999347%26disp%3Dmap%26mdm%3Dauto%26sort%3Ddays%26listPriceActive%3D1%26fs%3D1%26fr%3D0%26mmm%3D0%26rs%3D0%26ah%3D0%26singlestory%3D0%26abo%3D0%26garage%3D0%26pool%3D0%26ac%3D0%26waterfront%3D0%26finished%3D0%26unfinished%3D0%26cityview%3D0%26mountainview%3D0%26parkview%3D0%26waterview%3D0%26hoadata%3D1%263dhome%3D0%26commuteMode%3Ddriving%26commuteTimeOfDay%3Dnow%09%0938128%09%7B%22isList%22%3Atrue%2C%22isMap%22%3Atrue%7D%09%09%09%09%09; AWSALB=gAlFj5Ngnd4bWP8k7CME/+YlTtX9bHK4yEkdPHa3VhL6K523oGyysFxBEpE1HNuuyL+GaRPvt2i/CSseAb+zEPpO4SNjnbLAJzJOOO01ipnWN3ZgPaa5qdv+fAki; AWSALBCORS=gAlFj5Ngnd4bWP8k7CME/+YlTtX9bHK4yEkdPHa3VhL6K523oGyysFxBEpE1HNuuyL+GaRPvt2i/CSseAb+zEPpO4SNjnbLAJzJOOO01ipnWN3ZgPaa5qdv+fAki; search=6|1697587741808%7Crect%3D33.37188814545521%2C-96.34484483007813%2C32.260490641365685%2C-97.21001816992188%26disp%3Dmap%26mdm%3Dauto%26p%3D1%26sort%3Ddays%26z%3D1%26listPriceActive%3D1%26fs%3D1%26fr%3D0%26mmm%3D0%26rs%3D0%26ah%3D0%26singlestory%3D0%26housing-connector%3D0%26abo%3D0%26garage%3D0%26pool%3D0%26ac%3D0%26waterfront%3D0%26finished%3D0%26unfinished%3D0%26cityview%3D0%26mountainview%3D0%26parkview%3D0%26waterview%3D0%26hoadata%3D1%26zillow-owned%3D0%263dhome%3D0%26featuredMultiFamilyBuilding%3D0%26commuteMode%3Ddriving%26commuteTimeOfDay%3Dnow%09%09%09%7B%22isList%22%3Atrue%2C%22isMap%22%3Atrue%7D%09%09%09%09%09',
"origin": "https://www.zillow.com",
"referer": "https://www.zillow.com",
"sec-ch-ua": '"Chromium";v="116", "Not)A;Brand";v="24", "Google Chrome";v="116"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Windows"',
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36",
def _get_headers(self):
headers = {
'authority': 'www.zillow.com',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'en-US,en;q=0.9',
'sec-ch-ua': '"Google Chrome";v="117", "Not;A=Brand";v="8", "Chromium";v="117"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36',
}
if self.cookies:
headers['Cookie'] = self.cookies
return headers

View File

@@ -12,3 +12,7 @@ class NoResultsFound(Exception):
class GeoCoordsNotFound(Exception):
"""Raised when no property is found for the given address"""
class SearchTooBroad(Exception):
"""Raised when the search is too broad"""

View File

@@ -1,6 +1,6 @@
[tool.poetry]
name = "homeharvest"
version = "0.2.7"
version = "0.2.15"
description = "Real estate scraping library supporting Zillow, Realtor.com & Redfin."
authors = ["Zachary Hampton <zachary@zacharysproducts.com>", "Cullen Watson <cullen@cullen.ai>"]
homepage = "https://github.com/ZacharyHampton/HomeHarvest"

View File

@@ -4,11 +4,13 @@ from homeharvest.exceptions import (
InvalidListingType,
NoResultsFound,
GeoCoordsNotFound,
SearchTooBroad,
)
def test_redfin():
results = [
scrape_property(location="San Diego", site_name="redfin", listing_type="for_sale"),
scrape_property(location="2530 Al Lipscomb Way", site_name="redfin", listing_type="for_sale"),
scrape_property(location="Phoenix, AZ, USA", site_name=["redfin"], listing_type="for_rent"),
scrape_property(location="Dallas, TX, USA", site_name="redfin", listing_type="sold"),
@@ -24,9 +26,10 @@ def test_redfin():
location="abceefg ju098ot498hh9",
site_name="redfin",
listing_type="for_sale",
)
),
scrape_property(location="Florida", site_name="redfin", listing_type="for_rent"),
]
except (InvalidSite, InvalidListingType, NoResultsFound, GeoCoordsNotFound):
except (InvalidSite, InvalidListingType, NoResultsFound, GeoCoordsNotFound, SearchTooBroad):
assert True
assert all([result is None for result in bad_results])

View File

@@ -13,6 +13,7 @@ def test_zillow():
scrape_property(location="Phoenix, AZ, USA", site_name=["zillow"], listing_type="for_rent"),
scrape_property(location="Dallas, TX, USA", site_name="zillow", listing_type="sold"),
scrape_property(location="85281", site_name="zillow"),
scrape_property(location="3268 88th st s, Lakewood", site_name="zillow", listing_type="for_rent"),
]
assert all([result is not None for result in results])