896f862137 | ||
---|---|---|
.github/workflows | ||
homeharvest | ||
tests | ||
.gitignore | ||
HomeHarvest_Demo.ipynb | ||
LICENSE | ||
README.md | ||
poetry.lock | ||
pyproject.toml |
README.md
HomeHarvest is a simple, yet comprehensive, real estate scraping library.
Not technical? Try out the web scraping tool on our site at tryhomeharvest.com.
Looking to build a data-focused software product? Book a call to work with us.
Check out another project we wrote: JobSpy – a Python package for job scraping
Features
- Scrapes properties from Zillow, Realtor.com & Redfin simultaneously
- Aggregates the properties in a Pandas DataFrame
Video Guide for HomeHarvest - updated for release v0.2.7
Installation
pip install homeharvest
Python version >= 3.10 required
Usage
CLI
homeharvest "San Francisco, CA" -s zillow realtor.com redfin -l for_rent -o excel -f HomeHarvest
This will scrape properties from the specified sites for the given location and listing type, and save the results to an Excel file named HomeHarvest.xlsx
.
By default:
- If
-s
or--site_name
is not provided, it will scrape from all available sites. - If
-l
or--listing_type
is left blank, the default isfor_sale
. Other options arefor_rent
orsold
. - The
-o
or--output
default format isexcel
. Options arecsv
orexcel
. - If
-f
or--filename
is left blank, the default isHomeHarvest_<current_timestamp>
. - If
-p
or--proxy
is not provided, the scraper uses the local IP. - Use
-k
or--keep_duplicates
to keep duplicate properties based on address. If not provided, duplicates will be removed.
Python
from homeharvest import scrape_property
import pandas as pd
properties: pd.DataFrame = scrape_property(
site_name=["zillow", "realtor.com", "redfin"],
location="85281",
listing_type="for_rent" # for_sale / sold
)
#: Note, to export to CSV or Excel, use properties.to_csv() or properties.to_excel().
print(properties)
Output
>>> properties.head()
property_url site_name listing_type apt_min_price apt_max_price ...
0 https://www.redfin.com/AZ/Tempe/1003-W-Washing... redfin for_rent 1666.0 2750.0 ...
1 https://www.redfin.com/AZ/Tempe/VELA-at-Town-L... redfin for_rent 1665.0 3763.0 ...
2 https://www.redfin.com/AZ/Tempe/Camden-Tempe/a... redfin for_rent 1939.0 3109.0 ...
3 https://www.redfin.com/AZ/Tempe/Emerson-Park/a... redfin for_rent 1185.0 1817.0 ...
4 https://www.redfin.com/AZ/Tempe/Rio-Paradiso-A... redfin for_rent 1470.0 2235.0 ...
[5 rows x 41 columns]
Parameters for scrape_properties()
Required
├── location (str): address in various formats e.g. just zip, full address, city/state, etc.
└── listing_type (enum): for_rent, for_sale, sold
Optional
├── site_name (list[enum], default=all three sites): zillow, realtor.com, redfin
├── proxy (str): in format 'http://user:pass@host:port' or [https, socks]
└── keep_duplicates (bool, default=False): whether to keep or remove duplicate properties based on address
Property Schema
Property
├── Basic Information:
│ ├── property_url (str)
│ ├── site_name (enum): zillow, redfin, realtor.com
│ ├── listing_type (enum): for_sale, for_rent, sold
│ └── property_type (enum): house, apartment, condo, townhouse, single_family, multi_family, building
├── Address Details:
│ ├── street_address (str)
│ ├── city (str)
│ ├── state (str)
│ ├── zip_code (str)
│ ├── unit (str)
│ └── country (str)
├── House for Sale Features:
│ ├── tax_assessed_value (int)
│ ├── lot_area_value (float)
│ ├── lot_area_unit (str)
│ ├── stories (int)
│ ├── year_built (int)
│ └── price_per_sqft (int)
├── Building for Sale and Apartment Details:
│ ├── bldg_name (str)
│ ├── beds_min (int)
│ ├── beds_max (int)
│ ├── baths_min (float)
│ ├── baths_max (float)
│ ├── sqft_min (int)
│ ├── sqft_max (int)
│ ├── price_min (int)
│ ├── price_max (int)
│ ├── area_min (int)
│ └── unit_count (int)
├── Miscellaneous Details:
│ ├── mls_id (str)
│ ├── agent_name (str)
│ ├── img_src (str)
│ ├── description (str)
│ ├── status_text (str)
│ └── posted_time (str)
└── Location Details:
├── latitude (float)
└── longitude (float)
Supported Countries for Property Scraping
- Zillow: contains listings in the US & Canada
- Realtor.com: mainly from the US but also has international listings
- Redfin: listings mainly in the US, Canada, & has expanded to some areas in Mexico
Exceptions
The following exceptions may be raised when using HomeHarvest:
InvalidSite
- valid options:zillow
,redfin
,realtor.com
InvalidListingType
- valid options:for_sale
,for_rent
,sold
NoResultsFound
- no properties found from your inputGeoCoordsNotFound
- if Zillow scraper is not able to derive geo-coordinates from the location you input
Frequently Asked Questions
Q: Encountering issues with your queries?
A: Try a single site and/or broaden the location. If problems persist, submit an issue.
Q: Received a Forbidden 403 response code?
A: This indicates that you have been blocked by the real estate site for sending too many requests. Currently, Zillow is particularly aggressive with blocking. We recommend:
- Waiting a few seconds between requests.
- Trying a VPN to change your IP address.