Streaming Real Estate Data

Big data in real estate is a prime use case for real-time stream processing–a programming paradigm that allows us to instantaneously respond to data as it arrives. It is the antithesis to batch processing, whereby all data is loaded into memory before it is delivered and processed. The real estate profession is truly a matter of time is money–if an appraisal is a day late, or there is a lack of efficiency in the transaction process, it could lead to a deal collapsing, putting the livelihood of all parties involved at stake.

The time sensitive nature of real estate demands a level of immediacy–from responding to client enquiries (now typically automated by chat bots), validating big data as it is imported into a data lake, or generating an estimate of property value upon offer request (in the iBuyer space). Real-time streaming and data processing provide the mechanisms for organizations to generate business value from their data and outperform the competition.

Streaming Real Estate Data Lake AWS
Amazon Data Lakes and Analytics

Multiple Databases with Rails 6 and RDS

Rails 6 shipped with the ability to use multiple databases in one application, making automatic connection switching as simple as adding a connects_to method in the respective class. To go a step further, we’ll set up an Amazon RDS instance, which benefits team members by providing consistent access to the same database—which could contain a copy of production data that will be useful to test against—avoiding development environment configuration, and improving horizontal scaling.

AWS offers a free tier for RDS, with 750 hours of db.t2.micro instance usage, 20 GB of General Purpose (SSD) DB Storage, and 20 GB of backup storage for automated database backups. The free tier is available for 12 months from the account creation date.

“The service handles time-consuming database management tasks so you can pursue higher value application development.”


Background Processing with RETS and Sidekiq

Managing large quantities of real estate data is computationally intensive, and well suited for background processing. The task involves importing thousands of listings from a RETS database into a Redis in-memory data structure store, using an open government API for geocoding, and association with other models, therefore a lot can go wrong, and it’s important to isolate these functions according to the single responsibility principle and separation of concerns.

This is an attempt to find the optimal setup using Heroku Redis in regards to concurrency and pool size, while gracefully dealing with Timeout, 429 Too Many Requests, and ERR max number of clients reached errors. I’ve predominantly worked with two libraries that tie perfectly into Rails’ ActiveJobResque and Sidekiq. My preference leans toward Sidekiq, not only for their sweet karate logo, but the creator, who open-sourced the software and charged money for Pro features that allowed him to quit his job:

Machine Learning

Python for Real Estate

“Maintainable code is more important than clever code.”

– Guido van Rossum, creator of Python

During this lockdown, I’ve been spending time taking online courses, specifically in the areas of Data Science, Machine Learning, and Python. My go-to platform right now is Coursera. I managed to complete three university-grade courses within a week for free. As long as you stay within the one week trial period, you don’t incur any fees.

Over the last 2+ years, I’ve been fully immersed in Ruby on Rails, both for work and personal projects. When building large-scale applications, Rails’ convention over configuration is ideal. Ruby has an incredibly expressive, human-readable syntax, and is truly a joy to program in alongside a team.

Learning Python seemed to be the natural evolution of being a well-rounded engineer. When it comes to the data-intensive industry of real estate, it is the predominant language with an extensive collection of data crunching libraries. Zillow, HouseCanary, and Opendoor, all use Python as their preferred dynamic language, as machine learning and predictive analytics are central to their business models.

So what is it that makes Python such a powerful language to analyze real estate data? A simple demonstration can show how little overhead is required to extract insights from large datasets.