Managing large quantities of real estate data is computationally intensive, and well-suited for background processing. The task involves importing thousands of MLS® listings into a Redis in-memory data structure store, using an open government API for geocoding, and association with other models, therefore a lot can go wrong, and it’s important to isolate these functions according to the single responsibility principle and separation of concerns.
This is an attempt to find the optimal setup using Heroku Redis in regards to concurrency and pool size, while gracefully dealing with Timeout
, 429 Too Many Requests
, and ERR max number of clients reached
errors. I’ve predominantly worked with two libraries that tie perfectly into Rails’ ActiveJob
— Resque and Sidekiq. My preference leans toward Sidekiq, not only for their sweet karate logo but the creator, who open-sourced the software and charged money for Pro features that allowed him to quit his job:
“I’ve been working daily for the last 5 years as a solo entrepreneur, building as much value into my commercial products and automating my business as much as possible. It’s time to take a vacation and enjoy my success for a few months — relax and enjoy life while the products sell themselves.”
– Mike Perham
Suffice it to say, that enthusiasm for software engineering and independence is reflected in the product, and it helps that he frequently answers questions on StackOverflow for when you run into issues (also a happy hour for support). Sidekiq has tight integration with ActiveJob
, which has worked great so far, to varying degrees.
Jobs within a Job
This one took me a while to figure out. It doesn’t make much sense to perform a request to a third-party API outside of the job only to pass it to a job. That request could Timeout
, or respond with a 400
, and is not the most effective way to use background processing as it was intended. I ended up creating one job that connects to the RETS client using Estately’s RETS library, which loops over all the records and queues a new job for every row.
Connect to RETS client:
module Rets
extend ActiveSupport::Concern
def connect
retries = 5
@client = Rets::Client.new(
login_url: :endpoint,
username: :user,
password: :password,
version: 'RETS/1.5',
max_retries: retries
)
@client.login
rescue Timeout::Error => e
Rails.logger.error(e)
retry if retries.positive?
retries -= 1
end
def disconnect
@client.logout
end
end
Import records:
module Import
class ListingJob < ApplicationJob
queue_as :priority
before_perform :connect
after_perform :disconnect
sidekiq_options retry: 5
def perform(**args)
records = @client.find(
:all,
search_type: args[:search_type],
class: args[:property_class],
resolve: true
)
return if records.blank?
records.each do |record|
Insert::ListingJob.perform_later(record)
end
rescue StandardError => e
Rails.logger.error(e)
Raven.capture_exception(e)
end
end
end
Insert record:
module Insert
class ListingJob < ApplicationJob
queue_as :priority
def perform(record)
Listings::Create.call(record) if record.present?
end
end
end
Sidekiq then calls a Plain Old Ruby Object (PORO) service to handle the interaction with the database. The operation can be seen through Sidekiq’s sleek dashboard:
Rails.application.routes.draw do
require 'sidekiq/web'
require 'sidekiq-scheduler/web'
mount Sidekiq::Web => '/sidekiq'
end

The jobs can be controlled through the UI, or programmatically through the Rails console, which makes it super easy to manage:
2.7.1 > queue = Sidekiq::Queue.new('priority')
2.7.1 > queue.each do |job|
2.7.1 > job.delete
2.7.1 > end