Categories
Programming

Transferring Data in Postgres

“We believe that databases need to excel at more than simple selects to be useful for complex tasks, and our positive experiences with PostgreSQL has done nothing but reinforce that philosophy.”

– David McNett

Migrating data hardly ever comes without headaches and the odd “oh shit” moments, but after getting the hang of it, you realize a great level of control.

Postgres comes with two utilities that make it simple to extract a database and restore it to another destination, with flexible options to select which parts of the data you want restored:

  • pg_dump is a utility for consistent back-ups of a PostgreSQL database, even if the database is being used concurrently.
  • pg_restore is a utility for restoring a PostgreSQL database from an archive created by pg_dump in one of the non-plain-text formats.
Categories
Programming

JWT Authentication

Implementing authentication for third-party access is no small feat, but is imperative in order to compete in a complex API economy and expand business capabilities. With every application, securing protected resources always poses a unique challenge, particularly considering how rapid technology evolves. New solutions come along and customers expect a level of consistency across apps, which is important to be mindful of in reducing friction (and generating revenue).

The OAuth 2.0 protocol is the industry standard for authorization. It focuses on client/developer simplicity, and enables secure access for desktop and mobile applications. Nearly everyone has come across this type of authentication with Single Sign On (SSO) options from companies like Google, Apple, or Linkedin, which keep you logged in across all of their products (view a full list of strategies).

Categories
Programming

Multiple Databases with Rails 6 and RDS

Rails 6 shipped with the ability to use multiple databases in one application, making automatic connection switching as simple as adding a connects_to method in the respective class. To go a step further, we’ll set up an Amazon RDS instance, which benefits team members by providing consistent access to the same database—which could contain a copy of production data that will be useful to test against—avoiding development environment configuration, and improving horizontal scaling.

AWS offers a free tier for RDS, with 750 hours of db.t2.micro instance usage, 20 GB of General Purpose (SSD) DB Storage, and 20 GB of backup storage for automated database backups. The free tier is available for 12 months from the account creation date.

“The service handles time-consuming database management tasks so you can pursue higher value application development.”

– AWS
Categories
Programming

AWS Lambda Functions for Python and Ruby

I love to program in Ruby as well as Python, and AWS Lambda functions provide the perfect solution to combine both language’s capabilities without additional server configuration. The aws-sdk-lambda library makes serverless computing workflow dead simple, by providing a gateway that connects to a Lambda function that will run Python code and return the results to a Rails application.

“Run code without thinking about servers. Pay only for the compute time you consume.”

– AWS

The primary motive for integrating Python, is that it provides a rich set of Machine Learning tools to analyze Real Estate data. Much of what I wanted to accomplish, I could have with one script in a Jupyter Notebook, but integrating that functionality at scale would require a lot of overhead and building another API.

AWS does much of the heavy-lifting tasks like server provisioning and management, which can be monitored through their web interface:

Categories
Programming

Background Processing with RETS and Sidekiq

Managing large quantities of real estate data is computationally intensive, and well suited for background processing. The task involves importing thousands of listings from a RETS database into a Redis in-memory data structure store, using an open government API for geocoding, and association with other models, therefore a lot can go wrong, and it’s important to isolate these functions according to the single responsibility principle and separation of concerns.

This is an attempt to find the optimal setup using Heroku Redis in regards to concurrency and pool size, while gracefully dealing with Timeout, 429 Too Many Requests, and ERR max number of clients reached errors. I’ve predominantly worked with two libraries that tie perfectly into Rails’ ActiveJobResque and Sidekiq. My preference leans toward Sidekiq, not only for their sweet karate logo, but the creator, who open-sourced the software and charged money for Pro features that allowed him to quit his job:

Categories
Programming

Ruby 2.7.0

As per tradition, a new version of Ruby was released on Christmas Day. Out of the 4,190 file changes since 2.6.0, Ruby 2.7.0 introduces some notable improvements.