OCT 13, 2019
By Brad Apps

Queuing in Software

I am an enthusiastic advocate of queuing due to the benefits I’ve seen it bring to ECConnect's technology stack.

Queuing, in the realm of software, is the mechanism for making a software application asynchronous. Making software asynchronous, means splitting up large business processes and operations, or decouple parts or modules of an application. These smaller operations, tasks or things the software needs to do are referred to as ‘jobs’ or ‘messages’. A ‘producer’ is a function or part of the software which creates the jobs. Jobs are stored in the queue and the queue ‘pollers’ (sometimes referred to as ‘workers’) pick up and send the jobs to ‘consumers’ which process them.

By splitting up a business process into multiple 'jobs', rather than performing multiple functions within a single execution, we make them easier to test, debug, evolve and scale. This methodology of breaking up applications into smaller blocks is perfect for modern software architecture, while providing simpler and more manageable software platforms. As well as working to increase performance, reliability, fault tolerance and scalability.

Various ECConnect software products utilise queuing architecture. ECConnect use a fast database, tuned specifically for the job, which stores jobs waiting to be handled, as well as jobs which succeeded or failed together with the result. This provides the utmost transparency and reporting ability giving the team a very granular and in depth view of the relevant processes.

The ECConnect queuing system also provides the following features:

  • Multi-threading
  • Categorising the jobs into different types
  • Pausing of a job type
  • Scheduling jobs to run at a specified time in the future
  • Retrying jobs
  • Setting the priority for processing based on the job type

Queuing is especially important when communicating with external APIs. These could be facilities provided by other parties - payment gateways, address validation, credit checking, etc. - or by systems geographically hosted - in the cloud or remote data centres. It is important to make these asynchronous to mitigate delays, latency and outages, and to not effect an entire business process.

I'll provide an example of a commonly used business process in the ECConnect systems for MVNOs to demonstrate the benefits we gained by splitting apart a large business process. The particular process is the monthly renewals of a prepaid mobile product, such as a $25 product giving Unlimited Calls and SMS and 5 GB of data, which lasts 30 days. Every 30 days this product needs to be automatically renewed.

The process is as follows:

  • Check if the service has any credit on the carrier which could be used for the renewal
  • If not, process a credit card payment with the payment provider
  • If successful, create an invoice PDF
  • Then, renew the product with the carrier
  • Lastly, send and Email and/or SMS to the customer

Initially this business process was synchronous, meaning everything was done as a single execution. Delays or outages with the carrier or payment provider would cause the entire process to stop. Also the time it took to do every execution was lengthy, as the number of renewals grew, it took many hours to complete.

We then implemented multi threading of the process, giving the ability to process multiple of these monolithic business processes at the one time. This solved the problem of the total time to process all of the renewals. However, it didn't solve the issue for external provider delays or outages.

The ECConnect team then split apart the steps into smaller pieces and utilised the queuing mechanism which has in-built multi-threading. The process was split into these steps:

  • Check
  • Pay and invoice
  • Renew
  • Message

Each step is multi-threaded and individually processed. As the queuing system allows pausing of a job type, just say the payment provider has an outage, we can pause the specific job and the system can still process the first step (and final steps for ones not needing a payment) meaning the whole process is not broken or delayed.

A "quick win" we see by the jobs remaining in the queue, and flagged as successful or failed, is that the failed jobs can easily be reset to be processed again. When our client requested to implement a retry for payments which failed, the team were easily able to leverage the scheduling feature of the queuing engine.

A queuing system should be reliable and bulletproof to not process jobs multiple times. The ECConnect queuing engine does this by leveraging the database's locking mechanism for updates, we update the next job to be processed instead of selecting and then updating it. Doing the latter could return the same record in the case that multiple database queries run at the same time.

To control the size of our queuing database we implemented automated scripts which run daily to carry out archiving, leaving only a set number of days of records in the tables. This pro-actively manages growth and keeps the database operating at optimal performance.

By utilising a queuing methodology and implementing a mechanism to facilitate this, ECConnect has achieved great scalability and transparency into relevant business processes. Since implementing the queuing engine into our software the team has been able to migrate numerous legacy processes which now benefit from the system, as well as build new and innovative capabilities using the technology.

Don't hesitate to reach out to find out more or see how ECConnect can help your business.

Brad Apps
Founder, CEO & Director