Goal: Reduce workload time from 14+ hours to <1 hour through performance optimization

Successfully optimized a performance and Reduce workload time

We successfully optimized a customer’s performance from over 14 hours to under an hour. Allow me to share the details of how we achieved this impressive feat.

The system

The system is designed to receive data from external systems and can be categorized into different dimensions for analysis. The analysis process involves externalizing important formulas to give users better access to data and applying various algorithms to extract information from the dimensions generated from each record.

In terms of implementation, the system’s back end is built using Java, and Mongo DB is used as the storage layer. Java allows for using additional languages, such as JavaScript, Groovy, and Python, which were utilized to enable the specification of formulas.

The problem

As we examined the system, we identified several issues contributing to significant delays in execution time. These issues can be broadly categorized as follows:

The process consumed excessive memory, causing the system to slow down.
The system processed large datasets far from the storage, further compounding the problem.

These issues become particularly challenging when they occur together, resulting in even longer delays in execution time.

Memory hungry slow execution process

The polyglot system can support multiple programming languages, allowing users to write formulas in their language of choice, such as plain JavaScript, MS Excel formulas, etc. This feature provides great flexibility as people with different skill sets can express models in the language they are comfortable with.

However, implementing this feature causes an extra load on the system as it has to support multiple execution run times within a single process.

Processing large datasets away from storage datasets

The second aspect of the problem surfaced as a solution to the first issue. The system was processing a massive dataset comprising more than 1,000,000 records. It applied logical operations to summarize the results and calculated derived data by examining each record

The solution journey

We noticed that a particular process was causing a bottleneck in the performance of our systems. To improve the overall performance, we worked on enhancing this process to bring it up to an acceptable level. To address the issue of the slow-running process per record, we explored different options and observed gradual improvements in various aspects.

Java multi-threading

As the system was developed using Java, our initial solution was to adopt a multi-threaded approach. However, we soon encountered memory limitations and frequent garbage collection, resulting in slower performance and larger heap sizes in Java.

Since this process is not always running and is just one of many use cases, it is not cost-effective to scale up the entire system to handle it. After several attempts at optimizing the JVM and increasing memory, we explored other alternatives.

Implementing Serverless computing

Most cloud providers offer serverless computing capabilities, and since we were already using AWS, we decided to use AWS Lambda for concurrent execution. This allowed us to avoid memory constraints and the need for large-sized instances dedicated to the application. We used Lambda to process each record by fanning out the Lambda process, which updated the database and sent a completion signal once it was done. This allowed us to consolidate the results while waiting for the processing to finish.

Issue with concurrency

After implementing a performance optimization technique, we noticed a significant improvement in the import time of records. The time taken to import records remained almost predictable, regardless of the number of records. However, during our testing, we encountered a problem with conflicting database updates. This occurred when multiple users tried to update the same shared record, causing the database to reject the updates to prevent data corruption.

Moving Objects Around Costs

Although we could process each record in approximately five milliseconds, it would still take us 15 hours to consolidate the results and enrich the data. Even if we managed to reduce the processing time to around 1.6 milliseconds per record, it would still take us around 6 hours to complete the task.

We faced a challenge while isolating records to allow for easy concurrency. To solve this problem, we created a new record for each conflicting update in the staging table. This would enable us to consolidate all records post-completion of complete concurrent execution.

Although this was an excellent solution in theory, we soon realized that the intermediary collection had generated far more records than expected. We ended up with roughly 1,100,000 additional records from only 13,000 input records.

Processing each record, including reading, enriching, and consolidating them into target records, was time consuming. We calculated that if we were to process each record, we would need to read it, sum up fields that may cause concurrent update conflicts, and calculate the value for some additional fields in the target record.

Get Closer to Data

Through learning, we were able to implement logic that is closer to the data. Nowadays, most database management systems offer some form of data summarization and consolidation query capabilities within the database itself. This brings out the best in the database.

We used MongoDB’s aggregation framework to bring our logic closer to the database. This resulted in a fantastic outcome of reducing several work hours to just a few minutes. MongoDB’s aggregation pipeline has several thoughtfully constructed operators that filter, consolidate, join, and produce the desired document structure (output record). It also allowed us to store it in a target collection (table) without leaving the process.

Bulk updates implemented

We were able to make progress with our data, but we encountered some data points that were not well represented in the aggregation framework. We had to figure out how to include certain logic inside the framework, and we noticed that we could apply the same calculation to a large set of records based on specific attributes.

By summarizing approximately 76,000 resulting records to only 30, we found a way to improve the processing. We used the aggregation framework to query records that needed to be enriched and performed external calculations on the derived fields. This allowed us to perform mass updates while still maintaining excellent performance.