← Back to articles Case Study

Halving AWS Lambda Costs with Python Parallel Processing

How a two-line code change reduced compute times by 65% and cut our AWS bill in half - a lesson in the real-world value of code optimisation.

Project: Web Data Pipeline
Tech: Python, AWS Lambda, RDS
Completed: November 2019

The Results

A simple optimisation to our Python code delivered dramatic improvements across the board:

-65%
Compute Time Reduction
-53%
AWS Bill Reduction
2
Lines of Code Changed
AWS Lambda Average Compute Time - Before & After Optimisation
AWS CloudWatch graph showing Lambda compute time dropping from ~23 seconds to ~8 seconds after optimization in late October 2019

The dramatic drop at the end of October 2019 shows the immediate impact of deploying the parallel processing optimisation. Average compute time fell from ~23 seconds to under 8 seconds.

The Challenge

I had set up a series of AWS Lambda functions to gather information from the web periodically and write it to an RDS instance. The functions were running fine, but the average invocation time was around 23 seconds - not unusual for an I/O-heavy process where most of the time is spent waiting for server responses.

However, with AWS Lambda, you pay for what you use. Billing is calculated based on both the number of invocations and the compute time (measured in GB-seconds). Every second of execution time directly impacts costs.

The Lambda Billing Model

AWS Lambda charges for every 1ms of execution time, rounded up. A function running for 23 seconds costs roughly 3x more than one running for 8 seconds - making optimisation directly profitable.

Python 3.x AWS Lambda Amazon RDS CloudWatch

The Solution

The key insight was recognising that our workload was I/O-bound, not CPU-bound. Most of the 23 seconds was spent waiting for external servers to respond - time during which the Lambda function was essentially idle.

This is the perfect use case for parallel processing. Instead of making requests sequentially (wait for response 1, then request 2, wait for response 2, etc.), we could make all requests simultaneously and process responses as they arrived.

The Implementation

Python's concurrent.futures module makes parallel processing remarkably simple. Using a ThreadPoolExecutor as a context manager, the entire optimisation required just two lines of code:

The Two-Line Optimisation
with concurrent.futures.ThreadPoolExecutor(max_workers=60) as executor:
    result = executor.map(core_function, iterable)

This elegant solution handles all the complexity of thread management automatically:

Thread Pool Creation

Python creates a pool of up to 60 worker threads, ready to execute tasks concurrently.

Parallel Execution

The executor.map() function distributes the iterable across all available workers, running them in parallel.

Automatic Cleanup

The context manager ensures all threads are properly closed once processing completes - no manual cleanup required.

Why Threads Work Here

Python's Global Interpreter Lock (GIL) normally prevents true parallel execution. However, for I/O-bound tasks, threads release the GIL while waiting for responses - making ThreadPoolExecutor ideal for web scraping and API calls.

The Transformation

Before

~23s
Average execution time

After

~8s
Average execution time

The impact was immediate and visible in CloudWatch the moment the optimised code was deployed. What had been a steady line at around 23 seconds dropped sharply to under 8 seconds - and stayed there.

More importantly, the AWS bill reflected this improvement. With Lambda billing based on GB-seconds, reducing execution time by 65% translated directly into a 53% reduction in costs.

Key Takeaways

This project reinforced several important lessons about writing production code:

1. Understand Your Workload Type

The optimisation strategy depends entirely on whether your code is I/O-bound or CPU-bound. For I/O-bound tasks (API calls, web scraping, database queries), threading works brilliantly. For CPU-bound tasks, you'd need multiprocessing instead.

2. Serverless Amplifies Optimisation ROI

In traditional server infrastructure, an inefficient function just wastes cycles. In serverless, every wasted millisecond is billed. This makes code optimisation directly profitable - the ROI is immediate and measurable.

3. Python Makes It Easy

Setting up parallel processing can be complex in many languages. Python's concurrent.futures module abstracts away the complexity, making it accessible with just two lines of code.

Need Help Optimising Your Cloud Infrastructure?

I help businesses reduce cloud costs and improve performance through data-driven analysis and smart engineering.

Get in Touch