event, Context context) {
LambdaLogger logger = context.getLogger();
String bucket = event.get("bucket");
String key = event.get("key");
System.out.println("Event: " + gson.toJson(event));
S3Client s3Client = S3Client.builder().build();
GetObjectRequest getObjectRequest = GetObjectRequest.builder()
.bucket(bucket)
.key(key)
.build();
InputStream responseBody = s3Client.getObject(getObjectRequest);
try {
ZstdInputStream decompressStream = new ZstdInputStream(
new BufferedInputStream(responseBody)
BufferedReader reader = new BufferedReader(
new InputStreamReader(new BufferedInputStream(decompressStream))
int count = 0;
JsonIterator jsonIterator = new JsonIterator();
String line;
while ((line = reader.readLine()) != null) {
++count;
Any jsonObject = jsonIterator.deserialize(line);
System.out.println("num_log_events=" + count);
} catch (IOException ex) {
System.err.println("ERROR: " + ex.toString());
return "500 Internal Server Error";
return "200 OK";
dependencies {
implementation 'com.amazonaws:aws-lambda-java-core:1.2.1'
implementation 'com.google.code.gson:gson:2.8.9'
implementation platform('software.amazon.awssdk:bom:2.19.33')
implementation 'software.amazon.awssdk:s3'
implementation 'com.github.luben:zstd-jni:1.5.2-5'
implementation 'org.json:json:20220924'
implementation 'com.jsoniter:jsoniter:0.9.9'
testImplementation 'org.apache.logging.log4j:log4j-api:[2.17.1,)'
testImplementation 'org.apache.logging.log4j:log4j-core:[2.17.1,)'
testImplementation 'org.apache.logging.log4j:log4j-slf4j18-impl:[2.17.1,)'
testImplementation 'org.junit.jupiter:junit-jupiter-api:5.6.0'
testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.6.0'
test {
useJUnitPlatform()
task buildZip(type: Zip) {
from compileJava
from processResources
into('lib') {
from configurations.runtimeClasspath
java {
sourceCompatibility = JavaVersion.VERSION_1_8
targetCompatibility = JavaVersion.VERSION_1_8
build.dependsOn buildZip
We built the zip file with
gradle build -i
.
We used the AWS console to create our Lambda function and upload our zip code files, but you can use the AWS CLI in the same way described in the “Rust” section of this post above. You may need to change a few flags when you use
aws lambda create-function
, like these:
-
--runtime java11
-
--handler example.Handler::handleRequest
Given that Java probably has the strongest AWS SDK of all languages, it is tempting to use Java in this Lambda function to process JSON logs. However, we were disappointed with the slow performance we saw, so we would not recommend using Java for this use case.
We tried running both JSON parsers we found,
org.json
and
jsoniter
, under each of the two CPU architectures.
Here are our takeaways:
-
arm64 was as fast or faster than x86_64 in all cases. Since arm64 is 20% cheaper to run per unit of compute time, it’s the smarter CPU architecture choice if you are using Java for this use case.
-
Running the
org.json
parser was ~2x faster in arm64 than in x86_64.
-
Running the
jsoniter
parser was basically equally fast in arm64 and x86_64.
-
Unfortunately, Java was much slower than Rust and Go.
-
While Rust and Go can process 1GB of logs in ~2 seconds, Java is far slower at around 8-10 seconds to do the same task.
-
Performance improved as we increased memory allocation before reaching a plateau around 1.7GB of RAM.
In the world of AWS Lambda functions, Java’s cold start times are notoriously slow. To help address this, AWS recently released a new feature called
SnapStart
for Java Lambda functions. SnapStart takes a snapshot of a “warmed-up” version of your Java program and restores that snapshot during cold starts.
We saw that cold start times improved dramatically in Java when we used SnapStart. The chart above shows the average cold start times across 10 cold starts for each of 14 memory allocation settings, ranging from 128MB to 10GB.
-
Normal cold start time: 583ms on average
-
SnapStart restore time: 104ms on average, a 5.6x improvement.
With SnapStart enabled, Java’s cold start times were better than Python’s but still slower than Go’s and Rust’s. The chart above compares cold start times across all four languages: Python, Java with SnapStart, Go, and Rust. Again, we are showing average cold start times across 10 cold starts for each of 14 memory allocation settings, ranging from 128MB to 10GB. Here is roughly what the cold start times looked like for each language:
-
Python: ~325ms
-
Java: ~100ms
-
Go: ~45ms
-
Rust: ~30ms
SnapStart is a good option if you need to use Java and want to minimize cold start times. However, the cold start times for Go and Rust are still 2-3x faster, and there are a few important SnapStart limitations you should consider. In particular, SnapStart does not support:
-
The arm64 CPU architecture
-
Provisioned concurrency
-
Using AWS Elastic File System
-
Using ephemeral storage above 512MB in size
In general, we were disappointed with Java’s performance in Lambda functions for this bursty data-processing use case, so we recommend trying Rust or Go instead if you can.
Of the four languages, Python’s Lambda function code is the simplest. It executes an S3 GET request for the given
bucket
and
key
fields from the input event, downloads the response body as a stream, and decompresses the data on the fly. We read the data in 10KB chunks, split on newlines, and parse each line as a JSON object. The newline chunk splitting approach is not optimally efficient, but we intend to use the Python code here to get a performance baseline against which we can compare the performance of the other languages.
We chose to deploy the Python code to our Lambda function as a Docker container. Here is the
Dockerfile
and a
push_container.sh
script to build the container and push it to the AWS Elastic Container Registry. We are using
docker buildx
, the extended BuildKit tool set, to build for a specific CPU architecture, namely
arm64
.
FROM public.ecr.aws/lambda/python:3.8
# Install the function's dependencies using file requirements.txt
# from your project folder.
COPY requirements.txt ${LAMBDA_TASK_ROOT}
RUN pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"
# Copy function code
COPY app.py ${LAMBDA_TASK_ROOT}
# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD ["app.handle_request"]
# push_container.sh
docker buildx build --platform linux/arm64/v8 . -t lambda_langs_test_python
docker tag lambda_langs_test_python:latest ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_python:latest
docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_python:latest
aws lambda create-function \
--function-name lambda_langs_test_python \
--memory-size 640 \
--architectures arm64 \
--code ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_python:latest
--timeout 900 \
--role ${LAMBDA_IAM_ROLE}
The chart above shows the average task duration across 10 cold starts for each of 14 memory allocation settings, ranging from 128MB to 10GB.
Here are our takeaways:
-
x86_64 was slightly faster than arm64, but only by 3-4%. Since arm64 is 20% cheaper per unit of compute time, it looks like arm64 is the smarter choice here for Python.
-
Performance improves as we add more memory to the Lambda function until we reach a plateau around 1.7GB.
-
Python is quite slow for this use case. At best, it looks like it will take around 12 seconds to process 1GB of JSON logs using Python.
At Scanner, we use Lambda functions to scan through S3 data at scale, so we are very interested in the maximum S3 performance we can expect from an individual function invocation.
To test this, we wrote a Rust program to read 1GB of raw data from S3 (no decompression, no parsing), and we measured the S3 throughput.
In the graph above, we show S3 read throughput averaged over 10 runs at various memory allocation levels, 128MB to 10GB with discontinuous jumps.
Here are the interesting takeaways:
-
Increasing memory allocation also increased S3 read throughput, but an obvious plateau was reached at 640MB of memory allocation.
-
The maximum S3 read throughput for a Lambda function reading a single object was 85-90MB per second.
It seemed that, as long as our Lambda function used 640MB of memory or more, we got optimal S3 read throughput.
Although using Rust with a SIMD accelerated JSON parser can process a lot of data quickly, we can do even better.
Mozilla maintains a library called
bincode
, which is used for inter-process communication in Firefox. It is specifically very good at parsing binary data into Rust data structures.
We leverage
bincode
in Scanner’s index file data format, which gives us 4x performance improvement over SIMD-accelerated JSON parsing.
The chart above shows Lambda function task duration (scanning 2GB of Scanner index file records) using various memory allocation levels.
Scanner index file records are quite a bit more complex than typical JSON log events, so even SIMD-accelerated JSON parsing struggles to be fast. By using
bincode
in our index file format, we get extremely fast performance. This comes with an important trade-off: the
bincode
format is language-specific to Rust, which means reading it from other languages is difficult.
If you are interested in learning more about the tradeoffs between the most popular data serialization formats available in the Rust ecosystem, check out
this excellent blog post from LogRocket
. They cover JSON,
bincode
, MessagePack, and more, with plenty of data about how performance and usability differences.
If you want to process S3 data at scale using Lambda functions, here are our recommendations:
-
For optimal performance, try using Rust or Go. They are basically equally fast for this use case – specifically, 4x faster than Java and 6x faster than Python.
-
You may get better data parsing performance with
x86_64
than with
arm64
, especially if your parsing library leverages specialized SIMD instructions.
-
Data parsing could be the most important bottleneck that you will need to optimize.
-
If you are parsing JSON files:
-
With Go, use
fastjson
instead of the standard library’s
encoding/json
, to get 10x better performance.
-
With Rust, use
simdjson
instead of
serde_json
to get 3x better performance.
-
Allocate at least 640MB of memory to your Lambda function to maximize your S3 read throughput, which is roughly 90 MB / sec.
-
If you need every to squeeze every possible drop of performance you can out of your data processing system, consider exploring one of the more esoteric data formats, like
bincode
.
-
There are quite a few formats that parse faster than JSON.
-
Beware the tradeoffs. For example,
bincode
is very fast, but it is Rust-specific and not portable between languages.
If you process massive amounts of data using Lambda functions, and you feel like there are important things we’ve missed, we would love to hear from you. Reach out to me
on Twitter
and tell me about the cool things you’re building.
Also, if you would like to add fast full-text search to your S3 data lake,
sign up to try out the Scanner beta
and let us know what you think.
Scanner
is a next-gen search engine and API for logs in object storage.
Scanner’s cloud indexing uses S3 storage, serverless compute, and a novel indexing file format to reduce log management costs up 90% compared to traditional tools like Datadog and Splunk – all while keeping search, time series, and threat detections fast.
Scanner can be deployed into your own AWS account eliminating the need to ship logs over the public internet creating a zero-cost data transfer and no vendor lock-in.
Scanner
uses its own proprietary search index system that runs on top of S3, and there’s no need to build a pipeline to transform the logs first to fit a predefined SQL table schema.
Cliff is the CEO and co-founder of Scanner.dev, a next-gen search engine and API for logs in object storage. Prior to founding Scanner, he was a Principal Engineer at Cisco where he led the backend infrastructure team for the Webex People Graph. He was also the engineering lead for the data platform team at Accompany before its acquisition by Cisco. He has a love-hate relationship with Rust, but it’s mostly love these days.