Rust vs. Go, Java, and Python in AWS Lambda Functions

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

The experiment

We compared the languages by giving them a classic, “bursty” data lake task:

Given 1 GB of JSON log events in S3, run a Lambda function from a cold start to stream, decompress, and parse the objects on the fly.

There were 4 parameters for each test run:

Language : Rust, Go, Java, or Python.
CPU architecture : arm64 or x86_64.
JSON parser : Try standard library JSON parser, and try the fastest JSON parser we could find for the language.
Memory allocation : A series of values between the minimum allowed Lambda function memory allocation (128MB) and the maximum (10GB).

We ran 10 test runs from a cold start for each choice of language, CPU architecture, JSON parser, and memory allocation.

For each Lambda function invocation, we recorded the cold start time and the total time to process 1GB of JSON logs in S3.

The high-level takeaways

The chart above shows the performance results for the fastest performing settings for each language (Python, Java, Go, and Rust) at varying memory allocation levels. The lower the task duration, the faster the performance.

Here are our high-level takeaways:

Python and Java are slow .
- At peak performance, they took 8 – 12 seconds to process 1 GB of logs.
Rust and Go are fast.
- At peak performance, they took only 2 seconds to process 1 GB of logs. That’s 4x faster than Java, and 6x faster than Python.
Increasing memory allocation also increases performance, to a point.
- Performance improved as we allocated more memory until we reached a plateau around 1.5 GB.
Pick your data parsing libraries carefully .
- In Go, the standard library’s JSON parser in encoding/json was 10x slower than the parser in valyala/fastjson . In fact, our Go-based function using the standard library parser was slower than our Python-based function , but our Go function using the fast parser was insanely fast.
You reach maximum S3 read throughput (~90MB/sec) when you allocate 640MB of memory or more.
- Even if your program does not need very much memory, consider allocating 640MB of memory or more to your Lambda function if you need to maximize S3 download throughput. Under the hood, AWS Lambda probably runs your function on a bigger machine with a better network adapter.
There are data formats that are much faster than JSON.
- At Scanner, we use Mozilla’s bincode library in our index file format, giving us a 4x performance boost over SIMD-accelerated JSON parsing.

In the rest of the blog post, we will cover what the code looked like for each language, how performance varied as we tried different CPU architectures and JSON parsing libraries for each language, how cold start times differed between languages, and how memory allocation affected S3 download speed.

Rust code and deployment

We used the excellent cargo-lambda tool to generate a Rust project for our Lambda function and compile the release binary.

Here is the basic structure of the Lambda function:

Execute an S3 GET request for a given bucket and key
Stream the response from S3, decompressing it on the fly with zstd
Parse each line as a JSON object

We tested out two different JSON parsing libraries:

serde_json – A simple, popular JSON parser
simdjson – The fastest JSON parser we could find. Leverages SIMD CPU instructions.

The code below is the version of our Rust Lambda function that used the simdjson parser.

// main.rs use lambda_runtime::{run, service_fn, Error, LambdaEvent}; use rusoto_core::Region; use rusoto_s3::{S3Client, S3}; use serde::{Deserialize, Serialize}; use tokio::io::{AsyncReadExt, AsyncBufReadExt}; #[derive(Deserialize)] struct Request { bucket: String, key: String, #[derive(Serialize)] struct Response { req_id: String, msg: String, async fn handle_request(event: LambdaEvent<request>) -&gt; Result<response error=""> { let started_at = std::time::Instant::now(); let client = S3Client::new(Region::UsWest2); let output = client .get_object(rusoto_s3::GetObjectRequest { bucket: bucket.to_string(), key: key.to_string(), ..Default::default() .await?; let Some(body) = output.body else { return Err(anyhow::anyhow!("No body found in S3 response").into()); let body = body.into_async_read(); let body = tokio::io::BufReader::new(body); let decoder = async_compression::tokio::bufread::ZstdDecoder::new(body); let reader = tokio::io::BufReader::new(decoder); let mut lines = reader.lines(); let mut num_log_events = 0; while let Ok(Some(mut line)) = lines.next_line().await { let _value = unsafe { simd_json::to_borrowed_value(line.as_mut_str().as_bytes_mut())? num_log_events += 1; if num_log_events % 1000 == 0 { println!("num_log_events={}", num_log_events); let msg = format!( "elapsed={:?} num_log_events={}", started_at.elapsed(), num_log_events Ok(Response { req_id: event.context.request_id, #[tokio::main] async fn main() -&gt; Result { tracing_subscriber::fmt() .with_max_level(tracing::Level::INFO) .init(); run(service_fn(handle_request)).await <xmp>aws lambda create-function \ --function-name lambda_langs_test_rust \ --runtime provided.al2 \ --memory-size 640 \ --architectures arm64 \ --zip-file ./bootstrap.zip \ --handler unused \ --timeout 900 \ --role ${LAMBDA_IAM_ROLE}

We used the AWS CLI to invoke the Lambda function. To get information about total invocation durations and cold start times, we retrieved the REPORT log entry that appears at the end of Lambda invocation logs. You can see these logs if you use --log-type Tail in your AWS CLI invocation, and then use jq and base64 to decode the logs.

--function-name lambda_langs_test_python \ --log-type Tail \ --cli-binary-format raw-in-base64-out \ --payload '{"bucket": " ", "key": " "}' \ ./response.json \ | jq -r .LogResult | base64 --decode num_log_events=365000 elapsed=2.050280398s num_log_events=365057 END RequestId: 07f141a7-d7d1-44cc-ba7d-8f7e7757c780 REPORT RequestId: 07f141a7-d7d1-44cc-ba7d-8f7e7757c780 Duration: 2051.89 ms Billed Duration: 2092 ms Memory Size: 1024 MB Max Memory Used: 27 MB Init Duration: 39.47 ms

The chart above shows the average task duration across 10 cold starts for each of 14 memory allocation settings, ranging from 128MB to 10GB. The lower the task duration, the faster the performance.

Each color shows performance for a specific choice of CPU architecture ( x86_64 or arm64 ) and JSON parsing library ( serde_json or simdjson ).

Here are some of the interesting takeaways:

Performance improved as we added more memory to the Lambda function until we reached a plateau around 1.5GB.
We got the fastest runtime when we used x86_64 and simdjson . In general, we have found that SIMD accelerated libraries tend to have better support for x86_64 than for arm64.
Using simdjson parsing gave a 2-3x performance improvement over serde_json . If you need to optimize performance, consider trying SIMD accelerated libraries or other tools that use special highly optimized CPU instructions.
With the standard JSON parser with no fancy SIMD acceleration, we got better performance with arm64. Since arm64 is 20% cheaper than x86_64 per unit of compute time, it could be a smart choice for your use case if you do not need SIMD.

Go code and deployment

The code for the Go version of our Lambda function looks fairly similar to the Rust version. We read, decompress, and parse a stream of JSON objects from S3.

We tested out two different JSON parsing libraries:

encoding/json – The JSON parser from the Go standard library.
valyala/fastjson – The fastest JSON parser we could find for Go.

The code below shows our version that used fastjson .

"github.com/aws/aws-lambda-go/lambda" "github.com/aws/aws-sdk-go-v2/config" "github.com/aws/aws-sdk-go-v2/service/s3" "github.com/klauspost/compress/zstd" "github.com/valyala/fastjson" type Request struct { Bucket string `json:"bucket"` Key string `json:"key"` func HandleRequest(ctx context.Context, request Request) (string, error) { start := time.Now() cfg, err := config.LoadDefaultConfig(context.TODO()) if err != nil { return "", err client := s3.NewFromConfig(cfg) output, err := client.GetObject(context.TODO(), &s3.GetObjectInput{ Bucket: &request.Bucket, Key: &request.Key, if err != nil { return "", err body := bufio.NewReader(output.Body) decoder, err := zstd.NewReader(body) defer decoder.Close() if err != nil { return "", err num_log_events := 0 scanner := bufio.NewScanner(decoder) var parser fastjson.Parser for scanner.Scan() { bytes := scanner.Bytes() _, err := parser.ParseBytes(bytes) if err != nil { return "", err num_log_events += 1 if num_log_events%1000 == 0 { fmt.Printf("num_log_events=%d\n", num_log_events) err = scanner.Err() if err != nil { return "", err elapsed := time.Since(start) outputMsg := fmt.Sprintf("num_log_events=%d elapsed=%v", num_log_events, elapsed) return outputMsg, nil func main() { lambda.Start(HandleRequest)

We deployed the Go version of our Lambda function as a Docker container. Here is the Dockerfile and a push_container.sh script to build the container and push it to the AWS Elastic Container Registry. We are using docker buildx , the extended BuildKit tool set, to build for a specific CPU architecture, namely arm64 .

FROM public.ecr.aws/lambda/provided:al2 as build # install compiler RUN yum install -y golang RUN go env -w GOPROXY=direct # cache dependencies ADD go.mod go.sum ./ RUN go mod download # build ADD . . RUN GOARCH=arm64 go build -o /main # copy artifacts to a clean image FROM public.ecr.aws/lambda/provided:al2 COPY --from=build /main /main ENTRYPOINT [ "/main" ] # push_container.sh docker buildx build --platform linux/arm64/v8 . -t lambda_langs_test_go docker tag lambda_langs_test_go:latest ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_go:latest docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_go:latest --memory-size 640 \ --architectures arm64 \ --code ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_go:latest --timeout 900 \ --role ${LAMBDA_IAM_ROLE}

The chart above shows the average task duration across 10 cold starts for each of 14 memory allocation settings, ranging from 128MB to 10GB. The lower the task duration, the faster the performance.

Each color shows performance for a specific choice of CPU architecture ( x86_64 or arm64 ) and JSON parsing library ( encoding/json or fastjson ).

Here are some of the interesting takeaways:

Pick your data parsing libraries carefully .
- In Go, the standard library’s JSON parser in encoding/json was 10x slower than the parser in valyala/fastjson . In fact, our Go-based function using the standard library parser was slower than our Python-based function , but our Go function using the fast parser was insanely fast.
When using the fastjson library, Go’s performance matched Rust’s. In fact, Go actually beat Rust slightly at very high memory allocation levels – at 3GB of memory allocation and above.
Our Go function generally ran faster with the x86_64 architecture than with arm64, but the performance difference between CPU architectures was insignificant when using the fastjson parser.

Java code and deployment

The Java version of our Lambda function is more verbose than Python version, but it has the same simple structure:

Execute an S3 GET request
Decompress on the fly
JSON parse each line individually

We tried out two different JSON parsing libraries to see if there were performance differences:

org.json – a basic JSON parsing library
jsoniter – the fastest JSON library we could find

The code below shows the version where we used jsoniter .

import com.amazonaws.services.lambda.runtime.Context; import com.amazonaws.services.lambda.runtime.RequestHandler; import com.amazonaws.services.lambda.runtime.LambdaLogger; import com.google.gson.Gson; import com.google.gson.GsonBuilder; import software.amazon.awssdk.services.s3.model.GetObjectRequest; import software.amazon.awssdk.services.s3.S3Client; import com.github.luben.zstd.ZstdInputStream; import com.jsoniter.any.Any; import com.jsoniter.JsonIterator; import java.io.BufferedInputStream; import java.io.BufferedReader; import java.io.InputStream; import java.io.InputStreamReader; import java.io.IOException; import java.util.Map; // Handler value: example.Handler public class Handler implements RequestHandler , String> { Gson gson = new GsonBuilder().setPrettyPrinting().create(); @Override public String handleRequest(Map event, Context context) { LambdaLogger logger = context.getLogger(); String bucket = event.get("bucket"); String key = event.get("key"); System.out.println("Event: " + gson.toJson(event)); S3Client s3Client = S3Client.builder().build(); GetObjectRequest getObjectRequest = GetObjectRequest.builder() .bucket(bucket) .key(key) .build(); InputStream responseBody = s3Client.getObject(getObjectRequest); try { ZstdInputStream decompressStream = new ZstdInputStream( new BufferedInputStream(responseBody) BufferedReader reader = new BufferedReader( new InputStreamReader(new BufferedInputStream(decompressStream)) int count = 0; JsonIterator jsonIterator = new JsonIterator(); String line; while ((line = reader.readLine()) != null) { ++count; Any jsonObject = jsonIterator.deserialize(line); System.out.println("num_log_events=" + count); } catch (IOException ex) { System.err.println("ERROR: " + ex.toString()); return "500 Internal Server Error"; return "200 OK"; dependencies { implementation 'com.amazonaws:aws-lambda-java-core:1.2.1' implementation 'com.google.code.gson:gson:2.8.9' implementation platform('software.amazon.awssdk:bom:2.19.33') implementation 'software.amazon.awssdk:s3' implementation 'com.github.luben:zstd-jni:1.5.2-5' implementation 'org.json:json:20220924' implementation 'com.jsoniter:jsoniter:0.9.9' testImplementation 'org.apache.logging.log4j:log4j-api:[2.17.1,)' testImplementation 'org.apache.logging.log4j:log4j-core:[2.17.1,)' testImplementation 'org.apache.logging.log4j:log4j-slf4j18-impl:[2.17.1,)' testImplementation 'org.junit.jupiter:junit-jupiter-api:5.6.0' testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.6.0' test { useJUnitPlatform() task buildZip(type: Zip) { from compileJava from processResources into('lib') { from configurations.runtimeClasspath java { sourceCompatibility = JavaVersion.VERSION_1_8 targetCompatibility = JavaVersion.VERSION_1_8 build.dependsOn buildZip

We built the zip file with gradle build -i .

We used the AWS console to create our Lambda function and upload our zip code files, but you can use the AWS CLI in the same way described in the “Rust” section of this post above. You may need to change a few flags when you use aws lambda create-function , like these:

--runtime java11
--handler example.Handler::handleRequest

Java performance results

Given that Java probably has the strongest AWS SDK of all languages, it is tempting to use Java in this Lambda function to process JSON logs. However, we were disappointed with the slow performance we saw, so we would not recommend using Java for this use case.

We tried running both JSON parsers we found, org.json and jsoniter , under each of the two CPU architectures.

Here are our takeaways:

arm64 was as fast or faster than x86_64 in all cases. Since arm64 is 20% cheaper to run per unit of compute time, it’s the smarter CPU architecture choice if you are using Java for this use case.
- Running the org.json parser was ~2x faster in arm64 than in x86_64.
- Running the jsoniter parser was basically equally fast in arm64 and x86_64.
Unfortunately, Java was much slower than Rust and Go.
- While Rust and Go can process 1GB of logs in ~2 seconds, Java is far slower at around 8-10 seconds to do the same task.
Performance improved as we increased memory allocation before reaching a plateau around 1.7GB of RAM.

Java cold start improvement with SnapStart

In the world of AWS Lambda functions, Java’s cold start times are notoriously slow. To help address this, AWS recently released a new feature called SnapStart for Java Lambda functions. SnapStart takes a snapshot of a “warmed-up” version of your Java program and restores that snapshot during cold starts.

We saw that cold start times improved dramatically in Java when we used SnapStart. The chart above shows the average cold start times across 10 cold starts for each of 14 memory allocation settings, ranging from 128MB to 10GB.

Normal cold start time: 583ms on average
SnapStart restore time: 104ms on average, a 5.6x improvement.

With SnapStart enabled, Java’s cold start times were better than Python’s but still slower than Go’s and Rust’s. The chart above compares cold start times across all four languages: Python, Java with SnapStart, Go, and Rust. Again, we are showing average cold start times across 10 cold starts for each of 14 memory allocation settings, ranging from 128MB to 10GB. Here is roughly what the cold start times looked like for each language:

Python: ~325ms
Java: ~100ms
Go: ~45ms
Rust: ~30ms

SnapStart is a good option if you need to use Java and want to minimize cold start times. However, the cold start times for Go and Rust are still 2-3x faster, and there are a few important SnapStart limitations you should consider. In particular, SnapStart does not support:

The arm64 CPU architecture
Provisioned concurrency
Using AWS Elastic File System
Using ephemeral storage above 512MB in size

In general, we were disappointed with Java’s performance in Lambda functions for this bursty data-processing use case, so we recommend trying Rust or Go instead if you can.

Python code and deployment

Of the four languages, Python’s Lambda function code is the simplest. It executes an S3 GET request for the given bucket and key fields from the input event, downloads the response body as a stream, and decompresses the data on the fly. We read the data in 10KB chunks, split on newlines, and parse each line as a JSON object. The newline chunk splitting approach is not optimally efficient, but we intend to use the Python code here to get a performance baseline against which we can compare the performance of the other languages.

We chose to deploy the Python code to our Lambda function as a Docker container. Here is the Dockerfile and a push_container.sh script to build the container and push it to the AWS Elastic Container Registry. We are using docker buildx , the extended BuildKit tool set, to build for a specific CPU architecture, namely arm64 .

FROM public.ecr.aws/lambda/python:3.8 # Install the function's dependencies using file requirements.txt # from your project folder. COPY requirements.txt ${LAMBDA_TASK_ROOT} RUN pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}" # Copy function code COPY app.py ${LAMBDA_TASK_ROOT} # Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile) CMD ["app.handle_request"] # push_container.sh docker buildx build --platform linux/arm64/v8 . -t lambda_langs_test_python docker tag lambda_langs_test_python:latest ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_python:latest docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_python:latest aws lambda create-function \ --function-name lambda_langs_test_python \ --memory-size 640 \ --architectures arm64 \ --code ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_python:latest --timeout 900 \ --role ${LAMBDA_IAM_ROLE}

The chart above shows the average task duration across 10 cold starts for each of 14 memory allocation settings, ranging from 128MB to 10GB.

Here are our takeaways:

x86_64 was slightly faster than arm64, but only by 3-4%. Since arm64 is 20% cheaper per unit of compute time, it looks like arm64 is the smarter choice here for Python.
Performance improves as we add more memory to the Lambda function until we reach a plateau around 1.7GB.
Python is quite slow for this use case. At best, it looks like it will take around 12 seconds to process 1GB of JSON logs using Python.

Maxing out S3 read throughput in a Lambda function

At Scanner, we use Lambda functions to scan through S3 data at scale, so we are very interested in the maximum S3 performance we can expect from an individual function invocation.

To test this, we wrote a Rust program to read 1GB of raw data from S3 (no decompression, no parsing), and we measured the S3 throughput.

In the graph above, we show S3 read throughput averaged over 10 runs at various memory allocation levels, 128MB to 10GB with discontinuous jumps.

Here are the interesting takeaways:

Increasing memory allocation also increased S3 read throughput, but an obvious plateau was reached at 640MB of memory allocation.
The maximum S3 read throughput for a Lambda function reading a single object was 85-90MB per second.

It seemed that, as long as our Lambda function used 640MB of memory or more, we got optimal S3 read throughput.

There are data formats that are much faster than JSON

Although using Rust with a SIMD accelerated JSON parser can process a lot of data quickly, we can do even better.

Mozilla maintains a library called bincode , which is used for inter-process communication in Firefox. It is specifically very good at parsing binary data into Rust data structures.

We leverage bincode in Scanner’s index file data format, which gives us 4x performance improvement over SIMD-accelerated JSON parsing.

The chart above shows Lambda function task duration (scanning 2GB of Scanner index file records) using various memory allocation levels.

Scanner index file records are quite a bit more complex than typical JSON log events, so even SIMD-accelerated JSON parsing struggles to be fast. By using bincode in our index file format, we get extremely fast performance. This comes with an important trade-off: the bincode format is language-specific to Rust, which means reading it from other languages is difficult.

If you are interested in learning more about the tradeoffs between the most popular data serialization formats available in the Rust ecosystem, check out this excellent blog post from LogRocket . They cover JSON, bincode , MessagePack, and more, with plenty of data about how performance and usability differences.

Conclusion

If you want to process S3 data at scale using Lambda functions, here are our recommendations:

For optimal performance, try using Rust or Go. They are basically equally fast for this use case – specifically, 4x faster than Java and 6x faster than Python.
You may get better data parsing performance with x86_64 than with arm64 , especially if your parsing library leverages specialized SIMD instructions.
Data parsing could be the most important bottleneck that you will need to optimize.
If you are parsing JSON files:
- With Go, use fastjson instead of the standard library’s encoding/json , to get 10x better performance.
- With Rust, use simdjson instead of serde_json to get 3x better performance.
Allocate at least 640MB of memory to your Lambda function to maximize your S3 read throughput, which is roughly 90 MB / sec.
If you need every to squeeze every possible drop of performance you can out of your data processing system, consider exploring one of the more esoteric data formats, like bincode .
- There are quite a few formats that parse faster than JSON.
- Beware the tradeoffs. For example, bincode is very fast, but it is Rust-specific and not portable between languages.

If you process massive amounts of data using Lambda functions, and you feel like there are important things we’ve missed, we would love to hear from you. Reach out to me on Twitter and tell me about the cool things you’re building.

Also, if you would like to add fast full-text search to your S3 data lake, sign up to try out the Scanner beta and let us know what you think.

Scanner is a next-gen search engine and API for logs in object storage. Scanner’s cloud indexing uses S3 storage, serverless compute, and a novel indexing file format to reduce log management costs up 90% compared to traditional tools like Datadog and Splunk – all while keeping search, time series, and threat detections fast.

Scanner can be deployed into your own AWS account eliminating the need to ship logs over the public internet creating a zero-cost data transfer and no vendor lock-in.

Scanner uses its own proprietary search index system that runs on top of S3, and there’s no need to build a pipeline to transform the logs first to fit a predefined SQL table schema.

Cliff is the CEO and co-founder of Scanner.dev, a next-gen search engine and API for logs in object storage. Prior to founding Scanner, he was a Principal Engineer at Cisco where he led the backend infrastructure team for the Webex People Graph. He was also the engineering lead for the data platform team at Accompany before its acquisition by Cisco. He has a love-hate relationship with Rust, but it’s mostly love these days.