aws-sdk-java-v2: First DynamoDB request is very slow
Expected Behavior
All DynamoDB queries should take in the low 10s of milliseconds.
Current Behavior
DynamoDB queries are generally in the 10s of milliseconds however the first time a Dynamo function is executed I am seeing response times around 2s, subsequent queries then fall back to the expected levels.
Possible Solution
Unsure, I haven’t been able to identify anything from stepping through the code
Steps to Reproduce (for bugs)
I have written the following code to replicate the creation of the DynamoDbClient
and the execution of four put requests. It assumes you have a table named “item” in your AWS account.
The results are below.
import java.util.UUID;
import org.junit.jupiter.api.Test;
import com.google.common.collect.ImmutableMap;
import com.ixxus.cnc.utils.SimpleTimer;
import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider;
import software.amazon.awssdk.http.SdkHttpClient;
import software.amazon.awssdk.http.urlconnection.UrlConnectionHttpClient;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
import software.amazon.awssdk.services.dynamodb.DynamoDbClientBuilder;
import software.amazon.awssdk.services.dynamodb.model.AttributeValue;
import software.amazon.awssdk.services.dynamodb.model.PutItemRequest;
class DynamoClientCT {
private static final SimpleTimer TIMER = new SimpleTimer();
private static final String CONTEXT_ALL = "CONTEXT_ALL";
private static final String CONTEXT_HTTP_CLIENT_BUILD = "CONTEXT_HTTP_CLIENT_BUILD";
private static final String CONTEXT_CREDENTIALS_PROVIDER_BUILD = "CONTEXT_CREDENTIALS_PROVIDER_BUILD";
private static final String CONTEXT_DYNAMO_CLIENT_BUILD = "CONTEXT_DYNAMO_CLIENT_BUILD";
private static final String CONTEXT_DYNAMO_CLIENT_EXECUTE = "CONTEXT_DYNAMO_CLIENT_EXECUTE_";
@Test
void test() {
TIMER.start(CONTEXT_ALL);
createClientAndExecuteRequest();
TIMER.stop(CONTEXT_ALL);
TIMER.logAllSummaries("DynamoClientCT");
}
private void createClientAndExecuteRequest() {
TIMER.start(CONTEXT_HTTP_CLIENT_BUILD);
final SdkHttpClient httpClient = UrlConnectionHttpClient.builder().build();
TIMER.stop(CONTEXT_HTTP_CLIENT_BUILD);
TIMER.start(CONTEXT_CREDENTIALS_PROVIDER_BUILD);
final DefaultCredentialsProvider credentialsProvider = DefaultCredentialsProvider.create();
TIMER.stop(CONTEXT_CREDENTIALS_PROVIDER_BUILD);
TIMER.start(CONTEXT_DYNAMO_CLIENT_BUILD);
final DynamoDbClientBuilder builder = DynamoDbClient.builder();
builder.httpClient(httpClient);
builder.region(Region.EU_WEST_1);
builder.credentialsProvider(credentialsProvider);
final DynamoDbClient client = builder.build();
TIMER.stop(CONTEXT_DYNAMO_CLIENT_BUILD);
putItem(client, 1);
putItem(client, 2);
putItem(client, 3);
putItem(client, 4);
}
private void putItem(final DynamoDbClient client, final int count) {
TIMER.start(CONTEXT_DYNAMO_CLIENT_EXECUTE + count);
final PutItemRequest request = PutItemRequest.builder()
.tableName("item")
.item(ImmutableMap.of("id", AttributeValue.builder().s(UUID.randomUUID().toString()).build()))
.build();
client.putItem(request);
TIMER.stop(CONTEXT_DYNAMO_CLIENT_EXECUTE + count);
}
}
The following is the code for the SimpleTimer
:
import java.util.ArrayList;
import java.util.DoubleSummaryStatistics;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class SimpleTimer {
private static final Logger LOGGER = LoggerFactory.getLogger(SimpleTimer.class);
Map<String, List<Long>> times = new TreeMap<String, List<Long>>();
Map<String, Long> currentTime = new HashMap<>();
public void start(final String context) {
currentTime.put(context, System.currentTimeMillis());
}
public void stop(final String context) {
if (!currentTime.containsKey(context)) {
throw new IllegalStateException("Start has not been called for context: " + context);
}
times.computeIfAbsent(context, s -> new ArrayList<Long>())
.add(System.currentTimeMillis() - currentTime.get(context));
}
public DoubleSummaryStatistics getSummaryStatistics(final String context) {
return times.getOrDefault(context, new ArrayList<Long>())
.stream()
.mapToDouble(a -> a)
.summaryStatistics();
}
public void logAllSummaries(final String prefix) {
LOGGER.debug("logAllSummaries: {}");
times.keySet().forEach(ctx -> LOGGER.info("{} - {} - {}", prefix, ctx, getSummaryStatistics(ctx)));
}
}
Results:
2019-07-11 15:14:17.746 INFO SimpleTimer:49 - DynamoClientCT - CONTEXT_ALL - DoubleSummaryStatistics{count=1, sum=1633.000000, min=1633.000000, average=1633.000000, max=1633.000000}
2019-07-11 15:14:17.747 INFO SimpleTimer:49 - DynamoClientCT - CONTEXT_CREDENTIALS_PROVIDER_BUILD - DoubleSummaryStatistics{count=1, sum=4.000000, min=4.000000, average=4.000000, max=4.000000}
2019-07-11 15:14:17.748 INFO SimpleTimer:49 - DynamoClientCT - CONTEXT_DYNAMO_CLIENT_BUILD - DoubleSummaryStatistics{count=1, sum=594.000000, min=594.000000, average=594.000000, max=594.000000}
2019-07-11 15:14:17.748 INFO SimpleTimer:49 - DynamoClientCT - CONTEXT_DYNAMO_CLIENT_EXECUTE_1 - DoubleSummaryStatistics{count=1, sum=884.000000, min=884.000000, average=884.000000, max=884.000000}
2019-07-11 15:14:17.748 INFO SimpleTimer:49 - DynamoClientCT - CONTEXT_DYNAMO_CLIENT_EXECUTE_2 - DoubleSummaryStatistics{count=1, sum=29.000000, min=29.000000, average=29.000000, max=29.000000}
2019-07-11 15:14:17.749 INFO SimpleTimer:49 - DynamoClientCT - CONTEXT_DYNAMO_CLIENT_EXECUTE_3 - DoubleSummaryStatistics{count=1, sum=70.000000, min=70.000000, average=70.000000, max=70.000000}
2019-07-11 15:14:17.749 INFO SimpleTimer:49 - DynamoClientCT - CONTEXT_DYNAMO_CLIENT_EXECUTE_4 - DoubleSummaryStatistics{count=1, sum=29.000000, min=29.000000, average=29.000000, max=29.000000}
2019-07-11 15:14:17.750 INFO SimpleTimer:49 - DynamoClientCT - CONTEXT_HTTP_CLIENT_BUILD - DoubleSummaryStatistics{count=1, sum=23.000000, min=23.000000, average=23.000000, max=23.000000}
As you can see the first put takes 884ms then the three subsequent ones take 29, 70, and 29ms making the first request at least an order of magnitude slower. Add in the 600ms for the DynamoDbClient
creation and that’s quite a long time.
Context
This code is used in a Lambda function serving a REST API which is one of the canonical use cases for Lambdas. The expectation is that the function should be quick to respond and as shown by the subsequent requests, this is a reasonable expectation. The results above are actually far better than we are seeing when deployed to Lambda - our first Dynamo execution is consistently around 2s +/-0.2s. We are not using a VPC and the Lambdas have 1024MB memory.
Your Environment
The above code was run on my machine to get the provided results but we are deploying virtually identical code to an AWS Lambda with 1024MB memory.
- AWS Java SDK version used: 2.5.48
- JDK version used: 1.8
- Operating System and version: N/A - using AWS Lambda
I have some questions:
- Are there other settings or configuration we could be providing to reduce the time to build the client and execute the first request?
- What causes the first request to be so much slower than subsequent requests?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 8
- Comments: 32 (12 by maintainers)
Hi @ocind. I’m quite surprised more people haven’t encountered the problem, it fundamentally undermines one of the main reference architectures for AWS Lambda IMO.
The only way we have found to reduce the time is by warming the Lambdas. However the normal Lambda warming (for example, using, serverless-plugin-warmup) isn’t enough; you need to make a Dynamo request so that the connection is established to do the SSL negotiation, setup connection pooling, etc. We use
DynamoDbClient#describeEndpoints
for this as it is a harmless operation. Note that you’ll need to do something similar for this for every AWS service you use from Lambda. For example we also use Cognito and callCognitoIdentityProviderClient#listUserPools
as part of our warming process.I also have encountered this issue - I tried a few things but found that increasing the memory size was the only thing that seemed to make an impact. My findings:
We are using the
DynamoDBMapper
, but I tried switching over to usingQuerySpec
instead and manually mapping the result to a POJO and it didn’t make a noticeable performance difference. I’d love to know if anyone has found a more consistent solution, but this at least helped make the times more reasonable (we were at 256 MB to start and it was unusable on cold starts).hi @stuartleylandcole , thank you so much for a very detailed explanation and benchmark. I am actually having the same issue, and your investigation results above is really helpful. I would like to know if you have found some other way to shave off some more time in regards to the first request to DynamoDb. Have you gained any updates from Support or other forums?
I think it is unacceptable that a 1024MB lambda takes 2s to execute first DynamoDb request. This is actually worse than AWS Lambda cold start itself 😦
@stuartleylandcole Sure! Below are the patterns we use in the archetype to reduce SDK startup time:
UrlConnectionHttpClient
to reduce jar sizecredentialsProvider
to beEnvironmentVariableCredentialsProvider
netty-nio-client
andapache-client
dependencies from service client dependency (they are included by default) to reduce jar size.https://github.com/aws/aws-sdk-java-v2/blob/c72d2b554de92258b9cfced4f8f54f3ba714cf92/archetypes/archetype-lambda/src/test/resources/projects/urlhttpclient/reference/pom.xml#L37-L46
Update: please check out https://aws.amazon.com/blogs/developer/tuning-the-aws-java-sdk-2-x-to-reduce-startup-time/ for best practices to reduce SDK startup time
Thanks for replying @sl-omideidivandi. i have set tcpKeepAlive as true. But still same behaviour. i mean after 2-3 idle time if request is made, first the connection is shutting down then reestablishing the connection. Well this problem exist, but it is ok since reestablishing the connection is taking some 20-30 ms.
Now i noticed some different issue. i am firing request to dynamodb after every 5 mins of idle time. And i found 20% of request randomly goes more than 1s. which totally unacceptable. BTW i fired around 100 request.
To avoid this above issue i am warming up the DB by making dummy DB call every 1 min. But this solution seems not full proof. i mean now the 20% has decrease to 1% but still this issue exist i am afraid this issue is not related to property changes. Seems like some other issue.
I tried to investigate more by printing SDK log. and found logs similar to @stuartleylandcole. only that after DynamoDBEnhancedClient call it is going to ExecutionInterceptorChain -Creating an interceptor chain that will apply interceptors. i found whenever it is taking more than 1s it is taking in above two steps. Not sure if it is some sdk issue.
should i be creating a bug for this.
BTW i am doing all this from ECS. Lambda is not involve in my case.
Hi @biletnikov, from your previous comment, it seems you are using DynamoDBMapper from AWS Java SDK 1.11.x. We are aware of the slow startup issue in 1.11.x, and unfortunately there is not much we can do to improve it while still maintaining backwards compatibility.
We are actively working on the implementation of Dynamodb mapper 2.x, and it’s currently in preview. https://github.com/aws/aws-sdk-java-v2/tree/master/services-custom/dynamodb-enhanced
Hi @biletnikov, we don’t have a public timeline at the moment, but DynamoDbMapper v2 is definitely one of our top priorities right now. The work is tracked in #35
Hi @Nxtra Sure! One of the main reasons is that 1.11.x SDK uses ApacheHttpClient under the hood and initializing it can be expensive. Checkout this comment https://github.com/aws/aws-sdk-java/issues/1774#issuecomment-426431217 for more info. Let me know if you have further questions!
Hi all, we have released a maven archetype
archetype-lambda
that helps customers bootstrap a lambda function faster with SDK recommended practices to achieve minimal SDK startup time.See https://github.com/aws/aws-sdk-java-v2/blob/master/archetypes/archetype-lambda/README.md#maven-archetype-for-lambda-function-using-aws-sdk-for-java-2x for more info
Please try it out and let us know if you have any feedback! 😃
hi @stuartleylandcole I think my use case is very similar to yours. We also use Dynamo and Cognito. On lambda warmup calls, currently we also initialize all the sdk clients and try to make a dummy request for each one.
In one of our Lambda functions, we actually have 2 Dynamo tables to read values from. I did some experiment couple months ago, and we found that while writing to different tables, first request to each table would involve setting up SSL handshake again. Currently we warm the connections by performing
describe-table
request for each table.Hmmm, I didn’t know that
describeEndpoints
works in this case, I will check it out.Thank you for the insights
Hi @debora-ito @dagnir. Thank you for your reply and apologies for not responding sooner.
I think my question has been answered, at least in relation to the SDK - I have others concerns about the viability of Lambdas for a REST API backed by DynamoDB given the timings shown above but I don’t think this is the place to raise them. Do you happen to know the correct place to raise this?
Many thanks for your help! I’m happy for this issue to be closed now.
Hi @dagnir, thank you for your replies! I’ve been doing some further investigation and wanted to report back my findings and ask for some further assistance.
With a 1024MB Lambda the first DynamoDB request reliably takes 2s from executing the request to receiving a response. Note that this does not include time to build
DynamoDbClient
nor any of its dependencies (SdkHttpClient
, etc). BuildingDynamoDbClient
itself takes 400ms. Here is a log showing the client build and request. Note that I have used theApacheHttpClient
instead ofUrlConnectionClient
here so I could enable trace logging - using this client has not affected the times taken to make the DynamoDB request, they are inline with my original post, although I concede it probably takes longer to initialise so I will revert back when deploying for real.I have also run the same code on a 2048MB Lambda to compare times and they are roughly half the time for the 1024MB Lambda, except for building
DynamoDbClient
- that still takes 400ms. Here is the log:A quick summary (all times in ms):
DynamoDbClient
buildEven with the timings on the 2048MB sized Lambda this makes it quite difficult to run a REST API using Lambdas as a cold start will incur both the normal Lambda cold start time (1-2s) plus the DynamoDB overhead (1s). Equally these Lambdas do not require anything like 2GB of memory, they are using less than 200MB but I realise that CPU scales with memory which is why I performed this test - ideally I would be using 512MB or less.
Some questions to finish with:
DynamoDbClient
creation that takes 400ms? Why would this be so consistent across the two different Lambdas when other timings are roughly proportionalDynamoDbClient#getItem
:Thank you for any help you can provide. Hopefully we are able to reduce these times as at the moment I’m not sure that Lambdas providing a REST API in front of DynamoDB is a viable architecture.