serverless: MSK support as an event to trigger Lambdas

Lambda now supports Amazon MSK as an event source, so it can consume messages and integrate with downstream serverless workflows. It will be great to have a Kafka event to trigger the functions from serverless.

https://aws.amazon.com/es/blogs/compute/using-amazon-msk-as-an-event-source-for-aws-lambda/

Proposed solution

(Updated on the go, to reflect final agreement):

Event name msk, with support for following properties:

  • batchSize: optional, maps to BatchSize
  • arn: required, maps to EventSourceArn
  • enabled: optional, maps to Enabled
  • startingPosition: optional (but required in AWS), maps to StartingPosition with default set to TRIM_HORIZON
  • topic: required (technically optional in AWS, but it’s due to implied support for other stream sources), maps to Topics[0]

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 28 (16 by maintainers)

Most upvoted comments

From my point of view, ARN and Topic should be mandatory. We could define some default values for the other two properties. What do you think @pgrzesik, @medikoo?

Thanks for sharing your finding @pedrocava 🙇 Looks like something added a bit later than msk support on Framework side was introduced as I remember it wasn’t avaialble at first. Would you like to open a separate issue that proposes to add this feature?

@pgrzesik go ahead, for now, I am content with being part of the review of the pull request to learn a bit. If you need help with something I am open to help, just tell me.

Do you see it problematic to use it as default for msk ?

I don’t - I believe we should stick to the current default. I think it’s application-specific and if current default fared well for 4 years now it sounds like the perfect choice 👍

@safv12 do you plan to implement support for msk based on the above implementation proposal? If not, I’d be happy to give it a try over the weekend

Great thanks @safv12 and @pgrzesik for valuable insight!

One note on the topic parameter. In CloudFormation it’s implemented as Topics list with a maximum number of 1 topic

I’d say let’s follow with what we did in EFS, so stick to singular notation (unless we have a clear hint from AWS that it’s likely to change soon).

Summarizing above (and after going through AWS docs) I think in context of MSK we may have following options. (Please point if you feel I got something wrong, or something can be improved):

  • batchSize: optional, maps to BatchSize
  • bisectBatchOnFunctionError: optional, maps to BisectBatchOnFunctionError
  • onFailureDestination: optional, maps to DestinationConfig.OnFailure
  • arn: required, maps to EventSourceArn
  • maximumBatchingWindow: optional, maps to MaximumBatchingWindowInSeconds
  • maximumRecordAge: optional, maps to MaximumRecordAgeInSeconds
  • maximumRetryAttempts: optional, maps to MaximumRetryAttempts
  • parallelizationFactor: optional, maps to ParallelizationFactor
  • startingPosition: optional (but required in AWS), maps to StartingPosition and we should map to LATEST as default
  • topic: required (technically optional in AWS, but it’s due to implied support for other stream sources), maps to Topics[0]

Having that outlined, I wonder weather before bringing support for MSK, shouldn’t we refactor currently supported streaming events (as backed by AWS::Lambda::EventSourceMapping).

Current situation is that we have stream event that backs Kinesis and DynamoDB streams and sqs event for SQS streams.

Additionally in them I see following issues and in-consequences:

  • In stream event (dynamodb, kinesis):
    • We set defaultbatchSize to 10 while AWS default is 100
    • We provide no suport for 0 value at batchWindow (maps to MaximumBatchingWindowInSeconds), where it’s a supported value by AWS
    • Naming is confusing (e.g. batchWindow and maximumRecordAgeInSeconds doesn’t seem to follow same convention)
  • sqs stream misses support for bisectBatchOnFunctionError, maximumRetryAttempts, maximumRecordAgeInSeconds, batchWindow and destinations properties

Implementation proposal

1. Introduce dynamodb and kinesis events, and deprecate stream event

By doing that, we can we fix listed above stream event issues, and follow better naming (as I proposed outlining the possible options for MSK). Separating both will also I think bring better design. It’ll be upfront clear to what stream is attached, and internally we no longer have to deduct the type from ARN (if type property was not provided)

Both events could use one generic, secluded AWS::Lambda::EventSourceMapping resource generator, which then we can use for sqs (and with that ensuring support for currently not supported properties) and then msk.

We should also introduce a schema config to fully cover dynamodb, kinesis and deprecated stream event.

Documentation should also be upgraded

2. Upgrade sqs event implementation

So it relies on generic AWS::Lambda::EventSourceMapping resource generator (introduced with previous step). This will solve an issue of missing support for some properties.

Additionally we should introduce config schema for sqs event, and update documentation so it covers new properties

3. Introduce support for msk event

Relying on generic AWS::Lambda::EventSourceMapping resource generator, and with introduction of config schema and documentation for it


I think to avoid making things to complex best would be to cover those 3 steps with 3 distinct following after each other PR’s.

What do you think?

@pgrzesik great thanks for sharing that.

msk can be replaced by kafka in my opinion as well.

I fully agree, seems as right to me

Given that, we could follow a convention like this

Which properties do you think should be mandatory for msk event type and which optional ?