2 min read

Understand your Lambda Event Retries

Understanding the differences between the various event types and their retries is important, because it will change how you design and code your application. If you're developing a serverless application in Lambda, you will have to deal with retries. They are a necessary evil of a distributed system, which all serverless applications are.

Cost

Lambda retries literally cost you money. In a failure scenario, it's not uncommon to hit your function timeout limit, maximising the extra cost to you. In most applications it's not going to break the bank (due to Lambda's generous Free Tier), but retries clog your logs with extra noise (i.e. log lines that add no value), make troubleshooting harder, and take longer. The worst-case scenario is that the retries might impact a downstream system(s), changing data and/or pushing extra load on them.

By understanding the ins-and-outs of Lambda's retry behaviour, you'll be much less surprised when it counts most - running and troubleshooting your serverless applications in production.

Events

This page in the docs is technically helpful, but also a bit dry.
Here's what you need to know:

Synchronous Push

Synchronous push events

Retries: No

Examples: Alexa, Invoke API

Asynchronous Push

Asynchronous push events

Retries: Two (i.e. runs three times). If you have DLQ configured (see below), then off it goes.

Examples: S3, Invoke API

Stream (Pull)

Streaming events

Pull (via polling, every 250ms), then batch synchronous invoke (aka. push).

Retries: Until event expires, but doesn't progress past failing events.

Examples: DynamoDB, Kinesis.

Dead Letter Queue

The dead letter queue lets you redirect failed events to a SQS queue or SNS topic, so that you can specifically catch messages that are causing retries. Events are sent after the two retries fail, and it is not used for synchronous invocations.

It's a good idea to have these setup on your functions, especially in production.

Idempotent Actions

As shown above, there are a variety of ways for Lambda to retry processing your events. This makes it worthwhile to spend the time and effort making your processing idempotent.

Idempotent
Describing an action which, when performed multiple times, has no further effect on its subject after the first time it is performed.

Making your actions idempotent will likely take more work than not doing it, but you will be grateful you took the time when something goes wrong.

A bunch of the content for this came from the session by Cecilia Deng's at re:Invent 2016 (slides), which I was fortunate enough to see.