Webhooks are great, until your destination server crashes or the network connection drops. Webhooks operate close to the “fire and forget” principle by nature, but in critical business processes (like receiving payments), “forgetting” is not an option.

In this article, we will examine how you can make your webhook architecture more resilient and how to reduce data loss to 0%.

1. Timeout Management

The service sending the webhook does not wait for a response forever. There is usually a timeout period of 5-10 seconds. If your endpoint does not respond within this time, the request is considered failed.

Solution: Never perform long-running operations (creating PDFs, sending emails, database reporting) on the endpoint that receives the webhook. The only thing you need to do is:

Receive the request.
Put it in the queue (Queue).
Return 200 OK to the sender.
Perform the operation in the background (Background Worker).

2. Retry Mechanism

Every system can crash. Your server might be under maintenance or there might be an instant network error. In these cases, the webhook should not be lost.

Exponential Backoff: Instead of trying a failed request again immediately, the healthiest method is to try by increasing the waiting time.

1st Attempt: Error
2nd Attempt: After 5 seconds
3rd Attempt: After 30 seconds
4th Attempt: After 5 minutes

Definitely review the retry policy of your webhook provider (e.g., Stripe). Also, definitely set up a retry strategy in your own webhook submissions.

3. Idempotency

The retry mechanism is great but has a dangerous side effect: The same request coming more than once. If the 200 OK response does not reach the sender due to a network error, the sender sends the same webhook again. If your code is not prepared for this, you might charge the customer twice or create duplicate records in the database.

Solution: There is a unique ID (Event ID) in every webhook request. Keep this ID in your database or Redis.

if redis.exists(event_id) {
    return 200 OK; // Already processed, don't do it again.
}
process_payment();
redis.save(event_id);

4. Security and Verification (HMAC)

Since your endpoint is public, malicious people can send fake webhook requests. To prevent this, it is mandatory to verify the HMAC (Hash-based Message Authentication Code) signature. The sender hashes the payload with a secret key and sends it in the header. You do the same operation and check if it matches.

5. Dead Letter Queue (DLQ)

What happens if all retry attempts fail? Instead of deleting the data, move it to a separate area called “Dead Letter Queue”. You can manually examine the erroneous records here later and process them again after resolving the problem.

Summary

Building a reliable webhook architecture requires queue management, retry strategies, and security measures.

If you don’t want to build all this infrastructure (Queue, Retry, DLQ, Logging) from scratch, you can use a Webhook Gateway like WebhookIO. WebhookIO receives all incoming requests for you, queues them, and securely delivers them to you when your endpoint is ready.

Critical Measures to Avoid Losing Webhook Requests