Blog Home

Dev Tips: Simulating Service Errors

Network services need to be robust as part of being reliable. One key aspect of robustness is being tolerant of issues in other services that are a service's dependencies. For example, a service should have behaviors such as:

  • continue running when a backing service goes down
  • operate in a partially degraded mode if possible
  • properly report errors during the outage, with network responses and log messages
  • resume normal operation automatically when the backing service recovers
  • deal with a slow or unresponsive backing service smoothly and with appropriate timeouts

Sometimes this robustness comes automatically through a library like a database connection pool, which is great. In that case the application code just needs to understand its failure states and any exceptions it may throw and represent those properly in the service responses. However, sometimes when dealing with backing services we may have to handle this in code, possibly via a library we create.

When doing this, simulating and testing the myriad types of failure conditions can be tough. For unit testing, most libraries and languages will enable mocking errors and failures to some degree, but when it comes time to integration test, it can be hard to force a simulated error state. Very few services have easy feature flags or configurations to force them into error simulation mode.

We can do a few very basic simulations though with a powerful network Swiss Army knife called socat. Below we'll show a few basic cases in a docker environment.

Case: Backing Service Fully Down

This one is easy since every service can be fully stopped, so it amounts to the following:

cd my-backing-service
docker-compose stop
cd ../my-main-service
docker-compose up

Case: Backing Service Unresponsive

Here we can simulate a non-responsive service using the alpine/socat docker image:

docker run --rm --interactive --tty \
  --network reaction.localhost \
  --name my-backing-service \
  alpine/socat readline TCP-LISTEN:6000,crlf

Next in a separate terminal, you can simulate your service connecting to that over HTTP.

docker run --rm --interactive --tty \
  --network reaction.localhost \
  busybox \
  wget my-backing-service.reaction.localhost:6000

Because the backing service is unresponsive, you should now see your wget request is just stalled waiting for a response. You can use this technique to make sure your main service code handles this OK in terms of not crashing, timing out within a good not-too-short & not-too-long time, etc.

Case: Backing Service Error Response

We should also ensure we are robust against misbehaving backing services. In this example, we'll simulate an HTTP-level error. Start up the mock backing service as before:

docker run --rm --interactive --tty \
  --network reaction.localhost \
  --name my-backing-service \
  alpine/socat readline TCP-LISTEN:6000,crlf

Then in a separate terminal, start your client again as before:

docker run --rm --interactive --tty \
  --network reaction.localhost \
  busybox \
  wget my-backing-service.reaction.localhost:6000

Now back in the socat terminal, type the following:

HTTP/1.0 555 Oops!

As needed you can pre-craft malformed responses, particular error responses, etc.

socat reference

socat is a sophisticated low-level tool so there's a pretty steep learning curve to understanding everything you can do with it. The examples provided here should be enough to get you started, but check out the socat manual if you want to dig into more features.

comments powered by Disqus