Anyway to skip a failed job and continue next flows?

In our case, when a job failed, we would fix the problem offline. After fix the job, we expect the flows can continue ignoring the failed job. The resolveIncident API will retry the failed job, which is not expected in our case. Anyway to skip a failed job and continue next flows?

The first idea that occurs to me is to put a facade on your workers.

In that facade you check for a flag in the payload, like __skip__. If that is set true, then the worker completes the job immediately with { __skip__: false }, otherwise execution passes to the worker handler.

This way, when you resolve the incident, you set the payload variables with this flag, and that will skip the failed task. Because the worker sets the flag to false, further workers will ignore it.

If you implement this as a facade for your worker implementations, then you can be sure that every worker that is implemented has it baked in.