·5 min read

Parallelism and Rate Limit for Workflow And QStash

Sancar KoyunluSancar KoyunluSenior Software Engineer @Upstash

Parallelism and Rate Limits for Workflow and QStash

We are excited to announce the release of Flow-Control, a new feature that lets you set Rate and Parallelism limits for QStash Publish and Workflow.

This blog is divided into four sections for easier reading:

Motivation

Our community drives our development. We’ve heard your feedback on two key points:

  • Workflow: Our Workflow product is gaining traction. However, it launched without built-in rate or parallelism limits. The existing parallelism control via Queues was not suitable for Workflow.
  • Queue Limitations: Many users relied on Queue Parallelism to prevent bursts on their endpoints. However, queues had per-plan limits due to memory/CPU allocation constraints. The new design removes these restrictions, allowing you to configure limits based on your application’s needs without worrying about plan limits.

To solve these issues, we developed Flow-Control, which works for both Workflow and QStash.

How It Works

  • Rate Limit: This defines the maximum number of calls per second. Calls within the same FlowControl key contribute to the rate count. Instead of rejecting excess calls, QStash delays them for execution in later seconds, respecting the limit.
  • Parallelism Limit: This controls the number of concurrent executions. Unlike rate limiting, execution duration matters. At no time will there be more than the specified number of active calls. If the limit is reached, other publishes will wait until a slot is available and, once one finishes, they will continue.
  • Using Rate and Parallelism Together: Both parameters can be combined. For example, with a rate of 10 per second and parallelism of 20, if each request takes a minute to complete, QStash will trigger 10 calls in the first second and another 10 in the next. Since none of them will have finished, the system will wait until one completes before triggering another.

How to Use It

QStash

Here’s an example of using Flow-Control with QStash:

const client = new Client({ token: "<QStash_TOKEN>" });
 
await client.publishJSON({
    url: "https://my-api...",
    body: { hello: "world" },
    flowControl: { key: "app1", parallelism: 3, rate: 10 },
});

For more details and other SDKs, see the documentation here.

Workflow

There are two main use cases for Flow-Control in Workflow:

  1. Limiting the Workflow Environment: Controlling the execution environment.
  2. Limiting External API Calls: Preventing excessive requests to external services.

Limiting the Workflow Environment

To limit the execution environment, you need to configure both the serve and trigger methods. When configured, all the steps of the workflow will respect the limits. Due to the nature of the Workflow SDK, QStash calls the serve method multiple times. This means that to stay within the limits of the deployed environments, the given rate will be applied to all calls going from QStash servers to the serve method.

Note that if there are multiple Workflows running in the same environment, their steps can interleave, but overall rate and parallelism limits will be respected if they share the same flowControl key.

  • In the serve method:
export const { POST } = serve<string>(
    async (context) => {
        await context.run("step-1", async () => {
            return someWork();
        });
    },
    {
        flowControl: { key: "app1", parallelism: 3, rate: 10 }
    }
);

For more details, see the documentation on Rate and Parallelism.

  • In the trigger method:
import { Client } from "@upstash/workflow";
 
const client = new Client({ token: "<QStash_TOKEN>" });
const { workflowRunId } = await client.trigger({
    url: "https://workflow-endpoint.com",
    body: "hello there!",
    flowControl: { key: "app1", parallelism: 3, rate: 10 }
});

For more details, see the documentation here.

Limiting External API Calls

To limit requests to an external API, use context.call:

import { serve } from "@upstash/workflow/nextjs";
 
export const { POST } = serve<{ topic: string }>(async (context) => {
  const request = context.requestPayload;
 
  const response = await context.call(
    "generate-long-essay", 
    {
      url: "https://api.openai.com/v1/chat/completions",
      method: "POST",
      body: {/*****/},
      flowControl: { key: "app1", parallelism: 3, rate: 10 }
    }
  );
});

For more details, see the documentation here.

What Happened to Queues?

Previously, parallelism control were managed through Queues. However, this approach was hard to manage for our users. The main issue was that a single failed message could block the queue.
The reason is that QStash Queues were initially designed for FIFO use cases with single parallelism. This means that until a message has either been successfully processed or exhausted all its configured retries, the queue does not proceed to the next message. While this design remains useful for strict FIFO requirements, it does not work well for rate-limiting and concurrency use cases.

With Flow-Control, rate and parallelism limits are applied without blocking new publishes due to failures.

Queues are still available for FIFO (first-in, first-out) processing. If strict message ordering is required, queues with parallelism set to 1 should be used.
We plan to phase out parallelism option of the Queues, once all users have migrated to Flow-Control.
If you are using queue parallelism greater than 1, we recommend switching to the new feature.

Conclusion

We hope this feature improves your experience! If you have any feedback or suggestions, join us on Discord and let us know. We’re here to build what you need.