Skip to main content

High-availability Namespaces

Temporal Cloud's replicated Namespaces provide disaster-tolerant deployment for workloads where availability is critical to your operations. When you enable high availability, Temporal Cloud automatically synchronizes your data between a primary and a fallback Namespace, keeping them in sync. Should an incident occur, Temporal will failover your Namespace. This allows your Workflow Executions and Schedules to seamlessly shift from the active availability zone to the synchronized replica in the fallback availability zone.

Advantages of using Temporal Cloud’s High Availability features:

  • No manual deployment or configuration needed, just simple push-button operations.
  • Existing Workflows resume seamlessly in the replica with minimal interruption and data loss.
  • No changes needed for Worker and Workflow code during setup or failover.
  • 99.99% contractual SLA.

High availability options

Temporal currently offers the following high availability features, which you configure at a Namespace level:

  • Replication: Workflows are seamlessly replicated to a different isolation domain within the same region as the Namespace, such as "us-east-1". Choose this option for applications architected for a single-region. You will failover within the same region to a separate isolation domain.
  • Multi-region replication: Workflows are seamlessly replicated to a different region that you choose. Choose this option when your business requires multi-regional availability and the higher-level of resilience that separated locations offers. You will failover from one region to a separate region.
note

Please note that replication charges apply when enabling high availability features. For pricing details, visit Temporal Cloud's Pricing page.

Replication and replicas

High Availability features in Temporal Cloud simplify deployment, ensuring operational continuity and data integrity even during unexpected events impacting Namespace operations. It uses a process called replication. Replication asynchronously replicates Workflow Executions from an active Namespace to its replica, which is physically located in another isolation domain within the same region or another region in the same continent. In the event of incidents in the active Namespaces, your replica is ready to take over. Temporal Cloud smoothly transitions control from the active to the replica via a "failover".

Isolation domains and replicas

An isolation domains is a physically isolated data center within a deployment region for a given cloud provider. Regions consist of multiple isolation domains, providing redundancy and fault tolerance. In some cases, the fallback domain may be in the same region as the primary, or it may be in a different region altogether, depending on your deployment configuration.

High availability simplifies deployment, ensuring operational continuity and data integrity even during unexpected events. Incidents that affect the data centers within a specific isolation domain may occur. High availability allows processing to shift from the affected domain to an already-synchronized fallback domain.

This synchronized domain is called a "replica." The process of duplicating all Workflow data ensures that your replica, which serves as the standby Namespace, is always available and ready to take on the active role. When necessary, Temporal Cloud smoothly transitions control from the active to the standby using a process called "failover".

High availability and business continuity

For many organizations, ensuring high availability is critical to maintaining business continuity. Temporal Cloud's high availability Namespace feature includes a 99.99% contractual Service Level Agreement (SLA). It provides 99.99% availability and 99.99% guarantee against service errors.

A high availability Namespace creates a single logical Namespace that operates across two physical isolation domains: one active and one standby. Replicated Namespaces streamline access for both domains to a unified Namespace endpoint. As Workflows progress in the active Namespace, history events are asynchronously replicated to the standby zone, ensuring continuity and data integrity.

In the event of an incident or outage in the active isolation domain, Temporal Cloud will seamlessly failover to your standby replica. Failovers allow existing Workflow Executions to continue running and new Workflow Executions to be started. Once failover occurs, the roles of the active and standby domains switch. The standby zone becomes active, and the previous active zone becomes the standby. After the issue is resolved, the domain "fails back" from the replica to the original.

Should you choose high availability?

Should you be using high availability Namespaces? It depends on your availability requirements:

  • High availability Namespaces offer a 99.99% contractual SLA for workloads with strict high availability needs. They use two Namespaces in two isolation domains to support standby recovery. In the event of an incident, Temporal Cloud automatically fails over the Namespace to the standby replica.
  • Namespaces without high availability include a 99.9% contractual Service Level Agreement (SLA). In this use, Temporal clients connect to a single Namespace in one deployment domain. For many applications, this offers sufficient availability.

Temporal Cloud provides 99.99% service availability for all Namespaces, both single-region and high availability.

SLA guarantees

High availability Namespaces offer 99.99% availability, enforced by Temporal Cloud's service error rates SLA. Our system is designed to limit data loss after recovery when the incident triggering the failover is resolved.

Our recovery point objective (RPO) is near-zero. There may be a short period of time during an incident or forced failover when some data is unavailable in the replica. Some Workflow History data won't arrive until networks issue are fixed, enabling the History to finish replicating and the divergent History branches to reconcile.

Temporal Cloud proactively responds to incidents by triggering failovers. Our recovery time objective (RTO) is 20 minutes or less per incident.

info

During a disaster scenario in which the data on the hard drives in the active Namespace cannot be recovered, the duration of data loss may be as high as the replication lag at the time of disaster.

Regional availability

Multi-region Namespaces are one of the high availability options you can choose. They are available in all existing Temporal Cloud regions.

tip

Namespace pairing is currently limited to regions within the same continent. South America is excluded as only one region is available.