Online HSMs
To deliver high availability and resilience, Secubit operates its Hardware Security Modules (HSMs) in multiple datacenters across distinct geographic locations. This setup ensures continuity of service while meeting strict uptime guarantees.
Secubit’s design targets an SLA of 99% uptime. To achieve this, two datacenters are provisioned, each equipped with two redundant HSMs. Across these sites, one HSM is designated as the primary, while the other three act as standby fail-over devices. If the primary HSM becomes unavailable, Secubit automatically redirects operations to a standby unit, preserving service availability without manual intervention.
Synchronization between HSMs is handled continuously and securely. The system propagates Merkle root updates across datacenters so that all HSMs share a consistent view of wallets' state. If a fail-over occurs, the standby HSM can take over immediately without data loss or state divergence. This synchronization protocol is described in detail in HSMs Synchronization.
flowchart LR subgraph DC1["Datacenter 1"] HSM1["🔒 HSM 1 </br> (primary)"] HSM2["🔒 HSM 2 </br> (standby)"] end DC1 --- S subgraph DC2["Datacenter 2"] HSM3["🔒 HSM 3 </br> (standby)"] HSM4["🔒 HSM 4 </br> (standby)"] end DC2 --- S S("☁️ </br> Secubit </br> Cloud Service")
All cryptographic keys used by Secubit are generated in a formal key ceremony, ensuring that master secrets are created, verified, and cloned to each HSM under strict security controls. By replicating keys during the ceremony and maintaining state consistency through automatic synchronization, Secubit guarantees that any HSM in the cluster can serve as a trusted anchor of security.
The datacenters hosting Secubit’s HSMs are SOC 2 certified and operated under strict physical protections, including 24/7 surveillance, controlled access, and monitoring, providing assurance that hardware remains physically secure at all times.
Secubit’s architecture is designed to be scalable, since it uses minimal HSM storage (only a few keys and a single Merkle root) and the performance of HSMs for key operations and policy checking far exceeds the system’s requirements. A single HSM can already support millions of wallets. Additional HSMs are deployed primarily for redundancy and fail-over, ensuring resilience. At the same time, the design makes it straightforward to add more HSMs and datacenters in the future if needed, without disrupting existing operations.
This distributed online HSM architecture provides the foundation for Secubit’s institutional custody platform: a system that is fault-tolerant, resilient against datacenter failures, and continuously available without compromising cryptographic integrity.