Cloud fault tolerance is the ability of a system to continue operating when part of it fails. It assumes failure will happen and designs around it.
How fault tolerance is built
Common methods include redundancy, automatic failover, retries, replication, multiple zones, health checks, queues, and graceful degradation.
Fault tolerance and high availability
High availability is the goal of staying reachable. Fault tolerance is one design approach that helps achieve that goal.
Fault tolerance is a mindset: plan for failure before it becomes an incident.