Paper: Designing Disaster Tolerant High Availability Clusters
A very detailed (339 pages) paper on how to use HP products to create a highly available cluster. It's somewhat dated and obviously concentrates on HP products, but it is still good information.
Table of contents:
1. Disaster Tolerance and Recovery in a Serviceguard Cluster
2. Building an Extended Distance Cluster Using ServiceGuard
3. Designing a Metropolitan Cluster
4. Designing a Continental Cluster
5. Building Disaster-Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
6. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
7. Cascading Failover in a Continental Cluster
Evaluating the Need for Disaster Tolerance
What is a Disaster Tolerant Architecture?
Types of Disaster Tolerant Clusters
Extended Distance Clusters
Metropolitan Cluster
Continental Cluster
Continental Cluster With Cascading Failover
Disaster Tolerant Architecture Guidelines
Protecting Nodes through Geographic Dispersion
Protecting Data through Replication
Using Alternative Power Sources
Creating Highly Available Networking
Disaster Tolerant Cluster Limitations
Managing a Disaster Tolerant Environment
Using this Guide with Your Disaster Tolerant Cluster Products
2. Building an Extended Distance Cluster Using ServiceGuard
Types of Data Link for Storage and Networking
Two Data Center Architecture
Two Data Center FibreChannel Implementations
Advantages and Disadvantages of a Two-Data-Center Architecture
Three Data Center Architectures
Rules for Separate Network and Data Links
Guidelines on DWDM Links for Network and Data
3. Designing a Metropolitan Cluster
Designing a Disaster Tolerant Architecture for use with Metrocluster Products
Single Data Center
Two Data Centers and Third Location with Arbitrator(s)
Additional EMC SRDF Configurations
Setting up Hardware for 1 by 1 Configurations
Setting up Hardware for M by N Configurations
Worksheets
Disaster Tolerant Checklist
Cluster Configuration Worksheet
Package Configuration Worksheet
Next Steps
4. Designing a Continental Cluster
Understanding Continental Cluster Concepts
Mutual Recovery Configuration
Application Recovery in a Continental Cluster
Monitoring over a Wide Area Network
Cluster Events
Interpreting the Significance of Cluster Events
How Notifications Work
Alerts
Alarms
Creating Notifications for Failure Events
Creating Notifications for Events that Indicate a Return of Service
Performing Cluster Recovery
Notes on Packages in a Continental Cluster
How Serviceguard commands work in a Continentalcluster
Designing a Disaster Tolerant Architecture for use with Continentalclusters
Mutual Recovery
Serviceguard Clusters
Data Replication
Highly Available Wide Area Networking
Data Center Processes
Continentalclusters Worksheets
Preparing the Clusters
Setting up and Testing Data Replication
Configuring a Cluster without Recovery Packages
Configuring a Cluster with Recovery Packages
Building the Continentalclusters Configuration
Preparing Security Files
Creating the Monitor Package
Editing the Continentalclusters Configuration File
Checking and Applying the Continentalclusters Configuration
Starting the Continentalclusters Monitor Package
Validating the Configuration
Documenting the Recovery Procedure
Reviewing the Recovery Procedure
Testing the Continental Cluster
Testing Individual Packages
Testing Continentalclusters Operations
Switching to the Recovery Packages in Case of Disaster
Receiving Notification
Verifying that Recovery is Needed
Using the Recovery Command to Switch All Packages
How the cmrecovercl Command Works
Forcing a Package to Start
Restoring Disaster Tolerance
Restore Clusters to their Original Roles
Primary Packages Remain on the Surviving Cluster
Primary Packages Remain on the Surviving Cluster using cmswitchconcl
Newly Created Cluster Will Run Primary Packages
Newly Created Cluster Will Function as Recovery Cluster for All Recovery Groups
Maintaining a Continental Cluster
Adding a Node to a Cluster or Removing a Node from a Cluster
Adding a Package to the Continental Cluster
Removing a Package from the Continental Cluster
Changing Monitoring Definitions
Checking the Status of Clusters, Nodes, and Packages
Reviewing Messages and Log Files
Deleting a Continental Cluster Configuration
Renaming a Continental Cluster
Checking Java File Versions
Next Steps
Support for Oracle RAC Instances in a Continentalclusters Environment
Configuring the Environment for Continentalclusters to Support Oracle RAC
Initial Startup of Oracle RAC Instance in a Continentalclusters Environment
Failover of Oracle RAC Instances to the Recovery Site
Failback of Oracle RAC Instances After a Failover
5. Building Disaster-Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP
Files for Integrating XP Disk Arrays with Serviceguard Clusters
Overview of Continuous Access XP Concepts
PVOLs and SVOLs
Device Groups and Fence Levels
Creating the Cluster
Preparing the Cluster for Data Replication
Creating the RAID Manager Configuration
Defining Storage Units
Configuring Packages for Disaster Recovery
Completing and Running a Metrocluster Solution with Continuous Access XP
Maintaining a Cluster that uses Metrocluster/CA
XP/CA Device Group Monitor
Completing and Running a Continental Cluster Solution with Continuous Access XP
Setting up a Primary Package on the Primary Cluster
Setting up a Recovery Package on the Recovery Cluster
Setting up the Continental Cluster Configuration
Switching to the Recovery Cluster in Case of Disaster
Failback Scenarios
Maintaining the Continuous Access XP Data Replication Environment
6. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF
Files for Integrating ServiceGuard with EMC SRDF
Overview of EMC and SRDF Concepts
Preparing the Cluster for Data Replication
Installing the Necessary Software
Building the Symmetrix CLI Database
Determining Symmetrix Device Names on Each Node
Building a Metrocluster Solution with EMC SRDF
Setting up 1 by 1 Configurations
Grouping the Symmetrix Devices at Each Data Center
Setting up M by N Configurations
Configuring Serviceguard Packages for Automatic Disaster Recovery
Maintaining a Cluster that Uses Metrocluster/SRDF
Managing Business Continuity Volumes
R1/R2 Swapping
Building a Continental Cluster Solution with EMC SRDF
Setting up a Primary Package on the Primary Cluster
Setting up a Recovery Package on the Recovery Cluster
Setting up the Continental Cluster Configuration
Switching to the Recovery Cluster in Case of Disaster
Failback Scenarios
Maintaining the EMC SRDF Data Replication Environment
R1/R2 Swapping
7. Cascading Failover in a Continental Cluster
Overview
Symmetrix Configuration
Using Template Files
Data Storage Setup
Setting Up Symmetrix Device Groups
Setting up Volume Groups
Testing the Volume Groups
Primary Cluster Package Setup
Recovery Cluster Package Setup
Continental Cluster Configuration
Data Replication Procedures
Data Initialization Procedures
Data Refresh Procedures in the Steady State
Data Replication in Failover and Failback Scenarios