14.752 bytes

Service Hints & Tips

Document ID: DJON-3H2347

Servers - Microsoft Enterprise Server clustering information

Applicable to: World-Wide

Definition of Clustering:

- A group of independent network servers that present themselves to a network as a single system
- Manageable as a single system
- Common name for clustered servers
- Services available "cluster wide"
- Can tolerate component failures
- Components can be added transparently to users.

Cluster Basics:

- Every physical machine in a cluster is considered a system, sometimes called nodes.
- A cluster consists of two or more systems.
- Each system runs one instance of the Cluster Service.
- Each system's cluster service is paired with one or more resource monitor(s).
- Resource monitors import a resource DLL for each resource type.

Failover
Failover is the process of transferring control of one or more client resources (applications, disks, print spoolers, and so on) from one node to another.

When the failing node is replaced or returned to working condition, some or all of the same resources can "fail back," which means that they are transferred back to the control of the original computer.

Failovers can be caused by many types of failure, most often at the operating system, hardware, and application levels.

Resources
A resource is any physical or logical entity used to provide a service to the cluster clients, such as an application or small computer system interface (SCSI)-attached disk. It is the basic unit managed by the Resource Manager within the Cluster Service.

Resources and Nodes
Resources can be owned by only one node at a time. Resources can be configured to run on multiple nodes, but only one at a time.

Dependency Relationships
- A dependency is a two-way association between resources.
- A resource may depend on any number of other resources.
- A resource is brought online after all resources it depends on are brought online.
- A resource is taken offline before any resource it depends on is taken offline.
- Resources and all their dependent resources must fail over together.

Resource Types
Microsoft Enterprise Server clustering implements the following resource types:
- Generic application
- Generic service
- Internet information server (IIS) virtual root
- Network name
- Physical disk
- Print spooler
- File share
- TCP/IP address

Groups
A group is a basic unit of failover managed by the failover manager within the cluster service.

Shared Resources
All of the resources described thus far have been non-shared resources; that is, resources owned by one system at a time. All resources managed by the cluster software MUST be non-shared resources.

Resource Properties and Failover
- Group Membership
- Member of only one Group at a time
- Group online to only one system at a time
- Online State
- Available for client or other resource
- Dependencies
- Dependents brought online last
- Set when adding resources or at any later time.

Quorum Resource
- "Tie-breaker" for non-communicating nodes
- Quorum capable resource
- Arbitrates for a resource by supporting the challenge/defense protocol
- Capable of storing log data
- Configuration change log
- Tracks changes to configuration database when nodes not communicating
- Prevents configuration partitions in time.
- Additionally, the quorum disk contains a log of recorded cluster configuration changes in a file located in a master file table (MFT).

SCSI IDs
Each device on the SCSI bus, including the SCSI interface, must have a unique identification called a SCSI ID. These IDs identify each device for properly routing data and commands along the bus and directing signals to a specific device.

System Requirements
- PCI-based Intel Pentium computer on Windows NT 4.0 HCL
- Windows NT 4.0 Server with Service Pack 3 (SP3)
- Same network
- Same domain
- Administrator account
- Not domain controllers

Note: Windows NT 4.0 with SP3 must be installed entirely on each node's non-shared disk(s). All paging files and system files must be on non-shared disks.

SCSI Bus, Cables and Disks
The SCSI IDs of the two controllers on a shared bus must be different. SCSI controllers are assigned ID 7 by default. One of them must be changed to another value (such as 6) before they are connected to the same bus.

Networking
One static TCP/IP address is required for the cluster. This address must be part of the IP subnet used on one of the shared LANs. Additional static IP addresses are required for clients to use application services on a cluster.

Parameters Property Page
Image Name - Name of the application executable.
Command Line - Command line switches.
Current Directory - Default application directory.

Print Spooler
The Print Spooler resource is used to create printers (spool directories) that can fail over in the Windows NT Cluster. See the README files on the product CD for instructions on installing print spooler.

Disk Signature
The unique identifier present on all hard disks must be used to identify additional physical disks. This number can be obtained using the FTEDIT utility, which is available in the Windows NT 4.0 Resource Kit.

Client/Server Issues
Cluster Clients:
- Must use TCP/IP or NBT as transport protocol
- Must reconnect or retry after failure

Cluster Servers:
- Must bind to all local IP interfaces

Transport/Protocol Issues
- NT clusters currently support only TCP/IP and NBT transport protocols
- DHCP--Cluster nodes and IP address resources must use static IP addresses

Disk Controllers
- All the devices on a SCSI bus MUST have a unique address
- Some SCSI controllers reset the SCSI bus when they initialize

Security
Share Security:
- NTFS permissions must include the System Account with read permissions

Domain Security vs. Local Security:
- File share resource fail-overs bring the current ACL with them to the other system
- Both nodes must be in the same domain.

Preparing the Hardware
SCSI drives will not power up.
When the physical SCSI devices are not powering up or spinning, the Cluster Service will be unable to initialize any quorum resource. If cables and power are correctly connected, check the following:
1) If the SCSI hard drive is configured to receive a start unit command from the SCSI adapter, the drive will not spin up before receiving this signal from the controller.
2) Enable the "Send start" command to the desired device in the SCSI controller configuration or configure the drive to automatically power up.
3) Try taking one or more of the drives in your server off the SCSI chain and see if the rest of the drives will spin up when you power on the server.

SCSI bus drive not recognized in BIOS scan.
This symptom may manifest itself as one of several errors, depending on the attached SCSI controller. It is normally accompanied with a one- to two-minute delay in booting and an error indicating the failure of some device.

1) Are computers attached to the SCSI bus and running?
This situation is not necessarily a problem. Many times, the second computer to be powered up will not recognize the shared SCSI bus during the BIOS scan, if the first computer is running. This situation may be manifested in a "Device not ready" error being generated by the SCSI controller, or substantial delays during boot.

Disable the option to scan for SCSI devices on the shared SCSI controller. If this option is not present on your computer, the following ordered events may alleviate the problem:
a. Start the computer without Cluster Server installed.
b. When the NT boot loader countdown screen is reached, press the space bar. This will effectively halt the boot process until a selection is made and the Enter key is pressed.
c. Start the second computer and allow it to boot. This one should have Cluster Server installed, or is the first node to have Cluster Server installed.

2) Is one node of the cluster service running? Is the quorum resource online?
When Cluster Server starts, it may automatically mount the available quorum resource, locking its use. To fix this problem, bring the disk resource offline during the period the other node is booting normally.

SCSI bus drive not recognized in NT Disk Administrator.
Under normal cluster operations, the owning cluster of a quorum resource will lock the shared SCSI drive set blocking any use of the device from the other node. If you find that the owning device of an operational cluster cannot access configuration information through the disk administrator, check the following:
1) A device does not have physical connectivity and power.
a. Reseat SCSI cards; reseat cables; make sure the drive spins up on boot.
b. Are the SCSI IDs properly configured?
c. Check the SCSI cards and devices to ensure that there are no conflicting SCSI IDs. Each SCSI card and device must have a unique SCSI ID.

2) Is Windows NT being installed with both servers attached to the shared drive(s)? You must have the Cluster Server on one node installed BEFORE you attach both servers to the shared drive(s). Attaching the drive to both nodes before you have the cluster installed could seriously damage the drive and/or the SCSI cards that are attached to it.

SCSI devices do not respond.
The SCSI bus is not terminated at both ends or the SCSI bus is terminated early:
1) Terminate the bus at both ends.
2) The SCSI cable length is excessive according to the SCSI specification. Replace with a shorter cable.
3) The SCSI cable might be damaged. Check for bent pins and loose connectors on the cable and replace, if necessary.

Connecting the Second Node to the Cluster
If the second node cannot join the cluster, check the following:
1) Proper cluster or node name?
2) Cluster Name Resource started?
3) First node powered up fully?
4) Network or IP connectivity?

Client Connectivity to the Cluster
If the client cannot attach to a cluster's SMB share, ensure that WINS is properly configured.

If the client cannot access a cluster resource, check the following:
1) IP resource started on cluster? If not, start the IP resource on the cluster.
2) Client or cluster computer configured for WINS? If not, ensure that cluster computers are configured to use WINS services.
3) Client accessing cluster from a different subnet? If so, add a DNS address record for the cluster in the DNS database.

Keywords: clustering, Windows NT

Search Keywords

Hint Category

Microsoft Cluster Server

Date Created

13-05-97

Last Updated

22-09-98

Revision Date

21-10-98

Brand

IBM PC Server

Product Family

Clustering

Machine Type

Various

Model

Various

TypeModel

Retain Tip (if applicable)

Reverse Doclinks
and Admin Purposes