9.743 bytes

Service Hints & Tips

Document ID: DDSE-3YSKAT

Netfinity 7000 - Isolating problems in an IBM Cluster running Oracle Parallel Server

Applicable to: World-Wide

IBM has created the following section to help you isolate problems if and when they occur. This solution contains many parts, some of which are not from IBM. To expedite problem resolution use the following problem determination sequence. It will help you resolve any problems and help identify whom you should seek for support.

This problem map will assist you with the isolation and resolution of problems with the hardware, the IBM Cluster Enabler (OSD) software, and the Oracle Parallel Server (OPS) software. The error you receive could be application generated or related. If while stepping through this guide, you have determined that an error is application generated or related, contact the appropriate vendor for that application.

Step 1. Validating Device Drivers and Microcode
Verify that the server (node) in the cluster is operating with the appropriate version of the controller microcode, adapter microcode, system BIOS, and device driver levels. A list of supported versions can be found at the Web site: http://www.pc.ibm.com/us/support

Step 2. Isolating the Disk Subsystem
The following list of problem attributes will aid you in isolating the disk subsystem.
Cannot see the RAID adapter.
Cannot see shared storage disk drives from the Windows NT disk administrator.
Errors from SYMplicity Manager.
Amber lights on hard disk drives.
SYMArray messages in the system event log of the Windows NT event log.
LP6NDS35 messages in the system event log of Windows NT event log.

If any of the above conditions or messages occur, do the following:
a. Use the recovery steps in the SYMplicity Storage Manager to correct the condition or message.
b. Refer to Symbios documentation for further problem analysis.
c. Contact Symbios support.
d. Continue with step 6 on page 55.

If none of the conditions or messages occur, continue with step 3.

Step 3. Isolating the IBM Netfinity Server Hardware
IBM Netfinity Server hardware problems appear as one of the following:
POST error messages.
Error messages in the Windows NT event log.
Blue screen (trap) errors before you are prompted to log on.

If any of the above conditions or messages occurs do the following:
a. Consult the hardware documentation for the component that appears to be failing.
b. Check the IBM online technical tips at: http://www.pc.ibm.com/us/support
Note: Search for text relating to the component that appears to be failing.
c. Contact IBM support.
d. Continue with step 6 on page 55.

Step 4. Isolating Start Up Software Errors
To complete the software isolation steps for problem determination, you must determine the prior state of the server (node) producing the error. Identification of the prior state is determined by the node's participation in the cluster since either the OSD, or OPS software, has been installed. The table below specifies the states and conditions which determine the node's status.

Start Up:
The server (node) has been installed but has never participated in a cluster and functionally performed (continue with step 4).

Up and Running:
The server (node) has participated in a cluster and functionally performed. Neither the IBM Cluster Enabler (OSD), nor the Oracle Parallel Server (OPS) software has been installed (continue with step 5 on page 55).

Check the following items in the order shown:
a. Validate that the critical Windows NT services are running to ensure the OPS solution is functioning.

b. Close and restart Windows NT services if it is already open.

c. Validate that the following four services are present and in a running state:
IBMCoreClusterService
OracleTNSListener80
OraclePGMSService
OracleService<SID>
(where the value for the SID is one number higher than the node number)
If any of the above services is not in a running state, restart the service and all subsequent services documented above.

Note: Each service may take from 2 to 3 minutes to start.

d. Validate that all servers (nodes) are using the same port number referenced in ôStep 11: Set Port Numbers in Windows NT Services Fileö on page 49.

e. Verify that the following three files are located in: %your_installation_dir%\IBMOSD\Config
If any of these files are not present, reinstall the OSD software on that specific server (node).
CSMETA.CFG
CSCLUSTER.CFG
CSCOMPUTER.CFG.

f. Verify that the environment variable, CS_Install_Dir is present and its path points to the OSD directory. If this environment variable is not present, or does not show the correct path, do not attempt to change the environment variable; reinstall the IBM Cluster Enabler software.

g. Validate that the servers (nodes), ID, and IP addresses are filled in and correct in the CSCLUSTER.CFG file.

h. Validate that the first line of file CSCOMPUTER.CFG identifies the ID number of the server (node) this file came from.

If any of the above conditions, messages, or suggested actions does not correct the problem, do the following:
1. Reinstall the IBM Cluster Enabler software.
2. Continue with step 6.

Step 5. Isolating Up and Running Software Errors
Check the following items in the order shown:
a. Validate that the critical Windows NT services are running to ensure that the OPS solution is functioning.

b. Close and restart Windows NT services if it is already open.

c. Validate that the following four services are present and in a running state:
IBMCoreClusterService
OracleTNSListener80
OraclePGMSService
OracleService<SID>
(where the value for the SID is one number higher than the node number)
If any of the above services is not in a running state, restart the service and all subsequent
services documented above.

Note: Each service may take from 2 to 3 minutes to start.

d. Validate all servers (nodes) are using the same port number referenced in ôStep 11: Set Port Numbers in Windows NT Services Fileö on page 49.

e. Check the Windows NT event log for hardware errors.

f. Correct any hardware errors as necessary.

g. Restart the server (node).

If any of the conditions or messages in the above steps do not correct the problem, do the following:
1. Contact Oracle support for ORA-xxxxx errors, or Oracle database errors at the Oracle command prompt.
2. Continue with step 6.

Step 6. Contacting Support
To assist with problem resolution, have the following information ready before contacting support,
either electronic or voice.

Contacting IBM Support:
a. Machine Type Model and Serial Number of the failing server.
b. Operating system, service pack, and IBM Cluster Enabler (OSD) software level.
c. BIOS, microcode, and device driver levels for the system unit, Symbios disk subsystem.
d. Configuration of Symbios disk subsystem.
e. Number of servers (nodes) in the cluster showing similar symptoms.

Contacting Symbios Support:
a. Server type and configuration.
b. Operating System and service pack level.
c. Name and version of SYMplicity Storage Manager software.
d. Model number of array and serial number on the Symbios CM1000 command module.

Contacting Oracle Support:
a. Refer to the Oracle web site at: http://www.oracle.com/support
b. Customer number.
c. Operating System version number.
d. Oracle DataBase version number.
e. ORA-xxxxx error information, if any.
f. Error or log file information resulting from the problem.

Note: The Oracle Guide to Customer Support publication included with your Oracle software will provide the most current information on customer support.

Search Keywords

Hint Category

Oracle Parallel Server

Date Created

29-09-98

Last Updated

30-09-98

Revision Date

29-09-99

Brand

IBM PC Server

Product Family

Clustering, Netfinity 7000

Machine Type

Various, 8651

Model

TypeModel

Retain Tip (if applicable)

Reverse Doclinks
and Admin Purposes