9.651 bytes

Service Hints & Tips

Document ID: RMIE-38BGM6

This document is provided to IBM and our Business Partners to help sell and/or service IBM products. It is not to be distributed beyond that audience or used for any other purpose.

Servers - Netware server NMI problem determination procedure.

Applicable to: World-Wide

Service Information:
NMI (Non-Maskable Interrupt) errors which occur while running Novell Netware are difficult to diagnose because they may be
caused by 4 different error conditions. However, the attached information is very useful in identifying the actual error condition which caused the NMI and providing a more efficient resolution to the problem.
The following facts should be understood:
1. The four causes of NMI's are:
a. Main memory (or possibly L2 cache) parity error.
b. Channel check (Micro Channel systems).
c. Watchdog timeout.
d. DMA (Direct Memory Access) time-out error.
2. Various operating systems refer to NMI's using different terms. OS/2 calls them "TRAP 0002", SCO UNIX refers to them as "Panic errors." Novell Netware references them as NMI's.
3. NMI's ARE NOT ALWAYS caused by system memory! Servicers SHOULD NOT automatically replace memory while performing problem determination of NMI errors.
Current technology produces a more reliable memory than that available only a few years ago. This is particularly true in the case of systems using ECC memory.
Note: Erroneous information has been distributed in certain operating system appendices which gives 3 causes of NMI errors, one being the power supply, the other two being channel and memory parity. That information is not consistent with the facts on IBM products and should be disregarded.
4. The operating systems currently available are limited in their ability to differentiate the four causes of an NMI.
An NMI is a catastrophic error detected by the system which then raises the NMI line to the processor. The processor executes the NMI handler and tries to save the information as to what caused the NMI. Usually, the software interrogates the registers to try and find the cause of the NMI. Sometimes the system cannot do this because the error is too severe. (IE. an adapter "hanging" on the bus.) Not all operating systems even try to do this.
Because of this limitation, some operating system error handling routines DEFAULT to pointing out memory as the most likely cause of any NMI. Hence, inappropriate service practices have developed around the fallacy that, "All NMI errors are caused by memory."

With this information in mind, the following procedure is the most efficient method of identifying the actual cause of the NMI, then Novell Netware is the operating system.
1. Check the hardware System Error Log (available in the system partition at the main menu under "More Utilities." It is not a available on some models of 8590/8595.) If the System Error Log is empty, the NMI error was probably NOT CAUSED by a memory parity error. The logging feature is not available on the 8580 or 8640.
Furthermore, if ECC memory is installed in the system, all single bit errors are corrected (and will not cause NMI's, but are logged in the system error log). Double bit errors are detected and logged in the hardware System Error Log and will cause an NMI, just as a parity error does. So, IF THE SYSTEM ERROR LOG IS EMPTY, DO NOT REPLACE MEMORY, continue the NMI problem resolution procedure below.
2. Run diagnostics on the system. If they run error-free do not replace any hardware at this time.
3. It is possible to identify the NMI error source by using the following process and collecting information from NVRAM (Non-Volatile Random Access Memory).

Procedure for Reporting and analyzing Errors on PC Servers
RUNNING NOVELL NETWARE (8590/95, 9585, 9590/95 and 8541 systems only).

Customer Name:__________________________ Date:__ __ __
Location:_______________________________ Time:__ __ __
Telephone#___ ___ ____ FAX#___ ___ ____
Contact name:___________________________

Upon getting the Abend and NMI error displayed on the Screen
(If the # sign is already on the screen the following step 1 is not required.)
1. Invoke the Netware Debugger by simultaneously depressing and holding down the Left-Shift and Right-Shift keys, then depress the ALT and ESC keys. Release all four keys together.
2. At the # sign, enter the following key sequences followed by the ENTER key. Write down the REGISTER INFORMATION DISPLAYED ON THE SCREEN after each entry in the space provided below.
i (SPACE) 61 (ENTER) Port 61 =______________________
i (SPACE) 90 (ENTER) Port 90 =______________________
i (SPACE) 92 (ENTER) Port 92 =______________________

For 8590/95, 9590/95 & 8641 (Type 3 & 4 Processor) Systems only, do the following after the initial three steps above:

o (SPACE) E0 = B1 (ENTER) ________________________
i (SPACE) E4 (ENTER) ________________________
o (SPACE) E0 = B2 (ENTER) ________________________
i (SPACE) E4 (ENTER) ________________________
i (SPACE) E0 = B3 (ENTER) ________________________

o (SPACE) E4 (ENTER) ________________________
i (SPACE) E0 = B4 (ENTER) ________________________
o (SPACE) E4 (ENTER) ________________________
i (SPACE) E0 = B7 (ENTER) ________________________
o (SPACE) E4 (ENTER) ________________________
i (SPACE) E0 = C0 (ENTER) ________________________
o (SPACE) E4 (ENTER) ________________________
i (SPACE) E0 = C1 (ENTER) ________________________
o (SPACE) E4 (ENTER) ________________________
i (SPACE) E0 = C2 (ENTER) ________________________
o (SPACE) E4 (ENTER) ________________________
i (SPACE) E0 = C3 (ENTER) ________________________
o (SPACE) E4 (ENTER) ________________________

Note: This routine will readout error information B1 - B4, B7, and the address information C0 - C3.

For SERVER 300/320 8640 systems:
This procedure will also work for these systems if the following two steps are substituted for the first three steps above.

i (SPACE) 61 (ENTER) Port 61 =________________________
i (SPACE) 461 (ENTER) Port 461 =________________________

3. Print out the SYSTEM CONFIGURATION: Either boot off the REFERENCE DISKETTE or press ALT, CTRL. INS or F1 key to bring up the SYSTEM PARTITION (Reference Diskette Image). (Use CTRL-ALT-S for Servers 300 and 320) Use the PRINT SCREEN" key on the keyboard and copy ALL of the hardware configuration screens to a printer (attached to the printer port), etc.
Dumping the hardware configuration screens to a printer will provide exact information regarding the adapters installed, such as the Micro Channel slot, interrupt setting, arbitration level, etc.
BE CERTAIN to include the "DISPLAY REVISION LEVELS" screen, which is found under "More Utilities" at the system partition MAIN MENU. This provides the IML Reference Diskette and BIOS versions and other machine information.
Have this information available to FAX to the appropriate Service Support Group responsible for Novell Netware support..

Search Keywords

Miscellaneous, Netware

Hint Category

Retain

Date Created

14-09-95

Last Updated

04-02-99

Revision Date

04-02-2000

Brand

IBM PC Server

Product Family

PC Server 300, PC Server 310, PC Server 315, PC Server 320, PC Server 325, PC Server 330, PC Server 500, PC Server 520, PC Server 704, PC Server 720, Server 85, Server 95, Server 195, Server 295

Machine Type

8640, 8639, 8638, 8641, 8650, 8642, 8600, 9585, 9595, 8595, Various

Model

Various

TypeModel

Retain Tip (if applicable)

H13823

Reverse Doclinks
and Admin Purposes