[an error occurred while processing this directive]
  [an error occurred while processing this directive][an error occurred while processing this directive]  
LAN DRIVER DEVELOPMENT HOME  

Technical Memo

Subject: Interrupts in the Symmetric Multi-Processing Environment.

Intended Audience: Novell development and testing. Third party driver developers.

Date: June 26, 1996


Introduction:

With the introduction of Multi Processor (MP) systems there are a number of new issues that affect systems running with NetWare and device drivers running on NetWare. Intel's Multi Processor Specification (MPS) was designed to establish an MP Platform interface standard that extends the performance of the existing PC/AT platform beyond the traditional single processor limit. This specification is based on the Intel Advanced Programmable Interrupt Controller (APIC). The APIC provides basic functionality similarly in nature to that of the Priority Interrupt Controller (PIC) however, its features and capabilities far surpass the PIC to support the needs of a multi processor system.

An MP system does not have to be Intel MPS compliant or use APICs to run with NetWare. However, MPS compliant systems based on the APIC are likely to dominate the industry as some Pentium and all Pentium Pro processors have APIC functionality embedded in the processor.

The rest of this memo is dedicated to a discussion of interrupt issues in NetWare SMP.


The Advanced Priority Interrupt Controller (APIC):

Systems which adhere to the Intel Multi-Processor Specification employ the use of the APIC. The APIC allows for the distribution of interrupts amongst the various processors in the system. The APIC is divided into two functional units called the local APIC and the IO APIC. There is one local APIC per processor in the system. The local APIC's registers are only visible to the processor with which they are associated. The local APIC may reside in the processor or may be part of an external device such as the Intel 82489DX. There may be multiple IO APICs in a system. The registers of each IO APIC are visible to all processors.

The purpose of the IO APIC is to receive an interrupt from one of the many devices in the system and present the interrupt to all local APICs in the system. Interrupt delivery may be preset to include one processor, all processors, or subset in any combination depending upon the needs of a particular OS.

Interrupts are sent serially from the IO to local APICs over a 3 wire bus. Two wires for data and one for clock. There are a number of interrupt delivery modes which govern how local APICs arbitrate amongst themselves to determine which CPU will get the interrupt. The local APIC has the responsibility to receive the interrupt and present it to the processor. Once a local APIC has accepted an interrupt it cannot be recalled, cleared or masked. An interrupt that is accepted by a local APIC will eventually interrupt the processor.

The SMP environment dictates that the interrupt distribution hardware be able to 1) detect the presence of a new interrupt and 2) present the interrupt to the designated processor set for acceptance. This two step process is somewhat different from the way the PIC works in the uni-processor environment and is the cause of some behavioral differences between APIC and PIC. These differences may result in the generation of spurious interrupts. Drivers that did not previously generate spurious interrupts in the native, non SMP, NetWare environment may cause spurious interrupts in NetWare SMP.


APIC Level Triggered Interrupts Are Latched:

As near as we can tell the behavior of edge triggered interrupts in the APIC is identical to the behavior of edge triggered interrupts routed through the PIC. However, the same is not true for level triggered interrupts. There is one subtle difference between PIC and APIC that may result in some devices or drivers generating spurious interrupts.

Level triggered interrupts are not latched in the PIC. If the interrupt line is asserted and deasserted while interrupts are disabled at the processor a new interrupt will not occur, even if the End Of Interrupt (EOI) command has already been written. The same is also true when 1) interrupts are enabled at the processor and 2) a higher priority interrupt is currently in service and the EOI for that interrupt has not been issued. So long as a device deasserts its interrupt before interrupts are enabled at the processor and there are no higher priority interrupts in service, a new interrupt will not occur.

The above is not true for level triggered interrupts delivered 1) through the IO APIC over the bus to the local APIC or 2) through the PIC, then to the IO APIC, and finally over the bus to the local APIC. The reason is that interrupts delivered through the IO APIC are latched. This means that if a device's interrupt line is asserted any time after EOI, a new interrupt may occur.

The state of the processor in a PIC based uniprocessor system may keep a level triggered interrupt from occurring if the interrupt disappears at the input of the PIC before the state of interrupts at the processor changes. However, after the EOI in an APIC system, the state of the processors has no effect upon interrupt delivery.

What this means is that devices that assert and deassert a level triggered interrupt line before being serviced will cause spurious interrupts in APIC MP systems. Also driver call back functions that require a device to deassert the interrupt line after it has already been asserted may also be a cause for spurious interrupts.

An interrupting device must not deassert its interrupt line until after interrupt acknowledge (INTA). Because the device does not know exactly when INTA occurs the device should not disable the interrupt until it is told to do so in the interrupt service routine, however the device must always deassert the interrupt before the EOI or a new interrupt will occur.


Posted Memory Write:

When controlling I/O devices, it is important that memory and I/O operations be carried out in the order programmed. Intel-compatible processors do not buffer I/O writes; thus, strict ordering among I/O operations is enforced by the processors.

To optimize memory performance, processors and chipsets often implement write buffers and writeback caches. Intel-compatible processors guarantee processor ordering on all internal cache and write buffer accesses. However, chipsets must also guarantee processor ordering on all external memory accesses.

For systems based on the integrated APIC, posting of memory writes may result in spurious interrupts for memory mapped I/O devices using level-triggered interrupts. I/O device drivers must serialize instructions to ensure that the device interrupt clear command reaches the device before the EOI command reaches the APIC so that spurious interrupts do not occur.

When allocating memory for memory mapped devices the memory should be marked as non-cacheable memory. This forces all read and write operations to go directly to the device. Because a posted write may not be immediately flushed to memory from the system's write buffers it is possible that a posted write to deassert a device interrupt line may be delayed until after the EOI has been issued.

This has actually happened with some memory mapped devices and is a cause for spurious interrupts.

The solution to this problem is follow the non-cacheable write to deassert interrupts with a non-cacheable read from the same location. This operation will immediately force the data out of the system write buffers to the device so that the subsequent non-cacheable read operation can complete. This should get the interrupt line deasserted before EOI and prevent an additional interrupt from occurring.


Locks:

It is important to remember that in the MP environment there may be an attempt to access a hardware device or global data from multiple processors simultaneously. For example a driver call back routine may execute on one processor at the same time as the interrupt service routine for that driver is executing on another processor. Locks must be used to ensure the integrity of data and safe operation of the driver.

For LAN drivers adapter lock operations are performed by the MSM and LSL. In NetWare 4.10 SMP and 4.11 SMP disk drivers are not SMP aware and disk interrupt delivery is restricted to processor 0. Consequently the use of MP locks was not necessary for disk drivers running on those versions of NetWare SMP.


End of Interrupt (EOI) in the SMP Environment:

The OS should be architected in such a way that the driver is allowed the freedom to decide when to issue the EOI. Allowing the driver to decide when to issue the EOI in a service routine makes it possible for high performance, MP safe, interrupt service routines to execute simultaneously on multiple processors.

There must be one and only one EOI per interrupt. This should be done early in the routine but not until it is safe to allow another interrupt at the same level to reoccur on any processor in the system.

Service routines that issue more than one EOI per interrupt can cause serious system problems. In a uni-processor system a reoccurrence of an interrupt could not occur until after EOI and also not until interrupts were reenabled at the processor using the STI instruction. However, in an MP system a new interrupt at the same interrupt level may fire on another processor immediately after EOI.

For example if a high priority interrupt occurs while a lower priority interrupt is still in service, and then issues two EOIs, it is possible for the lower priority device to interrupt again and be serviced concurrently on another processor. This can cause problems for the driver or the OS.

In a uni-processor system it was safe to do an EOI before critical sections of the ISR were complete so long as interrupts were disabled (CLI) at the processor. However, in an MP system a new interrupt at the same level may fire on another processor immediately after EOI.

Depending on the type of driver and driver specification, the OS may or may not perform the EOI for the driver. The driver may be required to call an OS routine that performs the EOI. Please refer to your specification for details. However, in every case the driver must not write EOI directly to the system hardware.


Masking and Unmasking the PIC:

We have seen some drivers mask the PIC IRQ line during their service routines to guarantee that no new interrupts at the same interrupt level will fire. When the service routine is complete they unmask the PIC. The big problem with this is that the PIC may no longer be the device through which the interrupt is routed and unmasking the PIC may enable a second interrupt route to the processor. If the IO APIC and PIC both have the same interrupt unmasked then for every assertion of the interrupt line there will be two interrupts at the processor.

Masking an interrupt request in the PIC has an immediate affect. However, because the interrupt request registers of the APIC are located in the local APIC, masking the interrupt at the IO APIC will not prevent an interrupt request that is already pending in the local APIC from interrupting the processor. This means that additional interrupts may occur at the processor even after the interrupt has been masked at the IO APIC. This is also true for PIC interrupts routed through the IO APIC. Masking of interrupts in the APIC environment will not guarantee that new interrupts at the same interrupt level will not occur. Only non issuance of the EOI can guarantee that a new interrupt will not occur.

Also the number of interrupts sources in a PCI system may be very large and a number of those devices may share the same interrupt line. If one driver masks the interrupt line it may well mask other devices keeping their interrupts from being serviced on other processors after the EOI.

The masking of interrupts to prevent interrupt reoccurrence is not guaranteed and poses a potential performance problem in the MP environment. So the masking and unmasking of interrupts in NetWare SMP is a function to be used exclusively by the OS. The EOI is the correct tool to use to guarantee that new interrupts at the same level will not reoccur. Please engineer your interrupt service routines to take advantage of the correct use of EOI and do not mask.

In NetWare versions 4.0, 4.01, 4.02, 4.10 and 4.10 SMP the MSM masks PIC interrupts 7 and 15 while in the service routine to compensate for a problem where the lost hard interrupt detection code was throwing away good interrupts. That problem has been corrected in NetWare 4.11 and 4.11 SMP. However, for the lost hardware interrupt detection algorithm to work correctly an interrupt service routine must call the specified EOI function in the OS and not write an EOI directly to the PIC. Drivers that write EOI directly to the PIC still pose a potential problem for interrupts 7 and 15. However, this is only true when the offending driver shares the interrupt, 7 or 15, with another device.

In the future most MP systems will not use the PIC for interrupt distribution and older drivers that access the PIC directly will be broke. Driver writers, please use the APIs documented in your driver specification and do not make assumptions about the underlying hardware configuration.


MP Configuration Table:

MPS compliant systems have a table in BIOS called the MP Configuration Table. The table describes the MP system configuration. It includes such information as the number of local APICs and processors in the system, the number of IO APICs, the number and type of buses, the routing and type of interrupts in the system, etc. If these tables do not accurately reflect the configuration of the system, or if the content of the table is not MPS compliant, it may be difficult or even impossible for the Platform Support Module (PSM) to set up the MP system correctly.

We have seen a number of MP configuration table problems in various systems and until this new aspect of the BIOS is fully understood by BIOS programmers there may be more systems with MP table problems. If a system is found to have an MP Configuration Table problem, the BIOS must be fixed. The PSM must adhere strictly to the MP specification and cannot make concessions for problems in the BIOS.


Interrupt Polarity:

IO and local APICs have programmable interrupt polarity. The Platform Support Module (PSM) programs the polarity according to the MP table in the BIOS. If the BIOS indicates the wrong polarity or trigger mode there may be thousands of spurious interrupts per second.

A new SMP set parameter has been added to NetWare 4.11 SMP to detect spurious interrupts in the MP environment. It is:

SET SMP DISPLAY SPURIOUS INTERRUPT ALERTS = ON

The warning message, for example, is as follows:

"SMP WARNING: 100000 spurious hardware interrupt(s) detected on INT 47."

If a system has a problem with thousands of spurious interrupts per second it may need a BIOS upgrade to correct a problem in the MP configuration table.

PCI, EISA and MCA interrupts are by default level triggered active low. ISA is positive edge triggered. EISA interrupts may also be set up as edge triggered to accommodate ISA devices. An MP system may invert an interrupt signal to present the desired polarity to the IO APIC. The 8259 PIC and 82489DX discrete APIC are always active high level triggered or positive edge triggered. EISA level triggered interrupts are always inverted by the EISA Edge Level Control Registers (ELCR) before reaching the either of these devices.


Interrupt Priority:

In NetWare PIC interrupt priority is based on the IRQ #. The priority of interrupts was as follows: 0, 1, 8, 9, A, B, C, D, E, F, 3, 4, 5, 6, 7. IRQ's 2 and 9 are the same interrupt.

In the APIC priority is based on the vector associated with the interrupt not the input line. The local enforces the priority of pending interrupt requests. However, because the IO APIC scans its input lines for active interrupts in order starting from line 0, the overall priority scheme based on the vector is not strictly enforced.


Virtual Interrupts:

NetWare 4.10 and 4.11 SMP support 64 virtual interrupts. Devices sharing the same virtual interrupt share the same OS service routine.

Currently there is no way easy way for the console operator to find out what virtual interrupt a driver is using. In future releases of the product we plan to provide means to obtain useful information regarding virtual interrupts assignments. The CONFIG console command only reports the uni-processor EISA equivalent IRQ, 0-15. This may be misleading because the IRQ number displayed by the CONFIG command may or may not be the same virtual interrupt number in use by the device. Do not rely on the CONFIG command for device/interrupt association in NetWare SMP v4.10 and v4.11.

Virtual interrupt assignments may change every time the MP configuration table is changed by the BIOS. This usually happens when a device is added, removed or reconfigured.


Useful "SET" Parameters:

There are a total of three console SET commands that can be used to detect interrupt problems in NetWare SMP.

SET SMP DISPLAY SPURIOUS INTERRUPT ALERTS = ON (SMP v4.11 and later.)

SET DISPLAY SPURIOUS INTERRUPT ALERTS = ON

SET DISPLAY LOST INTERRUPT ALERTS = ON

All three of these are OFF by default in NetWare v4.11. While testing and certifying drivers these three set parameters should be turned on!

A lost hardware interrupt alert occurs when at INTA time there is no IRR bit set in the interrupting PIC.

A spurious interrupt alert occurs when the interrupt is not claimed by any driver service routines.

Lost and spurious interrupts impact system performance. However, performance will not be severely degraded unless there are a lot of these interrupts occurring. In any event hardware vendors and device driver writers should work to eliminate the occurrence of unnecessary interrupts.

Other operating systems have spurious and lost hardware interrupts but do not report them.


APIC Spurious Interrupt:

In an APIC based MP system the APIC may be enabled to route PIC interrupts to the processor even if the OS is not an MP OS. Any time the APIC is enabled it is capable of generating an APIC spurious interrupt. An "APIC spurious interrupt" occurs when there are no interrupt request bits set in the local APIC's interrupt registers at interrupt acknowledge (INTA).

The APIC spurious interrupt is not to be confused with other spurious interrupts in NetWare. An APIC spurious interrupt is more akin to the "Lost hardware interrupt detected on [primary|secondary] controller" message generated by a interrupt glitch on the PIC.

Occurrence of the APIC spurious interrupt will be rare. At this time there are not any versions of NetWare SMP which report the occurrence of an APIC spurious interrupt.

Because the vector for this interrupt is not programmed to a known value in non SMP versions of NetWare prior to 4.11, an occurrence of this interrupt may cause an invalid interrupt exception which will cause the server to ABEND.

To work around this problem the APIC spurious vector register must be programmed with a valid interrupt vector for which there is a corresponding service routine. This will prevent the ABEND.

In NetWare 4.11 the default handler for unassigned interrupts has been changed so that it no longer ABENDs the server.


Lost Hardware Interrupts:

Lost hardware interrupt messages occur when there are no interrupt request bits set in the PIC at INTA. This usually happens when a level triggered interrupt is asserted then deasserted before INTA. When this happens the PIC will put the vector associated with interrupt 7 or 15 on the bus.

Lost hardware interrupts were not reported in NetWare 4.10 regardless of the state of the "SET DISPLAY LOST INTERRUPT ALERTS = ON" console command. This has been corrected in NetWare 4.11.

Devices using interrupts 7 or 15 on versions of NetWare prior to 4.11 may have struggled to work correctly. In NetWare 4.11 the problems with lost hardware interrupt detection and reporting have been corrected and interrupts 7 and 15 should no longer pose a problem for drivers compliant with current specifications. However, a device driver that writes an EOI command directly to the PIC hardware may foul up the detection of lost hardware interrupts. Consequently, devices that write EOI directly to the PIC should not share interrupts 7 or 15 with other devices.

Because the IO APIC latches level triggered interrupts, interrupts glitches that would have shown up as lost hardware interrupts when routed through the PIC in the uni processor environment may show up as spurious interrupts when routed through the IO APIC in NetWare SMP.


Disk Driver Interrupts:

In NetWare 4.10 and 4.11 SMP disk interrupts must be serviced by processor 0. Disk drivers with a .DSK extension are still routed through the PIC. The reason being that some legacy drivers write the EOI command directly to the PIC. In the future drivers must not directly access the PIC, APIC or other symmetric interrupt distribution hardware.

Interrupts from Host Adapter Modules, drivers with a .HAM extension, have been moved in NetWare 4.11 SMP to the IO APIC but interrupt delivery is still restricted to processor 0.


Marshaling:

Currently there are two kernels in NetWare SMP. The NetWare kernel and the SMP kernel. Marshaling in NetWare occurs when a thread running in one kernel calls an API that must run in the other kernel.

An interrupt service routine must not call a routine that is marshaled. Doing so is a critical error which will result in an INT3 and will cause the server to drop into the debugger.

Please refer to your Software Developers Kit documentation to find out which routines are marshaled and which are not.


Timer Interrupts:

In NetWare SMP Processor 0 should receive about 18 timer tick interrupts per second. Secondary processors also receive a varied number of timer interrupts. The purpose for these interrupts is to support among other things preemptive scheduling. Timer interrupts occur independent of disk and LAN activity.

The number of interrupts occurring on each processor may be monitored using MONITOR.NLM and by viewing Multiprocessor Information, Information by Processor.


Interrupt Affinity:

NetWare 4.11 SMP employs a new concept referred to as "interrupt affinity." Enabling interrupt affinity will allow SMP to bind an interrupt to a particular processor or set of processors. To enable or disable interrupt affinity add the following command to your AUTOEXEC.NCF file after SMP is loaded but before you load any LAN drivers.

SET SMP INTERRUPT AFFINITY = [ON|OFF]

SMP Interrupt Affinity is off by default in NetWare 4.11 SMP.

Interrupt affinity affects how interrupts are distributed. If interrupt affinity is enabled when a driver is loaded, NetWare SMP will assign that interrupt to one secondary processor plus processor 0 as a backup processor. Secondary processors are chosen in a round robin fashion.

Currently there is no easy way to view the target processor assignment for virtual interrupts. We hope to provide this capability in the future.

Interrupt affinity does not currently apply to disk driver interrupts as they must be serviced on processor 0.

The PSM provides functionality to dynamically rearrange the interrupt distribution scheme among processors. In future versions of NetWare this feature may be employed to assist with load balancing by allowing interrupt distribution to be changed on the fly based on processor utilization or other relevant factors.


Conclusion:

The intent of this memo was inform and educate developers and support people about interrupt issues in NetWare SMP. The move from PIC based PC/AT compatible systems to APIC based MP systems has revealed noticeable behavioral differences between the two architectures which may impact some devices and their drivers. The behavior of interrupts that we have grown accustomed to in the uni processor environment is no longer something that can be relied upon in MP systems. The PC architecture has taken a step forward exhibiting a new features and personality. The operating systems, hardware and drivers must adjust accordingly and move forward as well.


Reference Material:

Intel Pentium Processor Family Developers Manual, Volume 3, Chapter 19

Intel Pentium Pro Family Developers Manual, Volume 3, Chapter 7

Intel Pentium Processor and Related Products: 82489DX

Intel Pentium Processor and Related Products: AP-388

Intel 8259 Programmable Interrupt Controller Reference Manual

Intel Multi-Processor Specification Version 1.4 July 1995

[an error occurred while processing this directive]