MADMAN Working Group                 Gordon B. Jones [gbjones@mitre.org]
INTERNET-DRAFT                                                     MITRE
draft-ietf-madman-alarmmib-01.txt       Niraj Jain [njain@us.oracle.com]
                                                      Oracle Corporation
                                       Glenn Mansfield [glenn@aic.co.jp]
                                                  AIC Systems Laboratory
                                                           August    1996


                       Mail and Directory Alarms


Status of this Memo

   This document is an  Internet  Draft.  Internet  Drafts  are  working
   documents  of  the Internet Engineering Task Force (IETF), its Areas,
   and its Working Groups. Note that other groups  may  also  distribute
   working documents as Internet Drafts.

   Internet Drafts are draft  documents  valid  for  a  maximum  of  six
   months.  Internet  Drafts  may  be updated, replaced, or obsoleted by
   other documents at any time.  It is not appropriate to  use  Internet
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress."

   To learn the current status of any Internet-Draft, please  check  the
   1id-abstracts.txt  listing  contained  in  the Internet-Drafts Shadow
   Directories on ds.internic.net, nic.nordu.net,  ftp.nisc.sri.com,  or
   munnari.oz.au.


Abstract

This document defines alarms for Mail and Directory usage. It is  to  be
used  in  conjunction  with  the  Mail and Directory Management (MADMAN)
RFCs.


Expires: January 31, 1997                                     [Page 1]

Internet Draft                                             August 1996


1.The SNMPv2 Network Management Framework.


1.  The SNMPv2 Network Management Framework.

   The major components of the SNMPv2 Network Management framework  are
   described in the documents listed below.


        o RFC 1902 [1] defines the Structure of Management Information
           (SMI), the mechanisms used for describing and naming objects
           for the purpose of management.

        o STD 17, RFC 1213 [2] defines MIB-II, the core set of managed
           objects (MO) for the Internet suite of protocols.

        o RFC 1905 [3] defines the protocol used for network access to
           managed objects.


The framework is adaptable/extensible by defining new MIBs to  suit  the
requirements of specific applications/protocols/situations.

Managed objects are accessed via a virtual information store,  the  MIB.
Objects  in  the  MIB  are  defined  using the subset of Abstract Syntax
Notation One (ASN.1) defined in the SMI. In particular, each object type
is  named by an OBJECT IDENTIFIER, which is an administratively assigned
name. The object  type  together  with  an  object  instance  serves  to
uniquely  identify  a  specific  instantiation  of the object. For human
convenience, often a textual string, termed the descriptor, is  used  to
refer to the object type.


2. The Need for Alarms in Messaging

Alarms are notifications of abnormalities associated with an  MTA  or  a
message  processed  by  an  MTA.   Alarms  are generated by a Management
Console.  Two facilities aid the Management Console in the generation of
alarms.  The  first  facility is the trap, which is an unsolicited event
initiated by  the  Management  Agent  and  directed  to  the  Management
Console.   Traps  generated by an agent may optionally convey the values
of MIB variables inside them.  The  Management  Console  interprets  the
traps and generates alarms as it determines appropriate.

The second facility consists of variables that  can  be  polled  by  the
Management  Console.  These variables include the existing MIB variables
defined in the other  MADMAN  RFCs  (Network  Services  Monitoring  MIB,
Directory  Services  Monitoring  MIB,  Mail  Monitoring  MIB), plus more


Expires: January 31, 1997                                     [Page 2]

Internet Draft                                             August 1996


variables defined herein  specifically  to  augment  support  for  alarm
generation.  If  the  Management  Console detects a variable value which
indicates that a threshold has been reached,  or  some  other  worrisome
trend  or  event  has  occurred,  it generates an alarm as it determines
appropriate. It is expected that when an abnormality occurs, a trap will
be  generated indicating the specific cause of the problem.  If the trap
is lost or discarded by the network, the console may  still  detect  the
abnormality  on its next regular polling cycle through inspection of the
MIB variables.  This combination of mechanisms provides a flexible alarm
functionality that is either event-driven, polling-driven, or both.

It is understood that traps are an unreliable mechanism.  However, traps
may  enhance the effects of polling-based alarms.  This is because traps
can provide a more immediate discovery of a problem than  polling  alone
can,  which  may be important within some operational environments.  For
example, when component  availability  is  required  to  exceed  99%,  a
polling  cycle  consisting  of  fifteen  minute intervals to detect if a
component is operational may fail this  requirement.   A  polling  cycle
more  frequent than fifteen minutes might saturate the network with SNMP
traffic. When a fifteen minute polling cycle  with  99%  reliability  is
combined with an event-driven mechanism that is itself 99% reliable, the
probability that a given component  failure  goes  undetected,  if  both
event-driven  and  polled,  becomes  less  than one one-hundredth of one
percent.  This scenario is  also  applicable  to  the  case  of  message
throughput  requirements, where the detection of queue saturation may be
both event-driven and polling-driven.

Alarms  denote  cases  where  outstanding  intervention   is   required.
Implementations that result in a bombardment of superfluous traps should
be avoided (some fault conditions may lend themselves to  this).   Traps
should  not be issued repetitively to signify one basic fault condition.
The  setting  of  threshold  conditions  and  the  evaluation  of  other
composite  information  is  the  responsibility  of the console, or is a
local implementation matter within the agent.  The destinations of  SNMP
traps  as  selected  by  the SNMP agents or applications is also a local
matter.


3. MIB Data to Support Alarms

The following material is a definition of the traps  and  MIB  variables
defined   specifically  to  support  alarm  functionality.   The  MADMAN
variables used to support alarms are defined in  RFCs  19??,  19??,  and
19??.   The  usage  of these traps and MIB variables to fulfill specific
requirements is defined in a later section.


3.1 Traps to Support Alarms


Expires: January 31, 1997                                     [Page 3]

Internet Draft                                             August 1996


Two forms of specific traps are defined to support alarms.   The  first,
called mADAlarm, denotes an MTA- or DSA-related failure, and the second,
messageAlarm, denotes a message-related failure in an MTA. mADAlarm This
trap is generated by the agent in an unsolicited fashion to signify that
a failure has occurred within the MTA or DSA.  Examples of such failures
may include one MTA's inability to contact another MTA, or the detection
of message queue saturation. The mADAlarm trap may convey  a  number  of
values,  including the name of the MTA or DSA reporting the problem, the
name of the remote MTA or  DSA  purportedly  causing  the  problem,  and
variables  describing  the  problem  itself.  messageAlarm  This trap is
generated by the agent in an unsolicited fashion to signify that a  non-
recoverable  failure  has  occurred  in processing a message due to some
sort of structural flaw in the message  itself  or  in  its  addressing.
Examples  may  include  cases where a message can not be delivered, non-
delivered, or redirected,  or  the  case  where  a  messaging  loop  was
detected. The messageAlarm trap may convey a number of values, including
the name of the MTA that processed the message, and variables describing
the problem itself.


3.2 MIB Variables to Support Alarms

A new table is defined in the MIB to supply supplementary  fault-related
information  to  support  alarm  generation.  When a failure occurs, the
identities of the applications responsible  are  retained  in  the  MIB,
along  with  the  ID of the message most recently involved in a failure.
Through polling, any changes  in  the  values  of  these  variables  can
signify  a recent failure. The following sections describe each variable
in the MIB. lastMessageIdFailure This is  the  identifier  of  the  most
recent  message  that  was  the  cause  of a message-related failure.  A
message-related failure is defined to be a non-recoverable error in  the
processing  of a message.  In the event of multiple message failures, it
is a clue to the administrator or application  to  inspect  the  message
queues to determine which messages are defective. numMessagesFailed This
is the total number of messages that have failed  processing  since  the
messaging  application  was last initialized.  This variable may be used
in conjunction with  lastMessageIdFailure  to  detect  multiple  message
failures  within  a single unit of time. lastFailureMtaGroupName When an
error involving a  neighboring  MTA  occurs,  this  variable  holds  the
mtaGroupName  (from  the  MADMAN mtaGroupTable) of the MTA most recently
involved in a  failure.  lastFailureApplName  This  variable  holds  the
applName  (from  the  MADMAN  applTable)  of  the MTA that most recently
reported a failure.


4.  SNMP Format for Alarms

Alarms  are  supported  under  SNMP  using  traps  and  additional   MIB


Expires: January 31, 1997                                     [Page 4]

Internet Draft                                             August 1996


variables.  An  additional  table  called mADAlarmTable is defined here.
Elements of the existing MADMAN tables and proposed extensions are  also
utilized  for  alarm  purposes.  It  is  expected  that  traps  will  be
implemented under SNMP v1, but that the grammatical constructs  used  to
define  them  are taken from SNMP v2. Page 31 of RFC 1157 shows how trap
Protocol Data Units (PDUs) are formed in SNMP  v1.   We  would  add  two
enterprise-specific  traps  (generic-trap  type  6)  whose specific-trap
values are set to either  mADAlarm  (specific-trap  0)  or  messageAlarm
(specific-trap  1).  The  enterprise field of the trap would contain the
OID "experimental ??" designating the MADMAN  alarm  MIB  (MADAlarmMIB).
The  values  of variables and their corresponding OBJECT IDENTIFIERs are
conveyed within the VarBindList.   These  variables  are  obtained  from
either the mADAlarmTable or tables found in the other MADMAN RFCs.


Expires: January 31, 1997                                     [Page 5]

Internet Draft                                             August 1996


MADMAN-ALARM-MIB DEFINITIONS ::= BEGIN

      IMPORTS
        MODULE-IDENTITY,  OBJECT-TYPE,
        NOTIFICATION-TYPE, experimental, Counter32, Gauge32
              FROM SNMPv2-SMI
        DisplayString,
        TEXTUAL-CONVENTION
              FROM SNMPv2-TC
        applOperStatus,   applName
              FROM APPLICATION-MIB
        mtaGroupName,     mtaGroupInboundRejectionReason,
        mtaGroupStoredVolume, mtaLoopsDetected, mtaGroupLoopsDetected,
        mtaGroupOutboundConnectFailureReason
              FROM MTA-MIB;

mADAlarmMIB MODULE-IDENTITY
LAST-UPDATED "9608230000Z"
           ORGANIZATION "IETF Mail and Directory Management Working
                            Group"
           CONTACT-INFO
              "        Glenn Mansfield
               Postal: AIC Systems Laboratory
                       6-6-3, Minami Yoshinari
                       Aoba-ku, Sendai, Japan 989-32.

               Tel:    +81-22-279-3310
               Fax:    +81-22-279-3640
               E-mail: glenn@aic.co.jp"
            DESCRIPTION
               "The MIB module describing alarms for MADMAN"
   ::= { experimental 73 }

 mADAlarmTable OBJECT-TYPE
     SYNTAX SEQUENCE OF mADAlarmEntry
     ACCESS not-accessible
     STATUS mandatory
     DESCRIPTION
          "The table holding alarm information for an individual MTA or DSA."
::= { mADAlarmMIB 1 }

mADAlarmEntry OBJECT-TYPE
    SYNTAX mADAlarmEntry
    ACCESS not-accessible
    STATUS mandatory
        DESCRIPTION
          "The alarm entry associated with each MTA or DSA."
::= { mADAlarmTable 1 }


Expires: January 31, 1997                                     [Page 6]

Internet Draft                                             August 1996


mADAlarmEntry ::= SEQUENCE {
        lastMessageIdFailure DisplayString,
        numMessagesFailed Counter32,
        lastFailureMtaGroupName DisplayString,
        lastFailureMtaApplName DisplayString
}

lastMessageIdFailure OBJECT-TYPE
       SYNTAX DisplayString
       ACCESS read-only
       STATUS mandatory
       DESCRIPTION
        "This is the message ID of the last message to either loop or have
         an unrecoverable error while proccessing"
       ::= {mADAlarmEntry 1}

 numMessagesFailed OBJECT-TYPE
       SYNTAX Counter32
       ACCESS read-only
       STATUS mandatory
       DESCRIPTION
        "This is the number of messages that have had an unrecoverable error
         while proccessing since MTA initialization"
       ::= {mADAlarmEntry 2}

lastFailureMtaGroupName OBJECT-TYPE
       SYNTAX DisplayString
       ACCESS read-only
       STATUS mandatory
       DESCRIPTION
        "This is the group name of the last MTA group to have a connectivity
         failure"
       ::= {mADAlarmEntry 3}

lastFailureMtaApplName OBJECT-TYPE
       SYNTAX DisplayString
       ACCESS read-only
       STATUS mandatory
       DESCRIPTION
        "This is the application name of the last MTA to have a connectivity
         failure"
       ::= {mADAlarmEntry 4}


mADAlarmNotifications OBJECT IDENTIFIER ::= { mADAlarmMIB 2 }

mADAlarm NOTIFICATION-TYPE
        OBJECTS {applOperStatus, applName, mtaGroupName,


Expires: January 31, 1997                                     [Page 7]

Internet Draft                                             August 1996


                 mtaGroupConnectFailureReason, mtaGroupStoredVolume}
                 -- these OBJECTS are the things that an mADAlarm may convey
::= {mADAlarmNotifications 1}

messageAlarm NOTIFICATION-TYPE
        OBJECTS {lastMessageIdFailure, numMessagesFailed }
::= {mADAlarmNotifications 2}

mADAlarmConformance OBJECT IDENTIFIER ::= {mADAlarmMIB 3}

mADAlarmGroup  OBJECT IDENTIFIER ::= {mADAlarmConformance 1}
mADAlarmCompliances  OBJECT IDENTIFIER ::= {mADAlarmConformance 2}

mADAlarmTrapCompliance  MODULE-COMPLIANCE
   STATUS current
   DESCRIPTION
      "The most basic level of compliance for MAD SNMPv2 entities that
implement MAD alarms."
   MODULE
      MANDATORY-GROUPS {mADAlarmTrapGroup}
   ::= {mADAlarmCompliances 1}

mADAlarmVariableCompliance  MODULE-COMPLIANCE
   STATUS current
   DESCRIPTION
      "The compliance statement for MAD SNMPv2 entities that implement MIB
variables to support
       alarms for MTAs."
   MODULE
      MANDATORY-GROUPS {mADAlarmVariableGroup}
   ::= {mADAlarmCompliances 2}

mADAlarmTrapGroup  OBJECT-GROUP
   OBJECTS {mADAlarm, messageAlarm}
   STATUS current
   DESCRIPTION "Two Traps providing the basic level of support for alarms for
MTAs."
   ::= {mADAlarmGroup 1}

mADAlarmVariableGroup OBJECT-GROUP
   OBJECTS {lastMessageIdFailure, numMessagesFailed,
      lastFailureMtaGroupName, lastFailureMtaApplName}
   STATUS current
   DESCRIPTION "A collection of objects providing support for alarms for MTAs
that includes some
       other alarm-specific MIB variables"
   ::= {mADAlarmGroup 2}


Expires: January 31, 1997                                     [Page 8]

Internet Draft                                             August 1996


END


5. Scenarios

The following scenarios provide examples of how the mADAlarm and messageAlarm
are used in various fault conditions.


5.1 Connectivity Failure

When an MTA or DSA detects that another MTA or DSA cannot be contacted, a
mADAlarm is sent.  The mADAlarm contains the applName of the MTA reporting the
problem, the mtaGroupName for the MTA that cannot be contacted, and the
mtaGroupOutboundConnectFailureReason.  In the case of a more general
connectivity failure, such as the general unavailability of the network
element, the MTA-trap contains only the variable mtaGroupConnectFailureReason.
Care should be taken to report these conditions only in the case of permanent
failure, since intermittent failures are more frequent and might result in too
many traps being generated.  For example, when an MTA cannot connect to another
MTA in order to deliver a message, the MTA delivering the message usually
retries the delivery attempt for a specified duration or for a specified number
of tries.  If the retry limit is exceeded, a case that should not occur, the
message is returned.  In this case, a trap would be sent when the retry limit
is exceeded, but would not be sent for each individual retry.


5.2 MTA or DSA Down

This condition signifies that the MTA or DSA is not operational (but should be)
or has not recently registered with the management system.  This condition is
reported with an mADAlarm containing the values of applOperStatus and applName
from the MADMAN Application Monitoring MIB.  Support for this feature is
optional, since an MTA or DSA that has crashed cannot report that fact to an
agent, and since off-the-shelf agents cannot be expected to monitor the
aliveness of applications by themselves.


5.3 Messaging Loop Detection

This condition may signify that a particular message has been detected,
received, and sent multiple times, perhaps exceeding a locally established
threshold value.  The condition is reported with a messageAlarm trap, where the
trap contains the applName of the MTA reporting the problem, and optionally the
values of lastMessageIdFailure, mtaLoopsDetected, mtaGroupLoopsDetected.


Expires: January 31, 1997                                     [Page 9]

Internet Draft                                             August 1996


5.4 Message Processing Failure

When an MTA encounters certain non-recoverable errors processing a message,
(e.g., a "dead" message that cannot be delivered, nondelivered, or redirected),
a messageAlarm is generated.  The messageAlarm contains the applName of the MTA
reporting the failure, and optionally the lastMessageIdFailure, which
identifies the most recent message that failed, and numMessagesFailed, which
aids in detecting multiple message failures.  If other messages had failed
processing prior to the immediate condition being reported and after the most
recent polling cycle, the identities of these messages may be detected
manually.


5.5 Queue Error

When an MTA or agent detects that a queue is full or is approaching saturation,
a mADAlarm is sent.  The applName of the MTA reporting the problem is conveyed
within the variable bindings list of the mADAlarm.  The mADAlarm also contains
the values of the MIB variables mtaGroupName and mtaGroupStoredVolume (both
from the mtaGroupTable).


5.6 Security Error

When an MTA or agent detects a security error such as an authentication failure
(e.g. when an MTA or DSA fails to authenticate itself to another), a mADAlarm
is sent.  The applName of the MTA reporting the problem is conveyed within the
variable bindings list of the mADAlarm.  The mADAlarm also contains the values
of the MIB variables mtaGroupInboundRejectionReason (stating an authentication
failure) and the mtaGroupName.

When an MTA or agent detects a security error such as a data integrity
violation (e.g. while processing a message), a messageAlarm is sent.
The applName of the MTA reporting the problem is conveyed within the variable
bindings list of the messageAlarm.  The messageAlarm also contains the values
of the MIB variables mtaGroupInboundRejectionReason (stating an integrity
violation) and the mtaGroupName.


Expires: January 31, 1997                                    [Page 10]

Internet Draft                                             August 1996


6.  Acknowledgements

This draft is the product of discussions and deliberations carried out
in the following groups:
        ietf-madman-wg  ietf-madman@innosoft.com

This draft also incorporates the intellectual contributions of

            Bruce Greenblatt
            Sue Lebeck
            Roger Mizumori
            Edward Owens


7.  References


   [1] Case, J., McCloghrie, K., Rose, M., and S. Waldbusser, "Structure
       of Management Information for version 2 of the Simple Network
       Management Protocol (SNMPv2)", RFC 1902, SNMP Research,Inc.,
       Hughes LAN Systems, Dover Beach Consulting, Inc., Carnegie Mellon
       University, February 1996.

   [2] McCloghrie, K., and M. Rose, Editors, "Management Information
       Base for Network Management of TCP/IP-based internets: MIB-II",


Expires: January 31, 1997                                    [Page 11]

Internet Draft                                             August 1996


       STD 17, RFC 1213, Hughes LAN Systems, Performance Systems
       International, March 1991.

   [3] Case, J., McCloghrie, K., Rose, M., and S, Waldbusser, "Protocol
       Operations for version 2 of the Simple Network Management
       Protocol (SNMPv2)", RFC 1905, SNMP Research,Inc., Hughes LAN
       Systems, Dover Beach Consulting, Inc., Carnegie Mellon
       University, February 1996.

   [4] Freed, N., Kille, S., "Network Services Monitoring MIB"
       Monitoring MIB", RFC 1565, Innosoft, ISODE Consortium, January
       1994.

   [5] Freed, N., Kille, S., "Mail Monitoring MIB", RFC 1566,
       Innosoft, ISODE Consortium, January 1994.

   [6] Mansfield, G., Kille, S, "X.500 Directory Monitoring MIB",
       Monitoring MIB", RFC 1567, AIC Systems Lab, ISODE Consortium,
       November 1994


Security Considerations

   Security issues are not discussed in this memo.


Authors' Addresses

   Glenn Mansfield
   AIC Systems Laboratories
   6-6-3 Minami Yoshinari
   Aoba-ku, Sendai 989-32
   Japan

   Phone: +81-22-279-3310
   E-Mail: glenn@aic.co.jp


  Gordon B. Jones
  MITRE  Corporation
  1820 Dolley Madison Blvd.
  McLean, VA 22102-3481

  Phone: (703) 883-76701
  E-Mail: gbjones@mitre.org


Expires: January 31, 1997                                    [Page 12]

Internet Draft                                             August 1996


  Niraj Jain
  Oracle Corporation
  500 Oracle Parkway
  Redwood Shores
  California 940065

  Phone: (415) 506-2581
  E-Mail: njain@us.oracle.com


Expires: January 31, 1997                                    [Page 13]