HTTP/1.1 200 OK Date: Tue, 09 Apr 2002 08:29:08 GMT Server: Apache/1.3.20 (Unix) Last-Modified: Fri, 07 Jul 1995 22:00:00 GMT ETag: "323ba0-1667a-2ffdae60" Accept-Ranges: bytes Content-Length: 91770 Connection: close Content-Type: text/plain Internet Draft Barbara Fraser Network Working Group SEI/CMU Expires in six months July 1995 Site Security Handbook for System and Network Administrators Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). 1. INTRODUCTION This document provides guidance to system and network administrators on how to address security issues within the Internet community. It builds on the foundation provided in RFC 1244 and is the collective work of a number of contributing authors. Those authors include: Philip J. Nesser, Klaus-Peter Kossakowski, Erik Gutmann, Nevil Brownlee, Jules P. Aronson, Edward.P.Lewis 2. SECURITY POLICIES 2.1 What Is a Security Policy and Why Have One? The security-related decisions you make, or fail to make, as system administrator largely determines how secure or insecure your system is, how much functionality your system offers, and how easy your system is to use. However, you cannot make good decisions about security without first determining what your security goals are. And until you determine what your security goals are, you cannot make effective use of any collection of security tools, because you simply won't know what to check for, or what restrictions to impose. For instance, your goals will probably be very different from the goals of a hardware or software vendor who ships your system software set up in a default state that allows maximum access (e.g., all services turned on), but provides minimal security. Your goals will be largely determined by the following key tradeoffs: (1) services offered (functionality) vs. security (2) ease of use vs. security (3) cost of security vs. risk of loss Your goals are communicated to all users, operations staff, and managers through a set of security rules, called a "computer security policy". 2.1.1 Definition of a Computer Security Policy A computer security policy is a formal specification of the rules by which people are given access to an organization's technology and information assets. 2.1.2 Purposes of a Computer Security Policy The main purpose of a computer security policy is to inform users, staff and managers of their obligatory requirements for protecting technology and information assets through a consistent approach to policy matters. Another purpose is to provide a baseline from which to acquire, configure and audit computer systems for compliance with the policy. Therefore an attempt to use a set of security tools in the absence of at least an implied security policy is meaningless. 2.2 What Makes a Good Computer Security Policy? The characteristics of a good security policy are: (1) Able to be implemented (e.g. through system administration procedures, and by publishing of acceptable use guidelines, etc.). (2) Can be enforced (e.g., via security tools where appropriate, but also from a human perspective, via sanctions). (3) Clearly defines the areas of responsibilities. The components of a good security policy include: (1) Computer Technology Purchasing Guidelines - Specifies required or preferred security features to supplement existing purchasing policy and guidelines. (2) Privacy Policy - Defines reasonable expectations of privacy regarding such issues as monitoring of electronic mail and logging of keystrokes. (3) Access Policy - Defines access rights and privileges, and protects assets from loss or disclosure by specifying acceptable use guidelines for users, operations staff, and management. Provides guidelines for external connections, data communications, connecting devices to a network, and adding new software to a system. Also provides notification guidelines (e.g., requires a login warning banner stating that the system is for authorized users only, instead of displaying a welcome banner.) (4) Accountability Policy - Defines the responsibilities of users, management, and operations staff. Specifies an audit capability, and provides incident handling guidelines. (5) Authentication Policy - Establishes trust through an effective password policy, and by setting guidelines for remote location authentication, and the use of authentication devices. (6) Availability - Defines availability expectations. Addresses the redundancy of security functions. Specifies recovery procedures and mechanisms. (7) Violations Reporting - Denotes which types of violation must be reported, and provides a central reporting point. A non- threatening atmosphere and the possibility of anonymous reporting will result in a greater likelihood that if a violation is detected, it will be reported. (8) Supporting Information - Provides users, staff, and management with contact information for each type of policy violation. Gives guidelines to your public relations office on how handle outside queries about a security incident. Provides cross-references to security procedures and related information, such as company policies and regulatory requirements (federal, state, and local). Once your computer security policy has been established it should be clearly communicated to users, staff, and management. Having all personnel sign a statement indicating that they have read, understood, and agree to abide by the policy is an important part of the process. Finally, your policy should be reviewed on a regular basis to see if it is successfully supporting your security needs. 3. SECURITY PROCEDURES 3.1 Authentication 3.1.1 One-Time passwords Given today's networked environments, it is recommended that sites concerned about the security and integrity of their systems and networks consider moving away from standard, reusable passwords. There have been many incidents involving Trojan network programs (e.g., telnet and rlogin) and network packet sniffing programs. These programs capture clear-text hostname, account name, password triplets. Intruders can use the captured information for subsequent access to those hosts and accounts. This is possible because 1) the password is used over and over (hence the term "reusable"), and 2) the password passes across the network in clear text. Several authentication techniques have been developed that address this problem. Among these techniques are challenge-response technologies that provide passwords that are only used once (commonly called one-time passwords). This document provides a list of sources for products that provide this capability. The decision to use a product is the responsibility of each organization, and each organization should perform its own evaluation and selection. 3.1.2 Kerberos 3.1.3 Password Assurance While the need to eliminate the use of standard, reusable passwords cannot be overstated, it is recognized that some organizations may have to transition to the use of this technology. Given that situation, we have included the following sections to help with the selection and maintenance of traditional passwords. If you have to use them, here is some pertinent advice. 3.1.3.1 The importance of robust passwords In almost all cases of system penetration, the intruder needs to gain access to an interactive user account on the system. One way that goal is typically accomplished is through guessing the password of a legitimate user. The intruder may attempt to guess a password by knowing something about the legitimate user, or (more commonly) by trying dictionary words and other simple guesses. This threat is increased by the easy availability of the password file, since an intruder may attempt to guess the encrypted passwords in that file by using another (perhaps faster and more powerful) system to run automated "cracking" tools on the stolen password file. To guard against this threat, the simplest and most straightforward approach is to assure that the passwords on all accounts are not easily guessed. Once the system is guarded against this threat, the intruder would have to devote an inordinately large amount of resources to try to crack your passwords. It is more likely that the intruder would try to find another method of penetration, or would give up and try another system. 3.1.3.2 Restricting access to the password file The first step to password assurance is to limit access to the encrypted portion of the users' password. If the encrypted portion is unavailable to an attacker, guessing passwords cannot be done off-line at a remote (and possibly more powerful) location. The typical method used to protect the encrypted password is to use a shadow password file scheme. This requires the use of two distinct password files. The first contains the standard password file information (username, full-name, login directory, and shell) but contains a dummy or false encrypted password field. This information is kept in another password file that is available only to the root user or system. In this case, standard system utilities (such as FTP) and users without elevated privileges cannot gain access to this information. It is important to note that use of a shadow password file does not remove the need for robust password selection on the part of the user, but is one additional protection against penetration. 3.1.3.3 Initial password selection One problem often found in systems that have been penetrated is that a user's initial password, set by the system administrator, is easily guessed by an intruder. While the setting of this password is usually accompanied by the request that the password be changed as soon as the account is used, many users lack the knowledge or motivation to change it. Also, by giving a user a simple password, the system administrator sets up an expectation on the part of the user that simple passwords are acceptable. Therefore, a naive user is unlikely to select a more robust password, since such passwords tend to be harder to remember. Because of this threat, the system administrator should follow some sensible guidelines (such as the criteria shown below) for selecting "good" initial passwords for the users of the system, and for selecting a good root password as well. The system administrator should also provide users with guidance on choosing their own robust passwords. 3.1.3.4 Password selection criteria Many users give little thought to what makes a "good" password, and would be surprised at just how easy some passwords are for intruders to guess. Unfortunately, the increasing power of computing is making password "cracking" far easier than ever before. It is now fairly trivial to launch a brute-force attack on any password of four characters or less. Also, the increasing availability of disk space is allowing intruders access to complete dictionaries of not only English, but many foreign languages as well. Some password selection criteria that can reasonably be included in a password policy are: o The minimum password length should be 7-8 characters . o The password should not be a permutation of any user account or personal information such as name, office, telephone number, or social security number. o The password should not be contained in any English or foreign dictionary in regular use. This includes real/imaginary character names or terms from folklore. o The password should not consist of acronyms or project-specific information relating to the user's job or function. While these criteria may seem overly restrictive, there are many schemes that will generate hard to guess passwords that are still easy to remember. One example is to take a well known phrase and use the first or second letter of each word to form your password. Another is to use two short words separated by a control or punctuation character. 3.1.3.5 Password aging When and how to expire passwords is still a subject of controversy among the security community. It is generally accepted that a password should not be maintained once an account is no longer in use, but it is hotly debated whether a user should be forced to change a good password that's in active use. The arguments for changing passwords relate to the prevention of the continued use of penetrated accounts. However, the opposition claims that frequent password changes lead to users writing down their passwords in visible areas (such as pasting them to a terminal), or to users selecting very simple passwords that are easy to guess. While there is no definitive answer to this dilemma, a password policy should directly address the issue and provide guidelines for how often a user should change the password. It is recommended that passwords be changed whenever root is penetrated, there is a critical change in personnel (especially if it is the system administrator!), or when an account has been compromised. In particular, if the root password is compromised, all passwords on the system should be changed. In addition, an annual change in their password is usually not difficult for most users, and you should consider requiring it. 3.1.3.6 Machine generated passwords An alternative to allowing users the opportunity to select easily guessed passwords is to require machine generated passwords. These have historically been a problem due to the difficulty of remembering such passwords. As a result, they were often written down, and therefore more easily stolen. Another alternative is to present a list of several machine generated passwords and allow the user to select the one that's easiest for him or her to remember. It is recommended that if machine generated passwords are mandated by your policy, then users should not be required to change passwords very frequently. (For example, not more than once every six months.) 3.1.3.8 Multiple accounts It is now commonplace for a user to have accounts on many different systems. One problem is that if the same password is used on all of these systems, the penetration of one system will easily spread to the others. It is therefore recommended that a user should have different passwords on each system. However, if a user plans to use the same password on more than one system, these systems should all be in the same administrative domain, and should all be running the same operating system. 3.1.3.9 Tools for password assurance There are tools to aid a security administrator in assuring robust passwords. These tools fall generally into two categories: "password cracking" and "password filtering" programs. Password cracking programs take the encrypted password field and attempt to perform the same task an intruder would try -- guessing the password from dictionaries and common words. This is a lengthy process for any but the simplest of schemes since the guesses have to be encrypted once for each compare against the encrypted password entry. However, the intruder may have access to better dictionaries and more powerful systems than the security administrator. Nonetheless, password cracking programs provide a mechanism for assuring that no user has a password that is contained in your own dictionaries. Examples of tools that provide password cracking are "crack" and "cops". Crack is a tool that is completely devoted to password cracking and provides a mechanism for variations on dictionary words, and also provides the ability to distribute the processing over a network of computers to take advantage of parallelism. Cops is a more general security tool that has password cracking as one of its functions. It is not as complete as crack, but serves as a first level of protection for a system. Password filtering programs are replacements for the system utility that is invoked to change a user's password. In this scheme, the user submits a plain-text password to the tool, which checks the password against some criteria before changing the encrypted password on the system. This scheme has the advantage that more tests can be run against the plain-text password (such as a check for recurring patterns or reverse words) that would be very expensive to check for in a password cracking program. If such a tool is installed, it should completely replace the password changing and setting mechanism so that no user can bypass the program and change the password to a less robust string. An example of a tool that provides password filtering is "npasswd". 3.2 Authorization Authorization refers to the process of granting privileges to processes and ultimately users. This differs from authentication in that authentication is what occurs to identify a user. Once identified (reliably), the privileges, rights, property, and permissible actions of the user are determined by authorization. Explicity listing the authorized activities of each user (and user process) with respect to all resources (objects) is impossible in a reasonable system. In a real system certain techniques are used to simplify the process of granting and checking authorization(s). One approach, popularized in UNIX systems, is to assign to each object three classes of user - the super user, the owner, and the group. Super user, or root, is an entity that has access to all portions (and objects) of the computer. The owner of an object is the "user" who either created the object or was given it by the super user. A group is a collection of users that share privileges over a collection of objects. Groups ease authorization management by simplifying the process of changing the authorization of users and by changing the authority of a group to manage an object. Another approach is to attach to an object a list which explicitly contains the identity of all permitted users (or groups). This is an Access Control List. The advantage of these are that they are easily maintained (one central list per object). 3.3 Access 3.4 Modems 3.4.1 Modem lines must be managed. Although they provide convenient access to a site for its users, they can also provide an effective detour around the site's firewalls. For this reason it is essential to maintain proper control of modems. Don't allow users to install a modem line without proper authorization. This includes temporary installations, e.g. plugging a modem into a facsimile or telephone line overnight. Maintain a register of all your modem lines. Conduct regular site checks for unauthorized modems; keep your register up to date. 3.4.2 Dial-in users must be authenticated. A username and password check should be completed before a user can access anything on your network. Normal password security considerations (such as choosing passwords which don't appear in dictionaries and changing them from time to time) are particularly important. Remember that telephone lines can be tapped, and that it is quite easy to intercept messages to cellular 'phones. Modern high-speed modems use more sophisticated modulation techniques -, which makes them somewhat more difficult to monitor - but it is prudent to assume that hackers know how to eavesdrop on your lines. For this reason you should use one-shot passwords (e.g. skey) or hardware authentication devices (e.g. SecureID) if this is at all possible. It is helpful to have a single dial-in point, e.g. a single large modem pool, so that all users are authenticated in the same way. Users will occasionally mis-type a password. Set a short delay - say two seconds - after the first and second failed logins, and force a disconnect after the third. This will slow down automated password attacks. Don't tell the user whether the username, the password or both were incorrect. 3.4.3 All logins (successful and unsuccessful) should be logged. Don't keep correct passwords in the log, but consider keeping incorrect passwords to aid in detecting password attacks. However, bear in mind that most incorrect passwords are correct passwords with one character mistyped, and may suggest the real password. If you can't keep this information secure, don't log it at all. If Calling Line Identification is available, take advantage of it by recording the calling number for each login attempt. Be sensitive to the privacy issues raised by Calling Lne Identification. Also be aware that Calling Line Identification is not to be trusted; use the data for informational purposes only, not for authentication. 3.4.5 Minimize the amount of information given in your opening banner. In particular, don't announce the type of host hardware or operating system - this encourages specialist hackers. Display a short banner, but don't offer an 'inviting' name (e.g. University of XYZ, Student Records System). Instead, give your site name, a short warning that all sessions are monitored, and a username/password prompt. Get your site's lawyers to check your banner to make sure it states your legal position correctly. For high-security applications, consider using a 'blind' password, i.e. give no response to an incoming call until the user has typed in (without any echoing) a password. This effectively simulates a dead modem. 3.4.6 Call-back Capability Some dial-in servers offer call-back facilities, i.e. the user dials in and is authenticated, then the system disconnects the call and calls back on a specified number. You will probably have to pay the charges for such calls. This feature should be used with caution; it can easily be bypassed. As a minimum, make sure that the return call is never made from the same modem as the incoming one. Overall, although call-back can improve modem security, you should not depend on it alone. 3.4.7 Dial-out authentication Dial-out users should also be authenticated, particularly since your site will have to pay their telephone charges. Never allow dial-out from an unauthenticated dial-in call, and consider whether you will allow it from an authenticated one. The goal here is to prevent callers using your modem pool as part of a chain of logins. This can be hard to detect, particularly if a hacker sets up a path through several hosts on your site. As a minimum, don't allow the same modems and phone lines to be used for both dial-in and dial-out. This can be implemented easily if you run separate dial-in and dial-out modem pools. 3.4.8 Make your modem programming as 'bullet-proof' as possible. Be sure modems can't be reprogrammed while they're in service. As a minimum, make sure that three plus signs won't put your dial-in modems into command mode! Program your modems to reset to your standard configuration at the start of each new call. Failing this, make them reset at the end of each call. This precaution will protect you against accidental reprogramming of your modems. Check that your modems terminate calls cleanly. When a user logs out from an access server, verify that the server hangs up the 'phone line properly. It is equally important that the server forces logouts from whatever sessions were active if the user hangs up unexpectedly. 3.5 Cryptography 3.6 Auditing This section covers the procedures for collecting data generated by network activity that may be useful in analyzing the security of a network and/or useful in responding to a security incident. This section also covers the handling, preservation, and utilization of the data. (This will be reworked as I develop the remainder.) 3.6.1 What to collect Audit data should include any attempt to achieve a different security level of any person, process, or other entity in the network. The most obvious example in this area is a log of attempts to login to a host computer. Audit data should also include data pertaining to any "public" or anonymous access and retrieval of data, at least to the granularity of the "remote" host. And on and on... 3.6.2 Collection Process The collection process should be enacted by the host or resource being accessed. Depending on the importance of the data and the need to have it local in instances in which services are being denied, data could be kept local to the resource until needed or be transmitted to storage after each event. Reporting data can be done by writing to a file, writing to a line printer, writing over a network, or writing over a non-network port, such as a console port. Each of these has importance. File system logging is the least resource intensive of all four candidates (for a given audit log). It is also the least reliable. If a resource has been compromised, the file system is the first to go. If the network in front of the resource has become unusable, the data is inaccessible, unless a direct console port is available. Line printer logging is useful in system where permanent and immediate logs are required. A real time system is an example of this, where the exact point of a failure or attack must be recorded. A laser printer, or other device which buffers data between the auditing system and storage device may suffer from lost data if buffers contain the needed data at a critical instant. Reporting over the network provides for allowing a remote host to store data in a more permanent and possibly more reliable manner. However, this consumes bandwidth (at the minimum), exposes the audit data in an easy package to a interloper, and could be lost during network denial. Reporting over a console port ensures the delivery of the data follows the hardware design. The limitations are that the console port requires physical security and using console ports on machines more than a short distance away (e.g., across a campus) may require a phone line in addition to the network connection. In some instances, this is one more resource that may be constrained. 3.6.3 Collection Load Collecting running data may result in a quick accumulation of bytes. Storage of this must be considered in advance. There are a few ways to limit the required storage space. Data may be compressed using one of many methods. Another approach is to only archive summaries of activity (possibly losing some detail in the process). Data may be archived for just a fixed period of time, then is it permanently removed. The issue of archiving security data differs from archiving network management and application data. Network management data (statistics) can be reduced by altering the reporting period. Security data does not have that option (for the most part). Security data also does not have the permanence of application data. 3.6.4 Handling the Data/Preservation Security data should be protected as least as much as any other data is protected because from it, much can be inferred. Security data may give away enough secrets to allow a "masquerader" to impersonate an authorized administrator. Data may also become key to the investigation, apprehension, or prosecution of the perpetrator of an incident. Because of this, the data needs to be protected and clearly documented. For this reason it is advisable to seek the advice of local legal council or law enforcement when deciding how security data is to be treated. This should happen before an incident occurs. If a data handling plan is not cleared prior to an incident, this does not mean that it is useless. It means two things. You may not have recourse in the aftermath of an event. You may also me liable for penalties resulting from your treatment of the data too. 3.6.5 Audit Data Precautions In certain instances, audit data may contain personal information. Searching through the data, even for just a routine check of the network's security could present an invasion of privacy or make the auditing entity privy to information it should not be allowed to have. Note that this is not automatically true - not all data is "sensitive" and "sensitive" differs by locale. Another danger presented by auditing data is that it may reveal a pattern of incidents. If an organization knows about the incidents but permits them to continue and this results in damage to another organization ("downstream"), legal action could result. An organization may also be liable for not (making a "best effort" to) analyze this data for incidents. 3.7 Backups 4. ARCHITECTURE 4.1 Objectives 4.2 Service Configurations 4.3 Network Configurations Topology Infrastructure elements (DNS, mail hub, information servers, etc.) Network management 4.4 Firewalls 5. INCIDENT HANDLING This section of the document will supply guidance to be applied before, during and after a computer security incident is in progress on a machine, network, site, or multi-site environment. The operative philosophy in the event of a breach of computer security is to react according to a plan. This is true whether the breach is the result of an external intruder attack, unintentional damage, a student testing some new program to exploit a software vulnerability, or a disgruntled employee. Each of the possible types of events described above should be considered for an adequate contingency plan. Without a proactive approach to protect the assets in case of an incident the handling process can not be as efficient as with well prepared procedures, methods and policies in place. Traditional computer security, while quite important in the overall site security plan, usually falls heavily on protecting systems from attack, and perhaps monitoring systems to detect attacks. Little attention is usually paid for how to actually handle the attack when it occurs. The result is that when an attack is in progress, many decisions are made in haste and can be damaging to tracking down the source of the incident, collecting evidence to be used in prosecution efforts, preparing for the recovery of the system, and protecting the valuable data contained on the system. One of the most important but often overlooked benefit for efficient incident handling is an economic one. Having both technical and managerial personnel respond to an incident requires considerable resources, resources which could be utilized more profitably if an incident did not require their services. If these personnel are trained to handle an incident efficiently, less of their time is required to deal with that incident. Due to the worldwide network most of the incidents are not restricted to one site only. Operating systems vulnerabilities apply (in some cases) to several millions of systems and many vulnerabilities are exploited within the network itself. Therefore it is vital for all sites that all involved parties are informed as soon as possible. Another benefit is related to public relations. News about computer security incidents tends to be damaging to an organization's stature among current or potential clients. Efficient incident handling minimizes the potential for negative exposure. A final benefit of efficient incident handling is related to legal issues. It is possible that in the near future organizations may be sued because one of their nodes was used to launch a network attack. In a similar vein, people who develop patches or workarounds may be sued if the patches or workarounds are ineffective, resulting in damage to systems, or if the patches or workarounds themselves damage systems. Knowing about operating system vulnerabilities and patterns of attacks and then taking appropriate measures is critical to circumventing possible legal problems. This chapter is arranged such that a list of relevant topics may be generated from the Table of Contents to provide a starting point for creating a policy for handling ongoing incidents. The main points to be included in a policy for handling incidents are: o Preparing and planning (what are the goals and objectives in handling an incident). o Notification (who should be contacted in case of an incident). o Evaluation (how serious is the incident). o Handling (what should be done when an incident occurs). This especially includes: - Notification (who should be notified about the incident). - Containment (how can the damage be limited). - Eradication (eliminate the reasons for the incident). - Recovery (how to reestablish service and systems). - Follow Up (what actions should be taken after the incident). - Legal/Investigative implications (what are the legal and prosecutorial implications of the incident). - Documentation Logs (what records should be kept from before, during, and after the incident). o Aftermath (overall implications of past incidents). o Responsibilities (for planning and handling an incident). Each of these points is important in an overall plan for handling incidents. The remainder of this chapter will detail the issues involved in each of the relevant topics, and provide some guidance as to what should be included in a site policy for handling incidents. Guidelines for End User involvement in dealing with compromised accounts and vulnerabilities are covered in the corresponding "End User Security Handbook" [RFC xxx]. Especially interesting for Site Administrators which act as Site Security Contact in assisting other users and administrators in dealing with incidents are the "Guidelines and Recommendations for Incident Processing" [RFC xxx]. 5.1 Preparing and Planning for Incident Handling Part of handling an incident is being prepared to respond before the incident occurs. This includes establishing a suitable level of protections as explained in the chapters before. Not only are incidents avoided through this protection but if the incident becomes severe, the damage which can occur is limited. Protection includes preparing incident handling guidelines as part of a contingency plan for your organization or site. Having written plans eliminates much of the ambiguity which occurs during an incident, and will lead to a more appropriate and thorough set of responses. As explained for the site specific contingency plan in section xxx it is vital important to test the proposed plan before an incident actually occurs through 'dry runs'. Once a site has recovered from and incident, site policy and procedures should be reviewed to encompass changes to prevent similar incidents. If an incident is based on poor policy, and unless the policy is changed, then one is doomed to repeat the past. Even without an incident, it would be prudent to review policies and procedures on a regular basis. Reviews are imperative due to today's changing computing environments. To improve this process a problem reporting procedure should be implemented to describe, in detail, the incident and the solutions to the incident. Each incident should be reviewed by the site security subgroup to allow understanding of the incident with possible suggestions to the site policy and procedures. Learning to respond efficiently to an incident is important for numerous reasons: o protect the assets which are to protect by normal security in case of a worst event o protect your resources which could be utilized more profitably if an incident did not require their services o take care that (government) regulations are complied with o prevent use of your systems against other systems (which could incur legal liability) o minimize the potential for negative exposure As in any set of pre-planned procedures, attention must be placed on a set of goals for handling an incident. These goals will be prioritized differently depending on the site. The set of goals will be closely related to the goals for security in general. Therefore, the same guidelines as in section xxx for security in general might be applied here. A specific set of objectives can be identified for dealing with incidents: o Figure out how it happened. o Find out how to avoid further exploitations. o Avoid escalation and further incidents. o Recover from the incident. o Find out who did it. Due to the nature of the incident there might be a conflict between analyzing the original source of a problem instead of restoring systems and services. In this case overall goals (like assuring the integrity of (life) critical systems) might be the reason for not analyzing an incident. Of course this is an important management decision, but all involved parties must be aware that without a analysis the same incident may happen again. It is important to prioritize actions to be taken during an incident well in advance of the time an incident occurs. Sometimes an incident may be so complex that it is impossible to do everything at once to respond to it; priorities are essential. Although priorities will vary from institution to institution, the following suggested priorities may serve as a starting point for defining an organization's response: o Priority one -- protect human life and people's safety; human life always has precedence over all other considerations. o Priority two -- protect classified and/or sensitive data. Prevent exploitation of classified and/or sensitive systems, networks or sites. Inform effected classified and/or sensitive systems, networks or sites about already occurred penetrations. (Be aware of regulations by your site or by government) o Priority three -- protect other data, including proprietary, scientific, managerial and other data, because loss of data is costly in terms of resources. Prevent exploitations of other systems, networks or sites and inform already effected systems, networks or sites about successful penetrations. o Priority four -- prevent damage to systems (e.g., loss or alteration of system files, damage to disk drives, etc.); damage to systems can result in costly down time and recovery. o Priority five -- minimize disruption of computing resources; it is better in many cases to shut a system down or disconnect from a network than to risk damage to data or systems. An important implication for defining priorities is that once human life and national security considerations have been addressed, it is generally more important to save data than system software and hardware. Although it is undesirable to have any damage or loss during an incident, systems can be replaced; the loss or compromise of data (especially classified data), however, is usually not an acceptable outcome under any circumstances. Another important concern is the effect on other, beyond the systems and networks where the incident occurs. Additionally to government regulations it is always important to inform effected parties at soon as possible. Due to the fact of legal implications this topic should be addressed in the plans to avoid further delays and uncertainty for the administrators. Any plan for responding to security incidents should be guided by local policies and regulations. Government and private sites that deal with classified material have specific rules that they must follow. The policies your site makes chooses how it reacts to incidents will shape your response. For example, it may make little sense to create mechanisms to monitor and trace intruders if your site does not plan to take action against the intruders if they are caught. Other organizations may have policies that affect your plans. Telephone companies often release information about telephone traces only to law enforcement agencies. You may also note that if any legal action is planned, there are specific guidelines that must be followed to make sure that any information collected can be used as evidence. 5.2 Notification and Point of Contacts It is important to establish contacts with various personnel before a real incident occurs. These contacts are either local, related to the network or are investigative agencies. Working with these contacts appropriately will help to make your incident handling process more efficient. o Point of Contact (POC) people (Technical, Administrative, Response Teams, Investigative, Legal, Vendors, Service providers), and which POCs are visible to whom. o Wider community (users). o Other sites that might be affected. o The public (press). These issues are especially important for the local person responsible for handling the incident, since that is the person responsible for the actual notification of others. A list of contacts in each of these categories is an important time saver for this person during an incident. It can be quite difficult to find an appropriate person during an incident when many urgent events are ongoing. Including relevant telephone numbers (also electronic mail addresses and fax numbers) in the site security policy is strongly recommended. 5.2.1 Local Managers and Personnel When an incident is under way, a major issue is deciding who is in charge of coordinating the activity of the multitude of players. A major mistake that can be made is to have a number of "points of contact" (POC) that are not pulling their efforts together. This will only add to the confusion of the event, and will probably lead to additional confusion and wasted or ineffective effort. The single point of contact may or may not be the person "in charge" of the incident. There are two distinct rolls to fill when deciding who shall be the point of contact and the person in charge of the incident. The person in charge will make decisions as to the interpretation of policy applied to the event. The responsibility for the handling of the event falls onto this person. In contrast, the point of contact must coordinate the effort of all the parties involved with handling the event. The point of contact must be a person with the technical expertise to successfully coordinate the effort of the system managers and users involved in monitoring and reacting to the attack. Often the management structure of a site is such that the administrator of a set of resources is not a technically competent person with regard to handling the details of the operations of the computers, but is ultimately responsible for the use of these resources. Another important function of the POC is to maintain contact with law enforcement and other external agencies to assure that multi-agency involvement occurs. (In the U.S. FBI, CIA, DoD, U.S. Army, or others might be concerned.) Finally, if legal action in the form of prosecution is involved, the POC may be able to speak for the site in court. The alternative is to have multiple witnesses that will be hard to coordinate in a legal sense, and will weaken any case against the attackers. A single POC may also be the single person in charge of evidence collected, which will keep the number of people accounting for evidence to a minimum. As a rule of thumb, the more people that touch a potential piece of evidence, the greater the possibility that it will be inadmissible in court. It is very important to prepare a method of notification, so that you will know who to call and how to contact them. For example, every member of the Department of Energy's CIAC Team carries a card with every other team member's work and home phone numbers, as well as pager numbers. + responsibilities are distributed over the whole site, this has strong implications for coordinating the incident process + in multi-site incidents this even gets worser + the POS might be changed during an incident because of conflicting priorities/responsibilities 5.2.2 Law Enforcement and Investigative Agencies In the event of an incident it is important to establish contact with investigative agencies such as the FBI and Secret Service as soon as possible, for several reasons. Local law enforcement and local security offices or campus police organizations should also be informed when appropriate. A primary reason is that once a major attack is in progress, there is little time to call various personnel in these agencies to determine exactly who the correct point of contact is. Another reason is that it is important to cooperate with these agencies in a manner that will foster a good working relationship, and that will be in accordance with the working procedures of these agencies. Knowing the working procedures in advance and the expectations of your point of contact is a big step in this direction. For example, it is important to gather evidence that will be admissible in a court of law. If you don't know in advance how to gather admissible evidence, your efforts to collect evidence during an incident are likely to be of no value to the investigative agency with which you deal. A final reason for establishing contacts as soon as possible is that it is impossible to know the particular agency that will assume jurisdiction in any given incident. Making contacts and finding the proper channels early will make responding to an incident go considerably more smoothly. If your organization or site has a legal counsel, you need to notify this office soon after you learn that an incident is in progress. At a minimum, your legal counsel needs to be involved to protect the legal and financial interests of your site or organization. There are many legal and practical issues, a few of which are: (1) Whether your site or organization is willing to risk negative publicity or exposure to cooperate with legal prosecution efforts. (2) Downstream liability--if you leave a compromised system as is so it can be monitored and another computer is damaged because the attack originated from your system, your site or organization may be liable for damages incurred. (3) Distribution of information--if your site or organization distributes information about an attack in which another site or organization may be involved or the vulnerability in a product that may affect ability to market that product, your site or organization may again be liable for any damages (including damage of reputation). (4) Liabilities due to monitoring--your site or organization may be sued if users at your site or elsewhere discover that your site is monitoring account activity without informing users. Unfortunately, there are no clear precedents yet on the liabilities or responsibilities of organizations involved in a security incident or who might be involved in supporting an investigative effort. Investigators will often encourage organizations to help trace and monitor intruders -- indeed, most investigators cannot pursue computer intrusions without extensive support from the organizations involved. However, investigators cannot provide protection from liability claims, and these kinds of efforts may drag out for months and may take lots of effort. On the other side, an organization's legal council may advise extreme caution and suggest that tracing activities be halted and an intruder shut out of the system. This in itself may not provide protection from liability, and may prevent investigators from identifying anyone. The balance between supporting investigative activity and limiting liability is tricky; you'll need to consider the advice of your council and the damage the intruder is causing (if any) in making your decision about what to do during any particular incident. Your legal counsel should also be involved in any decision to contact investigative agencies when an incident occurs at your site. The decision to coordinate efforts with investigative agencies is most properly that of your site or organization. Involving your legal counsel will also foster the multi-level coordination between your site and the particular investigative agency involved which in turn results in an efficient division of labor. Another result is that you are likely to obtain guidance that will help you avoid future legal mistakes. Finally, your legal counsel should evaluate your site's written procedures for responding to incidents. It is essential to obtain a "clean bill of health" from a legal perspective before you actually carry out these procedures. One of the most important considerations in dealing with investigative agencies is verifying that the person who calls asking for information is a legitimate representative from the agency in question. Unfortunately, many well intentioned people have unknowingly leaked sensitive information about incidents, allowed unauthorized people into their systems, etc., because a caller has masqueraded as a representative of a government agency (e. g. the FBI or Secret Service in the US). A similar consideration is using a secure means of communication. Because many network attackers can easily reroute electronic mail, avoid using electronic mail to communicate with other agencies (as well as others dealing with the incident at hand). Non-secured phone lines (e. g., the phones normally used in the business world) are also frequent targets for tapping by network intruders, so be careful! There is no established set of rules for responding to an incident when the local Government becomes involved. Normally, except by court order, no agency can force you to monitor, to disconnect from the network, to avoid telephone contact with the suspected attackers, etc.. As discussed before, you should consult the matter with your legal counsel, especially before taking an action that your organization has never taken. The particular agency involved may ask you to leave an attacked machine on and to monitor activity on this machine, for example. Your complying with this request will ensure continued cooperation of the agency--usually the best route towards finding the source of the network attacks and, ultimately, terminating these attacks. Additionally, you may need some information or a favor from the agency involved in the incident. You are likely to get what you need only if you have been cooperative. Of particular importance is avoiding unnecessary or unauthorized disclosure of information about the incident, including any information furnished by the agency involved. The trust between your site and the agency hinges upon your ability to avoid compromising the case the agency will build; keeping "tight lipped" is imperative. Sometimes your needs and the needs of an investigative agency will differ. Your site may want to get back to normal business by closing an attack route, but the investigative agency may want you to keep this route open. Similarly, your site may want to close a compromised system down to avoid the possibility of negative publicity, but again the investigative agency may want you to continue monitoring. When there is such a conflict, there may be a complex set of tradeoffs (e.g., interests of your site's management, amount of resources you can devote to the problem, jurisdictional boundaries, etc.). An important guiding principle is related to what might be called "Internet citizenship" [22, IAB89, 23 (xxx old references)] and its responsibilities. Your site can shut a system down, and this will relieve you of the stress, resource demands, and danger of negative exposure. The attacker, however, is likely to simply move on to another system, temporarily leaving others blind to the attacker's intention and actions until another path of attack can be detected. Providing that there is no damage to your systems and others, the most responsible course of action is to cooperate with the participating agency by leaving your compromised system on. This will allow monitoring (and, ultimately, the possibility of terminating the source of the threat to systems just like yours). On the other hand, if there is damage to computers illegally accessed through your system, the choice is more complicated: shutting down the intruder may prevent further damage to systems, but might make it impossible to track down the intruder. If there has been damage, the decision about whether it is important to leave systems up to catch the intruder should involve all the organizations effected. Further complicating the issue of network responsibility is the consideration that if you do not cooperate with the agency involved, you will be less likely to receive help from that agency in the future. 5.2.3 Computer Security Incident Handling Teams There now exists a number of Computer Security Incident Handling teams (CSIH teams) such as the CERT Coordination Center and the CIAC or other teams around the globe. Teams exist for many major government agencies and large corporations. If such a team is available, notifying it should be of primary importance during the early stages of an incident. These teams are responsible for coordinating computer security incidents over a range of sites and larger entities. Even if the incident is believed to be contained to a single site, it is possible that the information available through a response team could help in closing out the incident. If it is determined that the breach occurred due to a flaw in the systems' hardware or software, the vendor (or supplier) and a Computer Security Incident Handling team should be notified as soon as possible. This is especially important due to the fact that many other systems are vulnerable, too. In setting up a site policy for incident handling, it may be desirable to create a subgroup, much like those teams that already exist, that will be responsible for handling computer security incidents for the site (or organization). If such a team is created, it is essential that communication lines be opened between this team and other teams. Once an incident is under way, it is difficult to open a trusted dialogue between other teams if none has existed before. (See [RFC xxx] for more information about the considerations for creating your own incident handling team.) 5.2.4 Effected and involved Sites + special care should be taken to inform other effected sites + directly: but who is responsible for the contact, who is responsible for an incident at the other site + indirectly: utilizing a incident response team + responsibility/liability + privacy reasons to protect specific data 5.2.5 Public Relations - Press Releases One of the most important issues to consider is when, who, and how much to release to the general public through the press. There are many issues to consider when deciding this particular issue. First and foremost, if a public relations office exists for the site, it is important to use this office as liaison to the press. The public relations office is trained in the type and wording of information released, and will help to assure that the image of the site is protected during and after the incident (if possible). A public relations office has the advantage that you can communicate candidly with them, and provide a buffer between the constant press attention and the need of the POC to maintain control over the incident. If a public relations office is not available, the information released to the press must be carefully considered. If the information is sensitive, it may be advantageous to provide only minimal or overview information to the press. It is quite possible that any information provided to the press will be quickly reviewed by the perpetrator of the incident. As a contrast to this consideration, it was discussed above that misleading the press can often backfire and cause more damage than releasing sensitive information. While it is difficult to determine in advance what level of detail to provide to the press, some guidelines to keep in mind are: o Keep the technical level of detail low. Detailed information about the incident may provide enough information for copy-cat events or even damage the site's ability to prosecute once the event is over. o Keep the speculation out of press statements. Speculation of who is causing the incident or the motives are very likely to be in error and may cause an inflamed view of the incident. o Work with law enforcement professionals to assure that evidence is protected. If prosecution is involved, assure that the evidence collected is not divulged to the press. o Try not to be forced into a press interview before you are prepared. The popular press is famous for the "2am" interview, where the hope is to catch the interviewee off guard and obtain information otherwise not available. o Do not allow the press attention to detract from the handling of the event. Always remember that the successful closure of an incident is of primary importance. 5.3 Identifying an Incident 5.3.1 It is real? This stage involves determining, if a problem really exist. Of course many, if not most, signs often associated with virus infections, system intrusions, malicious users, etc., are simply anomalies such as hardware failures or suspicious system/user behavior. To assist in identifying whether there really is an incident, it is usually helpful to obtain and use any detection software which may be available. For example, widely available software packages can greatly assist someone who thinks there may be a virus in a personal computer. Audit information is also extremely useful, especially in determining whether there is a network attack. It is extremely important to obtain a system snapshot as soon as one suspects that something is wrong. Many incidents cause a dynamic chain of events to occur, and an initial system snapshot may do more good in identifying the problem and any source of attack than most other actions which can be taken at this stage. Finally, it is important to start a log book. Recording system events, telephone conversations, time stamps, etc., can lead to a more rapid and systematic identification of the problem, and is the basis for subsequent stages of incident handling. There are certain indications or "symptoms" of an incident which deserve special attention: o System crashes. o New user accounts (e.g., the account RUMPLESTILTSKIN has unexplainedly been created), or high activity on an account that has had virtually no activity for months. o New files (usually with novel or strange file names, such as data.xx or k). o Accounting discrepancies (e.g., in a UNIX system you might notice that the accounting file called /usr/admin/lastlog has shrunk, something that should make you very suspicious that there may be an intruder). o Changes in file lengths or dates (e.g., a user should be suspicious if he/she observes that the .EXE files in an MS DOS computer have unexplainedly grown by over 1800 bytes). o Attempts to write to system (e.g., a system manager notices that a privileged user in a VMS system is attempting to alter RIGHTSLIST.DAT). o Data modification or deletion (e.g., files start to disappear). o Denial of service (e.g., a system manager and all other users become locked out of a UNIX system, which has been changed to single user mode). o Unexplained, poor system performance (e.g., system response time becomes unusually slow). o Anomalies (e.g., "GOTCHA" is displayed on a display terminal or there are frequent unexplained "beeps"). o Suspicious probes (e.g., there are numerous unsuccessful login attempts from another node). o Suspicious browsing (e.g., someone becomes a root user on a UNIX system and accesses file after file in one user's account, then another's). By no means does this list implies, that all possible signs are covered here. None of these indications is absolute "proof" that an incident is occurring, nor are all of these indications normally observed when an incident occurs. If you observe any of these indications, however, it is important to suspect that an incident might be occurring, and act accordingly. There is no formula for determining with 100 percent accuracy that an incident is occurring. It is best at this point to collaborate with other technical and computer security personnel to make a decision as a group about whether an incident is occurring. 5.3.2 Types and Scope of Incidents Along with the identification of the incident is the evaluation of the scope and impact of the problem. It is important to correctly identify the boundaries of the incident in order to effectively deal with it. In addition, the impact of an incident will determine its priority in allocating resources to deal with the event. Without an indication of the scope and impact of the event, it is difficult to determine a correct response. In order to identify the scope and impact, a set of criteria should be defined which is appropriate to the site and to the type of connections available. Some of the issues are: o Is this a multi-site incident? o Are many computers at your site effected by this incident? o Is sensitive information involved? o What is the entry point of the incident (network, phone line, local terminal, etc.)? o Is the press involved? o What is the potential damage of the incident? o What is the estimated time to close out the incident? o What resources could be required to handle the incident? 5.3.3 Assessing the Damage and Extent The analysis of the damage and extent of the incident can be quite time consuming, but should lead into some of the insight as to the nature of the incident, and aid investigation and prosecution. As soon as the breach has occurred, the entire system and all its components should be considered suspect. System software is the most probable target. Preparation is key to be able to detect all changes for a possibly tainted system. This includes checksumming all tapes from the vendor using a checksum algorithm which (hopefully) is resistant to tampering [10]. (See sections xxx.) Assuming original vendor distribution tapes are available, an analysis of all system files should commence, and any irregularities should be noted and referred to all parties involved in handling the incident. It can be very difficult, in some cases, to decide which backup tapes are showing a correct system status; consider that the incident may have continued for months or years before discovery, and that the suspect may be an employee of the site, or otherwise have intimate knowledge or access to the systems. In all cases, the pre-incident preparation will determine what recovery is possible. If the system supports centralized logging (most do), go back over the logs and look for abnormalities. If process accounting and connect time accounting is enabled, look for patterns of system usage. To a lesser extent, disk usage may shed light on the incident. Accounting can provide much helpful information in an analysis of an incident and subsequent prosecution. If you can address all aspects of a specific incident strongly depends on the success of this analysis. This also effects the efficience of the incident handling process. Review the lessons learned from the analysis and always update the policy and procedures to reflect changes necessitated by the incident. 5.4 Handling an Incident A major topic still untouched here is how to actually respond to an event. The response to an event will fall into the general categories: o Containment. o Eradication. o Recovery. o Follow-up. Two other topics are vital for the incident handling and relevant for all of the categories mentioned above. They are therefore addressed before these categories: o Types of notification and the exchange of information o Protection of evidence and activity logs + what are the goals for this process. + what is the best way to handle an incident. + important: do not try to handle the incident alone + get as much help as necessary + part of original RFC1244 which is to less to serve as a guideline but nevertheless the topics should be covered here, too 5.4.1 Types of notification, Exchange of information When you have confirmed that an incident is occurring, the appropriate personnel must be notified. How this notification is achieved is very important in keeping the event under control both from a technical and emotional standpoint. To aid prompt acknowledgment and understanding of the problem, the circumstances should be described in as much detail as possible. Great care should be taken to which groups detailed technical information is given during the notification. For example it is helpful to pass this kind of information to an incident handling team. They can assist you by providing helpful hints for eradicating the vulnerabilities involved in an incident. On the other hand putting the critical knowledge into the public domain (e. g. netnews, mailing lists) may potentially put a great number of systems at risk of intrusion. It is a wrong assumption, that all administrators are reading a particular news group, have access to operating system source code or can even understand the techniques well enough to take adequate steps. First of all, any notification to either local or off-site personnel must be explicit. This requires that any statement (be it an electronic mail message, phone call, or fax) provides information about the incident that is clear, concise, and fully qualified. When you are notifying others that will help you to handle an event, a "smoke screen" will only divide the effort and create confusion. If a division of labor is suggested, it is helpful to provide information to each section about what is being accomplished in other efforts. This will not only reduce duplication of effort, but allow people working on parts of the problem to know where to obtain other information that would help them resolve a part of the incident. Another important consideration when communicating about the incident is to be factual. Attempting to hide aspects of the incident by providing false or incomplete information may not only prevent a successful resolution to the incident, but may even worsen the situation. This is especially true when the press is involved. When an incident severe enough to gain press attention is ongoing, it is likely that any false information you provide will not be substantiated by other sources. This will reflect badly on the site and may create enough ill-will between the site and the press to damage the site's public relations. The choice of language used when notifying people about the incident can have a profound effect on the way that information is received. When you use emotional or inflammatory terms, you raise the expectations of damage and negative outcomes of the incident. It is important to remain calm both in written and spoken notifications. Another aspect of the choice of language used is that not all people speak the same language. Due to this fact misunderstandings and delay may arise, especially if it is a multi-national incident. + more about international aspects Another issue associated with the choice of language is the notification to non-technical or off-site personnel. It is important to accurately describe the incident without undue alarm or confusing messages. While it is more difficult to describe the incident to a non-technical audience, it is often more important. A non-technical description may be required for upper-level management, the press, or law enforcement liaisons. The importance of these notifications cannot be underestimated and may make the difference between handling the incident properly and escalating to some higher level of damage. + Template for minimum information exchange+ + Advice for Reporting, e.g. Timezones in GMT,... + give whole information (e.g. all tcpwrapper entries belonging to the incident and not only talking about: "found in the tcp wrapper logs, I assume ..." + international aspects 5.4.2 Protection of evidence and activity logs When you respond to an incident, document all details related to the incident. This will provide valuable information to yourself and others as you try to unravel the course of events. Documenting all details will ultimately save you time. If you don't document every relevant phone call, for example, you are likely to forget a good portion of information you obtain, requiring you to contact the source of information once again. This wastes yours and others' time, something you can ill afford. At the same time, recording details will provide evidence for prosecution efforts, providing the case moves in this direction. Documenting an incident also will help you perform a final assessment of damage (something your management as well as law enforcement officers will want to know), and will provide the basis for a follow-up analysis in which you can engage in a valuable "lessons learned" exercise. Additionally it will help during later phases of the handling process, especially during the eradiction and recovery. During the initial stages of an incident, it is often infeasible to determine whether prosecution is viable, so you should document as if you are gathering evidence for a court case. At a minimum, you should record: o All system events (audit records). o All actions you take (time tagged). o All phone conversations (including the person with whom you talked, the date and time, and the content of the conversation). The most straightforward way to maintain documentation is keeping a log book. This allows you to go to a centralized, chronological source of information when you need it, instead of requiring you to page through individual sheets of paper. Much of this information is potential evidence in a court of law. Thus, when you initially suspect that an incident will result in prosecution or when an investigative agency becomes involved, you need to regularly (e.g., every day) turn in photocopied, signed copies of your logbook (as well as media you use to record system events) to a document custodian who can store these copied pages in a secure place (e.g., a safe). When you submit information for storage, you should in return receive a signed, dated receipt from the document custodian. Failure to observe these procedures can result in invalidation of any evidence you obtain in a court of law. 5.4.3 Containment The purpose of containment is to limit the extent of an attack. For example, it is important to limit the spread of a worm attack on a network as quickly as possible. An essential part of containment is decision making (i.e., determining whether to shut a system down, to disconnect from a network, to monitor system or network activity, to set traps, to disable functions such as remote file transfer on a UNIX system, etc.). Sometimes this decision is trivial; shut the system down if the system is life-critical, classified or sensitive, or if proprietary information is at risk! In other cases, it is worthwhile to risk having some damage to the system if keeping the system up might enable you to identify an intruder. This stage should involve carrying out predetermined procedures. Your organization or site should, for example, define acceptable risks in dealing with an incident, and should prescribe specific actions and strategies accordingly. This is especially important when a quick decision is necessary without the possibility to contact all involved parties and discuss the decision. In most of the cases the person in charge will have not the power to make a difficult management decision (like to loss the results of a costly experiment). Finally, notification of cognizant authorities should occur during this stage. + since I decided to make this an own section it should contain more more text? In some cases, it is prudent to remove all access or functionality as soon as possible, and then restore normal operation in limited stages. Bear in mind that removing all access while an incident is in progress will obviously notify all users, including the alleged problem users, that the administrators are aware of a problem; this may have a deleterious effect on an investigation. However, allowing an incident to continue may also open the likelihood of greater damage, loss, aggravation, or liability (civil or criminal). That's another reason why the relevant decisions belong to the site policy and should be determined before an incident occurs. 5.4.4 Eradiction Once an incident has been detected, it is important to first think about containing the incident. Once the incident has been contained, it is now time to eradicate the cause. But before eradicate the cause great care should be taken to collect all necessary information about the compromised system and the cause of the incident due later on they will disappear during the eradication. Software may be available to help you in the eradiction process. For example, eradication software is available to eliminate most viruses which infect small systems. If any bogus files have been created, it is time to archive them for later use in case of a court case. Thereafter delete them from the system at this point. In the case of virus infections, it is important to clean and reformat any disks containing infected files. Finally, ensure that all backups are clean. Many systems infected with viruses become periodically reinfected simply because people do not systematically eradicate the virus from backups. After eradiction a new backup should be taken, too. Removing all vulnerabilities once an incident has occurred is difficult. The key to removing vulnerabilities is knowledge and understanding of the breach. It may be necessary to go back to the original distributed tapes and recustomize the system. To facilitate this worst case scenario, a record of the original systems setup and each customization change should be kept current with each change to the system. In the case of a network-based attack, it is important to install patches for any operating system vulnerability which was exploited. + patch for vulnerabilities not available + uncertainty of cause As discussed in section $$.4.2, a security log can be most valuable during this phase of removing vulnerabilities. There are two considerations here; the first is to keep logs of the procedures that have been used to make the system secure again. This should include command procedures (e.g., shell scripts) that can be run on a periodic basis to recheck the security. Second, keep logs of important system events. These can be referenced when trying to determine the extent of the damage of a given incident. 5.4.5 Recovery Once the cause of an incident has been eradicated, the recovery phase defines the next stage of action. The goal of recovery is to return the system to normal. In general, bringing up services in the order of demand to allow a minimum of user inconvenience is the best practice. Understand that the proper recovery procedures for the system are extremely important and should be specific to the site. + more text needed? 5.4.6 Follow-Up Once you believe that a system has been restored to a "safe" state, it is still possible that holes and even traps could be lurking in the system. One of the most important stages of responding to incidents is also the most often omitted---the follow-up stage. In the follow-up stage, the system should be monitored for items that may have been missed during the cleanup stage. It would be prudent to utilize some of the tools mentioned in section xxx (e.g., xxx) as a start. Remember, these tools don't replace continual system monitoring and good systems administration procedures. The follow-up stage is important for another reason, too, because it helps those involved in handling the incident develop a set of "lessons learned" (see section $$.5) to improve future performance in such situations. This stage also provides information which justifies an organization's computer security effort to management, and yields information which may be essential in legal proceedings. The most important element of the follow-up stage is performing a postmortem analysis. Exactly what happened, and at what times? How well did the staff involved with the incident perform? What kind of information did the staff need quickly, and how could they have gotten that information as soon as possible? What would the staff do differently next time? A follow-up report is valuable because it provides a reference to be used in case of other similar incidents. Creating a formal chronology of events (including time stamps) is also important for legal reasons. Similarly, it is also important to as quickly obtain a monetary estimate of the amount of damage the incident caused in terms of any loss of software and files, hardware damage, and manpower costs to restore altered files, reconfigure affected systems, and so forth. This estimate may become the basis for subsequent prosecution activity. 5.5 Aftermath of an Incident In the wake of an incident, several actions should take place. These actions can be summarized as follows: (1) An inventory should be taken of the systems' assets, i. e., a careful examination should determine how the system was affected by the incident, (2) The lessons learned as a result of the incident should be included in revised security plan to prevent the incident from re-occurring, (3) A new risk analysis should be developed in light of the incident, (4) An investigation and prosecution of the individuals who caused the incident should commence, if it is deemed desirable. All four steps should provide feedback to the site security policy committee, leading to prompt re-evaluation and amendment of the current policy. If an incident is based on poor policy, and unless the policy is changed, then one is doomed to repeat the past. Once a site has recovered from and incident, site policy and procedures should be reviewed to encompass changes to prevent similar incidents. Even without an incident, it would be prudent to review policies and procedures on a regular basis. Reviews are imperative due to today's changing computing environments. After an incident, it is prudent to write a report describing the incident, method of discovery, correction procedure, monitoring procedure, and a summary of lesson learned. This will aid in the clear understanding of the problem. Remember, it is difficult to learn from an incident if you don't understand the source. + improving proactive methods + educate users, administrators and managers 5.6 Responsibilities This is a total new section but it address some aspects because we sometimes experience a somewhat strange interpretation of responsibility. One example: to protect a network is a good thing, but to protect someone other's network is difficult, if you don't have the authority to do so. If you start to try this new hacker tool on a network, it is fair to contact some responsible person before. In the other case we experience (false or unnecessary) alerts every now and then. If you think about a successful breakin as a good education lesson, you can be sued for this like an ordinary cracker. + testing of known vulnerabilities + disclosure of information + testing of local sites + tiger teams (for remote sites) + announcing site security contact information + check advice before acting on behalf (hacker claims to be ROOT) + protect the communication + legal procedures was section 5.5.2 in the old RFC (most of this section is already included in 5.2.2) 6. MAINTENANCE and EVALUATION 6.1 Risk assessments 6.2 Notification of problems/events Appendices A1 Tools and Locations This section provides a brief overview of publically available security technology which can be downloaded from the Internet. Many of the items described below will undoubtedly be surpassed or made obsolete before this document is published. This section is divided into two major subsections, applications and tools. The applications heading will include all end user programs (clients) and their supporting system infrastructure (servers). The tools heading will deal with the tools that a general user will never see or need to use, but which may be part of or used by applications, used to troubleshoot security problems or guard against intruders by system and network administrators. The emphasis will be on unix applications and tools, but other platforms, particularly PC's and Macintoshes, will be mentioned where information is available. Most of the tools and applications described below can be found in one of the following two archive sites: 1. ftp://info.cert.org:/pub/tools CERT Coordination Center 2. coast.cs.purdue.edu:/pub/tools Computer Operations, Audit, and Security Tools (COAST) 3. ftp://ftp.cert.dfn.de/pub/tools/ DFN-CERT Any references to CERT or COAST will refer to these two locations. These two sites act as repositories for most tools, exceptions will be noted in the text. *** It is important to note that many sites, including CERT and COAST are mirrored throughout the Internet. Be careful to use a "well known" mirror site to retrieve software and to use whatever verification tools possible, checksums, md5 checksums, etc... to validate that software. A clever cracker might advertise security software with designed flaws in order to gain access to data or machines. *** Applications The sad truth is that there are very few security conscious applications currently available. The real reason is the need for a security infrastructure which must be first put into place for most applications to operate securely. There is considerable effort currently taking place to place this infrastructure so that applications can take advantage of secure communications. Unix based applications PGP MD5 S/KEY TROJAN.PL PEM KERBEROS Drawbridge Tripwire logdaemon TCP-Wrapper rpcbind/portmapper replacement cops tiger ISS SATAN smrsh swatch identd (not really a security tool) DES (non-US versions) lsof sfingerd passwd-replacements (npasswd / ANLpasswd / passwd+ / ...) A2 Mailing lists and other resources