by Theodore T. Allen and Enhao Liu (The Ohio State University)
As presented at the 2018 Winter Simulation Conference
This article proposes a discrete event simulation model of an organization that maintains computer hosts and incurs several millions of dollars in maintenance and incident response costs. The common maintenance policy is referred to as “out-of-sight is out-of-mind” (OSOM) because the majority of hosts are absent from scans and ignored. Hosts are “dark” (absent) because they are not accessible (turned off or with restricted permissions). The proposed model is used to compare OSOM with alternatives including improved analytics that make dark host vulnerabilities visible. Findings clarify the apparent benefits of OSOM unless indirect costs for intrusions or improved policies are applied. Also, benefits from using Windows operating systems and improved policies are clarified including millions in expected savings (vs. Linux).
Introduction
Cyber-security-related costs are important on multiple levels from national and international politics to electric grids connecting thousands of organizations to expenditures within individual organizations. Discrete event models have explored political effects (Naugle et al. 2016). Models at the power grid level include those described by Nguyen et al. (2015). Also, attack simulation models include Shinet al. (2015) and Case (2016).
In our own research, we have explored Markov decision process models of organizational expenses focusing on the evolutions of single hosts (Afful-Dadzie and Allen 2014; 2016). Computer hosts may be ordinary personal computers, laptops, servers, printers, or even exercise equipment. Here, we focus only on devices connected to the Internet that could be compromised and are scanned and maintained. These devices are used for student, research, and administrative tasks. These devices have so-called “vulnerabilities” which are weaknesses that attackers can exploit. For example, a host might use a weak password, software with an out-of-date encryption, or software without sufficient checks on the size of inputs or outputs. These vulnerabilities are rated by the U.S. National Institute of Standards (NIST) and the common vulnerability scoring system.
Here, we propose to extend the data and assumptions for maintenance policy development to discrete event simulations. This is similar to patch management in electric utilities addressed by Gauci et al. (2017) except that we consider a larger number of past incidents and a broader assortment of policies and host types. Benefits of discrete event simulation include relatively intuitive ways to include the inception and destruction of hosts and finite patching and incident response resources. We argue that host “end of life” issues are important to consider because, anecdotally, we are aware of hosts that were believed to be retired being used and causing incidents.
In our experience, a common policy is to require that staff attempts to patch or mitigate high or critical level vulnerabilities within one month of the time when the vulnerability is observed in the monthly scans. The policy ignores the medium- or low-level vulnerabilities which tend to accumulate. Also, typically 70% of the almost 50,000 distinct hosts that we studied were missing from the scans in any given month. This can occur because the host is turned off during the scan or permissions are lacking. Some methods to impute the vulnerabilities missing in the scan data are described by Afful-Dadzie and Allen (2014; 2016). Recently, we have methods that can predict with high accuracy (0.05% errors) the vulnerabilities on hosts which are not present (“dark”) in the monthly scans.
Here, we consider the implications of 21 months of observed transitions from month to month of approximately 50,000 hosts. The resulting transition probability estimates are shown in Table 1. The probabilities reflect the combined effects of at least four factors. First, users of the hosts are constantly adding software and the software they already added is aging. Second, hackers are constantly searching for vulnerabilities, observing the acknowledgement of vulnerabilities that are publically reported, and obtaining exploits (which are also often freely published). Third, vendors are constantly attempting to automatically patch their software remotely. Fourth, staff is attempting to patch vulnerabilities according to organization policy with lists of vulnerabilities obtained from scans and the results of their own searches for available patches, testing patches obtained for not destroying functionality, and applying patches found and tested (if any).
Here also, we consider only two types of hosts. These are Linux and Windows hosts for which the user has administrator privilege to install new software and the host is not controlled by administrators. (Controlled hosts are generally much safer.) Here, we refer to the common maintenance policy in which dark hosts are ignored as “out-of-site is out-of-mind” (OSOM). A major objective of this article is to clarify issues with the OSOM policy and the possible benefits of more sophisticated policies.
Table 1: Estimated transition data from a major university (a) Linux hosts, (b) changed transitions reflecting improved informatics, (c) Windows hosts, and (d) changes from improved informatics.
(a)
Low-Med. | Low-Med.-Dark | High-Crit. | High-Crit.-Dark | Comp. | Comp.-Dark | |
---|---|---|---|---|---|---|
Low-Med. | 0.2820 | 0.6580 | 0.0177 | 0.0413 | 0.0005 | 0.0005 |
Low-Med.-Dark | 0.2820 | 0.6580 | 0.0177 | 0.0413 | 0.0005 | 0.0005 |
High-Crit. | 0.1290 | 0.3010 | 0.1560 | 0.3640 | 0.0250 | 0.0250 |
High-Crit.-Dark | 0.0000 | 0.0000 | 0.2250 | 0.7000 | 0.0250 | 0.0500 |
Comp. | 1.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
Comp.-Dark | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.8000 | 0.2000 |
(b) | ||||||
High-Crit.-Dark | 0.1290 | 0.3010 | 0.1560 | 0.3640 | 0.0250 | 0.0250 |
(c)
Low-Med. | Low-Med.-Dark | High-Crit. | High-Crit.-Dark | Comp. | Comp.-Dark | |
---|---|---|---|---|---|---|
Low-Med. | 0.2760 | 0.6440 | 0.0239 | 0.0559 | 0.0001 | 0.0001 |
Low-Med.-Dark | 0.2760 | 0.6440 | 0.0239 | 0.0559 | 0.0001 | 0.0001 |
High-Crit. | 0.1444 | 0.3369 | 0.1554 | 0.3627 | 0.0003 | 0.0003 |
High-Crit.-Dark | 0.0000 | 0.0000 | 0.2988 | 0.7000 | 0.0006 | 0.0006 |
Comp. | 1.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
Comp.-Dark | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.8000 | 0.2000 |
(d) | ||||||
High-Crit.-Dark | 0.1444 | 0.3369 | 0.1554 | 0.3627 | 0.0003 | 0.0003 |
The proposed mode
Unit Size and Time Period
Our discrete event simulation model necessarily specifies the number of servers and entities typically within the system (Allen 2011; Law and Kelton 2000). We observed that a large university is generally organized as multiple, largely independent departments, each with typically 100 hosts. Each organization has an administrator principally responsible for repairing vulnerabilities and facilitating responses to known incidents. Therefore, the model includes somewhat more than 100 hosts (on average) over a period of more than 100 years to approximately capture maintenance and response costs for a university. As noted in AffulDadzie and Allen (2016), we assume that patching vulnerabilities costs are on average $150 and responding to known incidents costs on average $2,000. Therefore, impacts of vulnerabilities are counted but only in relation to direct costs for legally addressing known incidents.
States
Following Afful-Dadzie and Allen (2016), we categorize hosts by the highest risk vulnerability, e.g., a host with any critical vulnerability is categorized as critical. In the common policy, low- and medium- risk hosts are generally ignored. Hosts can also be compromised, e.g., the host has malware that is attempting to contact the hacker or hacker team but is intercepted by the intrusion prevention system. Because some hosts are “dark” in the scan and some intrusions are unknown, we consider states in addition to the trashed or recycled host state. States include visible and dark combinations of low-medium, high-critical, and compromised. Low and medium and high and critical are paired because they are often treated as equivalent in organizational policies.
Note that knowing about the vulnerabilities or the intrusions may not help the perceived goals of the organization. Yet, observability is clearly a desirable property of “resilient” systems (Allen et al. 2016). A major objective of this article is to clarify the possible benefits of improved observability