Improving Historical Demand Data Through Simulation

by Oroselfia Sánchez and Idalia Flores (Universidad Nacional Autónoma de México)

As presented at the 2017 Winter Simulation Conference


Organizations collect data during different periods so that they can use them for management and business purposes. However, the data do not always come in the most suitable form for analysis, and often needs to be prepared, for which there are a variety of methods, including simulation. This paper presents a case where simulation is used as a tool to get insights into demand, based on historical data. Through simulation, we extract the most frequent demand events for two types of jobs together with the worst events. The simulation model is based on the historical data from a private oil company that operates in Mexico. In addition, we show how simulation results improve the information about Scorecard data recorded during a year of work


Organizations generally record demand data to use them for forecasting possible future requirements in order to be in a position to avoid or prevent risks such as penalties, delays, insufficient capacity, among other things. However these data normally need to be pretreated to fit the requirements of the analysis.

To facilitate this, many organizations have robust record systems where employees record data in a timely fashion; however every step of an operation is not always recorded. This is a persistent problem in several organizations, meaning that the historical data has to be prepared before its analysis. But what happens when the recorded information needs to be used but is not in the right form for the analysis? Nowadays, a lot of companies prefer not to use this information while others use a variety of methods to get as many insights as possible for planning their future activities; methods such as networks (Zou et al. 2011); exponential smoothing (Mohammed et al. 2017); time series models (Qiu et al. 2016); causal and stochastic models (Ma et al. 2015) and simulations (Chen et al. 2010).

The veracity of the information can open up the possibility of more accurate planning of resources, budget and possible new locations, new job positions or scheduling of activities. In this paper, historical data from a scorecard is analyzed and prepared using the simulation method, as it enables us to generate a lot of different possible scenarios for the model entities. The purpose of this analysis is to identify risks by obtaining demand events that can be presented on a day-by-day basis for every month of the year.

2 Historical data available for demand.

The demand data we analyzed is the historical data from an organization that cements oil wells in different states of Mexico. Basically, it offers two types of services: the first being the Cementing Job (i) that includes the design of cement slurry and building a circular wall inside the oil well, while the second one refers to the Pumping Job (j), which consists of the leasing of resources. In order to meet the demand for both services, the organization uses the same resources for both types of services, i and j. The total number of each kind of job per month during a year is recorded on a scorecard, whose data are given in Table 1.

In practice, the scorecard is used by the organization to forecast the future behavior of its work. It is not uncommon to made mistakes that mean that there are insufficient or an excess of resources in certain locations. This happens because the total number of jobs recorded does not show the possible events that could happen day by day or the frequency of each event. For example, we cannot see in Table 1 how many jobs were requested of the company on July 3rd, or during how many days in any month the resources were used at maximum capacity. These details are omitted in these kinds of records.

In this paper, starting from Scorecard data, a model is established to obtain events information which will give the organization more insight into how its demand behaves. Especially the more frequent events and those that only happen sporadically but could cause operational risks for the organization, such as delays, penalties and non-compliance.

3 Simulation model for demand

The purpose of the simulation model is to generate the most frequent events and find the ones that could be critical for the company. The demand simulation model is executed for each month of the year where the demand has been simulated according to Table 1 data.

3.1 Relevant aspects of the Simulation Model

SIMIO is the simulation platform we chose to execute the model because its characteristics are suitable for developing the model and using it with other models later on. The simulation model has servers to generate entities, representing jobs i and j. Each server responds to distributions that complies Scorecard data (Table 1). Both servers are connected to a Sink counting the number of jobs of each kind per day. As result of the model, the number of days that occur i cementing and j pumping jobs are obtained monthly.

The model records the results starting from day 1200 for the purpose of report results. The simulation results have been determined after 1000 days of simulation time.

The results of the validation of the model are illustrated by comparing Table 2 with Table 1, which gives the averages for the runs of the model per job for every month.

Besides the simulation model, a mathematical model has been constructed, enabling us to compare the resulting events.

3.2 Simulation Model Results

The simulation model generates a combination of Jobs i and j with the corresponding frequency of each event for every month of the year. This combination of a number of jobs allows us to group the most frequent events and enables us to understand the demand per month. At the same time, the information about events provides us with information that is useful for resources management.

With detailed information available, it is possible to know those events or scenarios which are not frequent but can happen in some day of the year. Figure 1 also shows nearby events to probability region of events but with one time of difference in the frequency obtained. Those events can be analyzed fu purpose of knowing how the organization will react if they happen, and consequently an action plan can be formulated.

Figure 1. Number of jobs i and j generated in January simulation. The importance of preparing the data using the methodology presented in this paper is that the knowledge of information details can avoid operational risks by reducing the surprise factor of the system.

4 Conclusion and Discussion

Due to the changing environment and the resilience that systems need to have in respect of their own reality, more robust approaches and tools can offer better results. In this sense, Risk Identification, using tools that provide more detailed information about a system, offers great insights into the various risk situations the organization may face. Simulation is a tool that allows us to analyze a system and its information better and faster, allowing the emulation of unexpected changes with possible behavior of a system.


The system consists of a realistic 3D NIH Bethesda campus "virtual world" that can be used in a variety of applications to better understand and enhance planning and improve delivery of services. From a visual perspective, the model is designed to be viewed and interacted with via computer screen or VR hardware. In addition, quantitative output can be provided through experiments/scenarios.

The model also provides for the exercising of an emergency scenario or particular campus operations process in greater detail. The emergency scenarios considered initially are a building/campus evacuation and an active shooter scenario. The system is designed in such a way as to provide for the user to understand how such a scenario could play out in a particular campus building and the manner in which the scenario impacts and interacts with operations throughout the campus.

The initial campus operations systems included are staff and visitor access, and the shuttle bus system. The system is designed in a way to represent accurately the inputs and outputs of these processes and the interaction with the whole campus, and has the capability to track an individual throughout the entire system (e.g. a visitor drives onto campus, proceeds through security inspection, parks, proceeds to a building, participates in an evacuation drill, returns to building, returns to vehicle, and exits campus.)

This model is used to experiment with changes in services to better match capacity with demand, thus contributing to the cost-effective use of limited budget resources and plan for emergent scenarios, such as a shutdown of campus roads, an evacuation of buildings, or part of or the whole campus, or an active shooter scenario.


A base model is developed at a high level, so that particular processes or operations can be represented at a granular level and greater levels of detail can be added when necessary, both from a visual and analytical perspective. The more detailed models help develop and inform processing time parameters to be used in the main model. In this way, detailed models can be developed and built where necessary.

This base model consists of a realistic 3D model of the campus road and pedestrian circulation network, campus buildings with building entry/exits depicted. This base model includes several campus sub-systems developed previously as independent models that are now integrated into a single model. These systems include campus access, a shuttle bus system, campus traffic (both typical flow and during an evacuation), and active shooter scenarios.

For example, in the base model, a building is represented as a "black box" with defined inputs and outputs. In this way a building has a defined number of locations where occupants can enter/exit, and more detailed processes within the building can be developed in a separate sub-model that can be run independently or interface with the larger model (e.g. a building evacuation or an active shooter scenario.) The current model has been developed using Simio simulation software in conjunction with Simio partner, Mosimtec, LLC. 3D modeling was done with Trimble Sketchup and data from ORF and Open Street Maps. Prior approaches to modeling these subsystems were developed in Arena.


This simulation tool has contributed to significant cost-avoidance throughout the NIH and enhanced emergency planning efforts impacting life and well-being at the NIH. Projects performed have enabled research to continue uninterrupted and informed decisions impacting the security and safety of NIH and the surrounding community.

Developing and analyzing output metrics provided by the simulation have enabled decision makers to better understand the impact changes to the system on service delivery, as well as to better identify, understand, and mitigate risks identified through experimentation with the wide range of scenarios the simulation is capable of providing.

Through use of this scalable and modular approach, there is significant potential for integrating other campus and research systems into this model, and allow decision makers to have a greater understanding of how systems interact with each other and impact broader objectives. Though these models have been developed specifically for NIH, other agencies and organizations can leverage these approaches to solve related challenges. 4480 Rodriguez and


The OQM team would like to acknowledge the NIH and ORS/ORF leadership and numerous staff that have supported this project.