Best Answers to Commonly Asked Risk Analysis Questions

Many managers don’t understand what exactly risk analysis is.  We put together some of the most common questions with responses for you.

What does the risk percentage mean?

The risk percentage approximates the on-time probability for an order with appropriate consideration of the number of replications or “experiments.”  It tells the user how confident they can be in meeting the due date given how many trials they have conducted. 

How does Simio calculate the on-time probability?

Simio adjusts from a base rate of 50% with each risk replication.  If an order is on time in an individual replication, Simio updates the probability, increasing it closer to 100%.  If the order is late, Simio decreases the probability closer to 0%.  Each replication is an experiment that provides new information about the likelihood of success or failure.  More experiments mean more confidence in the answer.

Why is the base rate 50%?

Before any plan is generated or any activity is simulated, there is no information about the order other than the possible outcomes.  Because there are only two outcomes that matter (on time or not), the base rate is set to 50%.

I have an overdue order in my system.  Why is it not always 0%?

Because the calculation is an adjustment of a base rate of 50%, Simio needs a lot of evidence before it will guarantee that an order will be late (or on time for that matter).  If the user runs 1000 replications, and the result is late in all of them, Simio will reflect a 0% on time probability. 

What formula does Simio use to calculate the probability?

For the statistics experts, Simio uses a binomial proportion confidence internal formula known as the Wilson Score.  We report the midpoint of the confidence interval as the risk measure.

Why not just report the outcome of the replications as the probability (e.g., if 9 of 10 are on time, report 90% on time probability)?

This was the original implementation.  However, it gives a false sense of confidence and can be misleading.  A single replication would always yield either 100% on time or 0% on time.  We wanted the answer to also give decision makers a sense of how confident they could be in the answer.  Using the Wilson Score, a single replication will yield a result of 60% at best and 40% at worst (using 95% confidence level).  This helps the decision maker identify that they have a very small sample of data and would encourage them to run additional replications. 

Can you give me an example of how this works?

Risk analysis can be demonstrated using any scheduling example.  It is best viewed in the Entity Gantt.  In the screenshots below, we’ve included 2 orders from the Candy Manufacturing Scheduling example.  One of the orders is overdue (will be late always), and the other has plenty of time (will be on time always).

The base rate is 50%.  After 1 replication, Simio updates the probabilities.  Order 1 now has a 60% on time probability.  Order 2 has a 40% on time probability.

After 2 replications, 67% and 33%:

After 5 replications, 78% and 22%:

After 100 replications, 98% and 2%:

Finally, after 1000 replications, 100% and 0%:

How many replications should I run?

By default, we suggest 10 replications (and 95% confidence level).  With these settings, a risk measure of 86% is a good sign, while 14% is a bad one.  Beyond the default settings, there are several additional factors which are dependent on the situation and use case.  One of these factors is slack time (the time between estimated completion and due date).  On the Gantt, slack time is the distance between the grey marker and the green marker.  If the slack time is large, a single replication may suffice.  If the slack time is small, additional replications will help identify if the order is in trouble or not. 

Now that I know my risk, what can I do about it?

Depending on your position in the organization (and therefore your decision rights), you can change either the design or operation of the system.  Example design changes include things like adding another assembly line or buying another forklift.  These changes are long term and may require approvals for capital expenditure (which the model facilitates by quantifying the impact of the expenditure).  Example operational changes include things like adding overtime, expediting a material, or changing order priorities, quantities, due dates etc.  Bridging the gap between design and operation are the dispatching rules, which relate to overall business objectives.  They are also flexible parameters which control how Simio chooses the next job from a queue (e.g., earliest due date, least setup, critical ratio, etc.).  All of these parameters influence risk and can be changed, provided that the user has the authority to change them.

Will Simio choose the best design and operation for me?

Decision rights and business processes have far reaching consequences.  A floor manager can probably authorize overtime if the schedule looks risky.  He probably cannot buy a piece of equipment.  To change a priority or a due date, he probably needs to consult with the commercial team and/or account managers.  To expedite a material, he probably needs to communicate with the procurement team.  To make a capital expenditure (i.e., change system design), he probably needs executive/financial approval.  Our solution respects those boundaries.  We treat priorities, due dates, etc. as inputs rather than outputs.  Any of these parameters can be changed by the appropriate decision maker.  They should not be changed by the tool without consent.  Simio assists the decision maker (at any level in the organization) by exposing the true consequences.

With so many choices, how can I quickly explore the consequences across multiple scenarios?

The experiment runner is used to explore consequences (which we call Responses) across multiple scenarios where a user can influence the parameters mentioned above (which we call Controls).  If the solution space is very large (i.e., there are many controls with a wide range of acceptable values), we recommend using OptQuest to automate the search of the solution space based on single or multiple objectives (e.g., low cost and high service level).  OptQuest uses a Tabu search which learns how the control values influence the objectives as it explores the solution space.

How often should I run these type of experiments?

Experiments are most relevant to design choices.  Operational decisions have many hard constraints which cannot be easily influenced.  For example, though Simio will allow you to adjust material receipt dates of critical materials and show you the impact on the schedule, many of them are inflexible and out of control of planner or even the business.  If you ask OptQuest how much inventory you would like to have, it will tell you, but this information adds no value because it is not actionable in the short term.  The planners need to work with what they have and make the best of it.  In practical application, we recommend running large experiments to explore design decisions on a monthly or quarterly basis.

Effective Factory Scheduling with a Simio Digital Twin

Introduction

In today’s world, companies compete not only on price and quality, but on their ability to reliably deliver product on time.   A good production schedule, therefore, influences a company’s throughput, sales and customer satisfaction.  Although companies have invested millions in information technology for Enterprise Resource Planning (ERP) and Manufacturing Execution Systems (MES), the investment has fallen short on detailed production scheduling, causing most companies to fall back on manual methods involving Excel and planning boards.  Meanwhile, industry trends towards reduced inventory, shorter lead times, increased product customization, SKU proliferation, and flexible manufacturing are making the task more complicated.  Creating a feasible plan requires simultaneous consideration of materials, labor, equipment, and demand.  This bar is simply too high for any manual planning method.  The challenge of creating a reliable plan requires a digital transformation which can support automated and reliable scheduling.

Central to the idea of effective factory scheduling is the concept of an actionable schedule.  An actionable schedule is one that fully accounts for the detailed constraints and operating rules in the system and can therefore be executed in the factory by the production staff.   An issue with many scheduling solutions is that they ignore one or more detailed constraints, and therefore cannot be executed as specified on the factory floor.  A non-actionable schedule requires the operators to step in and override the planned schedule to accommodate the actual constraints of the system.   At this point the schedule is no longer being followed, and local decisions are being made that impact the system KPIs in ways that are not visible to the operators.

A second central idea of effective scheduling is properly accounting for variability and unplanned events in the factory and the corresponding detrimental impact on throughput and on-time delivery.  Most scheduling approaches completely ignore this critical element of the system, and therefore produce optimistic schedules that cannot be met in practice.   What starts off looking like a feasible schedule degrades overtime as machines break, workers call off sick, materials arrive late, rework is required, etc.  The optimistic promises that were made cannot be kept.

A third consideration is the effect of an infeasible schedule on the supply chain plan.  Factory scheduling is only the final step in the production planning process, which begins with supply chain planning based on actual and/or forecast demand.   The supply chain planning process generates production orders and typically establishes material requirements for each planning period across the entire production network.  The production orders that are generated for each factory in the network during this process are based on a rough-cut model of the production capacity.  The supply chain planning process has very limited visibility of the true constraints of the factory, and the resulting production requirements often overestimate the capacity of the factory.  Subsequently, the factory schedulers must develop a detailed plan to meet these production requirements given the actual constraints of the equipment, workforce, etc.  The factory adjustments to make the plan actionable will not be transparent to the supply chain planners.  This creates a disconnect in a core business planning function where enormous spending occurs. 

In this paper we will discuss the solution to these challenges, the Process Digital Twin, and the path to get there.  The Simio Digital Twin solution is built on the patented Simio Risk-based Planning and Scheduling (RPS) software.   We will begin by describing and comparing the three common approaches to factory scheduling.  We will then discuss in detail the advantages of a process Digital Twin for factory scheduling built on Simio RPS.  

Factory Scheduling Approaches

Let’s begin by discussion the three most common approaches to solving the scheduling problem in use today:  1) manual methods using planning boards or spreadsheets, 2) resource models, and 3) process Digital Twin.

Manual Methods

The most common method in use today for factory scheduling is the manual method, typically augmented with spreadsheets or planning boards.   The use of manual scheduling is typically not the companies first choice but is the result of failure to succeed with automated systems.

Manually generating a schedule for a complex factory is a very challenging task, requiring a detailed understanding of all the equipment, workforce, and operational constraints.  Five of the most frustrating drawbacks include:

  • It is difficult for a scheduler to consider all the critical constraints.   While schedulers can typically focus on primary constraints, they are often unaware – or must ignore – secondary constraints, and these omissions lead to a non-actionable schedule.
  • Manual scheduling typically takes hours to complete, and the moment any change occurs the schedule becomes non-actionable. 
  • The quality of the schedule is entirely dependent on the knowledge and skill of the scheduler.  If the scheduler retires is out for vacation or illness, the backup scheduler may be less skilled and the KPIs may degrade.
  • It is virtually impossible for the scheduler to account for the degrading effect of variation on the schedule and therefore provide confident completion times for orders. 
  • As critical jobs become late, manual schedulers resort to bumping other jobs to accommodate these “hot” jobs, disrupting the flow and creating more “hot” jobs.  The system becomes jerky and the system dissolves into firefighting.

Resource Model

Companies that utilize an automated method for factory scheduling typically use an approach based on a resource model of the factory.   A resource model is comprised of a list of critical resources with time slots allocated to tasks that must be processed by the resource based on estimated task times.   The resource list includes machines, fixtures, workers, etc., that are required for production.   The following is a Gantt chart depicting simple resource model with four resources (A, B, C, D) and two jobs (blue, red).  The blue job has task sequence A, D, and B, and the red job has task sequence A and B.

The resources in a resource model are defined by a state that can be busy, idle, or off-shift.  When a resource is busy with one task or off-shift, other tasks must wait to be allocated to the resource (e.g. red waits for blue on resource A).  The scheduling tools that are based on a resource model all share this same representation of the factory capacity and differ only in how tasks are assigned to the resources.

The problem that all these tools share is an overly simplistic constraint model.   Although this model may work in some simple applications, there are many constraints in factories that can’t be represented by a simple busy, idle, off-shift state for a resource.  Consider the following examples:

  • A system has two cranes (A and B) on a runway that are used to move aircraft components to workstations.   Although crane A is currently idle, it is blocked by crane B and therefore cannot be assigned the task.
  • A workstation on production line 1 is currently idle and ready to begin a new task.   However, this workstation has only limited availability when a complex operation is underway on adjacent line 2.
  • An assembly operator is required for completing assembly.   There are assembly operators currently idle, but the same operator that was assigned to the previous task must also be used on this task, and that operator is currently busy.
  • A setup operator is required for this task.  The operator is idle but is in the adjacent building and must travel to this location before setup can start.
  • The tasks involve the flow of fluid through pipes, valves, and storage/mixing tanks, and the flow is limited by complex rules.
  • A job requires treatment in an oven, the oven is idle but not currently at the required temperature.

This is just a few examples of typical constraints for which a simple busy, idle, off-shift resource model is inadequate.  Every factory has its own set of such constraints that limit the capacity of the facility.  

The scheduling tools that utilize a simple resource model allocate tasks to the resources using one of three basic approaches; heuristics, optimization, and simulation.

One common heuristic is job-sequencing that begins with the highest priority job, and assigns all tasks for that job, and repeats this process for each job until all jobs are scheduled (in the previous example blue is sequenced, then red).  This simple approach to job sequencing can be done in either a forward direction starting with the release date, or a backward direction starting with the due date.   Note that backward sequencing (while useful in master planning) is typically problematic in detailed scheduling because the resulting schedule is fragile and any disruption in the flow of work will create a tardy job.  This simple one-job-at-a-time sequencing heuristic cannot accommodate complex operating rules such as minimizing changeovers or running production campaigns based on attributes such as size or color.  However, there have been many different heuristics developed over time to accommodate special application requirements.  Examples of scheduling tools that utilize heuristics include Preactor from Siemens and PP/DS from SAP.

The second approach to assigning tasks to resources in the resource model is optimization, in which the task assignment problem is formulated as a set of sequencing constraints that must be satisfied while meeting an objective such as minimizing tardiness or cost.   The mathematical formulation is then “solved” using a Constraint Programming (CP) solver.  The CP solver uses heuristic rules for searching for possible task assignments that meet the sequencing constraints and improve the objective.  Note that there is no algorithm that can optimize the mathematical formulation of the task assignment for the resource model in a reasonable time (this problem is technically classified as NP Hard), and hence the available CP solvers rely on heuristics to find a “practical” but not optimal solution.   In practice, the optimization approach has limited application because often long run times (hours) are required to get to a good solution.   Although PP/DS incorporates the CP solver from ILOG to assign tasks to resources, most installations of PP/DS rely on the available heuristics for task assignments.

The third approach to assigning tasks in the simple resource model is a simulation approach.   In this case we simulate the flow of jobs through the resource model of the factory and assign tasks to available resources using dispatching rules such as smallest changeover or earliest completion.   This approach has several advantages over the optimization approach.   First, it executes much faster, producing a schedule in minutes instead of hours.  Another key advantage is that it can support custom decision logic for allocating tasks to resources.  An example of tool that utilizes this approach is Preactor 400 from Siemens. 

Regardless which approach is used to assign tasks to resources, the resulting schedule assumes away all random events and variation in the system.  Hence the resulting schedules are optimistic and lead to overpromising of delivery times to customers.  These tools provide no mechanism for assessing the related risk with the schedule.

Digital Twin

The third and latest approach to factory scheduling is a process Digital Twin of the factory.  A Digital Twin is a digital replica of the processes, equipment, people, and devices that make up the factory and can be used for both system design and operation.  The resources in the system not only have a busy, idle, and off-shift state, but they are objects that have behaviors and can move around the system and interact with the other objects in the model to replicate the behavior and detailed constraints of the real factory. The Digital Twin brings a new level of fidelity to scheduling that is not available in the existing resource-based modeling tools.

Simio Digital Twin

The Simio Digital Twin is an object-based, data driven, 3D animated model of the factory that is connected to real time data from the ERP, MES, and related data sources.   We will now summarize the key advantages of the Simio Digital Twin as a factory scheduling solution.

Dual Use: System Design and Operation

Although the focus here is on enhancing throughput and on-time delivery by better scheduling using the existing factory design, unlike traditional scheduling tools, the Simio Digital Twin can also be used to optimize the factory deign.  The same Simio model that is used for factory scheduling can be used to test our changes to the facility such as adding new equipment, changing staffing levels, consolidating production steps, adding buffer inventory, etc.                 

Actionable Schedules

A basic requirement of any scheduling solution is that it provide actionable schedules that can implemented in the real factory.   If a non-actionable production schedule is sent to the factory floor, the production staff have no choice to be ignore the schedule and make their own decisions based on local information.

For a schedule to be actionable, it must capture all the detailed constraints of the system.  Since the foundation of the Simio Digital Twin is an object-based modeling tool, the factory model can capture all these constraints in as much detail as necessary.  This includes complex constraints such as material handling devices, complex equipment, workers with different skill sets, and complex sequencing requirements,          

In many systems there are operating rules that have been developed over time to control the production processes.  These operating rules are just as important to capture as the key system constraints; any schedule that ignores these operating rules is non-actionable.  The Simio modeling framework has flexible rule-based decision logic for implementing these operating rules.  The result is an actionable schedule that respects both the physical constraints of the system as well as the standard operating rules.    

Fast Execution

In most organizations, the useful life of a schedule is short because unplanned events and variation occur that make the current schedule invalid.   When this occurs, a new schedule must be regenerated and distributed as immediately as possible, to keep the production running smoothly.  A manual or optimization-based approach to schedule regeneration that takes hours to complete is not practical; in this case the shop floor operators will take over and implement their own local scheduling decisions that may not aligned with the system-wide KPIs.  When random events occur, the Simio Digital Twin can quickly respond and generate and distribute a new actionable schedule.  Schedule regeneration can either be manually triggered by the scheduler, or automatically triggered by events in the system.

3D Animated Model and Schedule

In other scheduling systems the only graphical view of the model and schedule is the resource Gantt chart.  In contrast, the Simio Digital Twin provides a powerful communication and visualization of both the model structure and resulting schedule.  Ideally, anyone in the organization – from the shop floor to the top floor – should be able to view and understand the model well enough to validate its structure.  A good solution improves not only the ability to generate an actionable schedule, but to visualize it and explain it across all levels of the organization. 

The Simio Gantt chart has direct link to the 3D animated facility; right click on a resource along the time scale in the Gantt view and you instantly jump to an animated view of that portion of facility – showing the machines, workers, and work in process at that point in time in the schedule.  From that point you can simulate forward in time and watch the schedule unfold as it will in the real the system.  The benefits of the Simio Digital Twin begin with its accurate and fast generation of an actionable schedule.  But the benefits culminate in the Digital Twins ability to communicate its structure, its model logic, and its resulting schedules to anyone that needs to know.

Risk Analysis

One of the key shortcomings of scheduling tools is their inability to deal with unplanned events and variation.   In contrast, the Simio Digital Twin can accurately model these unplanned events and variations to not only provide a detailed schedule, but also analyze the risk associated with the schedule.

When generating a schedule, the random events/variations are automatically disabled to generate a deterministic schedule.  Like other deterministic schedules it is optimistic in terms of on time completions.  However, once this schedule is generated, the same model is executed multiple times with the events/variation enabled, to generate a random sampling of multiple schedules based on the uncertainty in the system.   The set of randomly generated schedules is then used to derive risk measures – such as the likelihood that each order will ship on time.  These risk measures are directly displayed on the Gantt Gannt chart and in related reports.   This let’s the scheduler know in advance which orders are risky and take action to make sure important orders have a high likelihood of shipping on time.

Constraint Analysis

It’s not uncommon that the supply chain planning process which is based on a rough-cut capacity model of the factory sends more work to a production facility than can be easily produced given the true capacity and operational constraints of the facility.   When this occurs, the resulting detailed schedule will have one or more late jobs and/or jobs with high risk of being late.   The question then arises as to what actions can be taken by the scheduler to ensure that the important jobs all delivered on schedule.

Although other scheduling approaches generate a schedule, the Simio Digital Twin goes one step further by also providing a constraint analysis detailing all the non-value added (NVA) time that is spent by each job in the system.  This includes time waiting for a machine, an operator, material, a material handling device, or any other constraint that is impeding the production of the item.   Hence if the schedule shows that an item is going to be late, the constraint analysis shows what actions might be taken to reduce the NVA time and ship the product on time.  For example, if the item spends a significant time waiting for a setup operation, scheduling overtime for that operator may be warranted. 

Multi-Industry

Although scheduling within the four walls of a discrete production facility is an important application area, there are many scheduling applications beyond discrete manufacturing.   Many manufacturing applications involve fluid flows with storage/mixing tanks, batch processing, as well as discrete part production.  In contrast to other scheduling tools that are limited in scope to discrete manufacturing, the Simio Digital Twin has been applied across many different application areas including mixed-mode manufacturing, and areas outside of manufacturing such as logistics and healthcare.  These applications are made possible by the flexible modeling framework of Simio RPS.

Flexible Integration

A process Digital Twin is a detailed simulation model that is directly connected to real time system data. Traditional simulation modeling tools have limited ability to connect to real time data from ERP, MES, and other data sources.  In contrast, Simio RPS is designed from the ground up with data integration as a primary requirement.

Simio RPS supports a Digital Twin implementation by providing a flexible relational in-memory data set that can directly map to both model components and to external data sources.  This approach allows for direct integration with a wide range of data sources while enabling fast execution of the Simio RPS model.    

Data Generated Models

In global applications there are typically multiple production facilities located around the world that produce the same products.  Although each facility has its own unique layout there is typically significant overlap in terms of resources (equipment, workers, etc.) and processes.   In this case Simio RPS provides special features to allow the Digital Twin for each facility to be automatically generated from data tables that map to modeling components that describe the resources and processes.   This greatly simplifies the development of multiple Digital Twins across the enterprise and also supports the reconfiguring of each Digital Twin via data table edits to accommodate ongoing changes in resources and/or processes.

Forward Scheduling: What is it and how does it differ from backwards scheduling?

Simio is a forward scheduling simulation engine.  We do not support backwards scheduling.  We have found the backwards scheduling approach fails to represent reality, thus generating an infeasible plan that is unhelpful to planners.  Many of our customers have learned this lesson the hard way. 

The underlying principle of forward scheduling is feasibility first.  A schedule is built looking forwards considering all the constraints and conditions of the system (e.g., resource availability, inventory levels, work in progress, etc.).  The schedule is optimized in run time while only considering the set of feasible choices available at that time.  Decisions are made according to user specified dispatching rules (the same as backwards scheduling).  The output is a detailed schedule that reflects what is possible and tells the planner how to achieve it.  As in real life, a planner can only choose when to start an operation.  Completion date is an outcome, not a user specified input.

The most salient technical difference between the two approaches is material availability (both raw and intermediate manufactured materials).  A forward-looking schedule makes no assumptions.  If materials are available, a finished good can be produced.  Otherwise, it cannot.  If the materials must be ordered or manufactured, the system will order them or manufacture them before the finished good can start.  A backwards schedule plans the last operation first, assuming that materials will be available (*we have yet to find an environment where future inventory can be accurately forecast).  If the materials must be produced or purchased, it will try to schedule or order them prior, hoping that the start date isn’t yesterday.  If the clock is wound backwards from due date all the way to present time, the resulting schedule shows the planner what their current stockpile and on-order inventory would have to be to execute the idealized plan.  It does not tell the planner what they could do with their actual stockpile and on-order inventory. 

Next consider a situation where demand exceeds plant capacity (this is reality for most of our customers).  The plant cannot produce everything that the planner wants.  The planner must choose amongst the alternatives and face the tradeoffs.  Forward scheduling deals with this situation by continuing to schedule into the future, past the due date, showing the planner which orders will be late.  By adjusting the dispatching rules, priorities, and the release dates, the planner can improve the schedule until they reach a satisfactory alternative.  Every alternative is a valid choice and feasible for execution.  Backwards scheduling deals with this situation by continuing to schedule into the past, showing the planner which orders should have been produced yesterday.  The planner must tweak and adjust dispatching rules and due dates until finding a feasible alternative.  In our experience, the planner can make the best decision by comparing multiple feasible plans, rather than searching for a single one.

Any complete scheduling solution must also be capable of rescheduling.  Rescheduling can be triggered by any number of random events that occur daily.  In rescheduling, the output must respect work in progress.  Forward scheduling loads WIP first, making the resource unavailable until the WIP is complete.  Backwards scheduling loads WIP last, if at all.  Imagine building a weekly schedule backwards in time, hoping that the “ending” point exactly equals current plant WIP.  The result is often infeasible.

In terms of feasibility, the advantages of forward scheduling are clear.  But we also get questions about optimization, particularly around JIT delivery.  A quick Google search on forward scheduling reveals literature and blog posts that describe forward scheduling “As early as possible” (meaning a forward schedule starts an operation as soon as a resource is available, regardless of when the order is due).  This is false.  Forward scheduling manages the inventory of finished goods the same way the plant does.  A planner specifies a release date as a function of due date (or in some cases specifies individual release dates for each order).  In forward scheduling, no order is started prior to release date.  The power of this approach is experimentation.  Changing lead time is as easy as typing in a different integer and rescheduling.  As above, the result is a different feasible alternative which makes the tradeoff transparent.  Shorter lead times minimize inventory of finished goods but increase late deliveries and vice versa.  We have found many customers focus on short lead times based on financial goals rather than operational goals.  Inventory ties up cash.  Typically, the decision to focus on cash is made without quantifying the tradeoff.  We provide decision makers with clear cut differences between operational strategies so that they can choose based on complete information.

Forward scheduling is reality.  It properly represents material flows and constraints, plant capacity, and work in progress.  It manages the plant the same way a planner does.  Accordingly, it generates sets of feasible alternatives that quantify tradeoffs for planners and executive decision makers alike.  It answers the question “What should the plant do next?” as opposed to “What should the plant have done before?”  We’ve found the feasibility first approach is the most helpful to a planner and therefore the most valuable to a business.