How Much Data Do I Need?

I have discussed data issues in several previous articles. People are often confused about how much data they really need. In particular, I frequently hear the refrain “Simulation requires so much data, but I don’t have enough data to feed it.” So let’s examine a situation where you have, say 40% of the data you would like to have in order to make a sound decision and let’s examine the choices.

1) You can possibly defer the decision. In many cases no decision is a decision in itself because the decision will get made by the situation or by others involved. But if you truly do have the opportunity to wait and collect more data before making the decision, then you must measure the cost of waiting against the potential better decision that you might make with better data. But either way, after waiting you still have all of the following options available.

2) Use “seat of the pants” judgment and just decide based on what you know. This approach compounds the lack of data by also ignoring problem complexity and ignoring any analytic approach. (Ironically enough this approach often ignores the data you do have.) You make a totally subjective call, often heavily biased by politics. There is no doubt that some highly experienced people can make judgment calls that are fairly good. But it is also true that many judgment calls turn out to be poor and could have benefited greatly from a more analytical and objective approach.

3) Use a spreadsheet or other analytical approach that doesn’t require so much data. On the surface this sounds like a good idea and in fact, there is a set of problems for which spreadsheets are certainly the best (or at least an appropriate) choice. But for the modeling problems we typically come across, spreadsheets have two very significant limitations: they cannot deal with system complexity and they cannot adequately deal with system variability. With this approach you are simply “wishing away” the need for the missing data. You are not only making the decision without that data, but you are pretending that the missing data is not important to your decision. An oversimplified model that doesn’t consider variability or system complexity and ignores the missing data … doesn’t sound like the makings of a good decision.

3) Simulate with the data you have. No model is ever perfect. Your intent is generally to build a model to meet your project objectives to the best of your ability given the time, resources, and data available. We can probably all agree that better and more complete data results in a more accurate, complete, and robust model. But model value is not true false (valuable or worthless) but rather it is a graduated scale of increasing value. Referencing back to that variability problem, it is much better to model with estimates of variability than to just use a constant. Likewise a model based on 40% data won’t provide near the results of one with all of the desired data, but it will still outperform the analytical techniques that are not only missing that same data, but are also missing the system complexity and variability.

And unlike the other approaches, simulation does not ignore the missing data, but can also help you identify the impact and prioritize the opportunities to collect more data. For example some products have features that will help you assess the impact of guesses on your key outputs (KPIs). They also have features that can help assess where you should put your data collection efforts to expand sample or small data sets to most improve your model accuracy. And all simulations provide what-if capability you can use to evaluate best and worst case possibilities.

Perfection is the enemy of success. You can’t stop making decisions while you wait for perfect data. But you can use tools that are resilient enough to provide value with limited data. Especially if those same tools will help you better understand the value of both the existing and the missing data.

Happy modeling!

Dave Sturrock
VP Operations – Simio LLC

Can Simulations Model Chaos?

Can chaotic systems be predicted? I guess we first need to agree on exactly what a chaotic system is.

BusinessDictionary.com defines it as a
“Complex system that shows sensitivity to initial conditions, such as an economy, a stockmarket, or weather. In such systems any uncertainty (no matter how small) in the beginning will produce rapidly escalating and compounding errors in the prediction of the system’s future behavior.”

It is hard to imagine a complex system that does not show sensitivity to initial conditions. If the follow-on statement is true, then there is little point to ever trying to model or predict the behavior of such a system because it is not predictable. But it is not hard to find counter-examples, even to the examples they provided. Meteorologists do a reasonable job predicting the weather; it depends on your standards of accuracy. Certainly they can predict fairly accurately the likelihood of a 90 degree day in January in Canada or anticipating the path of a tropical storm for the next 12 hours.

A less technical but perhaps more useful definition comes from membrane.com:
“A chaotic system is one in which a tiny change can have a huge effect.”
That leads us toward a more practical definition for our purposes.

For the types of systems we normally model, I would propose yet another definition.
A chaotic system is one in which it is likely that seemingly trivial changes in the initial conditions would cause significant changes in the predicted results, over the time frame being considered.

This definition, while not technically rigorous, acknowledges that most of us rarely have the opportunity or the need to deal in absolutes. We live in a world where the majority of decisions are made subjectively (“Joe has 20 years experience and he says…”) or with gross simplification (“Of course I can model that in a spreadsheet…”). In this world, being able to base a decision on a simulation model with better accuracy and objectivity can help realize tremendous savings, even if it is still only an approximation and only useful within specified parameters.

Can we accurately predict true chaotic systems? By strict definition clearly not. And even by my definition, there will be some systems that are just too chaotic to allow any predictions to be useful.

But can we provide useful predictions of most common systems, even those with some chaotic aspects? Absolutely yes. Every model is an approximation of a real or intended system. Part of our job as modelers is to ensure that the model is close enough to provide useful insight. A touch of chaos just makes that more interesting. 🙂

Dave Sturrock
VP Products – Simio LLC

Predicting Process Variability

Systems rarely perform exactly as predicted. A person doing a task may take six minutes one time and eight minutes the next. Sometimes variability is due to outside forces, like materials that behave differently based on ambient humidity. Some variability is fairly predictable such as tool that cuts slower as it gets dull with use. Others seem much more random, such as a machine that fails every now and then. Collectively we will refer to these as process variability.

How good are you are predicting the impact of process variability? Most people feel that they are fairly good at it.

For example, if someone asked you what is the probability of rolling a three in one role of a common six-sided die, you could probably correctly answer one in six (17%). Likewise, you could probably answer the likelihood of flipping a coin twice and having it come up heads both times, one in four (25%).

But what about even slightly more complex systems? Say you have a single teller at a bank who always serves customers in exactly 55 seconds and customers come in exactly 60 seconds apart. Can you predict the average customer waiting time? I am always surprised at how many professionals get even this simple prediction wrong. (If you want to check your answer, look to the comment attached to this article.)

But let’s say that those times above are variable as they might be in a more typical system. Assume that they are average processing times (using exponential distributions for simplicity). Does that make a difference? Does that change your answer? Do you think the average customer would wait at all? Would he wait less than a minute? Less than 2 minutes? Less than 5 minutes? Less than 10 minutes? I have posed this problem many times to many groups and in an average group of 40 professionals, it is rare for even one person to answer these questions correctly.

This is not a tough problem. In fact this problem is trivial compared to even the smallest, simplest manufacturing system. And yet those same people will look at a work group or line containing five machines and feel confident that they can predict how a random downtime will impact overall system performance. Now extend that out to a typical system with all its variability in processing times, equipment failures, repair times, material arrivals, and all the other common variability. Can anyone predict its performance? Can anyone predict the impact of a change?

With the help of simulation, you can.

This simple problem can be easily solved with either queuing theory or a simple model in your favorite simulation program. More complex problems will require simulation. After using your intuition to guess the answer, I’d suggest that you determine the correct answer for yourself. If you want to check your answer look at the comment attached to this article.

And the next time you or someone you know is tempted to predict system performance, I hope you will remember how well you did at predicting performance of a trivial system. Then use simulation for an accurate answer.

Dave Sturrock
VP Products – Simio LLC