After publishing my probabilistic pitfalls article, my thirteen-year old son (who apparently follows me on LinkedIn; who knew?) came and told me I was the grumpy old man of risk and uncertainty. I had wittered on and on about what everyone was doing wrong, but made no effort to try and say what people should be doing instead. Fair comment, I thought. Here's a go at putting that right.
Always use uncertainty ranges in lookback analysis - also for probability plots and percentile plots. Not only will it help show whether deviations are the result of poor practice or just statistical deviations, they can also be used to surface more subtle biases such as over-confidence (as opposed to optimism) and vagueness or thresholding. See my model lookback analysis for examples, as well as this article and this presentation for a taxonomy of the different kinds of bias we kind find in a lookback.
Use plots with constant sequence lengths (like the sliding window plot). The commonly used cumulative sequence plots can be extremely misleading, as the sequence length varies across the plot (there I go again). See my model lookback analysis for examples.
Use bias modelling and Bayesian inference to understand your company’s systematic bias: are you optimistic or pessimistic, over-confident or unnecessarily vague. This analysis shows where you lie in these biases, as well as the uncertainty on this evaluation which inevitably arises if there aren’t that many wells in your lookback. See slides 5 and 10 in my model lookback analysis.
Send me your pre-drill predictions and final outcomes, and I will send you a pdf with a full lookback analysis (like this one) as well as a zip file with all the plots.
2) P99 and P1
Try to parameterize distributions with expected value and variance, rather than percentiles. And if you must use percentiles, use P90 / P10, P85 / P15 or P80 / P20.
Expected values and - to a lesser degree - variances are much easier to infer from and check against historical data.
They are also, typically, what matters in an evaluation, in the sense that if evaluations are used for portfolio prioritization then all other information than the expected value and variance tends to get washed away in the portfolio aggregation.
The exception to this are log-normal distributions and their components. Here, it is the expected value and variance of the logarithm that matters. Practically, this means we should parameterize with median and P10/P90 ratio.
3) Weakest link and unwarranted independence
Draw the probability tree for the risk elements you are assessing and recognize that as you go down the tree each event of which you are assessing the probability is conditioned on the success of all the prior events on the tree.
You’re still multiplying probabilities together, but the probabilities are conditioned on the previous probabilities, so you now have full opportunity to account for all dependencies.
4) The use of 50% probability in the absence of any information
If you are in a known hydrocarbon province then historical rates are excellent prior information on which to base probabilities.
If you are boldly drilling where no man has drilled before then you will need to use analogues. Bear in mind, though, that the province you are now analyzing is an instance of the set of all potential provinces, both those we have drilled and those we passed over.
Ideally use this prior information as a starting point in a Bayesian analysis, where the specific data you have is used to refine those probabilities successively. Ask me how.
5) Naive use of historical statistics
This is tricky. Wise use of historical statistics is incredibly powerful, but not at all trivial
The best approach is to use historical data to establish prior probabilities that are then updated using Bayesian analysis of the data pertaining to your specific prospects.
Under all circumstances, simple “sanity checks” - checking predictions against historical performance - can raise important flags without any sophisticated analysis
6) Log-normal (hydrocarbon filled) GRV
This is really tricky, the subject of at least an article in itself. I teach a three day course on just the basics of this and am currently collaborating with geologists to bring it to a week. But briefly:
Map an area - depth relationship for top and base surfaces
Model the uncertainty in the top and base surfaces. You might, kindly, assume the uncertainty to be symmetric, then you can use a normal distribution.
Model the leak point. I recommend an exponential distribution defined over a lateral distance from the crest and parameterized using historical data from the basin or analogue data from similar basins. Plays with good top seal that are typically filled to spill will have correspondingly low probability of failing on the flank.
Model charge / migration (straight line cumulative relationship based on probability of no fill at all) Can obviously leave this out if charge / migration is not an issue
Model top seal capacity (same)
Model known discrete seal risks (downdip faults)
Convolve the above (monte carlo) and then repeat for all segments, modelling common risk factors.
Sanity check the resulting distribution against historical statistics, or account for missing variance with a hidden variable model or use Bayesian updating to incorporate the historical prior.
7) Risk and uncertainty “getting on the curve”
Draw the probability tree.
Whatever intermediary events you want to analyze, if you are going to count successes and compare them with predictions, then you have to count what you predict. This means your probability of success has to be success - however you define that - at the well you drill.
Likewise, a stochastic economic model will model the events as they occur, so the probability required for economics is the probability you find oil, not just the probability it exists.
Probability of a small drop of oil somewhere in the trap (sometimes, upsettingly, referred to as risking the P99 - see below) is a good place to start, but it can’t be the the final probability.
The volume distribution needs to be the volume distribution on which the decision to proceed is based. This will be the volume distribution you have in the event your exploration well is successful. Thus the probability distribution is defined by the event you find oil. AND NOT THE OTHER WAY AROUND! (This is why it doesn’t make sense to “risk the P99” - the P99 (which you should never use anyway) is defined by the event you risk, the event you risk can not be defined by the P99. The curve is defined by the event, so the event can’t be getting on the curve.
So if you drill down-dip then you will have a low probability of success, but in the event you find something the volume distribution will have high probabilities for a large range of volumes. Conversely, drilling at the crest gives a high probability of success, but a volume distribution that falls off quickly with volume.
Similarly, if you drill into a small block with a DHI sitting on top of a big structure with no apparent amplitude then the probability of success will be fairly high, but the volume distribution in the event you find oil will still have low probabilities for all but the small volume in the block.
8) Data quality and ambiguity
The Bayesian approach mentioned in 4-6 above makes it very clear that data can be ambiguous even when it’s good quality data. Some features, however clearly you see them are as likely to be seen in a success as a failure case and thus provide no evidence with which to move the probability.
It is though true that in order to move a probability substantially, you need unambiguously to observe something that is much more likely in the success case than in the failure case.
9) Stretched beta and pert distributions
Use unstretched vanilla beta distributions for ratios and uncertainties that vary between 0 and 1 (porosity, permeability, water saturation, net to gross, etc.)
Use expected values, variances or sensible percentiles to parameterize distributions (not max / min or silly percentiles)
Use maximum entropy distributions with as few parameters as possible (exponential, log-normal).
For resource distributions (GRV, EUR, etc.), recognize that what matters in a distribution is expected value and variance (or expected value and variance of logarithm - if uncertainty is used multiplicatively). This is what we need to get right. Don’t worry about unrealizable tail values: they’re there to get the statistics right. If you don’t like them and have to truncate them, then fix the expected value and variance after truncation.
10) Parameterization by mean or mode
Parameterization by mean / expected value is fine if the distribution is not too terribly skewed, long- or heavy-tailed and you have data. Otherwise tail values pull the expected value in a way that is difficult to assess or that requires many data.
In the absence of many data, or if distributions are long- or heavy-tailed or very skewed then medians are much more robust and reliable - both from data and from expert elicitation.