Should I trust that model?

By Val Snow

val snow
Val Snow (biography)

How do those building and using models decide whether a model should be trusted? While my thinking has evolved through modelling to predict the impacts of land use on losses of nutrients to the environment – such models are central to land use policy development – this under-discussed question applies to any model.

In principle, model development is a straightforward series of steps:

   • Specification: what will be included in the model is determined conceptually and/or quantitatively by peers, experts and/or stakeholders and the underlying equations are decided

   • Coding: the concepts and equations are translated into computer code and the code is tested using appropriate software development processes

   • Parameterisation: here the values that go into the equations are determined by a variety of methods

   • Testing: the model is compared against data using any of a wide range of metrics, the comparisons are examined and the fitness of the model for the intended purpose or scope is decided. Bennett and colleagues (2013) give an excellent position on the variety of statistical approaches that can be used for this purpose.

In reality, of course, these steps do not take place in an orderly progression, there are many loops backward, some of the parameterisation and testing occurs in parallel to the coding and the first step is often re-visited many times.

It is mostly assumed that assessment of ‘trust’ or ‘confidence’ in a particular model should be based on the metrics or statistics resulting from the comparison of the model outputs against experimental datasets. Sometimes, however, the scope of the testing data and whether the model has been published in a good journal are also taken to imply confidence in the model. These criteria largely refer to that last testing step and this focus is understandable. Of the steps above, testing is the one mostly readily documented against accepted standards with the results made available externally. However, even with a quantitative approach to testing, Bennett and colleagues note that the actual values of the statistics that are considered to be acceptable are a subjective decision.

While I agree with the approach and need for quantitative testing, the testing results themselves have very little to do with my confidence or trust in a model. My confidence will evolve over time as I become more familiar with the model. By the time I am prepared to make any statements about the specific reasons for my degree of trust, the reasons for that trust will largely have become tacit knowledge – and that makes it very difficult for me to explain to someone else why I have confidence (or not) in that model.

Here I have attempted to tease out the factors that influence my confidence in a model. I should note that my trust in the models I have been involved in developing, or that I use at an expert level, can fluctuate quite widely and wildly over time so, for me, the process of developing trust is not a linear process and is subject to continual revision. I assess four key areas concerning the model using a range of questions, as follows:

Area 1. The nature of the problem domain: Are the ‘correct’ outputs even measureable? How mature is the science community’s understanding and agreement of the conceptual and quantitative processes that must be included in the model? What constraints and deliberate assumptions have been included? Will these assumptions likely constrain error or allow (or even encourage) it to blossom?

Area 2. Software development and parameterisation: Who did the work and do I have a favourable opinion of their other modelling activities? What documented software development processes did they use? Do they use a reliable version control system and can I compare older versions of the model to the current version? Is the documentation sufficient and sufficiently well-presented that I can, for the large part, understand the workings of the model and its implementation assumptions? If I need more detail can I (or can I get someone else to) dive into the code to understand more detail? How open/transparent does the process appear to be? Can it be readily reviewed by others?

Area 3. Developer’s testing: What have the developers done with respect to testing? Does it feel robust (eg., basic things like not reusing data used for parameterisation, but also have they delved into and explained reasons for poor performance)? Have they relied mostly on reporting statistical values or are there extensive graphs that are appropriate for the domain of the model?

Area 4. User’s experience: Is the model user interface set up in such a way that I can investigate the model’s behaviour as inputs, settings and parameters are changed? When I do this investigation, how often does the model ‘surprise’ me? How many of those are “Wow!” surprises (meaning I thought the model would be unlikely to behave well but it did), how many are surprising surprises (the model outputs can be rationalised and even make sense once investigated) and how many are “Really!?!” surprises (the model outputs do not make sense in any way that I can explain and/or they seem to be in conflict with the developer’s testing or documentation)? When I get the last type of surprise: is the model constructed in such a way that I can understand the extent to which that surprise will flow through to outputs that matter or are the effect of any such surprises likely to be minimised or cancelled out by the way the model is constructed?

These questions are how I develop trust in a model. Do my questions align with your criteria or have I missed critical points? Do you have a completely different process for developing trust in a model? My approach is probably strongly tuned by my experience with mechanistic or process-based models (where the model is intended to represent an expert’s opinion of how the system works rather than being driven by data). Given that, if you work with a different type of model, does your approach to developing trust work differently? Might you place more reliance on comparison to data? I’d value your thoughts.

Bennett, N. D., Croke, B. F. W., Guariso, G., Guillaume, J. H. A., Hamilton, S. H., Jakeman, A. J., Marsili-Libelli, S., Newham, L. T. H., Norton, J. P., Perrin, C., Pierce, S. A., Robson, B., Seppelt, R., Voinov, A. A., Fath, B. D., Andreassian, V., (2013). Characterising performance of environmental models. Environmental Modelling & Software, 40: 1–20. Online (DOI): 10.1016/j.envsoft.2012.09.011

Biography: Val Snow is a systems modeller at AgResearch in New Zealand and comes from a soil physics and agricultural science background. Her research focuses on the development and use of simulation models to support technological innovation in pastoral agricultural systems and assessment of the impacts of land use. Application areas include land use policy, future farming systems, greenhouse gas mitigation and climate change adaptation.

This blog post is one of a series resulting from the first meeting in March 2016 of the Core Modelling Practices Pursuit. This pursuit is part of the theme Building Resources for Complex, Action-Oriented Team Science funded by the National Socio-Environmental Synthesis Center (SESYNC).

18 thoughts on “Should I trust that model?”

  1. I only found this today, 3 years later, so I am not sure anybody will read it, but anyway: What Val write resonates a lot with our TRACE initiative. TRACE is standard format for documenting all aspects of model development, testing, analysis, and usage. It was developed in the context of using ecological models for regulatory risk assessment of pesticides. Regulator kept asking: When should we trust a model, and how much?

    EFSA, the European Food Safety Authority, has incorporated most elements of TRACE in their guidance for documenting models. Otherwise, in the scientific literature, there so far only 30 or so articles coming with a TRACE document, but I have been told cases, and also experience my own, where skeptical reviewers were convinced by looking up the TRACE document.

    Of course, not all the points raised by Val are covered by a TRACE document, but I think it is useful approach.

    Schmolke, A., Thorbek, P., DeAngelis, D. L., & Grimm, V. (2010). Ecological models supporting environmental decision making: a strategy for the future. Trends in ecology & evolution, 25(8), 479-486.

    Augusiak, J., Van den Brink, P. J., & Grimm, V. (2014). Merging validation and evaluation of ecological models to ‘evaludation’: a review of terminology and a practical approach. Ecological Modelling, 280, 117-128.

    Grimm, V., Augusiak, J., Focks, A., Frank, B. M., Gabsi, F., Johnston, A. S., … & Thorbek, P. (2014). Towards better modelling and decision support: documenting model development, testing, and analysis using TRACE. Ecological modelling, 280, 129-139.

    Nabe‐Nielsen, J., van Beest, F. M., Grimm, V., Sibly, R. M., Teilmann, J., & Thompson, P. M. (2018). Predicting the impacts of anthropogenic disturbances on marine populations. Conservation Letters, 11(5), e12563.
    TRACE document:

    Volker Grimm, Leipzig

    • Volker, thank you for your considered comments that have, I think, added to the discussion on this post. I am, of course, familiar with your work on transparency through documentation. This is a crucially important part of robust modelling for a variety of reasons. I always wonder how many modellers have written documentation after development and had an “oh!” moment when it revealed a worrisome assumption or gap? I think that agricultural systems modellers have a lot to learn from the ODD/ODD+ and TRACE standards developed by the agent-based community.

      As pointed out, structured and open documentation such as TRACE assists experts to understand the model and to learn or use the information as a starting point for new/adapted models. I found the example that you presented clear and comprehensive. I have considerable general modelling expertise but not in marine mammals, noise propagation, or behavioural implications when the two come together, so I am not fully in your “sceptical reviewer” category. However, the existence of the open and clear documentation means that I know that it could be reviewed by an expert that I might trust and that confers a degree trust from me to the modellers and the model. When it is necessary or desirable that non-experts have some trust in a model, this secondary or conferred trust is terribly important. While such open documentation will not itself resolve issues associated with cultures and beliefs that are not shared, it can provide a starting point to move beyond arguing about the model and, as Pierre Glynn suggests (see below), finding some commonality and growing trust from there

  2. Val, I really enjoyed reading your blog on the Trust building process. I think it makes a lot of sense. In particular though, I liked your comment that “By the time I am prepared to make any statements about the specific reasons for my degree of trust, the reasons for that trust will largely have become tacit knowledge – and that makes it very difficult for me to explain to someone else why I have confidence (or not) in that model.”

    To me this is quite interesting. It means that Trust fundamentally is about building beliefs, or at least some type of inherent “knowledge” (and by the way, one classic definition of knowledge is “justified true belief”). What I did not see mentioned in your blog is that often trust in one thing, such as a model, is acquired simply because of other shared beliefs (or trust) in some other practices or values. All this means that an alignment of shared culture and beliefs also facilitate building trust in models created/applied through participatory processes. The challenge though is how to build trust when cultures and beliefs are not necessarily shared? One suggestion is to find the “shared seeds” and to “grow” understanding of different perspectives from there…

    • Thank you Pierre for your insightful comment. When I started drafting the blog I intended to write about stakeholders developing trust or confidence in a model. My own experience as a modeller was intended as a contrast to launch the discussion but I really didn’t get past the first part.

      Your point about the shared culture is a really good one. I have seen this operating in a group where a sub-group of stakeholders with that shared culture align themselves and reinforce their opinions about particular models. This can damage or derail a collaborative process with energies focussed on detailing why models are unfair/inadequate/wrong rather than figuring out how to appropriately use models to support the development of consensus. That shared culture effect can also work positively. For example a stakeholder that perceives an alignment with a modelling expert in the delivery team, develops a trust relationship with that expert and so bypasses the need to really understand the model – essentially the expert is trusted rather than the model in this case. That puts an additional responsibility on the modeller to be, and be perceived to be, scrupulously fair or independent in their work.

      It might be interesting to flesh out more of the factors influencing stakeholder trust and then follow on with how best to manage the collaboration or co-development processes to avoid as much as possible those negative effects and to take advantage of the positive reinforcements.

  3. I think there is an opportunity for conducting experimental research to explore the questions around trust in the model, and bring some empirical evidence into different modelling processes and products to influence trust. A good study in this direction is the work done by Monks (2010) ( who tried to explore how the involvement of stakeholders in the modelling process from the very beginning versus the use of reusable modelling components influence the way users perceive the model’s credibility, and their confidence in results.

    • Thanks Sondoss and also to Josef below (that thread seems to have reached its limit) as these issues are all intertwined. The transition from model developers to stakeholders is where it gets even more interesting and doing something on this in the SESYNC pursuit on Core Modelling Practices (see link at the bottom of my blog post for those unfamilar with it) would be great. I’d be very interested in exploring this in further.

  4. In reply to Tim Gieseke below:

    Thanks for those further thoughts Tim – there are some great key points there that you have highlighted. While I mostly work in the science realm, your experiences with that transition from the ‘hard science’ (which tends to become less traditionally hard but more complex as the physical scale increases) to the communities reflects my more-limited experience.

    Where only unquantified directional change is needed then it is possible for those implementing policy to abandon models as part of that policy but I suspect that there are fewer and fewer of these types of policies developed.

    Increasingly land use policy has embedded demands for demonstration of meaningful progress, cost-benefit style analysis and reporting against targets or limits and in these a quantitative model is probably unavoidable. However that unavoidability places additional demands on the model developers:
    • to understand just how robust their models are (the topic of that blog post),
    • with respect to their ability to convey their understanding of the model and its robustness (or not) to stakeholders, and
    • in their skills in assisting policy developers/implementers to design policy that is sound given the strengths and weaknesses in the models that are used in setting targets and monitoring progress.

    Thanks for your final thoughts in the comment below. They address just the type of thing that I was trying to understand – regarding how transferrable the processes might be and they align well with those from Joseph Guillaume as well.

    • Issues with earth science models have also been discussed by philosophers, which could potentially provide some normative guidance on how models should be used.
      Here are two examples:

      Oreskes, N, K Shrader-Frechette, and K Belitz. 1994. “Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences.” Science (New York, N.Y.) 263 (5147): 641–46. doi:10.1126/science.263.5147.641.

      Parker, Wendy S. 2009. “II—Wendy S. Parker: Confirmation and Adequacy-for-Purpose in Climate Modelling.” Aristotelian Society Supplementary Volume 83 (1).

      The latter approach is roughly consistent with this one:
      Haasnoot, M., W.P.A. van Deursen, J.H.A. Guillaume, J.H. Kwakkel, E. van Beek, and H. Middelkoop. 2014. “Fit for Purpose? Building and Evaluating a Fast, Integrated Model for Exploring Water Policy Pathways.” Environmental Modelling & Software 60 (October): 99–120. doi:10.1016/j.envsoft.2014.05.020.

      We try to make explicit the different requirements of a model, and check whether they are satisfied.

      • Another publication relevant here is:
        Lahsen, M., 2005. Seductive Simulations? Uncertainty Distribution Around Climate Models. Soc. Stud. Sci. 35, 895–922. doi:10.1177/0306312705053049

  5. Val Snow makes many great points in this blog post. My modeling work is in human biology. I’m an electrical engineer with a PhD in physiology. My trust in a model depends critically on the transparency of the processes and how their rate laws are formulated; I agree completely with Snow on this point. I want to see that the rate laws make physical sense. If a biochemical rate law is regulated by a molecule, I want to know if there is physical evidence for an interaction between that molecule and the enzyme that catalyzes the reaction. I want the model to be mechanistic. I am put off by black box transfer functions, especially if they imply mechanism based only on correlation/association.

    I want to know, as Snow does, if the model makes testable predictions. Can we actually do an experiment to compare model predictions with experimental data? This is equivalent to Karl Popper’s concept of falsifiability.

    In this regard I am not as influenced by parameterization methods. Nor do I worry if the data used to develop the model are re-used in testing it; it is all too common that data we considered during model design simply cannot be fitted by the first version of the model we built. Human biology is so complex that if any physically reasonable parameterization fits all the available data for many experimental perturbations, my trust in the model increases enormously.

    Ultimately, trust is built as a model accounts for more and more experimental data. But I’m always looking for data that the model cannot explain. This is when we learn something new.

    Modeling can be seen as a quantitative form of hypothesis testing.

    • Thank you for those interesting thoughts Robert. I too regard much of the modelling that I do as hypothesis testing and often refer to the activity as virtual experimentation (but am aware that many of my empirically-oriented colleagues would not agree with me). It is great to see that the issues facing the robust application of modelling have commonality across such diverse areas of application!

      I agree, in principle, that the model outputs should be comparable to and compared with experimental data and therefore the model should have the possibility of being falsified (and essential part of it being a science). One of the challenges for this in my research area (land use modelling) relates to scale. We do compare the model, or components of the model, against data at particular physical scales. Usually these are relatively small scales – patches in a paddock/field, say about 1-100 square metres. But the scale that really matters to farmers and those developing or implementing land use policy is much larger than that – 1-100 hectares (10,000 to 1000,000 square metres), so at least 4-6 orders of magnitude larger – and it is just not physically possible to obtain data at that larger scale. We use additional model components to scale up by those orders of magnitude and those components are pretty much unverifiable against data so other methods of testing and developing trust are needed.

  6. From a generalist’s perspective, I consider the models I rely on to ‘measure’ landscape sustainability trustworthy if they contribute to supporting behaviors that have the tendency to improve landscape sustainability.

    • Thanks for your comment from quite a different perspective here, Timothy. I am trying to unpack what you mean by “contribute to supporting behaviours” in this context. Does that mean that the land users have sufficiently accepted the model that they have acted on the information produced by the model? If I have got your intention correct, then perhaps could I describe your trust in a model to be your opinion of the reliability or effectiveness of the model in contributing to changing behaviour? While it might be that those who changed their behaviour developed trust using different methods (or perhaps they didn’t need to trust the model?) then it would seem that this is another aspect to the list of features that I constructed.

      • Val, thank you for wrestling with my comment and I apologize if it resides outside the intentions of your blog. I will note that I agree with your experience that “the reasons for that trust will largely have become tacit knowledge”. I also appreciated the perspective of Phair, that he, while dealing with a complex system, is able to use modeling as a quantitative form of hypothesis testing.

        In agriculture landscape sustainability, the more open system of ecology limits the reliability/reproduceability of model outputs. And then the influence of economics and social governance may cause stakeholders to discount the trustworthiness of a model because it does not meet their perspective of what should be what.

        So my loosely worded comment is weighted toward the end user of the model and in my experience the end-user is often beyond the hard science community and into the fuzzy communities seeking on-going outputs within socially-complex systems. These end-users, such as corporate sustainability supply chains, government watershed efforts, drinking water utilities, environmental insurers desire directional change first, and may give up on models as a quantitative form of hypothesis testing.

        Your four assessment areas are very helpful to bring the end-users back to identifying the trustworthiness of the models and to determine the level of trustworthiness that is needed, desired or understood. In many of my cases, I have used models and desired outputs as “environmental market signals” and so, should I trust the model enough to provide the results to accomplish these goals within this context. And so, I guess I circle back on your experience that trust will largely become tacit knowledge. Thank you for the dialogue.

        • See reply from Val Snow above. There seems to be a limit to the length of the discussion thread – we (the administrators) are investigating.

          LATER: There is a limit, we’ve now expanded it, so that discussion threads can be longer in future.

  7. I really appreciate hearing your perspective. It’s great that you identify your trust in a model as tacit. In my opinion, this is one of the key challenges in both training new modellers and helping end-users make use of models. They need to learn the (tacit) skills to form their own opinion of models, or at the very least, the modeller needs to provide scaffolding that allows them to make sense of the models and model results to some extent.

    I find it fascinating that you identify such a broad range of factors, ranging from the evidence used to the reputation of the modeller, and including your assessment of both the constituent hypotheses and the model’s behaviour. It clearly shows the complexity of assessing a model – despite the advantages compared with assessing tacit knowledge, it is not necessarily easy just because it’s a concrete artefact.

    • Thank you Joseph – I think you’ve nailed it with respect to the difficulties of models, new modellers and stakeholders/end-users. Because I use such a large range of factors and many of them are qualitative and influenced by particular context, how can I assist others to develop their own informed opinions? What can model developers do to assist that learning and development of trust? Here I am thinking of things beyond just good software processes and documentation but perhaps into what attributes that user interfaces might incorporate to assist learning. This is something that I would like to explore further sometime.

      • Accounting for uncertainty when using models is an area I’m actively exploring. It would be interesting to see what we can work on there together, perhaps even within the frame of our SESYNC pursuit.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.