By Wolfgang Beywl and Amy Gullickson

2. Amy Gullickson (biography)
Efforts to develop evaluation in transdisciplinary research have mostly been conducted without reference to the evaluation literature, effectively re-inventing and re-discussing key concepts. What do transdisciplinary researchers need to know to build on the in-depth knowledge available in evaluation science?
Here we add to other key contributions about evaluation in i2Insights, especially:
- Belcher and colleagues, who provide a tool for evaluating transdisciplinary research
- Nagy and Schäfer, who describe how to systematically design transdisciplinary project evaluation
- Meagher and Edwards, who provide a framework to evaluate the impacts of research on policy and practice
- Louder and colleagues, who discuss ways of choosing a framework to assess research impact.
We focus specifically on the origins and the current state of the evaluation field.
Evaluation science origins
Evaluation science has evolved over five generations starting in the mid-19th Century (Stufflebeam and Coryn, 2014; Alkin, 2022).
The first generation of modern evaluation (“measurement”) favoured methodological (statistics, surveys) and technological innovations such as performance measurement in schools. Although also inspired by other disciplines (for example, agronomy with rigorous experiments in plant cultivation), evaluation was primarily undertaken and developed in the context of education.
The second generation (“analysis”) commenced about 1950. Goals of (curricular) programs were defined precisely, the interventions assigned to them and, if possible, research-based assumptions about the mechanisms of impact, were critically analysed. Textual-visual models of program logic emerged, which became a core component of evaluation (eg., the Context, Input, Process and Product (CIPP) evaluation model: Stufflebeam, 1969; Stufflebeam and Zhang, 2017). Data collection, analysis, and interpretation were increasingly systematically rule-based within the framework of a critical-rationalist social science methodology.
The third generation (“valuation”) from the end of the 1960s elaborated criteria as a reference for the valuation processes constitutive for evaluation. Thus, the goals set by policy or program directors were questioned by evaluators, alternative values were brought into play, and the dependence of the evaluation process on social or cultural (power) constellations was worked out. It was increasingly doubted that evaluation could be politically and socially neutral.
Since the end of the 1970s, the fourth generation (“negotiation”) focused on the brokering of evaluation criteria between stakeholders who have interests in relation to the object to be evaluated (evaluand) (Guba and Lincoln 1981). The claim: evaluation should provide information for as many legitimate stakeholders as possible that would be potentially useful to them and actually used by them. Evaluation itself became the object of systematic description and valuation within the framework of meta-evaluation. The central reference document for this was the “Program Evaluation Standards”, first published as a book in 1981 (now in its third revised edition, Joint Committee, 2011). In many countries, these evaluation standards, originally based in the field of education, were regarded as the criteria to use for evaluations in broad policy domains.
The fifth generation of evaluation (“engagement”) emerged in the 2000s. With globalization and the awareness promoted by natural science research, not only the values and interests of the respective stakeholders but also those of future generations came to the fore (Gullickson and Hannum, 2019; Roorda and Gullickson, 2019). The economically and technologically developed industrial nations influenced living conditions all over the world and understanding of Gaia emerged as a hardly predictable complex system. The key question for evaluation science now is what role should it take in view of urgency and unpredictability (Better Evaluation, 2022; Patton, 2019; https://bluemarbleeval.org; Uitto, 2019)?
Evaluation science now
As in any emerging scientific community, there is no authoritative definition of evaluation. Following innumerable textbooks, journal articles and debates, the field has accumulated the following definitional elements with increasingly established technical terms (some in italics below):
- Evaluation can be defined as a scientific endeavour and professional service, which reasonably exhaustively describes and valuates evaluands (ie., programmes, projects, measures, policies, etc.).
- It is guided by purposes (eg., improvement, fundamental decision, accountability*) and evaluation questions (eg., how well did this programme increase equity of marginalised populations?), which are clarified collaboratively by clients and stakeholders.
- The achievement of the prioritised evaluation purposes and answers to the evaluation questions should be reflected in stakeholder utilisation, an important prerequisite for generating the intended influence on both the evaluand and wider social, economic, natural etc. systems.
- To obtain reliable information for the descriptive task, evaluation uses a wide range of empirical, especially social science, methods: qualitative, quantitative, and mixed.
- For the valuation task – a uniqueness – ie., the determination of context-independent merit, situationally bound worth and socially attributed significance – criteria (and threshold points, if applicable) are clarified in the evaluation process, and transparent valuation procedures also take place in a methodical fashion (to see more: Gullickson, 2020; Balzer et al., 2020).
Evaluation and transdisciplinary research in future
Given current trends in evaluation science, we expect that it could advance the efforts of transdisciplinary research to address the wicked questions and challenges of the Anthropocene by providing tools that help focus attention on values and stakeholders. What do you think? Are any of the resources we cited herein useful for your practice? Where and how do you think evaluation research could support transdisciplinary research? Are there other aspects of the history and current state of evaluation science that transdisciplinary researchers should be aware of? Do you have a favourite evaluation tool or resource to share?
* “accountability” replaced the incorrect word “enhancement” three days after the blog post was published.
References:
Alkin, M. C. (2022, in press). Evaluation roots. 3rd Edition, Sage: Los Angeles, United States of America.
Balzer, L., Laupper, E., Eicher, V. and Beywl, W. (2020). The key to evaluation. 10 steps – ‘evaluiert’ in brief. Swiss Federal Institute for Vocational Education and Training, SFUVET: Zollikofen, Switzerland. (Online – open access): https://www.sfuvet.swiss/evaluiert
Better Evaluation. (2022). Footprint Evaluation. (Online): https://www.betterevaluation.org/en/themes/footprint_evaluation
Guba, E. G. and Lincoln, Y. S. (1989). Fourth generation evaluation. Sage, Newbury Park, United States of America.
Gullickson, A. M. (2020). The whole elephant: Defining evaluation. Evaluation and Program Planning, 79. (Online) (DOI): https://doi.org/10.1016/j.evalprogplan.2020.101787
Gullickson, A. M. and Hannum, K. M. (2019). Making values explicit in evaluation practice. Evaluation Journal of Australasia, 19, 4: 162–178. (Online) (DOI): https://doi.org/10.1177/1035719X19893892
Roorda, M. and Gullickson, A. M. (2019). Developing evaluation criteria using an ethical lens. Evaluation Journal of Australasia, 19, 4: 179–194. (Online) (DOI): https://doi.org/10.1177/1035719X19891991
Joint Committee on Standards for Educational Evaluation. (2011). The program evaluation standards. A guide for evaluators and evaluation users. 3rd Edition, Sage: Thousand Oaks, United States of America.
Patton, M. Q. (2019). Blue Marble evaluation. Premises and principles. Guilford Press, New York, United States of America
Stufflebeam, D. L. (1969). Evaluation as enlightenment for decision-making. In: W. H. Beatty (ed.), Improving educational assessment and an inventory of measures of affective behavior. The Association for Supervision and Curriculum, pp. 41-73: Washington D.C., United States of America.
Stufflebeam, D. L.. and Coryn, C. L. S. (2014). Evaluation theory, models, and applications. 2nd Edition, Jossey-Bass: San Francisco, United States of America.
Stufflebeam, D. L. and Zhang, G. (2017). The CIPP Evaluation Model: How to evaluate for improvement and accountability. Guilford Press: New York, United States of America.
Uitto, J. I. (2019). Evaluation for the Anthropocene: Global environmental perspectives. Evaluation Matters—He Take Tō Te Aromatawai, 5. (Online – open access) (DOI): https://doi.org/10.18296/em.0044
Biography: Wolfgang Beywl PhD is professor at the Institute for Continuing Education at the School of Education at the University of Applied Sciences of Northwestern Switzerland in Windisch and science director of Univation, Institute of Evaluation, Cologne, Germany. He has published several textbooks, chaired the Evaluation Standards committee of the DeGEval-Evaluation Association (Austria and Germany), and developed the longstanding post graduate evaluation training program at Berne University. His work helps to demonstrate the value and impact of transdisciplinary educational research and evaluation.
Biography: Amy Gullickson PhD is the Director of the Centre for Program Evaluation at the University of Melbourne in Australia, and the Chair of the International Society for Evaluation Education. She led development of the fully online Masters and Graduate Certificate programs in evaluation at the University of Melbourne, and has been deeply engaged with the Australian Evaluation Society in the development and refinement of their competencies for evaluators. She spends her time conducting evaluations, teaching people to conduct evaluations, teaching people to teach others how to conduct evaluations, and teaching organisations how to integrate evaluation into their day to day operations – and doing research on all of the above.
Thank you for this interesting and concise summary – and especially the inspiring discussion that evolved here in the comment section! As a researcher on the effects of transdisciplinary research rather coming from a TD research background than from evaluation research it gives me a helpful overview of the broad debates and literature in evaluation studies.
Josefa, we are very pleased with your feedback. We worked on our short blog post for as long as it takes to carry a baby elephant into the world. And we got help with the text work from Gabriele Bammer as ‘midwife’. Your feedback is a great reward for us, because it shows that we are getting closer to our goal: To raise the mutual interest of the two communities: Transdisciplinary Research and Evaluation Science for each other. Many conversations with representatives from both sides have shown us that there is hardly any in-depth knowledge about the other domain. We are convinced that TDR will become even more important in the near future. It is important for the evaluation community to realize this.
what Wolfgang said! 🙂
Thank you for a thought-provoking article. A couple of points I’d add.
Evaluation phases have been influenced by phases in other, related fields like management. Scientific management (Taylorism) endures as the search for generalizable models of best practices. The Human Relations School (reacting against Taylorism) emphasized culture, context, and gave us the Hawthorne and Halo effects. Drucker’s MBO (Management by Objectives) gave us SMART goals and was reacting against the worker satisfaction priority of the human relations school. Strategic management and the Quality movement were reactions against MBO, and that led to Complexity framings from Snowden (Cynefin) and Jim Collins (“Great by Choice”) represented in evaluation by Developmental Evaluation. Evaluation has been influenced by these developments in parallel fields.
I’ve been looking at how our historical development has generated a set of 10 dimensions on which evaluation approaches vary: https://youtu.be/YReAFxv_31s
This came from an inventory of over 100 evaluation approaches: https://youtu.be/GEGtBnkDyBk
I appreciate and applaud the framing of “evaluation science.” I work increasing in international transformation initiatives with people in sustainability science, systems science, complexity science, climate science, development science, transformation science, and so representing “evaluation science” and positioning evaluation as science is useful. As I wrote in an article on Evaluation Science in the American Journal of Evaluation:
“Both science in general and evaluation in particular are evidence-based processes with conclusions derived from systematic inquiry to understand and explain how some aspect of the world works. The credibility of scientific evidence is under attack. Guilt by association, the credibility of evaluation evidence is diminished. To defend the value of scientific evidence, then, is to defend the value of evaluation evidence. It is in our interest as evaluators to make common cause with those who support science.”
“Science is systematic inquiry into how the world works. Evaluation science is systematic inquiry into how, and how well, interventions aimed at changing the world work. Evaluation science involves systematic inquiry into the merit, worth, utility, and significance of whatever is being evaluated by adhering to scientific norms that include employing logic, using transparent methods, subjecting findings to review, and providing evidence and explicit rationales to support reason-based interpretation, valuing, and judgment.” (2018, AJE)
Blue Marble Evaluation (global systems change) is positioned as evaluation science in the context of the Anthropocene and the climate emergency: https://youtu.be/7s4fbY5Ynvw
The latest (current stage) of evaluation will be determined by our response to and engagement with the polycrises that threaten the future of humanity: https://youtu.be/qRYH2TkcUGc
Thank you Michael for broadening perspectives on the history and many disciplinary foundations of evaluation science. It is – as you convincingly present – multi-disciplinary based. This requirement is formulated, among other sources, in the title of the Journal of MultiDisciplinary Evaluation https://journals.sfu.ca/jmde/. In my view, the natural sciences should be involved in the discourse and practice of evaluation even more than they have been up to now.
Another important point is that evaluation increasingly sees itself as transdisciplinary in the sense that instrumental use is sought for the broad range of (also future) stakeholders involved in the evaluative process (see Transdisciplinary (general relevance) in the List of terms: https://i2insights.org/index/integration-and-implementation-sciences-vocabulary/.
We also see high potential and urgency in the intensified cooperation between transdisciplinary research and evaluation science.
Wolfgang
Hi Michael
Thanks for clarifying your perspective on evaluation science, and for the thoughtful work you have been doing to bring in complexity science to our evaluation practice. I have been particularly grateful to see your work on principles (criteria) as a way to navigate our framing of challenges and strengths, to help us see the interconnected nature of the disciplines and our interventions, and explore ways to gather information, learn, understand, and adapt so that we can survive the Anthropocene.
I can’t upload images here, but I have modelled evaluation as a cross-discipline with a hub, spoke and wheel model. The hub at the center is evaluation, which stands alone (as Wolfgang and I described in our post) as its own discipline with specific knowledge. The wheel is a wide circle made up of individual “discipline” bubbles (e.g., environment, public policy and administration, social sciences, international development, health, etc.). Each of these bubbles has an E inside of it, to show that evaluation is inside these disciplines already. The spokes are arrows going back and forth between evaluation and each discipline; they are a series of individual feedback loops that connect evaluation to these other disciplines. For evaluation and all the disciplines to fully realize their potential, we need to close the feedback loops so that evaluation is fully engaged with all disciplines in a reciprocal way – the work they do informs the development of evaluation as a discipline, and the work of evaluation as body of unique knowledge and practice is engaged with and embedded in all disciplines. In our current state those arrows are probably more of a dotted line both ways, so we have work to do!
Thank you Wolfgang and Amy, really useful summaries and links to related blog posts. I note your introductory comments about not needing to re-invent the wheel when developing and implementing evaluation in different settings or disciplines – so true.
Something that you did not mention explicitly, but is perhaps implied by your descriptions of the fifth generation in evaluation, is the emphasis in recent years in evaluation discussions (though perhaps not so much in practice!) of systems thinking and complexity science. I wonder if you see insights from those domains as fitting into the schemas that you presented?
Reference: Gates, EF, Walton, M & Vidueira, P (eds) 2021, Systems and complexity-informed evaluation: insights from practice, New Directions for Evaluation, no. 170.
Hi David
Yes, systems thinking and complexity science are deeply integrated into Generation 5. Michael Quinn Patton, who posted above, has cited several of his resources that explore this connection. Because causality is often non-linear in complex spaces, understanding and articulating values/principles/criteria (the words often get used interchangeably) becomes an important way to navigate. When we switch the focus to values, rather than aiming for specific outcomes, we can evaluate how we’re going based on them, and continually adjust along the journey as the circumstances change around us.
To help me navigate complexity better, I’ve started to engage with Aboriginal ways of thinking and learning – the Aboriginal view is inherently a systems view. I recommend Sand Talk by Tyson Yunkaporta and Braiding Sweetgrass by Robin Wall Kimmerer. I’m still working on what this means for evaluation. How might these ideas influence TDR? And evaluation and TDR together?
Thanks. Wolfgang and Amy, for your paper – and for clarifying that the different generations of evaluation live on in current practice.
I would caution however about relying so exclusively on analyses of educational evaluation, and review of evaluation practice done by educational evaluators. When Dugan Fraser and I reviewed the history of development monitoring and evaluation we identified 8 different approaches and underlying assumptions about the intended uses of monitoring evaluation – which all still live on in some form and are also relevant to other fields. We grouped these in terms of whether monitoring and evaluation was primarily understood to work by providing information for decision-makers or by changing behaviour through incentives. Some of these seem to be important types of evaluation to consider when surveying the field and learning from past experience.
PROVIDING INFORMATION FOR DECISION-MAKERS
If development is mostly about… And evaluation supports development by… Then the type of evaluation needed will be…
1. [If development is mostly about…] Choosing the right programs to invest in .[..And evaluation supports development by…] Helping to manage investment risk … [Then the type of evaluation needed will be] ..Ex ante impact evaluation.
2. Effective planning and management of projects and programs …Helping to clarify what needs to be done and providing ongoing feedback on progress …Performance monitoring (logical framework analysis, results-based management); external review; rapid rural appraisal.
3. Scaling up effective projects and programs …Identifying “what works” and monitoring fidelity of implementation… Experimental or quasi-experimental evaluations that provide estimates of average net impact; compliance monitoring of activities.
4. Translation and adaptation of appropriate technology … Identifying what works in what circumstances and supporting implementers in translating findings to new situations…Explanations of how interventions work, and under what circumstances (“what works when,” “good practices,” realistic evaluation).
5. Resilience in the face of uncertainty and rapid change …Supporting ongoing adaptation and responsiveness ..Real-time evaluation to support front-line workers; dialogue between partners.
6. Supporting local people in becoming agents of their own development …Supporting beneficiaries and other stakeholders in developing and sharing solutions, including managing, conducting, and using evaluation; learning from success …Participatory approaches; strengths-based approaches; building national capacity for evaluation.
CHANGING BEHAVIOR THROUGH INCENTIVES
7. Donors and central government ensuring that national partners (government agencies and NGOs) do the right thing …
(a) Identifying those not doing the right thing so they can be sanctioned, and motivating all to do the right thing … Upwards accountability reporting),
(b) Identifying those doing the right thing so they can be rewarded, and motivating all to do the right thing .. Rewards systems , performance-based aid.
8. Civil society ensuring that government agencies and NGOs do the right thing …Identifying and sanctioning those not doing the right thing and/or supporting agencies to improve their performance…Community accountability.
Source: Rogers, P. and Fraser, D. (2014), ‘Development evaluation’, in International Development: Ideas, Experience and Prospects, Oxford University Press, United Kingdom. Open access version available at https://idl-bnc-idrc.dspacedirect.org/handle/10625/51551
Dear Patricia
With your typology you make an addition that is of great value for TDR as well as for evaluation: among other things: what does TDR/EVAL want to trigger, to be either usefeul or influential, and what mechanisms underlie this?
In our paper, we focus on how the status and process of identifying values and criteria has changed over the last 170 years (gen1 – gen5). This is currently a prominent topic again in the discussions of transdisciplinary researchers (see e.g. the contribution by Belcher et al. ). There are two levels: 1) How can TDR be systematically evaluated (in evaluation this is meta-evaluation). 2) How “good” are measures (e.g. climate services- in evaluation we often call these evaluands “programmes”) with which TDR is linked. We therefore think that values, criteria and valuations are an outstandingly important common theme of TDR and evaluation.
Wolfgang
That makes sense. Metaphors are tricky things. Gerald Midgley talks about ‘waves’ of systems thinking, and I made the same point to him. Waves lap against the shore and then retreat leaving only a faint mark and only if the tide is going out! Generations is better, since we really do live in a multigenerational world.
It would be good to reference Tom’s earlier contribution, Reconstructing professional ethics and responsibility: Implications of critical systems thinking. Evaluation 2015, Vol. 21(4) 462–466. I think it is open source.
Interesting comments about the generations of evaluation. I’m not sure that is quite the right word, since it can imply replacement, when in fact all the ‘generations’ are very much alive. That adds to the confused identity of the craft. I’m surprised that the more recent notion of ‘generating value (Schwandt, Gates and others) isn’t identified although it may be accommodated in your fifth item in the ‘evaluation science now’ paragraph. Incidentally why ‘evaluation science’. I’ve never been convinced that identifying evaluation as a ‘science’ is helpful – especially when the ideas of of what constitutes ‘science’ itself is so contested.
Bob Williams
Hi Bob – always a pleasure to hear from you and have you provoke us to greater clarity in our thinking and communication!
Generations, to me, refers to the DNA of evaluation – the history of how evaluation evolved and the foundational genetics that inform how it functions over time. Our ancestors are never far from us; they are the foundations of our bodies and our being. Therefore, thinking and understanding more about them can help us trace back where we come from both in our physical genetics and our evaluation DNA.
The Schwandt and Gates book (2021) is a great recent contribution that fits in our fifth generation. Thanks for bringing that into the conversation.
Best regards – Amy
Reference: Schwandt, T. A. & Gates, E.F. (2021). Evaluating and valuing in social research. The Guilford Press.
Thank you Bob for pointing out the possible misconception that there are only a few or even one living generation of evaluation left. Evaluation lives in a multi-generational house. Each generation has deep convictions that their way is the best. But also learn from predecessors. For example, the members of the 1st generation continue to occupy a large section of the building: in universities and companies, they use standardised assessment scales to survey the satisfaction of students or clients. They assume that this provides important impulses for change. Another example: the “Logical Models” (prototype is Stufflebeam’s C.I.P.P. from 2nd gen.) are essential for later generations. In Generation 5, however, there are also doubts about their worth. An impulse to reflect, just like yours.