Data variety and why it matters

By Richard Berry.

richard-berry
Richard Berry (biography)

What are the differing characteristics of data? Why are they important for systems to function effectively? What is requisite variety of data?

There are nine characteristics of data variety which agitate systems. These are volume, velocity, variety, veracity, validity, vulnerability, viscosity, vectors and virtualisation. Together, the ‘9Vs’ constitute a data requisite variety framework and are described below. 

1. Volume

Description: The amounts of available data.

Example: Volume can vary widely from the results of small-scale research to the tsunami of digital material accessible through the internet. The latter can overwhelm both people and organisations.

2. Velocity

Description: How quickly data move across a network.

Example: Mobile, fixed line and satellite networks can operate at vastly differing directional speeds. These determine the informational capabilities that people and organisations can develop, access and use.

3. Variety

Description: The different types of data formats.

Example: There are numerous protocols for packaging and moving data. For example, internet protocol version four (IPv4) has a 32-bit address whereas, version six (IPv6) has a 128-bit address and provides far more capacity.

4. Veracity

Description: The truthfulness of data, ie., honest intent.

Example: Data can be created and provided with the intention to deceive recipients. For example, specific spoofing apps can falsify telephone numbers and device locations. Social media ‘bots’ can create deliberately manipulative content.

5. Validity

Description: The extent that data are accurate representations of reality and the number of errors. Validity necessitates tracing data to a source and being able to explain the processing journeys.

Example: Facial recognition systems vary in accuracy due to data capture factors like position, angle, poor image quality, etc. Artificial Intelligence can produce predictive results. However, accounting for accurate processing can be technically problematic.

6. Vulnerability

Description: The extent to which data creates vulnerabilities for organisations and people.

Example: Private and public data can increase risks, for example, commercially available location data can reveal personal information about routines and lifestyles.

7. Viscosity

Description: The solidity of data, it may run away, disappear or remain static for long periods.

Example: Some data can be overwritten easily, such as telematics that track the opening and closing of car doors. On the other hand, formal records can be retained by authorities for many years.

8. Vectors

Description: The routes and travel of data.

Example: A phone call can be on the cellular network or through a Wi-Fi system.

9. Virtualisation

Description: The global location of data, where it is stored and curated in accordance with local legislation.

Example: Differing laws exist for privacy and access. Therefore, online services can provide options for ‘data residency’ to ensure users can have choices about storage locations.

Ashby’s Law and the Importance of Requisite Variety

The cybernetic principle of requisite variety is often referred to as Ashby’s Law (Ashby, 2015). We can think of it in these terms:

To remain stable a system needs enough variety to match the variety within the demands placed upon it.

The nature of these demands can be assessed using the 9Vs found in a data requisite variety framework. Such analysis can inform how a system might adapt and therefore stabilise in accordance with Ashby’s Law.

We can learn from events where requisite variety was not achieved:

  • In 2019, the UK police inspectorate reported that over 25,000 mobile devices were awaiting forensic examination. There was no plan or capability to deal with the volume and complexity of the work required. There was insufficient requisite variety to meet demand, the system became unstable and government intervention followed.
  • Concurrently, Denmark found that over 10,000 criminal cases were at risk because of fallacious geolocation data from mobile networks. Soon after the software error was discovered, over thirty prisoners were released due to concerns about unsafe convictions and another forty cases were postponed. In terms of the data requisite variety framework the validity, variety and vulnerabilities of these data could not be absorbed by the Danish justice system.
  • In 2024, the UK formally recognised its largest miscarriage of justice when false data from the Post Office Horizon accounting system were used to prosecute over 900 innocent postal staff for theft and fraud offences. In terms of the data requisite variety framework these data lacked validity and were unreliably presented as evidence of true facts, ie., veracity. Software glitches were blamed. The data vulnerabilities have resulted in criminal investigation of those involved in compiling the original cases. The Post Office and criminal justice system remain in oscillation, as efforts are made to develop stability.

Requisite Variety and Future Demands

We can expect emergent technologies like autonomous cars and more sophisticated drones to propagate new data varieties. For example, data viscosity could reflect the potentially large volumes of sensor data which will be processed by such devices. Some of these data will disappear, some will be stored locally within the devices and other data will be stored in virtualised applications. The vectors of these data could be multiple from global positioning satellites to the use of localised mobile networks. New data variety often requires systems to adapt. A data requisite variety framework can inform how to promote stability in accordance with the principles of Ashby’s Law.

Do you have other examples to share of different data variety characteristics? Are you aware of other systems which have been disrupted by a lack of data variety? How might the data requisite variety framework apply to your research area? How can our understanding of data variety be improved?

Reference:

Ashby, W. R. (2015). An Introduction to Cybernetics (4th ed.). Martino Publishing: Mansfield Centre, Connecticut, United States of America.

Use of Generative Artificial Intelligence (AI) Statement: Generative artificial intelligence was not used in the development of this i2Insights contribution. (For i2Insights policy on generative artificial intelligence please see https://i2insights.org/contributing-to-i2insights/guidelines-for-authors/#artificial-intelligence.)

Biography: Richard Berry PhD is a Fellow of the Cybernetics Society and a former police officer now based at the Centre for Information Management, Loughborough University, UK. His research interests are security cybernetics, strategy and capability leadership within complex adaptive systems.

12 thoughts on “Data variety and why it matters”

  1. Having mulled over this thought-provoking piece several times, I’m drawn to the idea of suggesting a 10th ‘V’ – value. The collection of data has a purpose, typically either related to customer behaviour, to inform prediction of future behaviour, or with regard to the functioning of either a process or machine. Lawful acquisition of those data can lead to their assessment for a range law enforcement purposes – this secondary use of the data can be ascribed a value in a variety of circumstances, each analogous with the original purpose for which the data were collected – whether in a proactive or reactive investigation; or as evidential material.

    In the criminal justice system, focus on the latter centres on the inference that counsel can invite jurors to draw, based on the presentation of the case for prosecution or defence. Absent a body of corroborative evidence, thoughtful jurors might ask whether there is any real-world evidence to support the contention of the party wishing them to draw the inference proposed. It would be inappropriate to comment on the Post Office ‘Horizon’ cases while the police investigation is ongoing, other than to say that long-established law enforcement and public prosecution practice throughout the United Kingdom is that no case is put before a prosecuting authority on the basis of data alone: there must always be a body of corroborating evidence, or evidence of which the data are corroborative.

    In DIRECTOR OF PUBLIC PROSECUTIONS v SELENA VARLACK [2008] UKPC 56, the predecessor of the Supreme Court, the Privy Council held that when ruling on a submission of no case to answer to a charge of murder, the Eastern Caribbean Court of Appeal (ECCA) had erred by failing to apply the test of determining what inferences a reasonable jury properly directed might draw, as distinct from those which the court itself thought could or could not be drawn.

    Varlack had been convicted of murder, the trial judge had dismissed a defence submission of ‘no case to answer’, which was the basis of the initial appeal ruling – the strongest evidence against the defendant was one of a well-defined series of phone calls. The effect of the Privy Council hearing of the prosecution appeal was to reinstate the conviction. At trial, the prosecutor had painted a picture, based on real-world evidence, whereby the context of each of the calls in the series (including a call made from a neighbour’s fixed line a few minutes before the murder, to the co-defendant who fired the fatal shots, killing Varlack’s former boyfriend), inviting the jury to draw the appropriate inference.

    The value ascribed to the data in such cases is that of weight, which is directly related to context. In Varlack’s case, the data weighed most heavily; but they would have been valueless without the corroborating contextual evidence. [Judicial discretion permits evidence to be excluded where the prejudicial effect outweighs the probative value].

    So, if data can be so heavily weighted in a prosecution, does it follow that data which is flawed in some way could weigh too heavily in a prosecution? Addressing that risk is where human and machine need to act in concert. In The Queen v Howell and Martin, a case involving multiple murders, it became apparent to investigators that there was a flaw in relevant telecommunications data. Exhaustive testing took place with regard to the manner in which data fields relating to geographical locus were populated. Once this was properly understood and made explicable, it negated a defence assertion of alibi and ensured the remainder of the data were capable of supporting the prosecution case in relation to other matters at issue before the court, the key point being that the flaw in the data was not such as to negate its weight with regard to other issues with which it could assist the court in determining the issues.

    These two example cases demonstrate Data Value as a subjective consideration, where the human must be equipped to identify and transparently address inherent limitations as well as contextualising with corroborating non-digital material. Data do not have to be perfectly-formed; their effective value resting in context, understanding, explicability and relevance.

    Reply
    • Wow…thanks David – What a great example of trying to apply Ashby’s ‘law’ in respect of having enough variety in a legal system to absorb the ‘entropic’ lack of validity in data.

      My take – Data are inherently probabilistic and the systems of interpretation and inference arguably require the randomised capacities to match such probabilism. We also might acknowledge that data are never isomorphic with the system they were extracted from; they are perhaps representative measures with varying levels of assurance as to what inferences may be drawn from them. We might see these challenges in geo-location, events and entities as examples.

      Networks are also dynamic, the idea of systems flowing and changing is interesting. Data are energy exchanges i.e. they exist within dissipative systems. If I take data from a network on Monday, it may not and arguably will not, be the same network on Tuesday; akin to the familiar words attributed to Heraclitus, ‘No man ever steps in the same river twice, for it’s not the same river and he’s not the same man.

      It is also interesting to consider the role of automated analytics and algorithms – the key consideration in terms of Ashby is whether the biases in algorithms amplify or attenuate characteristics such as data validity. It could be argued that data might be more safely handled as electronic opinion and the biased human and machine systems which process them treated as joint expert witnesses. There are interesting cybernetics bases for such considerations; the conversation theories of Gordon Pask spring to mind.

      We also see intellectual and commercial protections of various form of AI – presenting the idea that the system which produces these data and the system which analyses these data are essentially recursive back boxes. Neither can be exactly reproduced and are too complex to map and model. Such concepts are well known in cybernetics and this theory can be attributed to various Ashby texts from the 1950s.

      Agreed that data, on their own can signpost mistaken inference. In the cases you outlined things worked well, but does that variety attenuation exist consistently across criminal justice? It would be fascinating to explore these issues more comprehensively.

      Final point from me is the idea of value. It’s a very interesting proposal. Value can be monetary as well as utility. I think in terms of utility for evidence then we are entering what might be considered as second order cybernetics where the observer i.e. the advocate or investigators seeking to draw inference from data actually become a unique system. When I looked at data requisite variety it become clear that there were orders in the 9v characteristics and I suspect value could thus become a higher order consideration sitting above the them.

      My question: how many legal practitioners are able to safely ascribe value and absorb data variety to provide consistent evidential assurance? There is no formalised training or education, knowledge is not tested nor licenced for safe data handling. I am not saying they should be so, but I do raise the question in the light of the numerous and known systemic failings. Any thoughts please?

      Reply
  2. We’ve needed an articulation of the variety of variety for far too long. Thanks. Not any random variety will interface with that of the controlled (or interacting) system. The ontological status of ‘data’ (whether it’s rubbish or not) has long been an issue beyond AI. (This month’s ‘Prospect’ magazine holds forth on just one of the aspects in your article. https://www.prospectmagazine.co.uk/ideas/business/statistics/data/70934/why-data-driving-government-could-be-wrong)

    Our former Webmanager at CybSoc, Nick Green, with his ‘Real time Study Group’ gave a presentation to the UK Treasury about designing ‘continuous data flow’ to manage the economy. For once, in my opinion, they got it right by rejecting it while source reliability was imperfect. As a data provider and assessor in education (head teacher and inspector) I encountered the warp of Goodhart’s Law. (Ain’t camouflage nature’s own example?).

    While I expatiated in Kybernetes 1991 about Requisite Variety being a strategy not a Law, it is certainly a crucial one, and generally a gentler, more subtle one than just maiming a feedback loop in the controlled system – which is what some minimalist interventions (J Wilk) or ‘nudges’ (Thaler) amount to. The notion of ‘control’ (rather than say ‘interaction’ or dance) has a range of connotations which is sometimes overextended in this literature.

    Additional information on the work referred to – added on 5/9/25:
    The Status of Requisite Variety by David Dewhurst in Kybernetes (1991) 20 (2): 61–64. (I can email interested individuals for whom it is behind an unacceptable paywall.)
    I don’t know that Jim Wilk has a key text on Minimalist Intervention. But his Substack ‘Changers’ (https://changers.substack.com/p/what-is-mi) or his 2010 talk to the Worshipful Company of Management Consultants (https://www.wcomc.org/sites/default/files/files/Kaleidoscopic%20Change%20by%20James%20Wilk%20(1)_0.pdf) may be the best Introductions.
    The relevant reference by Thaler is his book ‘Nudge’ with Sunstein (2008 & 2022; https://www.penguin.co.uk/books/56784/nudge-by-richard-h-thaler-cass-r-sunstein/9780141999937).

    Reply
    • Thank you for making some very interesting points. Your observations about the ontological status of data rings loud and thanks for a great article in Prospect Magazine.

      Where to start? I hope I have understood sufficiently: Turing said something like “quite small errors can have an overwhelming effect at a later time.” Data about states of systems may tell us something but drawing accurate meaning can be a challenge – public services in the UK seem to have in part ignored such realties and spent decades of being lost whilst navigating the mists and myths of so called performance management.

      There seems to be an absence of appreciating randomness and nonlinearity within dynamic systems. This leads to your insightful point about the reality of applying, where possible, suitable regulatory influences. My immediate thoughts are that the notion of the strong leader who ‘grips’ remains and can be observed in traditional hierarchical bureaucracies. Yet in the 1990s Stafford Beer highlighted that such beliefs had long passed their time. Evidence from UK policing exemplifies that data variety has no regard for such cultures. There is some interesting neuroscience and scientific philosophy which appears relevant (Friston and Peirce) but I don’t have the space to unpack them here. …sorry.

      RE Ashby’s requisite variety being a law – I am not sure when a law of systems can be viewed as being established. We agree about the brilliance of Ashby’s simple but fundamental postulation which has stood many theoretical and practical challenges. My guess it is more than a refined inductive proposition which, has been re-proven/accepted many times since 1950? We strongly agree about Ashby being a strategy – My work on security cybernetics strongly signposts contemporary leaders towards moulding their strategy around requisite variety.

      Thanks for your comments once again. Great cybernetics!

      Reply
  3. We have a well-known problem of bias in data used by AI. This article suggests a more general issue of uncertainty of data validity & verification. Whilst this may not be of critical importance in many application areas, such as e-commerce recommendations etc., presumably there will be other applications where it is critical. Take Law and the Criminal Justice system for one, as highlighted below in the comments. Starting with the simplest question first: If we don’t know the 9V’s of the data, how can we rely on the results? Recent court cases might, in their own way, testify to this. Any thoughts please?

    Reply
    • It might, but from a legal perspective, we should not focus on courts as they are not structured to handle this complexity. I believe the problem is more foundational.
      We should start by examining the foundational goals of law and courts. The primary function of our justice system has never been reaching absolute truth, but rather convincing society, maintaining social legitimacy and order. From that perspective, expecting courts to show sufficient care for data variety problems is unrealistic. While there are certainly some interested parties and relevant cases, these aren’t enough to generalise a systematic approach to the situation.

      More critically, the feedback mechanisms in law cannot present adequate data for this kind of analysis. Without collective feedback mechanisms drawing from various sources beyond traditional legal channels, we cannot reach a point where we can effectively chart the path forward. Courts are severely limited as sources for this type of research, and they are largely neglected by lawmakers when it comes to systemic technological challenges.

      The temporal lag in court processes and decisions presents another fundamental problem – data from courts only reflects the past, not our current technological reality. Any outcomes derived from court-based research would only illuminate a limited area, not inform our future approach to these challenges.

      Suppose we’re serious about addressing the 9Vs problem in critical systems. In that case, we need to look beyond courts to the broader legal ecosystem and develop new governance mechanisms that can operate proactively rather than reactively.

      Reply
      • Thank you. I agree that this is a foundational issue and extending the lens to the wider legal system is required. I would suggest that whilst the goals of the justice system may not be absolute truth – nevertheless a ‘truth test’ must exist as a basis for any society to regulate itself, even more so in this post truth/fake news era. Data variety seems to be perturbating the basis of belief. Decision makers might learn a great deal from the work of Charles Sanders Peirce in respect of how we fix belief. His approach to scientific reasoning and method has already been charted within cybernetics.

        We are dealing with the notion of systemic viability. I see the core challenge as Stafford Beer did i.e., the information circuit starts in society. It seems to be increasingly accepted that all crimes have digital elements. Increased data generation ‘results in and is a result of ‘(arguably) unregulatable megatrends – surely we should reasonably expect a justice system to requisitely reflect these emergences in society? I will be following how intelligent drones change our world in the foreseeable future. The weak signals are becoming stronger.

        You rightly highlight that latency periods for law (and other traditional Institutions are too long and the responses are too late. John Beckford’s book The Intelligent Nation explores such points well.

        Ashby’s work is arguably more than a postulation which has stood the test of time, it can also be seen as a basis of effective institutional and organisational strategy. We might then examine systems of strategic leadership and governance in the context of these cybernetic challenges. Such education is available but not yet mainstream. My thoughts keep pointing to some form of cybernetic/transdisciplinary research facility to begin unpicking and advise leaders about what we all appear to be seeing. What do you think?

        Reply
    • Thank you – my sense is that there is an absence of understanding about the importance of this topic in higher risk settings like the rule of law. Data varieties within evidence are fundamental and this is where complex disciplines like digital forensics are still finding their boundaries, the UK is a useful example. For instance, some data are not verifiable, repeatable and measurable, some are ephemeral, all data have bias driven varieties by design or through algorithmic processing. This reality suggests we are dealing with differing levels of trust in data, especially those which might be used to forensically model systems i.e., reconstruct (so called) facts or events for courts. I therefore suggest the term ‘electronic opinion’ i.e., data are not the whole truth and are a product of human questioning a machine (that replies). These can be opaque and complex conversations. Published research shows significant and unreliable story varieties can sometimes be created within digital forensic analyses. My question: Are advocates able to effectively test such scenarios? Ashby developed the idea of the black box system and this applies in such situations. I developed a basic Evidence Inference Confidence Framework (EICF) to help mitigate/attenuate such risks. I also suggest we might consider the ‘system of human and machine as a joint witness of opinion’. These are fledgling ideas but seem to be increasingly relevant in a world of bulk personal data sets (take automatic number plate or biometric recognition). Varying error rates (caused by data variety) matter. The collective skill is one of inference, the science being to create systems which adapt to technology and apply reasoning in a fair and justifiable manner. Ultimately data are probabilistic and not deterministic. There is much more to say but the signs point to a need for more transdisciplinary research in the face of such complexity.

      Reply
  4. Interestingly, the principle of requisite variety can also be applied to legal training and interpretation, as it offers a fresh perspective on legal thinking. Although most studies focus on criminology, I believe we can broaden the scope to encompass general legal reasoning. In particular, areas of law that interact heavily with technology highlight the importance of data variety, where Ashby’s Law may play a critical role in managing complexity. Intellectual property law (my field) could greatly benefit from this understanding. However, it requires a little bit of work to reach such a beneficial outcome.

    Reply
    • Great points – we have seen events where several legal systems seem unable to absorb (quite frankly) basic data variety. The published codes I have seen (e.g. the UK) around the use of varying forms of AI in investigation/evidence do not appear to reflect good cybernetic reasoning. This observation infers that further orders of complexity (variety) have already been brought into jurisprudence. Your point about the need to generate further understanding in specialist areas of law raises the flag for conducting enticing interdisciplinary research. I hope this can be progressed, the environmental signals suggest it is needed. Thank you.

      Reply
      • I’m not making a specific point here, but I’m wondering whether the 9Vs framework is fully appropriate for analysing data variety in legal contexts. While it provides a useful foundation, law represents a different dimension where the core principles may need adaptation. To apply it effectively, we need to identify the similarities and differences between cybernetic understanding and legal reasoning and tailor the framework to suit legal practice. Legal systems could retain the main concepts but emphasise or modify specific elements to address the types of variety and complexity that arise in the affected areas.

        Today, the law often struggles to perform its primary functions in the face of increasing data variety. This lag can erode public confidence in both the law and the state, potentially undermining the social and institutional foundations we rely on. By adopting a cybernetic perspective, particularly guided by Ashby’s Law, legal systems could evolve to better manage this complexity. Such an approach would help restore balance, allowing the law to fulfil its fundamental goals while promoting societal balance, aligning with state principles, and strengthening citizens’ confidence in legal systems. Ultimately, this understanding could enable law to adapt to new realities, overcoming the challenges posed by rapidly expanding forms of data and complexity.

        Reply
        • Yes, I have presented a basic Data Requisite Variety Framework (DRVF) in the blog, there is more depth to other versions in documented research. The DRVF was developed to explore data cybernetics or as one colleague has just commented – ‘the variety of variety’. The intention was to explore impacts upon organisations as ‘systems embedded and relational to other systems’. the DRVF is not a tool for justice, but it provides an initial understanding. There is much work to be done, I mention in a different reply that the challenge is to draw safe inference from data – there is early work completed in this respect. I concur with your your observations, albeit I think of opportunities for legal systems to demonstrate adaptive qualities. Thank you.

          Reply

Leave a Reply to Richard BerryCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Integration and Implementation Insights

Subscribe now to keep reading and get access to the full archive.

Continue reading