Measuring trust: Why every AI model needs a FICO score

August 22, 2023

glass cubes — Richard Drury/Getty Images

Generative AI will soon go mainstream, say 9 out of 10 IT leaders, according to the 2023 State of IT research by Salesforce.

McKinsey reports that 50% of organizations used AI in 2022. IDC is forecasting that global AI spending will increase by a staggering 26.9% in 2023 alone. A recent survey of customer service professionals found adoption of AI had risen by 88% between 2020 and 2022. Research shows 86% of IT leaders believe generative AI will have a prominent role in their organization in the near future.

Also: Two-thirds of professionals think AI will increase their skills’ value

And yet, 64% of IT leaders are concerned about the ethics of generative AI, and 62% are concerned about its impacts on their careers. In a recent survey of IT leaders, concerns around generative AI included security risks (79%), bias (73%), and carbon footprint (71%). Another report found that 23% of customers do not trust AI and 56% are neutral. This trust deficit can swing in either direction based on how companies use and deliver AI-powered services.

McKinsey’s latest research estimates that generative AI could add the equivalent of $2.6 trillion to $4.4 trillion annually across the 63 use cases analyzed by McKinsey — by comparison, the UK’s entire GDP in 2021 was $3.1 trillion. This would increase the impact of all artificial intelligence by 15 to 40%. The biggest impediment to the growth and adoption of AI is trust. Employee skills gap and trust are the biggest impediments to business adoption of generative AI.

Also: Generative AI and the fourth why: Building trust with your customer

To gain a better perspective on how AI solution providers can gain trust from all stakeholders — employees, customers, partners, and communities that they serve — I had a fascinating discussion with Richie Etwaru, an expert in data privacy, advanced analytics, AI, and digital transformation. Etwaru, the co-founder and chief creative officer of Mobeus, has numerous patents, books, TEDx talks, and industry-first innovations credited to his name.

How can we build trust with all foundational models used for generative AI and beyond? Here is what Etwaru proposed.

Arthur C. Clarke’s statement, “Any sufficiently advanced technology is indistinguishable from magic,” penned in his 1962 book “Profiles of the Future: An Inquiry into the Limits of the Possible,” has stood the test of time.

Fast forward 60 years to November 20, 2022, OpenAI unveiled ChatGPT, a technological marvel so advanced that it seemed to blur the lines between magic and sorcery. This wasn’t just magic; it was unsettling. Humanity’s relationship with magic is one of balance. We delight in magic when it’s captivating and awe-inspiring, but when it transcends our comprehension, crossing a boundary of understanding, it becomes intimidating. The moment magic seems too potent, it pulls us away from our comfort zones and into a perplexing dimension. The unease stems from confronting phenomena that evade logical or scientific explanation.

Also: The best AI chatbots: ChatGPT and other noteworthy alternatives

In the technological landscape, both ChatGPT and its contemporaries — large language models crafted for interaction and the likes of DALL-E with the abilities to generate text-to-media — all offer something unexpected. They present advancements that Clarke might not have imagined and achieve a level of sophistication that goes beyond mere “magic,” evoking instinctual concern because they introduce concepts and capabilities we struggle to grasp.

The human psyche is delicate. When faced with inexplicable phenomena, our innate response, shaped by evolution, often leans toward fear. It’s why an unfamiliar, tiny creature can be more intimidating than a massive, well-known beast. ChatGPT and its peers have breached this “beyond mere magic” barrier, and their awe-inspiring capabilities are indeed causing a stir.

AI does not scare us because of its capabilities, but rather because we have little understanding of how it works and achieves what it does. Our lack of comprehension makes us imagine all the additional things AI could “potentially do.”

Also: How to write better ChatGPT prompts for the best generative AI results

In the article “Just Calm Down About GPT-4 Already,” Rodney Brooks argues we should “stop confusing performance with competence.” Brooks explains that if an AI model can do X, we should not presume it can also do Y just because humans who can do X can usually do Y. Our fear stems from envisioning limitless AI abilities when in truth we are overestimating the general competence of systems that show surprising effectiveness in narrow applications.

Demystifying the inner workings of AI could significantly reduce the fear surrounding it. If we can move AI from an opaque “black box” to a transparent “glass cube”, we can recalibrate how we as a species adopt the technology.

In his talk “Beyond ChatGPT: Inside the Minds of Generative AI,” Dr. Michael Wu explains how a simple prompt like “What is the color of the sky?” actually functions. Wu does a great job demystifying generative models, demonstrating that their startling responses are produced using “just math” and statistics — not conscious intelligence. Revealing the mathematical foundations behind AI responses confirms for audiences that the systems do not possess awareness like humans.

Also: This AI-generated crypto invoice scam almost got me, and I’m a security pro

Though we are starting to understand AI more today, the path ahead remains long. In June, AWS CEO Adam Selipsky told CNBC that we’re only “three steps in, and it’s a 10K race.” As AI progresses, models will advance beyond their current capabilities. Enhanced data mastery, improved model management, greater ecosystem integration, human upskilling, and ongoing mathematical/statistical innovation can significantly, if not exponentially, improve AI over time.

While we regulated away our fears of past technologies like electricity, flight, automobiles, and the internet, it’s unlikely that we can fully regulate away our fears of AI. This is because AI has compounding exponentiality, while everything that has come before it is linear at best. At its core, our anxiety stems from how AI may impact humanity’s ability to continue thriving as a species. In an extreme scenario, we catastrophize about AI leading to human extinction. But the outcome likely will be less binary than total victory or defeat. Instead of framing the future as humans winning or losing our existence, we must find ways to durably and sustainably coexist with artificial intelligence.

Also: The 5 biggest risks of generative AI, according to an expert

With coexistence as the guiding principle, we need a mechanism to evaluate how aligned an AI model is with that goal. When presented with an AI system, we should be able to judge immediately whether it’s “good AI” that supports human-AI coexistence and serves human needs, or disregards coexistence and cannot be trusted. We require some kind of easy-to-understand scoring system that signals an AI model’s trustworthiness and service to humanity.

Without such a mechanism, we may grow increasingly suspicious of all AI, which could breed mistrust toward any corporation employing it. A coherent framework for assessing AI’s alignment with cooperative human-AI coexistence is key to building public confidence in and extracting value from the technology.

The European Union’s AI Act has taken initial steps toward an AI scoring system by requiring a CE marking and unique model number for each AI model, tracing back to conformity assessment data. However, the information underlying this CE marking only reveals how a model was trained and created. It falls short of signaling whether the model can be trusted. A model may comply with applicable regulations but still fail to gain public trust — impacting perceptions of consumers, corporations, or countries that use the model for products and services. Simply meeting requirements does not equate to coexistence alignment. We need an AI scoring framework that goes beyond technical metrics to explicitly evaluate human benefit, transparency, and coexistence potential.

Also: Ethics of AI: Benefits and risks of artificial intelligence

Companies such as Google and OpenAI have started to use “model cards” to collate and present information about the design, data, training, performance, and limitations of their models. For example, Google’s MediaPipe BlazeFace AI model has a model card that has the sections, data, and formatting prescribed in a paper written mostly by Google employees. OpenAI on the other hand, has a “system card” for GTP-4 that has the sections, data, and formatting prescribed in a paper from Stanford University.

While both model/system cards are a step in the right direction, the fact that they each follow an independent format and information architecture is only one of the many challenges. The primary challenge is that most consumers will not have the time, patience, or aptitude to read and understand these model/system cards, and hence while the cards are available for public consumption, they are somewhat useless to consumers. Too long to read, and too hard to understand.

Also: 40% of workers will have to reskill in the next three years due to AI, says IBM study

As an exercise, let’s define a simple and easy-to-understand score that signals an AI model’s alignment with serving human needs and fostering human-AI coexistence. To envision how a score would indicate a model’s trustworthiness for coexistence, let’s imagine a “Human & AI Coexistence score” (aka, HAICO) score. How could this work? What data on each AI model would need collection, how often, and what formula would be used to calculate the HAICO score? The framework would need to distill complex information into an easily understandable score that signals a model’s coexistence alignment to the general public.

While complex, bringing such a scoring framework to life is not impossible. Imagine our illustrative HAICO score comprising 50 attributes of an AI model grouped into the five layers of Maslow’s Hierarchy of Needs (Figure 1).

Figure 1: An overview of our illustrative HAICO score

Richie Etwaru

Each of the 50 attributes would measure something that is aligned to human-AI coexistence. The data from the attributes would be collected from the entire model pipeline ranging from the transparency of the routines baked into the silicon used in the chips, through the consent and ownership of the training data, model design, inference performance, retraining, and re-distribution.

Examples of model attributes would include items such as whether the model is robust, available, fair, respecting human autonomy, consensus-oriented, continuously learning, and adding value to human lives. Each attribute would be scored from 0-5, then a formula would combine them into an overall HAICO score of 0-100 for each model (Figure 2).

Also: Today’s AI boom will amplify social problems if we don’t act now, says AI ethicist

The final three-level HAICO scoring system:

Non-Coexistent (0-59 points): Not trusted to serve human needs.
Coexistent (60-79 points): Trusted to serve human needs.
Very Coexistent (80+ points): Highly trusted to serve human needs.

Figure 2: An illustrative AI model with scores of 0-5 for each of the 50 attributes, summed at each layer.

Richie Etwaru

This illustrates how multilayered technical details could map to a simple three-level model of coexistence and trustworthiness score. The HAICO illustrative framework provides a starting point. Transforming it into an effective public-facing framework would require inclusive development and continuous refinement. But it demonstrates the feasibility of a nuanced, human-AI coexistence scoring mechanism.

Our illustrative HAICO scoring mechanism is not close to completion; there is much work left to do. For example, the weighting of each layer and the range that would categorize an AI model as being non-coexistent could change for different audiences. The formula for calculating the HAICO score could change for AI models that are classified for PG-13 audiences versus models that are classified for Rated R audiences. This illustration demonstrates that we can establish a scoring system for AI models that introduces an easy and reliable way to determine whether the models can be trusted to align with human-AI coexistence or not (Figure 3).

Also: Ahead of AI, this other technology wave is sweeping in fast

We must move past the “who will win” debate and move to a path of coexistence. AI is here to stay, and so are we. The work ahead must be done as a collaborative community. If not done, the absence of a framework such as our illustrative HAICO score will increasingly bring into question the trustworthiness of the consumers, corporations, or countries that use AI models to build products or deliver services. As a society we would run the risk of exponentially increasing distrust for AI and those that use it, eventually crossing a threshold where we could cheat ourselves from the opportunity to harness the power of the technology to improve the human condition for the better.

Figure 3: The use of a formula to calculate the final HAICO score of an AI model, classifying it as COEXISTENT with a score of 76

Here’s the good news: In addition to participants in the maturing AI ecosystem — such as hardware providers (NVIDIA, Intel, Apple, AMD, SambaNova), Clouds (AWS, Google, Azure, Oracle, Alibaba, Salesforce), models, markets (Cohere, Hugging Face), apps (OpenAI, Antrophic, Stability.ai), and strategy and services firms (Deloitte, Accenture, IBM, Cognizant and others) — there is a maturing cohort of “model measurement” tools emerging.

For example, TensorFlow Data Validation can help to understand the characteristics of the dataset, detect anomalies, and compare the differences between datasets used to train a model. CleverHans or Adversarial Robustness Toolbox (ART) can be used to simulate adversarial attacks on the model when calculating robustness. Tools like Google’s Fairness Indicators, AI Fairness 360 from IBM, or Fairlearn can be used to measure, visualize, and mitigate biases in machine learning models. Tools like Google’s TFX, Seldon, or Fiddler can be used to monitor the performance of the model over time, alerting you when there is a significant drift or degradation.

Also: Mass adoption of generative AI tools is derailing one very important factor, says MIT

The pieces are beginning to come together. The North Star is coexistence. We are now at the moment where we can collaboratively establish a trust score for every AI model signaling the model’s alignment to human-AI coexistence, an easy-to-understand score similar to the FICO score we use to signal a human’s financial trustworthiness. The illustrative HAICO score shared in this article is an appetizer to start the conversation. There is no better time than now.

This article was co-authored by Richie Etwaru, co-founder of Mobeus. Etwaru is a multi-disciplinary executive, serial entrepreneur, and global thought leader. In collaboration with C-suites and boards, he has designed and led global transformations in financial services and healthcare. Etwaru invented software-enabled spatial computing and pioneered the 31st Human Right. He has authored three books, delivered three TED talks, and spoken at more than 100 conferences.

Artificial Intelligence

Source Link

Measuring trust: Why every AI model needs a FICO score

Artificial Intelligence

RELATED ARTICLES

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

TECH NEWS

TOP NEWS

TECH NEWS & UPDATES