Massaging AI language models for fun, profit and ethics

April 4, 2022

Do statistics amount to understanding? And does AI have a moral compass? On the face of it, both questions seem equally whimsical, with equally obvious answers. As the AI hype reverberates; however, those types of questions seem bound to be asked time and time again. State of the art research helps probe.

AI Language models and human curation

Decades ago, AI researchers largely abandoned their quest to build computers that mimic our wondrously flexible human intelligence and instead created algorithms that were useful (i.e. profitable). Some AI enthusiasts market their creations as genuinely intelligent despite this understandable detour, writes Gary N. Smith on Mind Matters.

Smith is the Fletcher Jones Professor of Economics at Pomona College. His research on financial markets, statistical reasoning, and artificial intelligence, often involves stock market anomalies, statistical fallacies, and the misuse of data have been widely cited. He is also an award-winning author of a number of books on AI.

In his article, Smith sets out to explore the degree to which Large Language Models (LLMs) may be approximating real intelligence. The idea for LLMs is simple: using massive datasets of human-produced knowledge to train machine learning algorithms, with the goal of producing models that simulate how humans use language.

There are a few prominent LLMs, such as Google’s BERT, which was one of the first widely available and highly performing LLMs. Although BERT was introduced in 2018, it’s already iconic. The publication which introduced BERT is nearing 40K citations in 2022, and BERT has driven a number of downstream applications as well as follow-up research and development.

BERT is already way behind its successors in terms of an aspect that is deemed central for LLMs: the number of parameters. This represents the complexity each LLM embodies, and the thinking currently among AI experts seems to be that the larger the model, i.e. the more parameters, the better it will perform.

Google’s latest Switch Transformer LLM scales up to 1.6 trillion parameters and improves training time up to 7x compared to its previous T5-XXL model of 11 billion parameters, with comparable accuracy.

OpenAI, makers of the GPT-2 and GPT-3 LLMs, which are being used as the basis for commercial applications such as copywriting via APIs and collaboration with Microsoft, have researched LLMs extensively. Findings show that the three key factors involved in the model scale are the number of model parameters (N), the size of the dataset (D), and the amount of compute power (C).

There are benchmarks specifically designed to test LLM performance in natural language understanding, such as GLUE, SuperGLUE, SQuAD, and CNN/Daily Mail. Google has published research in which T5-XXL is shown to match or outperform humans in those benchmarks. We are not aware of similar results for the Switch Transformer LLM.

However, we may reasonably hypothesize that Switch Transformer is powering LaMDA, Google’s “breakthrough conversation technology”, aka chatbot, which is not available to the public at this point. Blaise Aguera y Arcas, the head of Google’s AI group in Seattle, argued that “statistics do amount to understanding”, citing a few exchanges with LaMDA as evidence.

Innovation

This was the starting point for Smith to embark on an exploration of whether that statement holds water. It’s not the first time Smith has done this. In the line of thinking of Gary Marcus and other deep learning critics, Smith claims that LLMs may appear to generate sensible-looking results under certain conditions but break when presented with input humans would easily comprehend.

This, Smith claims, is due to the fact that LLMs don’t really understand the questions or know what they’re talking about. In January 2022, Smith reported using GPT-3 to illustrate the fact that statistics do not amount to understanding. In March 2022, Smith tried to run his experiment again, triggered by the fact that OpenAI admits to employing 40 contractors to cater to GPT-3’s answers manually.

In January, Smith tried a number of questions, each of which produced a number of “confusing and contradictory” answers. In March, GPT-3 answered each of those questions coherently and sensibly, with the same answer given each time. However, when Smith tried new questions and variations on those, it became evident to him that OpenAI’s contractors were working behind the scenes to fix glitches as they appeared.

This prompted Smith to liken GPT-3 to Mechanical Turk, the chess-playing automaton built in the 18th century, in which a chess master had been cleverly hidden inside the cabinet. Although some LLM proponents are of the opinion that, at some point, the sheer size of LLMs may give rise to true intelligence, Smith digresses.

GPT-3 is very much like a performance by a good magician, Smith writes. We can suspend disbelief and think that it is real magic. Or, we can enjoy the show even though we know it is just an illusion.

Do AI language models have a moral compass?

Lack of common-sense understanding and the resulting confusing and contradictory outcomes constitute a well-known shortcoming of LLMs — but there’s more. LLMs raise an entire array of ethical questions, the most prominent of which revolve around the environmental impact of training and using them, as well as the bias and toxicity such models demonstrate.

Perhaps the most high-profile incident in this ongoing public conversation thus far was the termination/resignation of Google Ethical AI Team leads Timnit Gebru and Margaret Mitchell. Gebru and Mitchell faced scrutiny at Google when attempting to publish research documenting those issues and raised questions in 2020.

Notwithstanding the ethical implications, however, there are practical ones as well. LLMs created for commercial purposes are expected to be in line with the norms and moral standards of the audience they serve in order to be successful. Producing marketing copy that is considered unacceptable due to its language, for example, limits the applicability of LLMs.

special feature

Managing AI and ML in the Enterprise

The AI and ML deployments are well underway, but for CXOs the biggest issue will be managing these initiatives, and figuring out where the data science team fits in and what algorithms to buy versus build.

MLOps: Keeping track of machine learning process and biases

If that situation sounds familiar, it’s because we’ve seen it all before: should search engines filter out results, or social media platforms censor certain content / deplatform certain people? If yes, then what are the criteria, and who gets to decide?

The question of whether LLMs should be massaged to produce certain outcomes seems like a direct descendant of those questions. Where people stand on such questions reflects their moral values, and the answers are not clear-cut. However, what emerges from both examples is that for all their progress, LLMs still have a long way to go in terms of real-life applications.

Whether LLMs are massaged for correctness by their creators or for fun, profit, ethics, or whatever other reason by 3rd parties, a record of those customizations should be kept. That falls under the discipline called MLOps: similar to how in software development, DevOps refers to the process of developing and releasing software systematically, MLOps is the equivalent for machine learning models.

Similar to how DevOps enables not just efficiency but also transparency and control over the software creation process, so does MLOps. The difference is that machine learning models have more moving parts, so MLOps is more complex. But it’s important to have a lineage of machine learning models, not just to be able to fix them when things go wrong but also to understand their biases.

In software development, open source libraries are used as building blocks that people can use as-is or customize to their needs. We have a similar notion in machine learning, as some machine learning models are open source. While it’s not really possible to change machine learning models directly in the same way people change code in open source software, post-hoc changes of the type we’ve seen here are possible.

We have now reached a point where we have so-called foundation models for NLP: humongous models like GPT-3, trained on tons of data, that people can use to fine-tune for specific applications or domains. Some of them are open source too. BERT, for example, has given birth to a number of variations.

In that backdrop, scenarios in which LLMs are fine-tuned according to the moral values of specific communities they are meant to serve are not inconceivable. Both common sense and AI Ethics dictate that people interacting with LLMs should be aware of the choices their creators have made. While not everyone will be willing or able to dive into the full audit trail, summaries or license variations could help towards that end.

Artificial Intelligence

Source Link

Massaging AI language models for fun, profit and ethics

AI Language models and human curation

Innovation

Do AI language models have a moral compass?

special feature

MLOps: Keeping track of machine learning process and biases

Artificial Intelligence

RELATED ARTICLES

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

TECH NEWS

TOP NEWS