With reinforcement learning, Microsoft brings a new class of AI solutions to customers

December 8, 2020

Someone looking to book a vacation online today might have very different preferences than they did before the COVID-19 pandemic.

Instead of flying to an exotic beach, they might feel more comfortable driving locally. With limited options for dining out, having a full kitchen might be essential. Motel rooms or cabins might be more appealing than hotels with shared lobbies.

Countless companies use online recommendation engines to show customers products and experiences that match their interests. And yet, traditional machine learning models that predict what people might prefer are often based on data from past experience. That means they aren’t necessarily able to pick up on quickly changing consumer preferences unless they are retrained with new data.

Personalizer, which is part of Azure Cognitive Services within the Azure AI platform, uses a more cutting-edge approach to machine learning called reinforcement learning, in which AI agents can interact and learn from their environment in real time.

The technique used to be primarily used in research labs. But now, it’s making its way into more Microsoft products and services — from Azure Cognitive Services that developers can plug into apps and websites to autonomous systems that engineers can use to refine manufacturing processes. Azure Machine Learning is also previewing cloud-based reinforcement learning offerings for data scientists and machine learning professionals.

“We’ve come a long way in the last two years when we had a lot of proof of concept projects within Microsoft and deployments with a couple of customers,” said Rafah Hosn, senior director at Microsoft Research’s New York lab. “Now we are really progressing nicely into things that can be packaged and shrink wrapped and pointed to a particular set of problems.”

Rafah Hosn standing outside — Rafah Hosn, senior director at Microsoft Research Lab – New York City. Photo courtesy of Microsoft.

Z-Tech, the technology hub of Anheuser-Busch InBev, is using Personalizer to deliver tailored recommendations in an online marketplace to better serve small grocery stores across Mexico. Other Microsoft customers and partners are employing reinforcement learning to detect production anomalies and develop robots that can adjust to unpredictable real-world conditions — with models that can learn from environmental cues, expert feedback or customer behavior in real time.

Once Microsoft began using Personalizer on its homepage to contextually personalize the products displayed to each visitor, the company saw a 19-fold increase in engagement with the products that Personalizer chose. The company has also used Personalizer internally to select the right offers, products and content across Windows, Edge browser and Xbox. These scenarios are giving up to a 60% lift in engagement across billions of personalizations each month.

Teams has also used reinforcement learning to find the optimal jitter buffer for a video meeting, which trades off millisecond-scale information delays to provide better connection continuity, while Azure is exploring reinforcement learning-based optimization to help determine when to reboot or remediate virtual machines.

Because reinforcement learning models learn from instantaneous feedback, they can quickly adapt to changing or unpredictable circumstances. Once the COVID-19 pandemic hit, some companies had no idea what to expect as people’s purchasing and travel behaviors changed overnight, said Jeff Mendenhall, a Microsoft principal program manager for Personalizer.

“All of their historic modeling and expert knowledge went out the window,” Mendenhall said. “But with reinforcement learning, Personalizer can update the model every minute if needed to learn and respond to what actual user behaviors are right now.”

In reinforcement learning, an AI agent learns largely by trial and error. It tests out different actions in either a real or simulated world and gets a reward when the actions achieve a desired result — whether that’s a customer hitting the button to book a vacation reservation or a robot successfully unloading an unwieldy bag of coins.

Training an AI agent through reinforcement learning is similar to teaching a puppy to do a trick, Hosn said. It gets a treat when it makes decisions that yield a desired result and learns to repeat the actions that get the most treats. But in complicated real-world scenarios, exploring the vast universe of potential actions and finding an optimal sequence of decisions can be far more complicated.

At the 34th Conference on Neural Information Processing Systems (NeurIPS 2020) this week, Microsoft researchers presented 17 research papers that mark significant progress in addressing some of the field’s biggest challenges. By investing in reinforcement learning teams across its network of Microsoft Research labs, the company says it is developing a portfolio of approaches to tackle different problems and exploring multiple paths to potential breakthroughs.

John Langford sits in an office — John Langford, partner research manager at Microsoft Research Lab – New York City. Photo by John Brecher.

Those teams have focused on developing a robust understanding of reinforcement learning’s foundational elements and creating practical solutions for customers — not just novelty demonstrations, researchers say.

They’ve spent a lot of time figuring out which scenarios reinforcement learning is well-suited to solve, as well as probing the technical underpinnings to understand why something works and how to repeat it, said John Langford, a partner research manager at Microsoft Research Lab – New York.

“Right now there’s a big gap between one-off applications where you can get PhDs to grind really hard and figure out a way to make it work as opposed to developing a routinely useful system that can be used over and over again,” Langford said.

“All of our reinforcement learning research at Microsoft really falls into two big buckets — how can we solve challenges that customers are bringing to us and what are the foundations we can use to build replicable, reliable solutions?” he said.

A different approach to machine learning

Reinforcement learning uses a fundamentally different approach than supervised learning, a more common machine learning technique in which models learn to make predictions from training examples they’ve been fed.

If a person is trying to learn French, exposing themselves to French text, grammar rules and vocabulary is closer to a supervised learning approach, said Raluca Georgescu, a research software engineer working on Project Paidia in the Microsoft Research Cambridge UK lab.

With a reinforcement learning approach, they would go to France and learn by talking to people. They’d be penalized with puzzled looks if they say the wrong thing and they’d get rewarded with a croissant if they order it correctly, she said.

A reinforcement learning agent learns from interacting with its environment, either in the real world or in a simulated environment that allows it to safely explore different options. It takes an action and waits to see if it results in a positive or negative outcome, based on a reward system that’s been established. Once that feedback is received, the model learns whether that decision was good or bad and updates itself accordingly.

It’s a really simple form of learning that’s endemic in the natural world, said Langford.

“Even worms can do reinforcement learning — they can learn to go towards things and avoid things based on some feedback,” Langford said. “That ability to learn at a very basic level from your environment is something that is super natural for us but in machine learning it’s a bit more tricky and delicate and requires more thought than supervised learning.”

The new papers presented at NeurIPS this week offer significant contributions in three key research areas: batch reinforcement learning, strategic exploration given rich observations and representation learning. Taken together, researchers say, these breakthroughs aim to boost the efficiency of models and expand the scope of problems that reinforcement learning can solve.

With reinforcement learning, Microsoft brings a new class of AI solutions to customers

A different approach to machine learning

LEAVE A REPLY Cancel reply

TECH NEWS

Everything Old is New Again: AI-Driven Development and Open Source

Gen AI in Healthcare: The State of Affairs in India

Gartner Predicts Legal, Risk and Compliance Functions to Double Technology Spend...

Microsoft to End Support for Windows Mail, Calendar and People Apps...

IDC Predicts: Asia/Pacific Business Leaders to Demand 80% Success Rate on...

The Cooling Conundrum: AI and Automation Push Data Centers Toward 3X...

TOP STORIES

Seventy Percent of Economies Are Underprepared for AI Disruption

New study shows almost half of tech professionals in India believe...

Organizations Combining Organizational Learning and AI-Specific Learning Are up to 80%...

Nvidia’s AI-driven triumph over Intel powered by strategic innovations

Most banks and insurers adopt cloud solutions with the primary objective...

India’s Web3 Ecosystem Has Over 400 Firms, Karnataka Emerges as Industry...

Cyber Security

AI and Gen AI are set to transform cybersecurity for most...

ThreatQuotient Publishes 2024 Evolution of Cybersecurity Automation Adoption Research Report

Kaspersky predicts quantum-proof ransomware and advancements in mobile financial cyberthreats in...

Rising concerns, lingering gaps: most organizations fear AI-driven cyberattacks but lack...

Tenable Forecasts Data Security in the Cloud to Take Centre Stage...

Blockchain-Enhanced Cybersecurity-Safeguarding Digital Identities and Data