Someone looking to book a vacation online today might have very different preferences than they did before the COVID-19 pandemic.
Instead of flying to an exotic beach, they might feel more comfortable driving locally. With limited options for dining out, having a full kitchen might be essential. Motel rooms or cabins might be more appealing than hotels with shared lobbies.
Countless companies use online recommendation engines to show customers products and experiences that match their interests. And yet, traditional machine learning models that predict what people might prefer are often based on data from past experience. That means they aren’t necessarily able to pick up on quickly changing consumer preferences unless they are retrained with new data.
Personalizer, which is part of Azure Cognitive Services within the Azure AI platform, uses a more cutting-edge approach to machine learning called reinforcement learning, in which AI agents can interact and learn from their environment in real time.
The technique used to be primarily used in research labs. But now, it’s making its way into more Microsoft products and services — from Azure Cognitive Services that developers can plug into apps and websites to autonomous systems that engineers can use to refine manufacturing processes. Azure Machine Learning is also previewing cloud-based reinforcement learning offerings for data scientists and machine learning professionals.
“We’ve come a long way in the last two years when we had a lot of proof of concept projects within Microsoft and deployments with a couple of customers,” said Rafah Hosn, senior director at Microsoft Research’s New York lab. “Now we are really progressing nicely into things that can be packaged and shrink wrapped and pointed to a particular set of problems.”
Z-Tech, the technology hub of Anheuser-Busch InBev, is using Personalizer to deliver tailored recommendations in an online marketplace to better serve small grocery stores across Mexico. Other Microsoft customers and partners are employing reinforcement learning to detect production anomalies and develop robots that can adjust to unpredictable real-world conditions — with models that can learn from environmental cues, expert feedback or customer behavior in real time.
Once Microsoft began using Personalizer on its homepage to contextually personalize the products displayed to each visitor, the company saw a 19-fold increase in engagement with the products that Personalizer chose. The company has also used Personalizer internally to select the right offers, products and content across Windows, Edge browser and Xbox. These scenarios are giving up to a 60% lift in engagement across billions of personalizations each month.
Teams has also used reinforcement learning to find the optimal jitter buffer for a video meeting, which trades off millisecond-scale information delays to provide better connection continuity, while Azure is exploring reinforcement learning-based optimization to help determine when to reboot or remediate virtual machines.
Because reinforcement learning models learn from instantaneous feedback, they can quickly adapt to changing or unpredictable circumstances. Once the COVID-19 pandemic hit, some companies had no idea what to expect as people’s purchasing and travel behaviors changed overnight, said Jeff Mendenhall, a Microsoft principal program manager for Personalizer.
“All of their historic modeling and expert knowledge went out the window,” Mendenhall said. “But with reinforcement learning, Personalizer can update the model every minute if needed to learn and respond to what actual user behaviors are right now.”
In reinforcement learning, an AI agent learns largely by trial and error. It tests out different actions in either a real or simulated world and gets a reward when the actions achieve a desired result — whether that’s a customer hitting the button to book a vacation reservation or a robot successfully unloading an unwieldy bag of coins.
Training an AI agent through reinforcement learning is similar to teaching a puppy to do a trick, Hosn said. It gets a treat when it makes decisions that yield a desired result and learns to repeat the actions that get the most treats. But in complicated real-world scenarios, exploring the vast universe of potential actions and finding an optimal sequence of decisions can be far more complicated.
At the 34th Conference on Neural Information Processing Systems (NeurIPS 2020) this week, Microsoft researchers presented 17 research papers that mark significant progress in addressing some of the field’s biggest challenges. By investing in reinforcement learning teams across its network of Microsoft Research labs, the company says it is developing a portfolio of approaches to tackle different problems and exploring multiple paths to potential breakthroughs.
Those teams have focused on developing a robust understanding of reinforcement learning’s foundational elements and creating practical solutions for customers — not just novelty demonstrations, researchers say.
They’ve spent a lot of time figuring out which scenarios reinforcement learning is well-suited to solve, as well as probing the technical underpinnings to understand why something works and how to repeat it, said John Langford, a partner research manager at Microsoft Research Lab – New York.
“Right now there’s a big gap between one-off applications where you can get PhDs to grind really hard and figure out a way to make it work as opposed to developing a routinely useful system that can be used over and over again,” Langford said.
“All of our reinforcement learning research at Microsoft really falls into two big buckets — how can we solve challenges that customers are bringing to us and what are the foundations we can use to build replicable, reliable solutions?” he said.
A different approach to machine learning
Reinforcement learning uses a fundamentally different approach than supervised learning, a more common machine learning technique in which models learn to make predictions from training examples they’ve been fed.
If a person is trying to learn French, exposing themselves to French text, grammar rules and vocabulary is closer to a supervised learning approach, said Raluca Georgescu, a research software engineer working on Project Paidia in the Microsoft Research Cambridge UK lab.
With a reinforcement learning approach, they would go to France and learn by talking to people. They’d be penalized with puzzled looks if they say the wrong thing and they’d get rewarded with a croissant if they order it correctly, she said.
A reinforcement learning agent learns from interacting with its environment, either in the real world or in a simulated environment that allows it to safely explore different options. It takes an action and waits to see if it results in a positive or negative outcome, based on a reward system that’s been established. Once that feedback is received, the model learns whether that decision was good or bad and updates itself accordingly.
It’s a really simple form of learning that’s endemic in the natural world, said Langford.
“Even worms can do reinforcement learning — they can learn to go towards things and avoid things based on some feedback,” Langford said. “That ability to learn at a very basic level from your environment is something that is super natural for us but in machine learning it’s a bit more tricky and delicate and requires more thought than supervised learning.”
The new papers presented at NeurIPS this week offer significant contributions in three key research areas: batch reinforcement learning, strategic exploration given rich observations and representation learning. Taken together, researchers say, these breakthroughs aim to boost the efficiency of models and expand the scope of problems that reinforcement learning can solve.