By breaking an intractable problem into smaller chunks, a deep-learning technique identifies the optimal areas for thinning out traffic in a warehouse.
Hundreds of robots zip back and forth across the floor of a colossal robotic warehouse, grabbing items and delivering them to human workers for packing and shipping. Such warehouses are increasingly becoming part of the supply chain in many industries, from e-commerce to automotive production.
However, getting 800 robots to and from their destinations efficiently while keeping them from crashing into each other is no easy task. It is such a complex problem that even the best path-finding algorithms struggle to keep up with the breakneck pace of e-commerce or manufacturing.
In a sense, these robots are like cars trying to navigate a crowded city center. So, a group of MIT researchers who use AI to mitigate traffic congestion applied ideas from that domain to tackle this problem.
They built a deep-learning model that encodes important information about the warehouse, including the robots, planned paths, tasks, and obstacles, and uses it to predict the best areas of the warehouse to decongest to improve overall efficiency.
Their technique divides the warehouse robots into groups, so these smaller groups of robots can be decongested faster with traditional algorithms used to coordinate robots. In the end, their method decongests the robots nearly four times faster than a strong random search method.
In addition to streamlining warehouse operations, this deep learning approach could be used in other complex planning tasks, like computer chip design or pipe routing in large buildings.
“We devised a new neural network architecture that is actually suitable for real-time operations at the scale and complexity of these warehouses. It can encode hundreds of robots in terms of their trajectories, origins, destinations, and relationships with other robots, and it can do this in an efficient manner that reuses computation across groups of robots,” says Cathy Wu, the Gilbert W. Winslow Career Development Assistant Professor in Civil and Environmental Engineering (CEE), and a member of a member of the Laboratory for Information and Decision Systems (LIDS) and the Institute for Data, Systems, and Society (IDSS).
Wu, senior author of a paper on this technique, is joined by lead author Zhongxia Yan, a graduate student in electrical engineering and computer science. The work will be presented at the International Conference on Learning Representations.
Robotic Tetris
From a bird’s eye view, the floor of a robotic e-commerce warehouse looks a bit like a fast-paced game of “Tetris.”
When a customer order comes in, a robot travels to an area of the warehouse, grabs the shelf that holds the requested item, and delivers it to a human operator who picks and packs the item. Hundreds of robots do this simultaneously, and if two robots’ paths conflict as they cross the massive warehouse, they might crash.
Traditional search-based algorithms avoid potential crashes by keeping one robot on its course and replanning a trajectory for the other. But with so many robots and potential collisions, the problem quickly grows exponentially.
“Because the warehouse is operating online, the robots are replanned about every 100 milliseconds. That means that every second, a robot is replanned 10 times. So, these operations need to be very fast,” Wu says.
Because time is so critical during replanning, the MIT researchers use machine learning to focus the replanning on the most actionable areas of congestion — where there exists the most potential to reduce the total travel time of robots.
Wu and Yan built a neural network architecture that considers smaller groups of robots at the same time. For instance, in a warehouse with 800 robots, the network might cut the warehouse floor into smaller groups that contain 40 robots each.
Then, it predicts which group has the most potential to improve the overall solution if a search-based solver were used to coordinate trajectories of robots in that group.
An iterative process, the overall algorithm picks the most promising robot group with the neural network, decongests the group with the search-based solver, then picks the next most promising group with the neural network, and so on.
Considering relationships
The neural network can reason about groups of robots efficiently because it captures complicated relationships that exist between individual robots. For example, even though one robot may be far away from another initially, their paths could still cross during their trips.
The technique also streamlines computation by encoding constraints only once, rather than repeating the process for each subproblem. For instance, in a warehouse with 800 robots, decongesting a group of 40 robots requires holding the other 760 robots as constraints. Other approaches require reasoning about all 800 robots once per group in each iteration.
Instead, the researchers’ approach only requires reasoning about the 800 robots once across all groups in each iteration.
“The warehouse is one big setting, so a lot of these robot groups will have some shared aspects of the larger problem. We designed our architecture to make use of this common information,” she adds.
They tested their technique in several simulated environments, including some set up like warehouses, some with random obstacles, and even maze-like settings that emulate building interiors.
By identifying more effective groups to decongest, their learning-based approach decongests the warehouse up to four times faster than strong, non-learning-based approaches. Even when they factored in the additional computational overhead of running the neural network, their approach still solved the problem 3.5 times faster.
In the future, the researchers want to derive simple, rule-based insights from their neural model, since the decisions of the neural network can be opaque and difficult to interpret. Simpler, rule-based methods could also be easier to implement and maintain in actual robotic warehouse settings.
“This approach is based on a novel architecture where convolution and attention mechanisms interact effectively and efficiently. Impressively, this leads to being able to take into account the spatiotemporal component of the constructed paths without the need of problem-specific feature engineering. The results are outstanding: Not only is it possible to improve on state-of-the-art large neighborhood search methods in terms of quality of the solution and speed, but the model generalizes to unseen cases wonderfully,” says Andrea Lodi, the Andrew H. and Ann R. Tisch Professor at Cornell Tech, and who was not involved with this research.