Microsoft’s latest breakthrough, now in Azure AI, describes images as well as people do

October 15, 2020

Novel object captioning

Image captioning is a core challenge in the discipline of computer vision, one that requires an AI system to understand and describe the salient content, or action, in an image, explained Lijuan Wang, a principal research manager in Microsoft’s research lab in Redmond.

“You really need to understand what is going on, you need to know the relationship between objects and actions and you need to summarize and describe it in a natural language sentence,” she said.

Wang led the research team that achieved – and beat – human parity on the novel object captioning at scale, or nocaps, benchmark. The benchmark evaluates AI systems on how well they generate captions for objects in images that are not in the dataset used to train them.

Image captioning systems are typically trained with datasets that contain images paired with sentences that describe the images, essentially a dataset of captioned images.

“The nocaps challenge is really how are you able to describe those novel objects that you haven’t seen in your training data?” Wang said.

To meet the challenge, the Microsoft team pre-trained a large AI model with a rich dataset of images paired with word tags, with each tag mapped to a specific object in an image.

Datasets of images with word tags instead of full captions are more efficient to create, which allowed Wang’s team to feed lots of data into their model. The approach imbued the model with what the team calls a visual vocabulary.

The visual vocabulary pre-training approach, Huang explained, is similar to prepping children to read by first using a picture book that associates individual words with images, such as a picture of an apple with the word “apple” beneath it and a picture of a cat with the word “cat” beneath it.

“This visual vocabulary pre-training essentially is the education needed to train the system; we are trying to educate this motor memory,” Huang said.

The pre-trained model is then fine-tuned for captioning on the dataset of captioned images. In this stage of training, the model learns how to compose a sentence. When presented with an image containing novel objects, the AI system leverages the visual vocabulary to generate an accurate caption.

“It combines what is learned in both the pre-training and the fine-tuning to handle novel objects in the testing,” Wang said.

When evaluated on nocaps, the AI system created captions that were more descriptive and accurate than the captions for the same images that were written by people, according to results presented in a research paper.

Tags
Recent News

Microsoft’s latest breakthrough, now in Azure AI, describes images as well as people do

Novel object captioning

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

Large-Scale Credential Theft Escalates, Threat Actors Pivot to Stealthier Tactics

Cyber Criminals Impersonating Employee Self-Service Websites to Steal Victim Information and Funds

Rubrik Reveals 90% of Global IT and Security Executives Report Cyberattacks in the Past Year

Survey Findings Illustrate that Cybersecurity Has Morphed into Critical Driver for Business Growth

Cover-Up Culture? 95% of Phishing Attacks Go Unreported in Healthcare, New Paubox Report Reveals

Cisco’s 2025 Data Privacy Benchmark Study: Privacy landscape grows increasingly complex in the age of AI

TECH NEWS

“Periodic table of machine learning” could fuel AI discovery

Cellular Vehicle-to-Everything (C-V2X): Revolutionizing Connected Mobility

The Role of Innovation in Essential Services

Making AI-generated code more accurate in any language

How Software Is Reshaping On-Site Services

Training LLMs to self-detoxify their language

TOP NEWS

Large-Scale Credential Theft Escalates, Threat Actors Pivot to Stealthier Tactics

Gartner Says Supply Chain Leaders Should Implement a Cost-to-Serve Model to Better Assess Customer and Product Profitability

Lack of Trust Negatively Impacts Employee Engagement, Effort and Organizational Performance

Colliers: About 80-85% of office leasing is expected in green-certified buildings

From Grammy-Award Winning Artist to Unicorn Founders: Meet the Young Global Leaders Class of 2025

US tariffs drive enterprise IT shift toward cloud and AI-driven digital transformation, finds GlobalData

TECH NEWS & UPDATES

The 11th Gen Apple iPad just reached its lowest price yet

I switched to an e-paper Android phone with a physical keyboard – here’s my...

The next Apple Watch likely won’t include this popular health feature

Your Google TV is getting a free feature upgrade – smart home users will...

This is (hopefully) the future of power banks

Microsoft’s latest breakthrough, now in Azure AI, describes images as well as people do

Novel object captioning

RELATED ARTICLES

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

TECH NEWS

TOP NEWS

TECH NEWS & UPDATES