Computer vision is playing an increasingly pivotal role across industry sectors, from tracking progress on construction sites to deploying smart barcode scanning in warehouses. But training the underlying AI model to accurately identify images can be a slow, resource-intensive endeavor that isn’t guaranteed to produce results. Fledgling German startup Hasty wants to help with the promise of “next-gen” tools that expedite the entire model training process for annotating images.
Hasty, which was founded out of Berlin in 2019, today announced it has raised $3.7 million in a seed round led by Shasta Ventures. The Silicon Valley VC firm has a number of notable exits to its name, including Nest (acquired by Google), Eero (acquired by Amazon), and Zuora (IPO). Other participants in the round include iRobot Ventures and Coparion.
The global computer vision market was pegged at $11.4 billion in 2020, a figure that’s projected to rise to more than $19 billion by 2027. Data preparation and processing is one of the most time-consuming tasks in AI, accounting for around 80% of time spent on related projects. In computer vision, annotation, or labeling, is a technique used to mark and categorize images to give machines the meaning and context behind the picture, enabling them to spot similar objects. Much of this annotation work falls to trusty old humans.
The problem Hasty is looking to fix is that the vast majority of data science projects never make it into production, with significant resources wasted in the process.
“Current approaches to data labeling are too slow,” Hasty cofounder and CEO Tristan Rouillard told VentureBeat. “Machine learning engineers often have to wait three to six months for first results to see if their annotation strategy and approach is working because of the delay between labeling and model training.”
Make haste
Hasty ships with 10 built-in automated AI assistants, each dedicated to reducing human spadework. Dextr, for instance, allows users to click just four extreme points on an object to highlight it and suggest annotations.
And Hasty’s AI “instance segmentation” assistant creates swifter annotations when it finds multiple instances of an object within an image.
The assistant observes while users annotate and can make suggestions for labels once it reaches a specific confidence score. However, the user can correct these suggestions to improve the model while receiving feedback on how effective their annotation strategy is.
“This gives the neural network a learning curve — it learns on the project as you label,” Rouillard said.
There are already countless tools designed to simplify this process, including Amazon’s SageMaker, Google-backed Labelbox, V7, and Dataloop, which announced a fresh $11 million round of funding just last month.
But Hasty claims it can make the entire model-training and annotation process significantly faster with its combination of automation, model-training, and annotation.
As with similar platforms, Hasty uses an interface in which humans and machines collaborate. Hasty can make suggested annotations after having being exposed to just a few human-annotated images, with the user (e.g. the machine learning engineer) accepting, rejecting, or editing that suggestion. This real-time feedback is used to improve models, expediting training the more a model is used in what is often referred to as “the data flywheel.”
“Everyone is looking to build a self-improving data flywheel. The problem with (computer) vision AI is getting that flywheel to turn at all in the first place, [as] it’s super expensive and only works 50% of the time — this is where we come in,” Rouillard said.
Rapid feedback
In effect, Hasty’s neural networks learn while the engineers are building out their data sets, so that the “build,” “deploy,” and “evaluate” facets of the process all happen more or less concurrently, rather than in sequence. Indeed, a typical linear approach may take months to arrive at a first testable AI model, only to discover that it is deeply flawed due to errors in the data or “blind assumptions” made at the project’s inception. Hasty brings agility to the mix.
That in itself isn’t entirely novel, but digging into the weeds, Rouillard said that his company views automated labeling in a similar light to autonomous driving, insofar as different technologies operate at different “levels” — in the self-driving vehicle sphere, some cars can brake or change lanes, while others are pretty much capable of near full autonomy. Translated to annotation, Rouillard said that Hasty goes further than many of its rivals regarding automation, in terms of minimizing the number of clicks required to label an image or entire batches of images.
“Everyone preaches automation, but it is not obvious what is being automated,” Rouillard explained. “Almost all tools have good implementations of level 1 automation, but only a few of us take the trouble of providing level 2 and 3 in a way that produces meaningful results.”
Data is essentially the fuel for machine learning, and so getting more (accurate) data into an AI model at scale is key.
In addition to a manual error finding tool, Hasty also offers an AI-powered error finder, which automatically identifies where the likely issues are in a project’s training data. It’s a quality control feature designed to find common mistakes in annotation, circumventing the need to search through data for the errors.
“This allows you to spend your time fixing errors instead of looking for them, and helps you to build confidence in your data quickly while you annotate,” Rouillard said.
Hasty claims around 4,000 users, constituting a fairly even mix of corporations, universities, startups, and app developers, spanning just about every industry. “We have 3 of the top 10 German companies in logistics, agriculture and retail using Hasty,” Rouillard added.
A typical use case in agriculture might involve an AgTech company training an AI model to identify crop, pests, or diseases, while in logistics it can be used to train machines to automatically sort parcels by type. Rouillard added that it’s also being used in the sports realm to provide real-time game analysis and stats for soccer coverage.
With $3.7 million in the bank, the company plans to accelerate product development and expand its customer base across Europe and North America.