Generative AI, a type of AI that can create new content and ideas including conversations, stories, images, videos, and music has taken the world by storm. Like all AI, generative AI is powered by ML models—very large models that are pre-trained on vast amounts of data and commonly referred to as foundation models (FMs). And consumer-facing applications like ChatGPT have demonstrated how powerful the latest machine learning models have become.
Organizations can apply generative AI across all lines of business including engineering, marketing, customer service, finance, and sales to transform nearly every aspect of how they work. They’re creating virtual assistants and new call center features to enhance the customer experience, boosting employee productivity with conversational search and code generation, and improving operations with document processing and data-enriched cybersecurity. These changes are happening across industries from healthcare to financial services as companies use generative AI to produce better results, faster.
This doesn’t mean that generative AI alone will transform your business, though. To fully realize the benefits of generative AI, you need to differentiate the applications you build with it, which requires going deeper. Generative AI relies on data not only to generate content, but also to learn and evolve. Every great generative AI application is supported by a solid data strategy that helps you customize your models and build competitive advantage.
If you want to build GenAI applications that are unique to your business needs, your organization’s data will be what provides the differentiator.
Check your data foundation
Generative AI, with its ability to create content, relies on data. In a simple context, the better the quality and relevance of the input data, the more refined and applicable the outputs. Data doesn’t just feed AI; it shapes it, offering a foundation upon which the AI learns and evolves.
There are several ways that organizations might leverage data in their generative AI applications. While some companies will build and train their own large language models (LLMs) with vast amounts of data, many more will use their organizational data to fine-tune existing foundation models for their unique business needs or add context to prompts through Retrieval Augmented Generation (RAG), a framework for feeding LLMs accurate, up-to-date information from external sources to improve LLM responses.
For example, if you are an online travel agency that wants to generate personalized travel itineraries, you’d want to use customer profile data in your databases to tailor recommendations based on things like past trips, web history, and travel preference. You could then marry that data with other company data like flight and hotel inventory, promotions, and similar travel details.
The key to making all of these use cases work is quality data. In fact, according to the Amazon Web Services CDO survey, the number one challenge for organizations in realizing the potential of generative AI is data quality.
“The biggest disservice companies can do is to only develop a generative AI strategy,” says Archana Vemulapalli, head of product and global strategy for data and AI at AWS. “You need to have a data and AI strategy.” Vemulapalli suggests building a data strategy that begins with data collection and ends with data governance, with each step ensuring your data is accessible, reliable, and secure.
First, it’s essential to lay the foundation of your data strategy with a scalable infrastructure for data storage. Generative AI relies on vast amounts of data, including text, images, and videos, so your infrastructure will need to be capable of handling the volume, variety, and velocity of data you’ll collect. That includes breaking down data silos and bringing together data controlled by different groups across your organization. You’ll also want to choose data storage that’s optimized for your particular use cases. For instance, generative AI applications often use vector data, so you’ll need data stores that are capable of searching and storing that type of data.
Next, the quality of data used to train generative AI models significantly impacts their performance, since models learn from the data they’re trained on. Along those lines, it’s important to ensure that your data is representative of your datasets and you’ve taken steps with those datasets to identify and mitigate bias.
It’s also important to have tools to easily connect your different data sources. These tools can include data integration platforms, APIs, and connectors to software-as-a-service (SaaS) applications, on-premises data stores, and other clouds.
Finally, you need to ensure that your builders have easy but governed access to data. Establishing data governance practices is crucial to promote the integrity, security, and compliance of your data. This encompasses defining data standards, access controls, data lineage, and data lifecycle management. It also involves implementing security measures to protect sensitive data and consideration of relevant data protection regulations. Data governance encourages data use that’s reliable, traceable, and compliant with privacy and legal requirements.
Once you have a solid end-to-end data foundation in place, you’re ready to innovate with generative AI.
Start with a small but mighty problem
In laying out a plan for using generative AI, start by focusing on business goals. “Think about what levers you want to exercise with generative AI,” Vemulapalli says. “Is the goal to drive customer experience, find new revenue streams, or build a new product out and see how it scales? Get your strategy aligned.”
The next step is to find a use case that can show meaningful impact quickly. “What time-consuming, difficult, or impossible problems could generative AI help solve? Where do you have data to help in this process?” Vemulapalli says. “Think big about the opportunities, but start small. Start with a problem that causes day-to-day irritations—one that your organization will see real value in fixing.”
“Just pick one known pain point and solve for that. Don’t wait around for a silver bullet use case. Your use cases will evolve,” she says. “Just experiment, just get going.” And once you have your use case identified, you can workback backwards to identify the relevant data needed.
Choose and customize a Foundation Model
Generative AI is powered by ML models—very large models that are pre-trained on vast amounts of data and commonly referred to as Foundation Models (FMs). FMs learn to apply their knowledge within a wide range of contexts through pre-training exposure to internet-scale data in all its various forms and myriad patterns, and these “general FMs” can be used out of the box for some use cases. But many organizations are looking for FMs that can be customized to perform domain-specific functions unique to the organization. In this case, the FM must be “fine-tuned” to the organization’s proprietary data.
AWS developed Amazon Bedrock for exactly this purpose. With the comprehensive capabilities of Amazon Bedrock, you can easily experiment with a variety of top FMs, privately customize them with your data using techniques such as fine-tuning and retrieval augmented generation (RAG), and create managed agents that execute complex business tasks—from booking travel and processing insurance claims to creating ad campaigns and managing inventory—all without writing any code.
Imagine a content marketing manager who works at a leading fashion retailer and needs to develop fresh, targeted ad and campaign copy for an upcoming new line of handbags. To do this, they provide a few labeled examples of their best performing taglines from past campaigns, along with the associated product descriptions. Bedrock makes a separate copy of the base foundational model that is accessible only to the customer and trains this private copy of the model that will then automatically start generating effective social media, display ad, and web copy for the new handbags. Now, the marketing manager has a new ad campaign informed by their historical data, without having to invest in a new model or incremental training, all while keeping the organization’s data private and secure.
Whichever FM an organization chooses, it’s critical to keep data private and secure and to retain control over who can access the models. “You want to ensure that the right guardrails are in place to protect your organization’s data and IP,” Vemulapalli says. “Your data is your differentiator and your ultimate competitive advantage.”
Train and develop responsibly
New challenges in handling data responsibly stem from the vast size of generative AI’s open-ended foundation models trained by billions of parameters, and raise new issues in defining, measuring, and mitigating responsible AI concerns across the development cycle. Accuracy, fairness, intellectual property considerations, toxicity, and privacy must all be considered on a new level.
“Consider your stance on responsible AI, transparency, data collection, security, and privacy with AI,” Vemulapalli says. “How can you ensure the technology is used accurately, fairly, and appropriately?” Organizations should train on these considerations, bake them into governance and compliance frameworks, and factor them into vendor selection processes to select partners who share the same values.
“Everyone is committed to being responsible,” Vemulapalli says. “What’s important is how it is executed and enforced.”
There’s also the matter of training and upskilling your people. Consider the technical skills required to use this new technology and how to infuse them into your organization. You might look at building technical skills alongside skills like critical thinking and problem-solving. We ultimately want people, assisted by AI, to solve real business challenges and critically assess and question inferences from ML models. This is particularly important with generative AI models that distill data rather than provide considered answers.
Be ready for the next thing
Establishing an end-to-end data foundation is imperative to a successful generative AI strategy, and treating your data as your greatest asset will guide your steps along that journey. That solid data foundation will, in turn, set you up to innovate faster. “Customers are being thoughtful and fast,” Vemulapalli says. “We are currently in a period of intense experimentation and quickly transitioning to at-scale implementations. Everyone recognizes the need to move quickly.”
Learn more about innovating with generative AI at AWS for Data.