Imagine binge-watching a TV series, but you can only remember one episode at a time. When you move on to the next episode, you instantly forget everything you just watched. Now, imagine you can remember every episode and every season you’ve watched from that TV show; this would allow you to understand the story, characters, and twists and turns.
Also: Google Glass vs. Project Astra: Sergey Brin on AI wearables and his top use case
When discussing artificial intelligence (AI) models, the ability to remember only one episode at a time and be forced to forget it when moving to the next episode represents a short context window. Remembering all the episodes in a series represents an AI model with a large context — or long context window.
In a nutshell, a long context window means that the model can remember a lot of information at once.
Knowing what context represents in AI is necessary to learn more about a long context window and how it affects a bot’s or other system’s performance.
AI systems like ChatGPT, the Gemini chatbot, and Microsoft Copilot are built on AI models, in this case, GPT-3.5, Gemini, and GPT-4, respectively. These models essentially work as the systems’ brains, holding the knowledge, remembering information within a conversation, and responding appropriately to users’ queries.
Also: 9 biggest announcements at Google I/O 2024: Gemini, Search, Project Astra, and more
Context in AI refers to information that gives meaning and relevance to the current data the AI is processing. It’s the information the model considers when deciding or generating a response.
Context is measured in tokens, and the context window represents the maximum number of tokens the model can consider or handle at once. Each token represents a word or part of a word, depending on the language. In English, one token tends to represent one word, so an AI model like GPT-4 with a 16,000 (16k) token window can handle roughly 12,000 words.
Also: What is Gemini Live? How Google’s real-time chatbot competes with GPT-4o
Tokenization methods — that is, how words are counted and translated into tokens — vary depending on the system. Here’s an example of what a tokenization method may look like:
Example phrase | The quick brown fox jumps over the lazy dog. | Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed. |
Token breakdown | “The”, “quick”, “brown”, “fox”, “jumps”, “over”, “the”, “lazy”, “dog”, “.” | “Lorem”, “ipsum”, “dolor”, “sit”, “amet”, “,”, “consectetur”, “adipiscing”, “elit”, “,”, “sed”, “.” |
Word count | 9 words | 9 words |
Token count | 10 tokens | 12 tokens |
An AI chatbot that can handle about 12,000 words can summarize a 3,000-word article or 5,000-word research paper and then answer follow-up questions without forgetting what was in the document the user shared. Tokens from preceding messages are considered throughout conversations, giving the bot context for what’s being discussed.
Also: 3 reasons to upgrade to Gemini Advanced, from Google I/O 2024
Hence, if a conversation stays within the token limit, the AI chatbot can maintain the full context. But if it exceeds the token limit, the earliest tokens will likely be ignored or lost to stay within the window, so the bot will potentially lose some context.
This is why Google proudly advertises Gemini 1.5 Pro’s large context window of one million tokens. According to Google CEO Sundar Pichai, a million tokens means its Gemini Advanced chatbot can process over 30,000 lines of code, PDFs up to 1,500 pages long, or 96 Cheesecake Factory menus.