How AI Will Shape The Future – A Look Ahead To 2024 And Beyond By Verysell Applied AI Lab

Verysell Applied AI Lab

Verysell AI’s Chief AI Scientist, Dr. Dao Huu Hung, offers insights into AI’s exciting future and its impact on businesses and society.

The agent concept is a new trend in Generative AI that has the potential to revolutionize the way we interact with computers.”

— Dr. Hung Huu Dao

MARLOW, ENGLAND, UNITED KINGDOM, January 10, 2024 /EINPresswire.com/ — Forewords
2023 has been one of the most exciting years to witness the breakthrough of AI Technology and Generative AI in particular, with the increasing popularity of ChatGPT (Generative Pretrained Transformer) and LLM (Large Language Models). This is thanks to its impressive ability to comprehend human languages and making decisions that remarkably mimic human intelligence.

Generative AI and Large Language Model on the rise
ChatGPT reached an unprecedented milestone of 1 million users within five days. Since then, Big-Tech Giants have been quickly entering the race, releasing dozens of LLMs both open source and proprietary, such as LaMDA (Google AI), Megatron-Turing NLG (NVIDIA), PaLM (Google AI), Llama-2 (Meta AI), Bloom (Hugging Face), Wu Dao 2.0 (Beijing Academy of Artificial Intelligence), Jurassic-1 Jumbo (AI21 Labs) and Bard (Google AI), etc.

Alongside the race of Big-Tech Giants, the adoption of ChatGPT and LLMs in business is growing rapidly. According to the Master of Code Global report “Statistics of ChatGPT & Generative AI in business: 2023 Report”, 49% of companies presently use ChatGPT, while 30% intend to use it in the future. Another report by Forbes suggests that 70% of organizations are currently exploring generative AI, which includes LLMs. This suggests that LLMs are gaining traction in the enterprise world and that more and more companies are seeing the potential of this technology to revolutionize their businesses.

1. Multimodal Generative AI
Although ChatGPT and most of other LLMs have been demonstrating superior performance in human language understanding (in text form), text is just one kind of data modals human beings perceive every day. However, multimodal data is ubiquitous in the real world, as humans often communicate and interact with all types of information, including images, audio, and video. Multimodal data also poses significant challenges for artificial intelligence (AI) systems, such as data heterogeneity, data alignment, data fusion, data representation, model complexity, computational cost, and evaluation metrics. The AI community, therefore, often opts for successfully addressing the unimodal data, before dealing with more challenging ones.

Inspired by the tremendous success of LLMs, the AI community has been creating Large Multimodal Models (LMMs) that can achieve similar levels of generality and expressiveness in the multimodal domain. LMMs can leverage massive amounts of multimodal data and perform diverse tasks with minimal supervision. Incorporating the other modalities into LLMs creates LMMs which solve many challenging tasks involving text, images, audio, videos, etc., such as captioning images, visual question answering, and editing images by natural language commands, etc.

GPT-4V and LLaVA-1.5
OpenAI has been pioneering the development of GPT-4V, the upgraded multimodal version of GPT-4 model that can understand and generate information from both text and image inputs. GPT-4V can perform various tasks, such as generating images from textual descriptions, answering questions about images, and editing images with natural language commands.

LLaVA-1.5: This is a model that can understand and generate information from both text and images. It can perform tasks such as answering questions about images, generating captions for images, and editing images with natural language commands. Alpaca-LoRA: This is a model that can perform various natural language tasks by providing natural language instructions or prompts.

Adept, on the other hand, has been aiming at a bigger ambition, building an AI model that can interact with everything on your computer. ”Adept is building an entirely new way to get things done. It takes your goals, in plain language, and turns them into actions on the software you use every day.” They believe that AI models reading and writing text are still valuable, but ones using computers like human beings are even more valuable to enterprise businesses.

This is driving the race among Big-Tech companies to deliver Large Multimodal Models. It will take a few years for LMMs to reach the same levels as LLMs today.

2. Generating vs. Leveraging Large Foundation Models
Producing AI applications for many diverse tasks has never been easier and more efficient than before. Recalling several years ago, if we would like to make a sentiment analysis application, for example, it may take a few months to implement POC with both in-house and public datasets. It also takes a few months to deploy the sentiment analysis models into the production system. Now, LLMs facilitate the development of such applications in a few days, simply formulating a prompt for LLMs to evaluate a text as positive, neutral, or negative.

Large Foundation Models in AI
In the field of computer vision, visual prompting techniques, introduced by Landing AI, also leverage the power of Large Vision Models (LVMs) to solve a variety of vision tasks, such as object detection, object recognition, semantic segmentation, etc. Visual Prompting uses visual cues, such as images, icons, or patterns, to reprogram a pretrained Large Vision Model for a new downstream task. Visual prompting can reduce the need for extensive data labeling and model training and enable faster and easier deployment of computer vision applications.

Generating pre-trained Large Foundation Models (LFMs), including LLMs and LVMs, requires not only AI expertise but also a huge investment in infrastructure, i.e., data lake and computing servers. Hence, the race to create pretrained LFMs among Big-Tech companies this year will continue in 2024 and in the years to come. Some are proprietary but many others are open source, leading to diverse alternatives for enterprises. Meanwhile, Small and Medium Enterprises (SMEs) and AI start-ups will be the main forces to realize the commercials of LFMs. Thus, they will primarily focus on the creation of LFMs applications.

3. Agent Concept in Generative AI
The agent concept is a new trend in Generative AI that has the potential to revolutionize the way we interact with computers. Agents are software modules that can autonomously or semi-autonomously spin up sessions (in this case, language models and other workflow-related sessions) as needed to pursue a goal. One of the key benefits of using agents is that they can automate many of the tasks that are currently performed by humans. This can free up humans to focus on more strategic and creative.

Steve Nguyen
Verysell Applied AI Lab
+84 94 571 62 86
tung.nguyenngoc@verysell.ai
Visit us on social media:
LinkedIn

Source Link

LEAVE A REPLY

Please enter your comment!
Please enter your name here