Welcome to AI Pulse!
Welcome to AI Pulse, your go-to newsletter for the latest AI innovations, practices, and behind-the-scenes stories from Baidu.
Why do you need AI Pulse? For more than a decade, we’ve dedicated ourselves to AI infrastructure, core AI capabilities, and open AI platforms, all with the aim of facilitating wider application and use. As the generative AI era unfolds, we’re scaling AI to address real-world problems and create substantial value for our consumers, enterprise clients, and society at large.
Today, AI is advancing faster than ever. Here at Baidu, there’s something new every week, even every day. That’s why we decided to launch AI Pulse: to bring you up-to-date insights that keep you informed on our progress spanning large language models (LLMs), cloud computing, autonomous driving, and much more.
We just wrapped up Baidu World 2024 in Shanghai earlier this month, our biggest tech event of the year, where we announced new AI technologies, tools, and many other products.
One of the highlights was the release of iRAG (Image-based Retrieval Augmented Generation), a new AI technology designed to tackle hallucinations in image generation. Since then, we’ve been flooded with questions: What exactly is iRAG? How does it work? Why did Baidu develop it? Let’s dive in and find the answers!
RAG for Text-to-Image Generation
In AI, particularly in LLMs, AI-generated content can appear plausible but be factually incorrect or nonsensical — these occurrences are called “hallucinations”, and they happen due to factors like errors in the training data, the model’s limitations, overfitting, and others. Hallucinations can undermine users’ trust in LLMs and AI more broadly, slowing down AI’s adoption and standing in the way of its greater utility.
Efforts to reduce hallucinations in text generation have seen progress through technologies like Retrieval Augmented Generation (RAG). RAG can enhance LLMs by integrating external knowledge sources into their response generation process. Once the user submits a prompt, an LLM processes the question, retrieves relevant information from databases, and integrates it with the user’s query. This enriched context allows the model to generate responses grounded in factual information.
However, hallucinations are not limited to the language domain. Image hallucination, where generated images fail to reflect factual information, remains a major hurdle and is still largely unaddressed. This is especially true in e-commerce and marketing, where sellers need to generate images that precisely match their text descriptions.
That’s why Baidu has developed iRAG (image-based RAG) technology, which integrates Baidu Search’s vast collection of hundreds of millions of images with the company’s foundation models, enabling text-to-image models to deliver more precise and demand-driven results while also significantly reducing the cost of image production.
Here’s how iRAG works, as explained by Baidu CTO Haifeng Wang at Baidu World 2024:
The system first analyzes the user’s needs to determine which parts of the image require augmentation.
It searches its database to select suitable reference images.
Baidu’s text-to-image generation model can generate creative variations that retain key features while introducing stylistic changes, such as transforming a painting of Albert Einstein into a picture-book-style illustration. It can also produce precise outputs that preserve exact details, such as ensuring a car in the image matches the original.
Users can further customize results by uploading their own reference images.
iRAG will accelerate content creation processes and improve production efficiency across various visual mediums, including comics, storyboards, posters, and more. iRAG-assisted image generation has been made available on Baidu’s AI products such as ERNIE Bot and Wenxiaoyan.
In addition to iRAG, this year’s Baidu World also featured many other highlights. Below are a few we’d like to share with you in particular.
No-Code App Builder
Gone are the days when you would have to master a programming language to write code or generate apps! Miaoda is a no-code tool introduced by Baidu CEO Robin Li that enables users to build complete applications simply by using natural language. Powered by ERNIE, Miaoda employs multiple agents that work together to achieve a common goal. Each agent has a specific role and specializes in particular tasks. These agents can also use tools, such as web search, maps APIs, and iRAG-assisted text-to-image generation to efficiently complete their assigned tasks.
With Miaoda, creating an application will no longer be a challenge, even if you don’t know how to code. Within just three days of its unveiling, over 5,000 companies applied to test Miaoda, which will launch in Q1 2025.
All-in-One Creation Platform
Free Canvas is an AI-powered creation platform developed by Baidu Wenku (for document creation) and Baidu Drive (for personal cloud storage). It allows users to import various file types — documents, PDFs, images, videos, or audio — by simply dragging them into the platform, whether they’re local files, retrieved from Wenku, or stored on Drive.
By just drawing a circle, you can ask AI to summarize files, create outlines, or generate new content such as itineraries, research reports, and presentations. All work happens in one space, where you can combine text, images, charts, and videos into a single project. Sharing is equally simple — just send a link to let others view, edit, or save the content.
AI-Powered Smart Glasses
Xiaodu’s smart speakers have long been one of the most popular smart device categories, but this time, Xiaodu has something new and truly exciting! Xiaodu AI Glasses are the first native AI glasses based on a Chinese LLM. Powered by ERNIE and equipped with visual, audio, and location-based capabilities, these smart glasses serve as a versatile AI assistant for a wide range of everyday scenarios. They function as a personal tour guide, offer information via Baidu Maps and Search, and excel in instant translation and content summarization from photos. Ideal for both academics and casual readers, they also assist with intelligent note-taking and can personalize music to enhance the user's surroundings.
We’ve received much interest in these glasses, and luckily, it won’t be too long before you can get your hands on a pair — Xiaodu AI Glasses will be available for sale in the first half of 2025. Stay tuned!
For more on Baidu World 2024, check out our press release and exhibition tour video.
In this space, we’ll spotlight our latest quarterly earnings report and what it signals for Baidu’s big-picture AI strategy.
Baidu will launch a new version of ERNIE in early 2025.
In November, ERNIE handled 1.5 billion daily API calls, up from 600 million in August.
ERNIE generated 1.7 trillion tokens on a daily basis.
ERNIE Bot has amassed 430 million users.
Baidu’s latest-gen robotaxi RT6 operates on public roads in multiple cities across China.
Apollo Go provided 988,000 rides nationwide in Q3, up 20% YoY, while cumulative public rides reached 8 million in October.
The proportion of fully driverless operations nationwide surpassed 70% in Q3 and 80% in October.
Over 20% of all Baidu search result pages feature AI-generated content.
70% of Baidu App’s 704 million MAUs are engaging with generative content.
MAUs of AI-enabled features on Baidu Wenku exceeded 50 million in September.
Baidu AI Cloud launches AI Innovation Center in Hong Kong.
Announced alongside is the Hong Kong AI Science Education and Training Base, a joint initiative with Hong Kong Qianfan Technology.
Both will serve as a venue to provide foundational knowledge and practical AI skills.
Baidu AI Cloud leads China’s LLM market in the first half of 2024, according to IDC.
Baidu AI Cloud holds a 34% share of China’s Model-as-a-Service (MaaS) market.
Baidu AI Cloud ranks as the No. 1 provider in China’s AI foundation model solutions market, with a 17% market share.
Baidu reveals its top 10 frontier technology inventions of 2024.
This year’s list highlights breakthroughs in foundation models, autonomous driving, and digital human technology.
These inventions are driving innovation, boosting industry productivity, and accelerating the AI-native transformation across sectors. Read more.
Have a question or something you’d love us to cover? Leave a comment or DM us!
Until our next roundup, keep up with our latest AI developments and innovations by following us on LinkedIn and X.