Rockchip Executes: "Turn Off the Airflow, Then Turn It Back On"
On-Device Vehicle Control and Local Inference from China′s Edge AI
2026-05-14 / 07월호 지면기사  / 한상민 기자_han@autoelectronics.co.kr


Real-time mcp_vhal-hvac API call logs scrolling on the terminal screen. A single command - "Turn off the airflow first, then turn it back on after 10 seconds" - executes in sequence: fan_speed level: 0 → nanobot response → fan_speed level: 2.

On a small board, voice commands are actually executing vehicle functions in real time. With Rockchip, AI has moved beyond competing on cloud-based large models - it is now reacting, reasoning, and controlling functions from inside the car. Real-world constraints of latency, bandwidth, and data sovereignty are amplifying the importance of on-device AI and local inference. Rockchip's booth at Auto China 2026 was a window into the execution layer that China's edge AI is building inside the automobile.

By  Sang Min Han _ han@autoelectronics.co.kr
한글로보기



Logs were scrolling up a black terminal screen, one after another.
mcp_vhal-hvac_fan_speed → level: 0
mcp_vhal-hvac_air_circulation → inside
"Turn off the airflow first, then turn it back on after 10 seconds," came the voice command. The screen responded immediately.
fan_speed level: 2
"The AC fan has been turned off. It will automatically resume at level 2 in 10 seconds."
Nanobot replied.
Time to first response: under 100ms. And all of it was processed inside the vehicle - no internet required.
This was the most striking moment at the Rockchip booth. Not a massive autonomous driving demo car, not a flashy NOA visualization. At the center of the booth sat small AI boards, a display, and that black terminal log screen. Overhead, a sign read: "AI Exhibition Zone."
The deeper you walked into the booth, the more clearly these small moments explained the fundamental shift underway inside China's automotive industry. AI is no longer confined to the cloud. It is coming into the car. And it isn't just a voice assistant - it is actively controlling real vehicle functions.



Why Rockchip Came to Auto China

Rockchip is a Chinese fabless semiconductor design company founded in 2001. Starting with digital audio and video chips, it has since expanded significantly into AIoT, automotive electronics, and robotics. The Hurun Report placed it among the top 10 of its 2025 China AI Top 50 ranking.
On the automotive side, Rockchip leads with SoCs like the RK3588M, RK3576M, and RK3572M - chips optimized for smart cockpits, instrument clusters, entertainment screens, HUDs, DLP systems, and in-cabin vision. The RK3588M supports driving up to seven displays simultaneously from a single chip and can also handle AVM functionality.
A Rockchip representative pointed to one of the small boards in the booth and began his explanation.
"We are a chip company, and within China we are one of the leaders in this field. We do a lot of work on the AI chip side in particular.“



The Yonghu AI Assistant demo station. The RK3576M + RK1828 board is on display; the product card on the left lists key capabilities including colloquial command comprehension, state memory, and multi-step instruction execution.



RK3576: The Sweet Spot of Edge AI

The centerpiece of the Auto China 2026 demo was on-device AI powered by the RK3576. Inside a transparent acrylic case, the actual board, heatsink, and connection ports were all exposed. A display beside it showed an HVAC control UI in operation.
Within Rockchip's lineup, the RK3576 occupies the sweet spot between performance and power efficiency. It features an octa-core CPU with four Cortex-A72 and four Cortex-A53 cores, a proprietary NPU delivering 6 TOPS, and an ARM Mali-G52 MC3 GPU. It supports 8K video codecs and operates across an industrial temperature range of -40°C to 105°C. Since its launch in July 2024, it has seen rapid adoption in edge AI, smart HMI, and multimedia signal processing.
"This part here is the AI chip. It's our latest, and the TTFT - Time To First Token - and TPS - Tokens Per Second - performance are quite good. The response speed is fast. TTFT is around the 100ms level."
TTFT is an increasingly critical metric in generative AI and LLM systems. It measures the time from when a user inputs a command to when the model delivers its first output - ultimately determining the perceived responsiveness of an AI interaction. In real-time environments like vehicle cabins, sensitivity to this figure is even greater.



"Turn Off the Fan, Turn It Back On After 10 Seconds“

The most impressive moment was not the hardware specs. It was that terminal screen.
mcp_vhal-hvac_fan_speed, mcp_vhal-hvac_temperature, mcp_vhal-hvac_air_circulation - vehicle function calls scrolled in real time. On one side of the screen, the name "nanobot" appeared. This was not simple UI animation. An LLM-based command orchestration architecture was executing live.
"Let's try putting in one command. Something like: 'Turn off the fan first, then turn it back on after 10 seconds.'"
The representative ran the demo himself. The moment the command was given, the screen responded instantly. The terminal log unfolded in sequence: fan_speed level: 0 → nanobot response: "The AC fan has been turned off. It will automatically resume at level 2 in 10 seconds." → fan_speed level: 2.
"You see that? The response speed is under 100ms. Quite fast."
Natural language commands were connected directly to the vehicle function API. Looking closely at the terminal log, it became clear that nanobot was decomposing a complex instruction - set main temperature to 18°C, set secondary temperature to 25°C, set airflow direction to face, activate seat heating at level 2, enable internal air circulation - into sequential API calls and executing them one by one. AI was not simply holding a conversation. It was calling and executing real vehicle functions.
Input methods were also flexible.
"Typing works, and voice ASR works too. Both are supported."
The representative then summed up the core of the demo in a single line.
"All of this is processed internally. No internet needed."
Yonghu AI Assistant: MCP-Based In-Vehicle AI Agent Architecture
The name of the booth demo was Yonghu AI Assistant. According to the product card, the demo runs on a combination of the RK3576M and RK1828 chips, with a model based on Qwen3-4B.
What made it interesting was that this went well beyond simple voice recognition. The product card listed capabilities including understanding colloquial commands, executing complex multi-step instructions, remembering user preferences, and invoking vehicle control tools. It was described as understanding everyday expressions like "I can't stretch my legs" or "I can't see clearly ahead."
Structurally, the system connects to the vehicle's VHAL (Vehicle Hardware Abstraction Layer) via MCP (Model Context Protocol). An agent uses LLM-based natural language reasoning to directly call vehicle function APIs. Where earlier automotive voice control centered on fixed keyword recognition, the architecture has now shifted to LLM-based contextual interaction.




The Omni Multimodal Perceptual Vehicle Control demo station. A camera is connected to the RK3576M + RK1828 board; the panel on the right lists voice-controllable commands covering everything from climate on/off to seat heating, defrost, and air circulation - all modeled on real vehicle control scenarios.



Omni Multimodal Perceptual Vehicle Control: Vision Enters the Picture

At a separate station within the same booth, another demo was running - this one with a camera connected. Named Omni Multimodal Perceptual Vehicle Control, it also ran on the RK3576M + RK1828 combination, with a multimodal model based on Qwen3-Omni-4B.
The key was that vehicle control and visual recognition were unified into a single system. Functions including climate on/off, temperature adjustment, fan speed, airflow direction, seat heating, defrost, and interior/exterior air circulation were all linked to voice commands - while simultaneously, the camera recognized individuals and generated real-time descriptive text. Image input, text generation, and vehicle interaction were all executing on a single edge device.
On a nearby screen, yet another AI Box demo was running. Qwen2.5-VL-7B and Qwen3-VL-4B models ran on the RK3576M + RK1828, recognizing movements of people in front of the camera and generating real-time descriptive outputs like: "Non-threatening event: a person approached the vehicle but did not make contact with the body."
"The camera recognizes the person, and the result is reflected here immediately. All on-device," the representative said, pointing to the camera feed.




The AI Box demo screen. The camera recognizes a person and generates the text "Event: a person approached the vehicle but did not make contact with the body" in real time. The result of a Qwen-based Vision-Language model running locally on the RK3576M + RK1828 edge AI board.



The Direction: The Car as Endpoint

While the demos ran in a controlled booth environment and questions about production timelines and target OEMs went unasked, one thing was clear after walking the floor: Rockchip is no longer simply an automotive MCU company. The booth displayed the slogan "Seven Product Lines, Upgraded Value for In-Vehicle Products" alongside AI, automotive vision, and intelligent cockpit zones laid out side by side. The boundary between automotive and consumer AI was visibly blurring.
Why run AI inside the vehicle at all? The first reason is latency. In-cabin interactions are sensitive even to delays of a few hundred milliseconds. A TTFT at the 100ms level meaningfully changes the actual interaction experience. The second is bandwidth. Continuously streaming all multimodal AI and vehicle sensor data to the cloud carries substantial real-world cost. The third is privacy and regulation. Demands for data sovereignty and local processing within the automotive industry are growing stronger. Ultimately, many functions are likely to execute directly inside the vehicle. That is why the edge AI competition is not simply a race of TOPS numbers - it is a problem that requires simultaneously solving latency, thermal management, power efficiency, local inference, and multimodal processing.



The Rockchip booth at Auto China 2026. In front of a white demo car, the AI exhibition, automotive vision, and intelligent cockpit zones are laid out side by side.



Where AI Comes Down

For the past few years, the generative AI industry has been largely a cloud AI competition - models trained on massive data centers and GPU clusters. But when the conversation shifts to real industrial deployment, the story can change.
What Rockchip and many Chinese technology companies demonstrated at Auto China 2026 was exactly that. Rockchip in particular showed AI operating at the lowest layer of this shift: executing at the actual device edge.
The automobile is one of the most important execution spaces for that shift. AI is no longer confined to data centers. It has already arrived - inside the vehicle's function calls and HVAC control logs.

AEM(오토모티브일렉트로닉스매거진)



<저작권자 © AEM. 무단전재 및 재배포 금지>


  • 100자평 쓰기
  • 로그인



TOP