In classic science-fiction films, AI was often portrayed as towering computer systems or massive servers. Today, it’s an everyday technology — instantly accessible on the devices people hold in their hands. Samsung Electronics is expanding the use of on-device AI across products such as smartphones and home appliances, enabling AI to run locally without external servers or the cloud for faster, more secure experiences.
Unlike server-based systems, on-device environments operate under strict memory and computing constraints. As a result, reducing AI model size and maximizing runtime efficiency are essential. To meet this challenge, Samsung Research AI Center is leading work across core technologies — from model compression and runtime software optimization to new architecture development.
Samsung Newsroom sat down with Dr. MyungJoo Ham, Master at AI Center, Samsung Research, to discuss the future of on-device AI and the optimization technologies that make it possible.
The First Step Toward On-Device AI
At the heart of generative AI — which interprets user language and produces natural responses — are large language models (LLMs). The first step in enabling on-device AI is compressing and optimizing these massive models so they run smoothly on devices such as smartphones.
“Running a highly advanced model that performs billions of computations directly on a smartphone or laptop would quickly drain the battery, increase heat and slow response times — noticeably degrading the user experience,” said Dr. Ham. “Model compression technology emerged to address these issues.”
LLMs perform calculations using extremely complex numerical representations. Model compression simplifies these values into more efficient integer formats through a process called quantization. “It’s like compressing a high-resolution photo so the file size shrinks but the visual quality remains nearly the same,” he explained. “For instance, converting 32-bit floating-point calculations to 8-bit or even 4-bit integers significantly reduces memory use and computational load, speeding up response times.”

A drop in numerical precision during quantization can reduce a model’s overall accuracy. To balance speed and model quality, Samsung Research is developing algorithms and tools that closely measure and calibrate performance after compression.
“The goal of model compression isn’t just to make the model smaller — it’s to keep it fast and accurate,” Dr. Ham said. “Using optimization algorithms, we analyze the model’s loss function during compression and retrain it until its outputs stay close to the original, smoothing out areas with large errors. Because each model weight has a different level of importance, we preserve critical weights with higher precision while compressing less important ones more aggressively. This approach maximizes efficiency without compromising accuracy.”
Beyond developing model compression technology at the prototype stage, Samsung Research adapts and commercializes it for real-world products such as smartphones and home appliances. “Because every device model has its own memory architecture and computing profile, a general approach can’t deliver cloud-level AI performance,” he said. “Through product-driven research, we’re designing our own compression algorithms to enhance AI experiences users can feel directly in their hands.”
The Hidden Engine That Drives AI Performance
Even with a well-compressed model, the user experience ultimately depends on how it runs on the device. Samsung Research is developing an AI runtime engine that optimizes how a device’s memory and computing resources are used during execution.
“The AI runtime is essentially the model’s engine control unit,” Dr. Ham said. “When a model runs across multiple processors — such as the central processing unit (CPU), graphics processing unit (GPU) and neural processing unit (NPU) — the runtime automatically assigns each operation to the optimal chip and minimizes memory access to boost overall AI performance.”
The AI runtime also enables larger and more sophisticated models to run at the same speed on the same device. This not only reduces response latency but also improves overall AI quality — delivering more accurate results, smoother conversations and more refined image processing.
“The biggest bottlenecks in on-device AI are memory bandwidth and storage access speed,” he said. “We’re developing optimization techniques that intelligently balance memory and computation.” For example, loading only the data needed at a given moment, rather than keeping everything in memory, improves efficiency. “Samsung Research now has the capability to run a 30-billion-parameter generative model — typically more than 16 GB in size — on less than 3 GB of memory,” he added.

The Next Generation of AI Model Architectures
Research on AI model architectures — the fundamental blueprints of AI systems — is also well underway.
“Because on-device environments have limited memory and computing resources, we need to redesign model structures so they run efficiently on the hardware,” said Dr. Ham. “Our architecture research focuses on creating models that maximize hardware efficiency.” In short, the goal is to build device-friendly architectures from the ground up to ensure the model and the device’s hardware work in harmony from the start.
Training LLMs requires significant time and cost, and a poorly designed model structure can drive those costs even higher. To minimize inefficiencies, Samsung Research evaluates hardware performance in advance and designs optimized architectures before training begins. “In the era of on-device AI, the key competitive edge is how much efficiency you can extract from the same hardware resources,” he said. “Our goal is to achieve the highest level of intelligence within the smallest possible chip — that’s the technical direction we’re pursuing.”
Today, most LLMs rely on the transformer architecture. Transformers analyze an entire sentence at once to determine relationships between words, a method that excels at understanding context but has a key limitation — computational demands rise sharply as sentences get longer. “We’re exploring a wide range of approaches to overcome these constraints, evaluating each one based on how efficiently it can operate in real device environments,” Dr. Ham explained. “We’re focused not just on improving existing methods but on developing the next generation of architectures built on entirely new methodologies.”

The Road Ahead for On-Device AI
What is the most critical challenge for the future of on-device AI? “Achieving cloud-level performance directly on the device,” Dr. Ham said. To make this possible, model optimization and hardware efficiency work closely together to deliver fast, accurate AI — even without a network connection. “Improving speed, accuracy and power efficiency at the same time will become even more important,” he added.

Advancements in on-device AI are enabling users to enjoy fast, secure and highly personalized AI experiences — anytime, anywhere. “AI will become better at learning in real time on the device and adapting to each user’s environment,” said Dr. Ham. “The future lies in delivering natural, individualized services while safeguarding data privacy.”
Samsung is pushing the boundaries to deliver more advanced experiences powered by optimized on-device AI. Through these efforts, the company aims to provide even more remarkable and seamless user experiences.
![[Interview] The Technologies Bringing Cloud-Level Intelligence to On-Device AI – Samsung Global Newsroom](https://afnnews.qaasid.com/wp-content/uploads/2025/11/Samsung-Corporate-On-Device-AI-Dr.-MyungJoo-Ham-Interview_thumb932.jpg)







