Samsung Research just cracked a major barrier in on-device AI. The company can now run a 30-billion-parameter generative model - typically over 16GB in size - on less than 3GB of memory through breakthrough compression algorithms. Dr. MyungJoo Ham, Master at Samsung's AI Center, revealed the technical details behind this achievement in an exclusive Samsung Newsroom interview.
Samsung just pulled off what seemed impossible six months ago - fitting enterprise-grade AI into smartphone memory. The company's breakthrough compression technology shrinks massive language models by over 80% while maintaining cloud-level performance, according to Dr. MyungJoo Ham from Samsung Research.
The numbers tell the story. Samsung Research can now run a 30-billion-parameter generative model - typically requiring more than 16GB of memory - on less than 3GB through advanced quantization techniques. "We're developing optimization techniques that intelligently balance memory and computation," Ham told Samsung Newsroom. "Loading only the data needed at a given moment improves efficiency dramatically."
This isn't just academic research. Samsung's already commercializing these algorithms across smartphones and home appliances, with each device getting custom compression profiles. "Because every device model has its own memory architecture and computing profile, a general approach can't deliver cloud-level AI performance," Ham explained. The company's product-driven research targets AI experiences "users can feel directly in their hands."
The secret lies in sophisticated quantization - converting complex 32-bit floating-point calculations into streamlined 8-bit or 4-bit integers. Ham compared it to photo compression: "The file size shrinks but visual quality remains nearly the same." Samsung's algorithms analyze each model weight's importance, preserving critical components with higher precision while aggressively compressing less important elements.
But compression is only half the battle. Samsung Research developed a custom AI runtime engine that acts as the "model's engine control unit," automatically distributing operations across CPU, GPU, and NPU processors. This multi-chip orchestration enables larger, more sophisticated models to run at identical speeds on the same hardware.
"The biggest bottlenecks in on-device AI are memory bandwidth and storage access speed," Ham noted. Samsung's runtime predicts when computations occur, pre-loading only necessary data while minimizing memory access patterns. The result: dramatically reduced response latency and improved overall AI quality through smoother conversations and refined image processing.












