Samsung just pulled back the curtain on how it's cramming cloud-level AI into smartphones. The company's research division has developed compression technology that can run a 30-billion-parameter AI model - typically requiring over 16GB of memory - in less than 3GB on device. Dr. MyungJoo Ham from Samsung Research AI Center revealed the breakthrough techniques in an exclusive interview that shows how the company plans to make your phone as smart as the cloud.
Samsung is rewriting the rules of mobile AI with compression breakthroughs that sound almost too good to be true. The company's research team has cracked the code on running massive AI models locally - achieving what many thought impossible just months ago.
Dr. MyungJoo Ham, Master at Samsung Research AI Center, revealed in an exclusive interview how his team compressed a 30-billion-parameter generative model from over 16GB down to less than 3GB of memory usage. That's the difference between needing a server rack and fitting in your pocket.
"Running a highly advanced model that performs billions of computations directly on a smartphone would quickly drain the battery, increase heat and slow response times," Ham told Samsung Newsroom. "Model compression technology emerged to address these issues."
The breakthrough centers on a sophisticated quantization process that Ham compares to photo compression - keeping the visual quality while dramatically shrinking file size. Samsung's algorithms convert 32-bit floating-point calculations down to 8-bit or even 4-bit integers, slashing memory usage and computational load.
But here's where it gets interesting: not all parts of an AI model are created equal. Samsung's compression identifies which neural network weights matter most, preserving critical components with higher precision while aggressively compressing less important areas. "Because each model weight has a different level of importance, we preserve critical weights with higher precision while compressing less important ones more aggressively," Ham explained.
The compression is just the beginning. Samsung has built what Ham calls an "AI runtime engine" - essentially the model's engine control unit that acts like a smart traffic controller for your phone's processors. When an AI model runs, this runtime automatically decides whether to use the CPU, GPU, or NPU for each operation, minimizing memory access to squeeze out maximum performance.
"The AI runtime is essentially the model's engine control unit," Ham said. "When a model runs across multiple processors, the runtime automatically assigns each operation to the optimal chip and minimizes memory access to boost overall AI performance."












