Title:
JFS6.2 | Cost-effective LLM accelerator using processing in memory technology (Invited)
Description:
Authors:
hyungdeok Lee1, guhyun kim1, dayeon yun1, ilkon kim1, yongkee kwon1, Euicheol Lim1 1SK Hynix
Large language model (LLM)-based services continue to improve their performance requires the system with both large memory capacity and high memory bandwidth. For the GPT-3 175 billion model to operate at a minimum, it requires 800GB of storage. In addition, from frequent memory access and limited data reuse also affects memory bandwidth. More powerful memory performance requirements, however, comes with significant costs increase. The expenses associated with operating the necessary equipment and services to handle these memory and bandwidth requirements are considerable.
SK hynix aims to solve this issue by introducing a Processing in memory (PIM) device and PIM based accelerator called AiM and AiMX, respectively. By exploiting true bank-level parallelism, AiM and AiMX is expected to enhance the performance of LLM-based services as a core component of disaggregated system and multi-head attention acceleration. Additionally, AiM also has a potential in on-device AI, in direction of both performance and energy consumption with low batch size and reducing off-chip data movement.
Type:
Joint Technical Session