MEMRAY

Data have become critical to all aspects of computing life, which in turn introduces immense demands of large memory solutions in many industries and business. However, DRAMs may face a difficulty to expand their storage capacity due to the memory technology scaling limit. The data refreshing power of DRAM is another challenge in satisfying the requirements of large memory solutions.

Also designing modern systems with the consideration of persistence is essential in a wide spectrum of computing domains, ranging from handheld devices to high-performance computing (HPC). However, the existing persistence mechanisms are limited to make the system fully non-volatile, and unfortunately, introduce significant overhead

MEMRAY considers leveraging non-volatile memory (NVM) and builds a series of new memory controllers and hardware/software infrastructure that can connect NVM to processors via a conventional memory bus. These new memory solutions can offer a large memory capacity as well as data persistency in diverse computing domains. MEMRAY expects that the new memory solutions can change the conventional memory hierarchy and open a new door in computing by making target systems more power and energy efficient.

MEMRAY built PRAM controller and PRAM only memory system. It realizes the highly innovative main memory system which provides byte accessible, DRAM like performance, greater capacity, and low power consumption. MEMRAY’s PRAM controller, which is key of MEMRAY’s persistence memory solution, employs PRAM as NVM and internal cache. These low-level memory modules are connected to MEMRAY’s PRAM controller through a conventional DDR interface, which is the common communication method in a general computing system.

New Memory Controller Key attributes

While there is an alternative solution to extend the memory capacity by utilizing NVM as storage, such solution is not the best owing to the block interface, incompatible with the conventional memory bus, and extremely long latency of NVM. For example, “NVM as Cache” adopts NVM as cache for system storage. This strategy can provide a larger memory capacity than DRAM and provide the data persistency. While this solution can be used for storage cache, the performance is still far from DRAM. Moreover, the solution requires an OS and file system support, which makes system complicated and in-energy efficient. On the other hand, “NVDIMM-N” is a standard persistent memory, offered by JEDEC. It integrates DRAM and NVM into a single DIMM module. In contrast to the NVM as Cache solution, NVDIMM-N module no needs OS and file system support since it can be accessed by conventional load/store instructions over the DRAM interface. However, NVM is invisible to processors, and the storage capacity of NVDIMM-N is limited by its internal DRAM.
The persistent NVDIMM technology, one of MEMRAY’s new memory solutions, addresses the shortcomings brought by these two strategies. The new memory controller of MEMRAY exposes the entire NVM storage capacity to processors over DRAM interface. This in turn removes the file system and OS requirement, while offering large storage space. MEMRAY’s memory controller also hides the long latency imposed by the underlying NVM and supports all persistent instructions, which can incarnate a persistent NVDIMM as persistent and large memory solution.

1. Firmwareless SSD

Solid state drives (SSDs) become major storage media in diverse computing domains . In particular, enterprise, high performance computing and datacenters require much higher performance, and their demands get increasing. While flash-based SSDs are yet faster than spinning disks, the major industry trend is to make flash denser rather than faster by stacking multiple flash layers and/or putting more bit presentations per bit.

While the new memory can be a very promising storage backend to realize a microsecond-scale latency PCIe storage card, power consumption behaviors of each card are expensively higher than middle-end flash storage This high power consumption is observed by other fast PCIe storage cards as well. The reason is that they require more computing resources, including DRAM andCPU. The computing resources are mainly used for firmware executions, which manage in practice internal DRAM buffer, queue, address translation, and reliability of storage.

The design of efficient firmware takes a key role of modern SSDs, but we believe that firmware operations can be a burden for the fast PCIe storage cards in the near future. Specifically, as the access latency of new memory is now around a few microseconds, the firmware executions become a critical performance bottleneck.

Memray’s PRAM controller technology enables to remove the internal CPU and DRAM resources from modern NVMe SSD architectures and realize energy-efficient new memory based block storage, “Firmwareless SSD”.

Firmwarless SSD will deliver significantly better bandwidth and latency behavior in addition to less energy consumption.

2. AI Accelerator

Deep neural networks (DNNs) have been paid significant attention as they are used for a wide spectrum of deep learning based predictions on massive data across many domains, such as visual understanding, speech perception and automated reasoning. FPGA-based hardware accelerations are a promising solution to meet the high demands of computing and energy efficiency for diverse DNN models.

One of well known issues behind the FPGA-based hardware acceleration approach is a limited storage capacity to accommodate all input/output data with the accelerator’s on-chip memory. In cases where the data do not fit with the internal on-chip memory, it introduces a host overflow that requires often transferring data between the host and FPGA board.

Memray’ PRAM controller technology realizes an FPGA-based domain specific architecture for DNNs exploiting a large storage space of phase change memory (PRAM). The PRAM controller enables an on-board PRAM backend complex that can contain many PRAM modules at the FPGA-side as working memory. The PRAM controllers and backend complex address the limited on-chip memory capacity issue, imposed by modern accelerators.

3. Smart Phone

Smartphone shipments in 2019 reached up to 1.37Billion units , fell by 2% compared to 2018. However, declines do not mean people are using smartphones less. In fact, the population is more addicted to phones than ever. Smartphone users more and more interact with their daily life, work and entertainment with smartphone. In natural course, smartphone users expect to have more battery endurance with their phones. As smartphone batteries are getting larger and larger, the latest flagships from Samsung and LG feature massive 5,000 mAh batteries.

Memray’ PRAM controller technology delivers totally different dimension of power efficiency to current battery competition. Memray’s Mobile PRAM controllers will dramatically decrease power consumption and prolong the power endurance without increasing battery size , which would provide more space and lower cost to the device as well.

Memray’s Persist CPU technology enables users to instantly put power off and back on instead of current “display lock “ process leading to power consuming idling status. Users consume power only when they use the phone. Only modem for communication(call. texting_) will remain alive.

Total battery endurance for idling full charged iPhone 11plus is 40.5hour as apple’s official specification. Memray’s mobile memory architecture will extend it more than 10 times. Memray’s Mobile PRAM Memory newly defines and changes the concept of the longest lasting smart phone competition.

4. Data Center

The overhead imposed by checkpoint-restart is a well-known problem in large-scale computing domains . For example, several HPC applications redundantly write 20 TB ∼ 160 TB per hour to check point which not only consumes severe I/O bandwidth and power, but also shortens the underlying storage or memory life times.

Recovering in-memory data takes more than a few or ten of hours by moving the data from the backend storage to working memory . The average cost of such recovering times (included in downtime) can be as much as $540K per hour in high-end datacenters.

Memray provides a full-system persistent computing platform that can keep all multi-cores’ execute states and memory data even in cases where the target system faces power failures or is completely powered off. When it recovers the power, Memray’s Persist CPU technology can instantaneously execute Operating System and all user-land applications from the exact point where the system stopped in previous instead of starting over the processor and memory initialization.

The contiguous execution environment that Persist CPU provides can remove unnecessary checkpoints and external power sources such as uninterrupted power supplies and battery/capacitance backed non-volatile memory

5. IoT

The number of Internet of Things is expected to grow up to 20.4 Billions by 2020. The fast growing number of devices and quantity of data produced from the devices provides new challenges to current technology.

That is, data centers are facing increasing problems to process huge data continuously transmitted from the devices because they have to face snowballing real time issue, network latency and privacy & security etc. On due course, need for data process at the local device, namely, edge computing is developing widely.

Providing the large memory system to IoT, Memray’s PRAM controller enables IoT to execute large scale data processing at the edge side. Also reducing the quantity and frequency of data transmissions between local and cloud, Memray’s PRAM controller will provide high-performance, low latency, and security to IoT.

6. Autonomous Vehicle

Real-time autonomous car applications demand high reliability, real-time response, high security, and fast data read and write. Especially the huge quantity of data transferring back and forth between car and cloud server must be processed and stored instantly, as any data loss at highway speeds could be catastrophic.
Thus the computing system for the autonomous car requires almost a small server class performance. Unfortunately, current DRAM base computing system , due to its limited capacity and excessive power consumption, is not likely to afford the big data generated from autonomous car . Memray’s Persistent Memory technology will deliver autonomous car memory system which supports storage class memory capacity, DRAM like performance and lower power consumption through full persistency.

7. Desk top/Lap top/Hand held PC

Existing persistence mechanisms are unfortunately limited to make the system fully persistent or introduce significant overhead. For example, system images, including hibernation and sleep, are infeasible to offer orthogonal persistence when the system faces a power failure; for the failure, hibernation requires serializing the entire system image, which cannot be success without assistance of external power sources.

MEMRAY’ PRAM Controllers delivers full persistent computing system which provides on-the-fly data persistence and secures execution persistence. Realizing instant “Power off and Power on” of computing system, MEMRAY’ PRAM Controller also enables it to keep data safe from system power failure even if power is cut off and immediately executes the system where it suspended when power is recovered.

Memray’ PRAM controller technology delivers totally different use experience to end users. They can instantly power off the device without losing any data and spontaneously restart where they stopped. Eliminating tiresome and time consuming booting, hibernation and resume process, this technology will prolong battery life and extend product life cycle as well.

MEMRAY developed the non-volatile computing platform which is built with a 28nm FPGA Chip and multiple pure PRAM DIMMs connected by DDR DIMM interface.
Based on our non-volatile computing platform, MEMRAY proposed 3 different application prototypes to prove PRAM Controller will be the right answer to solve problems which has not been solved with current computing technology.

1. Persist CPU.

MEMRAY developed a non-volatile computing prototype that consists of open-licensed RISC-V cores, multiple PRAM controllers and PRAM DIMMs. Persist CPU’s main memory consists of only PRAM DIMMs and both code and data exist in PRAM. Persist CPU removes all DRAM-related hardware components and software stack from the memory hierarchy and persistent control. Persist CPU eliminates physical and logical boundaries in drawing a line between non-persistent and persistent data structures by guaranteeing both data and execution persistence. Data copy between memory and storage that hibernation and checkpoint-restart require to support persistency is not expected. Existing applications do not require any source- level modifications to take advantages of Persist CPU. In addition, Persist CPU’s contiguous execution environment can remove checkpoints and external power sources, such as uninterrupted power supplies and battery/capacitance backed non-volatile memory.

Persist CPU prototype guarantees on-the-fly data persistence and secures execution persistence.

Persist CPU realizes instant Power off and Power on, which is an OS-level lightweight orthogonal persistence to quickly convert the execution states to persistent information in cases of a power failure. Specifically, it provides a single execution persistence cut where multiple cores can make sure their states in persistence upon a power failure event. Later, it almost-immediately executes all the stopped user and OS kernel process tasks on the processor from the previous persistence cut if the power is recovered. This feature significantly saves current computing’s power budget and overhead cost for persistence management.

No need for Checkpoint-Restart.

In current computing, especially in huge data centers and HPC, Checkpoint-Restart is very widely adopted. To prevent huge data loss from power failure or any power related events, Data centers repeatedly back up the GB, TB size of data redundantly. The power used for this redundant but inevitable process takes up to 60% of total power budget. Only 35~40% of power is actually used for data processing work load. Eradicating this redundant procedure and related overhead, Persist CPU saves at least 60% of power budget and related overhead budget, and also enhances the computing performance.

Performance of Persist CPU

Persist CPU demonstrates MEMRAY’s PRAM Controller performance in a real computing system.

* Computing organization is the same with Persist CPU, but employs DRAM and SSD rather than PRAM.

2. Firmwareless SSD

Removed Firmware(CPU+DRAM) from current SSD

Removed the most costly and the most power consuming operations in SSD
Shows much better bandwidth and latency and lower power consumption

The design of efficient firmware takes a key role of modern SSDs, but we believe that firmware operations can be a burden for the fast PCIe storage cards in the near future. Specifically, as the access latency of new memory is now around a few microseconds, the firmware executions become a critical performance bottleneck.

MEMRAY proposes Firmwareless SSD that removes the internal processor and DRAM buffer resources from state-of-the-art NVMe SSD architectures by fully automating all new memory processing components over hardware. Specifically, Firmwareless SSD converts all storage management logic for new memory into pipelined hardware modules. Firmwareless SSD directly reads or writes host-side data to underlying backend storage media without internal DRAM caches by employing PRAM controller to perform I/O services directly upon bare PRAM packages. Firmwareless SSD exposes this PRAM backend complex to the host through PCIe links. MEMRAY prototypes Firmwareless SSD on our non-volatile computing platform, employing massive numbers of PRAM as representative of new memory technologies.

Firmwareless SSD proves that performance of SSD can be greatly improved and energy consumption can be reduced by removing computation resources like CPU and DRAM.

* Microbench of Firmwareless SSD is normalized to Firmware SSD which is implemented based on firmware execution.

3. AI Accelerator

Reduced Data movement overhead by removing SSDs from the data process.

The data is stored in the accelerator’s large capacity of PRAM memory instead of host’s storage (SSD).
The accelerator directly accesses the data in the PRAMs and executes data processing.
AI accelerator gives better computation performance and consumes less energy than conventional AI accelerators

The well known issue behind FPGA-based hardware acceleration approach is a limited storage capacity to accommodate all input/output data with the accelerator’s on-chip memory. In cases where the data do not fit with the internal on-chip memory, it introduces a host overflow that requires often transferring data between the host and FPGA board.

MEMRAY proposes AI Accelerator that realizes a scalable heterogeneous deep learning accelerator. AI Accelerator realizes an FPGA-based domain specific architecture for DNNs, but exploits a large storage space of phase change memory (PRAM). Specifically, our AI Accelerator integrates a general processor, systolic-array hardware, and PRAM controllers into a single chip fabrication. AI Accelerator connects an on-board PRAM backend complex that can contain many PRAM modules at the FPGA-side as working memory. The systolic-array hardware accelerates deep learning operations in particular general matrix multiplication (GEMM) and convolution. In parallel, the PRAM controllers and backend complex address the limited on-chip memory capacity issue, imposed by modern accelerators. MEMRAY prototypes AI Accelerator on our non-volatile computing platform and exposes it to the host over PCIe.

AI Accelerator minimizes the host overflow by storing large machine learning data in the accelerator’s non-volatile PRAM memory that provides larger memory capacity than a conventional DRAM. AI Accelerator gives better computation performance and consumes less energy than conventional AI accelerators, which is owing to minimization of host overflow.

4. Disaggregated Persistent Pooled Memory System

Broke processor-memory dependency and provided huge non-volatile memory space.

Computing nodes can use the persistent pooled memory as their main memory, instead of local DRAM memory.
Integrates many PRAM modules for huge non-volatile memory space and manages cache coherence.

The demand for large memory space increases significantly in modern computing system. To increase the memory space in a conventional system, the number of processor should be increased too. This raises total costs of ownership (TCO) for larger memory space. If the memory can be disaggregated with processors, TCO can be reduced for larger memory space.

MEMRAY proposes disaggregated persistent pooled memory system that integrates many PRAM modules for huge non-volatile memory space and manages cache coherence between computing nodes and the persistent pooled memory system. Specifically, our persistent pooled memory system can be connected to multiple computing nodes through PCIe switch and provides shared memory space for the computing nodes. The persistent pooled memory system can be used as main memory for the computing nodes, which can remove the processor-memory dependency. Our persistent pooled memory system manages cache coherence between the multiple computing nodes and the persistent pooled memory based on our cache coherence protocols that optimized to the property of PRAMs, which is longer write latency compared to DRAMs.

Currently, our disaggregated persistent pooled memory system is under development. First prototype will be available by end of Dec. 2020. The prototype will be evaluated with RISC-V computing nodes.

Memray, has devoted itself to developing PRAM controllers for many years and realized a new computing platform using PRAM controller and PRAM Packages as main memory, while eradicating DRAMs and related persistence control software stack from current computing system. Memray’s new computing platform and technology shows remarkable performance results and possibility of creating new game changing applications for next computing environments.