At the 56th Annual IEEE/ACM International Symposium on Microarchitecture, researchers from the University of California, Riverside (UCR) demonstrated an approach in which any computing components of the platform would truly run simultaneously. Due to this, you can double the speed of calculations and halve energy consumption. The technology can work on any processors and accelerators from smartphones to data center servers, but requires further development.
Image source: AI generation DALL-E/newatlas.com
“You don't need to add new processors [to speed up computing] because you already have them,” said Hung-Wei Tseng, an associate professor in the Department of Electrical and Computer Engineering at the University of California and co-author of the study.. You just need to wisely manage the available hardware resources, and not line them all up.
The researchers' platform, which they call simultaneous and heterogeneous multithreading (SHMT), breaks away from traditional programming models. Instead of providing data in one period of time to only one of the computing components of the system – the central, graphics, tensor or other processor or accelerator, SHMT technology parallelizes code execution across all components simultaneously.
Test platform. Image source: Hsu and Tseng
SHMT uses a quality-aware work-stealing (QAWS) multi-threading scheduling policy that does not require large amounts of resources, but helps maintain quality control and workload balance. The runtime system creates and divides a set of virtual operations (vOPS) into one or more high-level operations (HLOPs) to use multiple hardware resources simultaneously. The SHMT runtime system then distributes these HLOPS across task queues to run on the target hardware. Because HLOPS are hardware independent, the runtime system can redirect tasks as needed to one or another component of the computing platform.
Comparison of parallelization methods of conventional, modern heterogeneous and SHMT
What is especially valuable is that the researchers, using the example of the test platform they created, showed the effectiveness of new software libraries.. They have created a kind of hybrid that can be considered both a smartphone and a kind of PC and even a server. Based on a backplane with a PCIe connector, a “computer” was created from a combination of an NVIDIA Nano Jetson module with a quad-core ARM Cortex-A57 processor (CPU) and 128 Maxwell architecture graphics cores (GPU). The Google Edge accelerator (TPU) was connected through the M.2 Key E slot on the board.
Acceleration of SHMT calculations depending on the selected policy
The main memory of the presented system is 4 GB LPDDR4 with a frequency of 1600 MHz and a speed of 25.6 Gbps, where general data is stored. The Edge TPU module additionally contains 8 MB of memory, and Ubuntu Linux 18.04 was used as the operating system.
Comparison of active and idle consumption between conventional computing and SHMT
Running the SHMT package on an improvised heterogeneous platform using standard testing applications showed that with the most efficient policy, the QAWS framework shows an increase in computing speed by 1.95 times and a significant reduction in consumption – by 51% compared to the basic method of distributing calculations. If you scale this approach for use as part of a data center, the gain promises to be colossal and at the same time all the hardware will remain the same – you won’t have to change anything. The proposed solution is not yet ready for implementation, but it will certainly easily find people interested in it.