Abstract: This brief proposes and evaluates several OpenCL-based implementations of the Secure Hash Algorithm-3 (SHA-3) co-processor. These implementations are developed based on OpenCL optimization ...
Perform these steps to build the runtime: Clone and change to the runtime repository. Use GCC as the compiler instead of the system default (for Linux only). CC=gcc CXX=g++ cmake -G Ninja ..
Run Stable Diffusion on RK3588's Mali GPU with MLC/TVM. Currently generate a 512x512 image costs about 500 seconds (including model loading and GPU kernel compilation time. The actual inference time ...