Abstract: This paper describes an in-memory computing architecture that combines full-precision computation for the first and last layers of a neural network while employing binary weights and input ...