Rationale¶
Traditional, matrix multiplication based layers suffer from a few issues.
They have high inference and computational costs due to the use of matrix multiplications. This hinders the speed at which inference is performed on GPU-less machines.
The memory use for storing full precision weights is very high.
The energy costs of running matrix multiplications is very high.
Matrix multiplication free layers addresses these pain points by removing the key source of costs — matrix multiplications.