Quantization can make your deep learning models smaller, faster, and more energy-efficient (I’ve written about this previously).
But the process may cause high accuracy loss or may not improve prediction speed if done incorrectly. So here, I’m sharing some practical tips to minimize accuracy loss while maintaining good inference speed. These points are valid for both post-training quantization and quantization-aware training.
Continue reading “Practical tips for better quantization results”