Empirical comparison of MSE vs Cross-Entropy on MNIST. CE reaches 97% accuracy in 4 epochs vs 13 for MSE - gradient norm analysis confirms the saturation mechanism.
deep-learning pytorch mnist classification mlp loss-functions cross-entropy mean-squared-error gradient-analysis gradient-saturation
-
Updated
Jun 15, 2026 - Python