Skip to content

Logic error in SGD: Incorrect epoch definition and gradient scaling #34

@harshgupta-23

Description

@harshgupta-23

The current stochastic_gradient_descent implementation contains two fundamental math and logic errors:
1. Epoch Definition: The loop for i in range(epochs): treats a single sample as one epoch. For a dataset of size N, one epoch must consist of N iterations (updates).
2. Gradient Scaling: The code uses (2/total_samples) for the gradient. This is the formula for Batch GD. In SGD, the gradient is calculated for a single point, so it should not be divided by the total number of samples.

Suggested Fix

    # Corrected loop to cover all samples per epoch
    for i in range(epochs * total_samples):    
        random_index = random.randint(0, total_samples - 1)
        # ... code ...
    
        # Correct SGD gradient (remove 1/total_samples)
        w_grad = -2 * (sample_x.T * (sample_y - y_predicted))
        b_grad = -2 * (sample_y - y_predicted)

Please reply :).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions