Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
7dc2968
Add dataset utilities and tests
EtonT471 Mar 2, 2026
47fc954
Adding dataset_tests.jl
EtonT471 Mar 3, 2026
787a88d
Small changes to design.md
EtonT471 Mar 3, 2026
2dd2295
March 16 Updates
EtonT471 Mar 17, 2026
173cdd1
Ridge Regression file
EtonT471 Mar 17, 2026
042d5d6
dataset.jl small update
EtonT471 Mar 17, 2026
70cfa67
Updated Experimental Units and Treatments Sections
EtonT471 Mar 17, 2026
9b8c48c
Small changes
EtonT471 Mar 17, 2026
6c923ab
Bidiagonalization Stuff
EtonT471 Mar 17, 2026
6727a90
Adding Linear Algebra to Project.toml
EtonT471 Mar 17, 2026
358e881
Design Branch
EtonT471 Mar 17, 2026
2ad65a9
Updated design 3/19
EtonT471 Mar 19, 2026
fc2595f
3/19 edits
EtonT471 Mar 19, 2026
788665c
Update experimental design document
EtonT471 Mar 24, 2026
698aff7
Remove code files from design branch
EtonT471 Mar 24, 2026
f904351
Compiling issues
EtonT471 Mar 24, 2026
49c8a81
Attempt 1
EtonT471 Mar 24, 2026
8529e34
Minor adjustment
EtonT471 Mar 24, 2026
3734a46
restoring .jl files
EtonT471 Mar 24, 2026
48edb29
moving to src
EtonT471 Mar 24, 2026
8c2e5bc
Put source file back into the folder.
vp314 Mar 24, 2026
8192f16
Delete docs/src/RidgeRegression.jl
vp314 Mar 24, 2026
8a6525e
Source folder re-added
EtonT471 Mar 24, 2026
dc10590
Fixed math notation
vp314 Mar 24, 2026
6b893ef
recent changes
EtonT471 Apr 7, 2026
a605e40
Updated design.md
EtonT471 Apr 7, 2026
d9a63e7
Remove RidgeRegression.jl changes from Design branch
EtonT471 Apr 7, 2026
c29b983
fixing compiling
EtonT471 Apr 7, 2026
0ba580c
Fixing dollar signs
EtonT471 Apr 7, 2026
e080193
XtopX change
EtonT471 Apr 7, 2026
26a6198
Update 4/13
EtonT471 Apr 13, 2026
cf1e17e
4/14 Update
EtonT471 Apr 15, 2026
9ca4815
4/28 updates
EtonT471 Apr 28, 2026
2c641dd
Fixed compiling issue
EtonT471 Apr 28, 2026
06f1c06
More compiling fixed
EtonT471 Apr 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
name = "RidgeRegression"
uuid = "739161c8-60e1-4c49-8f89-ff30998444b1"
authors = ["Vivak Patel <vp314@users.noreply.github.com>"]
version = "0.1.0"
authors = ["Eton Tackett <etont@icloud.com>", "Vivak Patel <vp314@users.noreply.github.com>"]

[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Downloads = "f43a241f-c20a-4ad4-852c-f6b1247861c6"

[compat]
CSV = "0.10.15"
DataFrames = "1.8.1"
Downloads = "1.7.0"
julia = "1.12.4"
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ makedocs(;
),
pages=[
"Home" => "index.md",
"Design" => "design.md",
],
)

Expand Down
198 changes: 198 additions & 0 deletions docs/src/design.md
Comment thread
vp314 marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
# Motivation and Background
Many modern science problems involve regression problems with extremely large numbers of
predictors. Genome-wide association studies (GWAS), for example, try to identify genetic
variants associated with a disease phenotype using hundreds of thousands or millions of
genomic features. In such settings, traditional least squares methods fail. Penalized Least Squares (PLS) extends ordinary least squares (OLS)
regression by adding a penalty term to shrink parameter estimates. Ridge regression, an
approach within PLS, adds a regularization term, producing a regularized estimator.

Mathematically, ridge regression estimates the regression coefficients by solving the penalized least squares problem
```math
\hat{\boldsymbol{\beta}} =
\arg\min_{\boldsymbol{\beta}}
\left(
\| \mathbf{y} - X\boldsymbol{\beta} \|_2^2
+
\lambda \| \boldsymbol{\beta} \|_2^2
\right)
Comment on lines +11 to +17
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should indicate which norms you are using by using a subscript _2 for instance.

```
where $\lambda > 0$ is a regularization parameter that controls the strength of the penalty.

The purpose of ridge regression is to stabilize regression estimates when the predictors are
highly correlated or the design matrix $X$ is nearly singular. Ridge regression modifies the
least squares objective by adding a penalty on the squared $\ell_2$-norm of the coefficient
vector. The estimator is obtained by minimizing a penalized least squares objective in which
large coefficient values are discouraged through the penalty term $\lambda
\|\boldsymbol{\beta}\|_2^2$. This penalty shrinks the estimated coefficients toward the
origin, which reduces the variance of the estimator and mitigates the effects of
multicollinearity.

There are many numerical algorithms available to compute ridge regression estimates
including direct methods, Krylov subspace methods, gradient-based optimization, coordinate
descent, and stochastic gradient descent. These algorithms differ in their computational
costs and numerical stability.

The goal of this experiment is to investigate the performance of these algorithms when we
vary the structure and scale of the regression problem. To do this, we consider the linear
model $\mathbf{y} = X\boldsymbol{\beta} + \boldsymbol{\varepsilon}$ where the matrix ${X}$
may be constructed with varying dimensions, sparsity patterns, and conditioning properties.
# Questions
The primary goal of this experiment is to compare numerical algorithms for computing ridge
regression estimates under various conditions. In particular, we aim to address the
following questions:

1. How does the performance of ridge regression algorithms change as the structural and numerical properties of the regression problem vary?

2. Which ridge regression algorithm provides the best balance between numerical stability and computational cost across these problem regimes?

# Experimental Units
A block is defined as a collection of experimental units that share common characteristics
which may influence the response. Blocking is introduced to control for
variation and to enable meaningful comparisons of algorithm performance across
different problem regimes.

In this experiment, blocks are defined by combinations of three factors: dimensional
regime, matrix sparsity, and ridge penalty magnitude.

The experimental units are the datasets. Each dataset consists of a design matrix $X$ and
response vector $y$, with the regularization parameter $\lambda$ determined by the
blocking structure through the choice of $\epsilon$.

Datasets will be grouped according to their dimensional regime, characterized as $p \ll n$,
$p ≈ n$, and $p \gg n$. These regimes correspond to fundamentally different geometric
properties of the design matrix, including rank behavior, conditioning, and the stability of
the normal equations. All of which impact the performance of numerical algorithms.

In addition to dimensional block, the strength of the ridge penalty will be incorporated as
a secondary blocking factor. The ridge estimator is $\hat{\beta_R} = (X^\top X + \lambda
I)^{-1}X^\top y$. The matrix conditioning number is defined as $\kappa(A) =
\frac{\sigma_{\max}(A)}{\sigma_{\min}(A)}$. In the context of ridge regression, the
regularization parameter ${\lambda}$, can impact the conditioning number. Let $X = U\Sigma
V^\top$ be the SVD of $X$, with singular values $\sigma_1,\dots,\sigma_p$.

Then
```math
X^\top X = V \Sigma^\top \Sigma V^\top
= V \,\mathrm{diag}(\sigma_1^2,\dots,\sigma_p^2)\, V^\top .
```

Adding the ridge term gives

```math
X^\top X + \lambda I
=
V \,\mathrm{diag}(\sigma_1^2+\lambda,\dots,\sigma_p^2+\lambda)\, V^\top .
```

```math
\kappa(X^\top X+\lambda I)
=
\frac{\sigma_{\max}^2+\lambda}{\sigma_{\min}^2+\lambda}.
```

Because the performance of numerical algorithms is strongly influenced by the conditioning
of the system they solve, the ridge penalty effectively creates regression problems with
different numerical difficulty. This provides a way to assess how algorithm performance,
convergence behavior, and computational cost depend on the numerical stability of the
problem.
Recall that,
```math
\kappa(X^\top X + \lambda I)
=
\frac{\sigma_{\max}^2 + \lambda}{\sigma_{\min}^2 + \lambda},
```
Where $\sigma_{\min}$ and $\sigma_{\max}$ denote the smallest and largest
singular values of $X$. Given a target condition number of the form, $1+\epsilon$ with $\epsilon > 0$, we solve for $\lambda$ by setting
```math
\frac{\sigma_{\max}^2 + \lambda}{\sigma_{\min}^2 + \lambda} = 1+\epsilon,
```
Which implies
```math
\sigma_{\max}^2 + \lambda
=
(1+\epsilon)(\sigma_{\min}^2 + \lambda).
```
Rearranging gives
```math
\sigma_{\max}^2 - (1+\epsilon)\,\sigma_{\min}^2 = (\epsilon)\lambda,
```
and therefore
```math
\lambda = \frac{\sigma_{\max}^2 - (1+\epsilon)\,\sigma_{\min}^2}{\epsilon}.
```
In this experiment, the ridge penalty parameter is not chosen directly, but is instead
determined through a target level of conditioning. Because the ideal condition number
of 1 cannot be achieved in practice, we instead aim for values of the form $1 + \epsilon$,
where $\epsilon > 0$.

A strong regularization regime corresponds to small values of $\epsilon$, resulting
in a well-conditioned system with condition number close to 1. Moderate
regularization corresponds to intermediate values of $\epsilon$, where the condition
number is reduced but not minimal. Weak regularization corresponds to large values
of $\epsilon$, where the system remains relatively ill-conditioned and close to the
unregularized case.

If $X$ is rank deficient, then $\sigma _{min}=0$, then
```math
\lambda =\frac{\sigma _{max}^2}{\epsilon}
```
So our method is robust in the case where $X$ is rank deficient.

Another blocking factor that will be considered is how sparse or dense the matrix $X$ is.
Many algorithms behave differently depending on whether the matrix is sparse or dense. In
ridge regression, there are many operations involving $X$ including matrix-matrix products
and matrix-vector products. A dense matrix leads to high computational cost whereas a sparse
matrix we can significantly reduce the cost. As such, different algorithms may perform
better depending on the sparsity structure of X, making matrix sparsity a relevant blocking
factor when comparing algorithm behavior and computational efficiency.

The total number of block combinations is determined by the product of the number of levels
in each blocking factor, denoted b. For example, if the experiment includes three
dimensional regimes, two sparsity levels, and two regularization strengths, then there are
$3 * 2 * 2 = 12$ block combinations. We will also denote r to be the number of replicated
datasets in each block. Here, we mean the number datasets within a block. The total number
of experimental units is then ${b * r}$.


| Blocking System | Factor | Blocks |
|:----------------|:-------|:-------|
| Dataset | Dimensional regime | $(p \ll n)$, $(p \approx n)$, $(p \gg n)$|
| Ridge Penalty | Magnitude of ${\lambda}$ relative to the spectral scale of $X^\top X$ | Strong: $\kappa \approx 10$, Moderate: $\kappa \approx 10^3$, Weak: $\kappa \approx 10^6$, with $\lambda$ computed from the corresponding $\epsilon$. |
| Matrix Sparsity| Density of non-zero values in $X$ | Sparse (< 10% non-zero), Moderate (10%-50% non-zero), Dense (> 50% non-zero)|
# Treatments

The treatments are the ridge regression solution methods:

- Gradient-based optimization
- Stochastic gradient descent
- Direct Methods
- Golub Kahan Bidiagonalization

Since each experimental unit will recieves all t treatments, the total number of algorithm
runs in the experiment is ${t * b * r}$. For this experiment, ${t=3}$. To ensure fair
comparison between algorithms, each treatment will be applied under a fixed wall time
constraint. Each algorithm will be run for a maximum of two hours per experimental unit.
# Observational Units and Measurements

The observational units are each algorithm-dataset pair. For each combination we will observe the following

| Column Name | Data Type | Description |
|:---|:---|:---|
| `dataset_id` | Positive Integer | Identifier for the generated dataset (experimental unit). |
| `dimensional_regime` | String | Relationship between predictors and observations: `p << n`, `p ≈ n`, or `p >> n`. |
| `sparsity_level` | String | Density of the matrix `X`: `Sparse`, `Moderate`, or `Dense`. |
| `lambda_level` | String | Relative magnitude of the ridge penalty parameter `λ`: `Weak`, `Moderate`, or `Strong`. |
| `algorithm` | String | Ridge regression solution method used: `GradientDescent`, `SGD`, or `DirectMethod`. |
| `runtime_seconds` | Positive Floating-point | Time required for the algorithm to compute a solution. |
| `iterations` | Positive Integer | Number of iterations performed by the algorithm (`NA` for direct methods). |
| `step_size` | Positive Floating-point | Step size used in gradient descent or SGD (`NA` for direct methods). |
| `batch_size` | Positive Integer | Number of samples used per SGD update (`NA` for direct methods and gradient descent). |
| `number_of_epochs` | Positive Integer | Number epochs per observation (`NA` for direct methods). |



The collected measurements will be written to a CSV file. Each row in the file corresponds
to a single algorithm–dataset pair, which forms the observational unit of the experiment.
The columns represent the recorded measurements. After the experiment, the resulting CSV
file should contain ${Algorithms∗Datasets}$ number of rows and each row will contain exactly
10 columns.
3 changes: 3 additions & 0 deletions test/Project.toml
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Loading