using Random
using ForwardDiff
using Distributions
using Statistics
Gradient Descent
Illustrated using Julia
An example of estimating linear regression beta coefficients via gradient descent, using Julia.
Generate Data
First, we generate some fake data
Random.seed!(0408)
#x data
= hcat(ones(1000), randn(1000, 3))
𝐗
#ground truth betas
= [.5, 1, 2, 3]
𝚩
#multiply data by betas
f₁(X) = X*𝚩
#make some error
= rand(Normal(0, .5), size(𝐗)[1])
ϵ
#generate y
= f₁(𝐗) + ϵ; y
Define a Loss Function
Mean squared error is the most straightforward
function mse_loss(X, y, b)
= X*b
ŷ
= mean((y .- ŷ).^2)
l
return l
end
mse_loss (generic function with 1 method)
Define a training function
This implements the gradient descent algorithm:
- initialize some random beta values
- initialize error as some very large number (the init value doesn’t really matter as long as it’s greater than the function’s
tol
parameter) - initialize the number of iterations (
iter
) at 0 - define a function
d()
to get the gradient of the loss function at a given set of betas - define a loop that updates the beta values by the learning rate * the gradients until convergence
function grad_descent(X, y; lr = .01, tol = .01, max_iter = 1_000, noisy = false)
#randomly initialize betas
= rand(size(X)[2])
β
#init error to something large
= 1e10
err
#initialize iterations at 0
= 0
iter
#define a function to get the gradient of the loss function at a given set of betas
d(b) = ForwardDiff.gradient(params -> mse_loss(X, y, params), b)
while err > tol && iter < max_iter
-= lr*d(β)
β = mse_loss(X, y, β)
err if (noisy == true)
println("Iteration $(iter): current error is $(err)")
end
+= 1
iter end
return β
end
grad_descent (generic function with 1 method)
Estimate βs
To estimate the betas, we just run the function
= grad_descent(𝐗, y) b
4-element Vector{Float64}:
0.5220524143318362
0.992503536801155
1.9951668587882012
2.997961983119764
Check Solution Against Base Julia Solver
\ y .≈ b 𝐗
4-element BitVector:
1
1
1
1
huzzah!