Gradient Descent and its Variants

  • Find the \(3^{rd}\) lecture notes linked: CS 419 Lecture Notes 3
  • Last class we covered evaluation function and linear regression. However, there’s efficient algorithms to reach the optimal paramters.
  • Gradient Descent: Uses the “slope” of the function to reach the optimal value. That is, simply flow along the graident until we reach the minimum. The scalar multiplying the gradient in the algorithm is called \(\textbf{Learning Rate}\), denoted by \(\eta\).
  • The variants of GD, use learning rate as a function of progress of the algorithm. Initially, start with high learning rates and then as the algorithm gets closer and closer to the optimal point, the rate is lowered.
  • Stochastic GD: uses noise based randomization of the iterates so that the movement towards optimal solution is not slugish. A jittery movement allows escape from “plateau regions” of the function.

MIDS Worskhop Lecture slides available

  • Find the lecture slides from the lectures of the MIDS workshop conducted last weekend linked: MIDS lectures