Stack Exchange Network
Stack Exchange network consists of 182 Q&A communities including
Stack Overflow
, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
Visit Stack Exchange
Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. It only takes a minute to sign up.
Sign up to join this community
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
$\begingroup$
$\endgroup$
–
$\begingroup$
$\endgroup$
–
$\begingroup$
Tikhonov regularizarization is a larger set than ridge regression. Here is my attempt to spell out exactly how they differ.
Suppose that for a known matrix $A$ and vector $b$, we wish to find a vector $\mathbf{x}$ such that
$A\mathbf{x}=\mathbf{b}$.
The standard approach is ordinary least squares linear regression. However, if no $x$ satisfies the equation or more than one $x$ does—that is the solution is not unique—the problem is said to be ill-posed. Ordinary least squares seeks to minimize the sum of squared residuals, which can be compactly written as:
$\|A\mathbf{x}-\mathbf{b}\|^2 $
where $\left \| \cdot \right \|$ is the Euclidean norm. In matrix notation the solution, denoted by $\hat{x}$, is given by:
$\hat{x} = (A^{T}A)^{-1}A^{T}\mathbf{b}$
Tikhonov regularization
minimizes
$\|A\mathbf{x}-\mathbf{b}\|^2+ \|\Gamma \mathbf{x}\|^2$
for some suitably chosen Tikhonov matrix, $\Gamma $. An explicit matrix form solution, denoted by $\hat{x}$, is given by:
$\hat{x} = (A^{T}A+ \Gamma^{T} \Gamma )^{-1}A^{T}{b}$
The effect of regularization may be varied via the scale of matrix $\Gamma$. For $\Gamma = 0$ this reduces to the unregularized least squares solution provided that (A
T
A)
−1
exists.
Typically for
ridge regression
, two departures from Tikhonov regularization are described. First, the Tikhonov matrix is replaced by a multiple of the identity matrix
$\Gamma= \alpha I $,
giving preference to solutions with smaller norm, i.e., the $L_2$ norm. Then $\Gamma^{T} \Gamma$ becomes $\alpha^2 I$ leading to
$\hat{x} = (A^{T}A+ \alpha^2 I )^{-1}A^{T}{b}$
Finally, for ridge regression, it is typically assumed that $A$ variables are scaled so that $X^{T}X$ has the form of a correlation matrix. and $X^{T}b$ is the correlation vector between the $x$ variables and $b$, leading to
$\hat{x} = (X^{T}X+ \alpha^2 I )^{-1}X^{T}{b}$
Note in this form the Lagrange multiplier $\alpha^2$ is usually replaced by $k$, $\lambda$, or some other symbol but retains the property $\lambda\geq0$
In formulating this answer, I acknowledge borrowing liberally from
Wikipedia
and from
Ridge estimation of transfer function weights
$\begingroup$
$\endgroup$
–
$\begingroup$
$\endgroup$
–
$\begingroup$
$\endgroup$
–
$\begingroup$
$\endgroup$
–
$\begingroup$
Carl has given a thorough answer that nicely explains the mathematical differences between Tikhonov regularization vs. ridge regression. Inspired by the historical discussion
here
, I thought it might be useful to add a short example demonstrating how the more general Tikhonov framework can be useful.
First a brief note on context.
Ridge regression
arose in statistics, and while
regularization
is now widespread in statistics & machine learning, Tikhonov's approach was originally motivated by
inverse problems
arising in model-based
data assimilation
(particularly in
geophysics
). The simplified example below is in this category (more complex versions are used for
paleoclimate reconstructions
).
Imagine we want to reconstruct temperatures $u[x,t=0]$ in the past, based on present-day measurements $u[x,t=T]$. In our simplified model we will assume that temperature evolves according to the
heat equation
$$ u_t = u_{xx} $$
in 1D with periodic boundary conditions
$$ u[x+L,t] = u[x,t] $$
A simple (explicit)
finite difference
approach leads to the discrete model
$$ \frac{\Delta\mathbf{u}}{\Delta{t}} = \frac{\mathbf{Lu}}{\Delta{x^2}} \implies \mathbf{u}_{t+1} = \mathbf{Au}_t $$
Mathematically, the evolution matrix $\mathbf{A}$ is invertible, so we have
$$\mathbf{u}_t = \mathbf{A^{-1}u}_{t+1} $$
However
numerically
, difficulties will arise if the time interval $T$ is too long.
Tikhonov regularization can solve this problem by solving
\begin{align} \mathbf{Au}_t &\approx \mathbf{u}_{t+1} \\
\omega\mathbf{Lu}_t &\approx \mathbf{0}
\end{align}
which adds a small penalty $\omega^2\ll{1}$ on roughness $u_{xx}$.
Below is a comparison of the results:
We can see that the original temperature $u_0$ has a smooth profile, which is smoothed still further by diffusion to give $u_\mathsf{fwd}$. Direct inversion fails to recover $u_0$, and the solution $u_\mathsf{inv}$ shows strong
"checkerboarding"
artifacts. However the Tikhonov solution $u_\mathsf{reg}$ is able to recover $u_0$ with quite good accuracy.
Note that in this example, ridge regression would always push our solution towards an "ice age" (i.e. uniform zero temperatures). Tikhonov regression allows us a more flexible
physically
-based prior constraint: Here our penalty essentially says the reconstruction $\mathbf{u}$ should be only slowly evolving, i.e. $u_t\approx{0}$.
Matlab code for the example is below (can be run online
here
).
% Tikhonov Regularization Example: Inverse Heat Equation
n=15; t=2e1; w=1e-2; % grid size, # time steps, regularization
L=toeplitz(sparse([-2,1,zeros(1,n-3),1]/2)); % laplacian (periodic BCs)
A=(speye(n)+L)^t; % forward operator (diffusion)
x=(0:n-1)'; u0=sin(2*pi*x/n); % initial condition (periodic & smooth)
ufwd=A*u0; % forward model
uinv=A\ufwd; % inverse model
ureg=[A;w*L]\[ufwd;zeros(n,1)]; % regularized inverse
plot(x,u0,'k.-',x,ufwd,'k:',x,uinv,'r.:',x,ureg,'ro');
set(legend('u_0','u_{fwd}','u_{inv}','u_{reg}'),'box','off');
$\begingroup$
$\endgroup$
–
$\begingroup$
$\endgroup$
–
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.