terminology - Is Tikhonov regularization the same as Ridge Regression?

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

帅气的豌豆 · 国新办举行加快推进知识产权强国建设、有效支撑 ...· 9 月前 ·

谈吐大方的爆米花 · 依次写入数据到多个txt文件 ...· 10 月前 ·

欢快的伤痕 · Python中三维变二维矩阵（用reshap ...· 1 年前 ·

风度翩翩的创口贴 · 最新趨勢觀測站 - 绿嬑 ...· 2 年前 ·

怕老婆的瀑布 · 第八话 - 勇者框架：起源 - 包子漫画· 2 年前 ·

Stack Exchange network consists of 182 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers. Visit Stack Exchange

Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. It only takes a minute to sign up.

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams Hi @JungyeonKim Including your comment here: mdav.ece.gatech.edu/ece-6254-spring2017/notes/… In the context of regression, Tikhonov regularization has a special name: ridge regression, which is essentially exactly what we have been talking about, but in the special case where we are penalizing all coefficients in equally except for the offset. – Carl May 28, 2021 at 9:51 @JungyeonKim What you wrote is OK but not exact. Ridge regression is a specific case of Tikhonov regularization with the added twist of conversion into a correlation matrix, which has the advantage of allowing comparisons between models of the magnitude of the smoothing coefficient, but is not a necessary step. Moreover, there are, in the notes you linked to, additional assumptions made that are not necessary concerning variance, in fact, they are neither unique nor always desirable. A ridge target can be a minimum error of any desired model parameter, not just minimum fit error. – Carl May 28, 2021 at 10:04

Tikhonov regularizarization is a larger set than ridge regression. Here is my attempt to spell out exactly how they differ.

Suppose that for a known matrix $A$ and vector $b$, we wish to find a vector $\mathbf{x}$ such that

$A\mathbf{x}=\mathbf{b}$.

The standard approach is ordinary least squares linear regression. However, if no $x$ satisfies the equation or more than one $x$ does—that is the solution is not unique—the problem is said to be ill-posed. Ordinary least squares seeks to minimize the sum of squared residuals, which can be compactly written as:

$\|A\mathbf{x}-\mathbf{b}\|^2 $

where $\left \| \cdot \right \|$ is the Euclidean norm. In matrix notation the solution, denoted by $\hat{x}$, is given by:

$\hat{x} = (A^{T}A)^{-1}A^{T}\mathbf{b}$

Tikhonov regularization minimizes

$\|A\mathbf{x}-\mathbf{b}\|^2+ \|\Gamma \mathbf{x}\|^2$

for some suitably chosen Tikhonov matrix, $\Gamma $. An explicit matrix form solution, denoted by $\hat{x}$, is given by:

$\hat{x} = (A^{T}A+ \Gamma^{T} \Gamma )^{-1}A^{T}{b}$

The effect of regularization may be varied via the scale of matrix $\Gamma$. For $\Gamma = 0$ this reduces to the unregularized least squares solution provided that (A ^T A) ⁻¹ exists.

Typically for ridge regression , two departures from Tikhonov regularization are described. First, the Tikhonov matrix is replaced by a multiple of the identity matrix

$\Gamma= \alpha I $,

giving preference to solutions with smaller norm, i.e., the $L_2$ norm. Then $\Gamma^{T} \Gamma$ becomes $\alpha^2 I$ leading to

$\hat{x} = (A^{T}A+ \alpha^2 I )^{-1}A^{T}{b}$

Finally, for ridge regression, it is typically assumed that $A$ variables are scaled so that $X^{T}X$ has the form of a correlation matrix. and $X^{T}b$ is the correlation vector between the $x$ variables and $b$, leading to

$\hat{x} = (X^{T}X+ \alpha^2 I )^{-1}X^{T}{b}$

Note in this form the Lagrange multiplier $\alpha^2$ is usually replaced by $k$, $\lambda$, or some other symbol but retains the property $\lambda\geq0$

In formulating this answer, I acknowledge borrowing liberally from Wikipedia and from Ridge estimation of transfer function weights

(+1) For completeness, it is worth mentioning that in practical application the regularized system would typically be written in the form $\begin{bmatrix}A\\ \alpha \Gamma\\ \end{bmatrix}x\approx\begin{bmatrix}b\\0\\ \end{bmatrix}\implies \hat{A}x\approx \hat{b}$, which can then be solved as a standard linear least squares problem (e.g. via QR/SVD on $\hat{A}$, without explicitly forming the normal equations). – GeoMatt22 Sep 14, 2016 at 6:28 Are smoothing splines and similar basis expansion methods a subset of Tikhonov regularization? – Sycorax - On Strike Nov 22, 2016 at 15:56 @Sycorax I do not expect so. For example, a B-spline would set derivatives at zero at endpoints, and match derivatives and magnitudes of spline to data in between endpoints. Tikhonov regularization will minimize whatever parameter error you tell it to by changing slope of fit. So, different things. – Carl Nov 22, 2016 at 16:05 Also, Tychonov regularization has a formulation in arbitrary dimensions for (separable?) Hilbert spaces – ABIM Aug 21, 2018 at 18:07

Carl has given a thorough answer that nicely explains the mathematical differences between Tikhonov regularization vs. ridge regression. Inspired by the historical discussion here , I thought it might be useful to add a short example demonstrating how the more general Tikhonov framework can be useful.

First a brief note on context. Ridge regression arose in statistics, and while regularization is now widespread in statistics & machine learning, Tikhonov's approach was originally motivated by inverse problems arising in model-based data assimilation (particularly in geophysics ). The simplified example below is in this category (more complex versions are used for paleoclimate reconstructions ).

Imagine we want to reconstruct temperatures $u[x,t=0]$ in the past, based on present-day measurements $u[x,t=T]$. In our simplified model we will assume that temperature evolves according to the heat equation $$ u_t = u_{xx} $$ in 1D with periodic boundary conditions $$ u[x+L,t] = u[x,t] $$ A simple (explicit) finite difference approach leads to the discrete model $$ \frac{\Delta\mathbf{u}}{\Delta{t}} = \frac{\mathbf{Lu}}{\Delta{x^2}} \implies \mathbf{u}_{t+1} = \mathbf{Au}_t $$ Mathematically, the evolution matrix $\mathbf{A}$ is invertible, so we have $$\mathbf{u}_t = \mathbf{A^{-1}u}_{t+1} $$ However numerically , difficulties will arise if the time interval $T$ is too long.

Tikhonov regularization can solve this problem by solving \begin{align} \mathbf{Au}_t &\approx \mathbf{u}_{t+1} \\ \omega\mathbf{Lu}_t &\approx \mathbf{0} \end{align} which adds a small penalty $\omega^2\ll{1}$ on roughness $u_{xx}$.

Below is a comparison of the results:

We can see that the original temperature $u_0$ has a smooth profile, which is smoothed still further by diffusion to give $u_\mathsf{fwd}$. Direct inversion fails to recover $u_0$, and the solution $u_\mathsf{inv}$ shows strong "checkerboarding" artifacts. However the Tikhonov solution $u_\mathsf{reg}$ is able to recover $u_0$ with quite good accuracy.

Note that in this example, ridge regression would always push our solution towards an "ice age" (i.e. uniform zero temperatures). Tikhonov regression allows us a more flexible physically -based prior constraint: Here our penalty essentially says the reconstruction $\mathbf{u}$ should be only slowly evolving, i.e. $u_t\approx{0}$.

Matlab code for the example is below (can be run online here ).

% Tikhonov Regularization Example: Inverse Heat Equation
n=15; t=2e1; w=1e-2; % grid size, # time steps, regularization
L=toeplitz(sparse([-2,1,zeros(1,n-3),1]/2)); % laplacian (periodic BCs)
A=(speye(n)+L)^t; % forward operator (diffusion)
x=(0:n-1)'; u0=sin(2*pi*x/n); % initial condition (periodic & smooth)
ufwd=A*u0; % forward model
uinv=A\ufwd; % inverse model
ureg=[A;w*L]\[ufwd;zeros(n,1)]; % regularized inverse
plot(x,u0,'k.-',x,ufwd,'k:',x,uinv,'r.:',x,ureg,'ro');
set(legend('u_0','u_{fwd}','u_{inv}','u_{reg}'),'box','off');
                $\begingroup$
                All compliments warmly received. It is worthwhile mentioning, even if slightly off topic, that both Tikhonov regularization and ridge regression can be used for targeting physical regression targets. (+1)
                $\endgroup$
– Carl
                Dec 10, 2016 at 23:51
                $\begingroup$
                @Carl this is certainly true. We could even use it here, by switching variables to $v=Lu$! (In general, any Tikhonov problem with an invertible Tikhonov matrix can be converted to ridge regression.)
                $\endgroup$
– GeoMatt22
                Dec 11, 2016 at 5:25
        Thanks for contributing an answer to Cross Validated!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.

Stack Exchange Network