const + a*x
const +a*10*x
const +a*100*x
const +a*1000*x
and so on. I would expect that the value of a and its error should be the same up to the fact I have rescaled it by hand the best fit a.
For some rescaling the fit can give negative diagonal covariance matrix elements(!) and the results may vary a lot doing these rescalings. At the moment I am forced to find myself which units make a meaningful covariance ... (not very nice)
So I wanted to ask how to avoid this issue with curve_fit.
If you want to reproduce the issue try this data and this fit below
y_dy_x1_x2=np.array([[3.595, 8740.28, 6391.76, 13958.5, 20597.7, 3.6539, 3.73653, 3.68226, 3.68414],\
[0.113684, 276.392, 202.125, 441.406, 651.356, 0.115547, 0.118159, \
0.116443, 0.116503],\
[0., -15.6627, 15.6627, -15.6627, 15.6627, -0.00156627, 0.00156627, \
-0.00156627, 0.00156627],\
[0., 9.4183, -9.4183, -9.4183, 9.4183, 0.00094183, -0.00094183, \
-0.00094183, 0.00094183]]).T
with this model fit function
def _fit(x,const,linB,linW,quadB,quadW,mix):
cB, cW = x[:,0], x[:,1]
return const+\
np.multiply(cB,linB)+np.multiply(cW,linW)\
+np.multiply(quadW,cW**2)+np.multiply(cB**2,quadB)\
+np.multiply(mix,cB*cW)
define a wide range of rescaled inputs values
scales = [ 1e-6, 1e-4, 1e-2, 1, 1e2, 1e4, 1e6]
_x = { scale: scale*y_dy_x1_x2[:,2:4] for scale in scales } # the variables times some constant
and use them to fit
_y = y_dy_x1_x2[:,0] # the response
_weights = y_dy_x1_x2[:,1] # the error on the response
curve_fit( _fit , _x[1e6] , _y, sigma=_weights, absolute_sigma=True)
then compare with
curve_fit( _fit , _x[1e2] , _y, sigma=_weights, absolute_sigma=True)
curve_fit( _fit , _x[1e-4] , _y, sigma=_weights, absolute_sigma=True)
The covariance should be simply be rescaled according the choice of scale but it does not happen at all! Even negative diagonal elements appear !!!
I'm not a scipy maintainer, but this mostly looks good on my end (especially considering the small data set and the fact that pcov is an estimate here). What are you running into exactly?
Using your data and
for scale in [1e0, 1e1, 1e2]:
_x = scale * y_dy_x1_x2[:, 2:4]
_y = y_dy_x1_x2[:, 0] # the response
_weights = y_dy_x1_x2[:, 1] # the error on the response
np.set_printoptions(formatter={'float': lambda x: f"{x:0.3E}"})
print(f"=== scale={scale} ===")
print(curve_fit(_fit, _x, _y, sigma=_weights, absolute_sigma=True)[1])
print()
I get
=== scale=1.0 ===
[[2.693E-03 8.679E-04 -1.192E-03 2.438E+04 -6.743E+04 2.689E-05]
[8.679E-04 1.571E+02 1.659E+02 -8.106E+04 2.242E+05 5.975E+00]
[-1.192E-03 1.659E+02 4.344E+02 -1.573E+05 4.349E+05 6.870E+00]
[2.438E+04 -8.108E+04 -1.573E+05 2.355E+15 -6.514E+15 -1.678E+04]
[-6.743E+04 2.242E+05 4.351E+05 -6.514E+15 1.801E+16 4.640E+04]
[2.689E-05 5.975E+00 6.870E+00 -1.678E+04 4.641E+04 2.068E+00]]
=== scale=10.0 ===
[[2.693E-03 8.691E-05 -1.191E-04 -3.080E+01 8.519E+01 2.711E-07]
[8.691E-05 1.571E+00 1.659E+00 -4.913E+01 1.359E+02 5.975E-03]
[-1.191E-04 1.659E+00 4.344E+00 -6.922E+01 1.915E+02 6.871E-03]
[-3.080E+01 -4.914E+01 -6.924E+01 2.875E+11 -7.951E+11 -3.356E-01]
[8.519E+01 1.359E+02 1.915E+02 -7.951E+11 2.199E+12 9.283E-01]
[2.711E-07 5.975E-03 6.871E-03 -3.351E-01 9.270E-01 2.068E-04]]
=== scale=100.0 ===
[[2.693E-03 8.746E-06 -1.182E-05 4.216E-03 -1.166E-02 2.643E-09]
[8.746E-06 1.572E-02 1.660E-02 -6.901E-04 1.915E-03 5.976E-06]
[-1.182E-05 1.660E-02 4.343E-02 -1.094E-03 3.043E-03 6.870E-06]
[4.216E-03 -7.017E-04 -1.093E-03 -8.774E+06 2.427E+07 -5.295E-07]
[-1.166E-02 1.947E-03 3.040E-03 2.427E+07 -6.711E+07 1.488E-06]
[2.643E-09 5.976E-06 6.870E-06 -5.234E-07 1.471E-06 2.068E-08]]
The covariance should be exactly the same up to a factor scale in some elements due to the coefficient being rescaled by scale^0, scale or scale**2. This is not happening.
Furthermore there are negative diagonal covariance matrix elements, e.g. your [3,3] and [4,4] for scale 100. How does this happens? and how comes this is this not flagged by a warning ???
The model you're fitting is linear, so we don't really need curve fit. Let's look at the design matrix.
In [1]: import numpy as np
In [2]: from scipy import optimize
In [3]: np.set_printoptions(linewidth=150)
In [4]: y_dy_x1_x2=np.array([[3.595, 8740.28, 6391.76, 13958.5, 20597.7, 3.6539, 3.73653, 3.68226, 3.68414],\
...: [0.113684, 276.392, 202.125, 441.406, 651.356, 0.115547, 0.118159, \
...: 0.116443, 0.116503],\
...: [0., -15.6627, 15.6627, -15.6627, 15.6627, -0.00156627, 0.00156627, \
...: -0.00156627, 0.00156627],\
...: [0., 9.4183, -9.4183, -9.4183, 9.4183, 0.00094183, -0.00094183, \
...: -0.00094183, 0.00094183]]).T
In [5]: _x = y_dy_x1_x2[:,2:4]
In [6]: X = np.zeros((9,6))
In [7]: X[:,0] = 1
In [8]: X[:,1] = _x[:,0]
In [9]: X[:,2] = _x[:,1]
In [10]: X[:,3] = _x[:,0]**2
In [11]: X[:,4] = _x[:,1]**2
In [12]: X[:,5] = _x[:,0] * _x[:,1]
In [13]: X
Out[13]:
array([[ 1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ 1.00000000e+00, -1.56627000e+01, 9.41830000e+00, 2.45320171e+02, 8.87043749e+01, -1.47516007e+02],
[ 1.00000000e+00, 1.56627000e+01, -9.41830000e+00, 2.45320171e+02, 8.87043749e+01, -1.47516007e+02],
[ 1.00000000e+00, -1.56627000e+01, -9.41830000e+00, 2.45320171e+02, 8.87043749e+01, 1.47516007e+02],
[ 1.00000000e+00, 1.56627000e+01, 9.41830000e+00, 2.45320171e+02, 8.87043749e+01, 1.47516007e+02],
[ 1.00000000e+00, -1.56627000e-03, 9.41830000e-04, 2.45320171e-06, 8.87043749e-07, -1.47516007e-06],
[ 1.00000000e+00, 1.56627000e-03, -9.41830000e-04, 2.45320171e-06, 8.87043749e-07, -1.47516007e-06],
[ 1.00000000e+00, -1.56627000e-03, -9.41830000e-04, 2.45320171e-06, 8.87043749e-07, 1.47516007e-06],
[ 1.00000000e+00, 1.56627000e-03, 9.41830000e-04, 2.45320171e-06, 8.87043749e-07, 1.47516007e-06]])
In [14]: u,s,v = np.linalg.svd(X, full_matrices=False)
In [15]:
Out[17]: array([5.21733467e+02, 2.95032015e+02, 3.13254002e+01, 1.88366001e+01, 2.23605153e+00, 1.41986916e-15])
From looking at the singular values, its clear that your X matrix is singular or near singular. This is why you are seeing the behavior you see. This would be an issue with computing the covariance matrix even if you did it without curve_fit.
Let's run your scaling test on curve_fit with a full rank linear fit.
In [1]: import numpy as np
In [2]: from scipy import optimize as opt
In [3]: true_p = np.array([3.0, -4.0, 2.0, -6.])
In [4]: x = np.linspace(0, 1, 9)
In [5]: fit = lambda x, a, b, c, d: a*x**3 + b*x**2 + c*x + d
In [6]: y = fit(x, *true_p) + np.random.randn(9)/4
In [7]: a1 = opt.curve_fit(fit, x, y)
In [8]: a10 = opt.curve_fit(fit, 10*x, y)
In [9]: a100 = opt.curve_fit(fit, 100*x, y)
In [10]: a1000 = opt.curve_fit(fit, 1000*x, y)
In [11]: np.set_printoptions(linewidth=150)
In [12]: a1
Out[12]:
(array([ 5.1934766 , -7.62608245, 3.25786124, -5.80878906]),
array([[ 2.52589366, -3.78884053, 1.42870866, -0.08288089],
[-3.78884053, 5.86593714, -2.32573929, 0.15096163],
[ 1.42870866, -2.32573929, 1.0054418 , -0.08084588],
[-0.08288089, 0.15096163, -0.08084588, 0.01179388]]))
In [13]: a10
Out[13]:
(array([ 5.19347673e-03, -7.62608263e-02, 3.25786128e-01, -5.80878906e+00]),
array([[ 2.52589341e-06, -3.78884010e-05, 1.42870845e-04, -8.28808860e-05],
[-3.78884010e-05, 5.86593636e-04, -2.32573892e-03, 1.50961617e-03],
[ 1.42870845e-04, -2.32573892e-03, 1.00544163e-02, -8.08458684e-03],
[-8.28808860e-05, 1.50961617e-03, -8.08458684e-03, 1.17938775e-02]]))
In [14]: a100
Out[14]:
(array([ 5.19347719e-06, -7.62608335e-04, 3.25786158e-02, -5.80878908e+00]),
array([[ 2.52589439e-12, -3.78884155e-10, 1.42870899e-08, -8.28809112e-08],
[-3.78884155e-10, 5.86593851e-08, -2.32573972e-06, 1.50961653e-05],
[ 1.42870899e-08, -2.32573972e-06, 1.00544192e-04, -8.08458810e-04],
[-8.28809112e-08, 1.50961653e-05, -8.08458810e-04, 1.17938779e-02]]))
In [15]: a1000
Out[15]:
(array([ 5.19347670e-09, -7.62608261e-06, 3.25786130e-03, -5.80878906e+00]),
array([[ 2.52589400e-18, -3.78884100e-15, 1.42870881e-12, -8.28808920e-11],
[-3.78884100e-15, 5.86593776e-12, -2.32573947e-09, 1.50961625e-07],
[ 1.42870881e-12, -2.32573947e-09, 1.00544185e-06, -8.08458714e-05],
[-8.28808920e-11, 1.50961625e-07, -8.08458714e-05, 1.17938768e-02]]))
So as long as we are providing a non-singular fit, curve_fit produces the correct scaling behavior. This isn't really a deficiency in curve_fit.
Thanks for looking at the issue. Ok, agreed, for the data you provided and the function you made up for it curve_fit is doing ok, but this was not the point. I am flagging an issue, that is a case in which it does not work and does not raise a warning. I know that in most cases curve_fit works, I am not questioning that, I am asking how to fix the case in which it does not work (maybe for good reasons) and does not raise a warning (if there is a good reason for that).
This warning does exist when the Jacobian does not have full rank. With your data and removing the last parameter of the fit:
def _fit(x, const, linB, linW, quadB, quadW):
cB, cW = x[:, 0], x[:, 1]
return const + \
np.multiply(cB, linB) + np.multiply(cW, linW) \
+ np.multiply(quadW, cW ** 2) + np.multiply(cB ** 2, quadB)
scale = 100
_x = scale * y_dy_x1_x2[:, 2:4]
_y = y_dy_x1_x2[:, 0] # the response
_weights = y_dy_x1_x2[:, 1] # the error on the response
popt, pcov = curve_fit(_fit, _x, _y, sigma=_weights, absolute_sigma=True)
print(pcov)
I get
/scipy/optimize/minpack.py:829: OptimizeWarning: Covariance of the parameters could not be estimated
category=OptimizeWarning)
[[inf inf inf inf inf]
[inf inf inf inf inf]
[inf inf inf inf inf]
[inf inf inf inf inf]
[inf inf inf inf inf]]
Perhaps this check fails due to numerical imprecision in your example? In any case, I think your data is very low rank. I noticed that when scale = 1, your _x matrix is
[[ 0.00000e+00 0.00000e+00]
[-1.56627e+03 9.41830e+02]
[ 1.56627e+03 -9.41830e+02]
[-1.56627e+03 -9.41830e+02]
[ 1.56627e+03 9.41830e+02]
[-1.56627e-01 9.41830e-02]
[ 1.56627e-01 -9.41830e-02]
[-1.56627e-01 -9.41830e-02]
[ 1.56627e-01 9.41830e-02]]
so I feel like fitting your 6-parameter model here is pretty underdetermined. Do you run into this issue on larger datasets?
Maybe I should have posted my set of covariances for each scale value, but I thought it was too much information. Yes, it is possible to get inf, I know. I am posting it here below.
My comment is that here there is a clear failure of the procedure and no warning given. Still, for scale=10000.0 the covariance wold give a rather good-lookin best fit estimate. Please also note that the relative residues are in average around the same order as the responses I have given, so the minimization to find the best fit is satisfactory in my opinion, is just in the covariance estimate that something happens.
I am wondering if this kind of sickness can be detected and returned as a warning. I am willing to work a pull request if we agree it is necessary to warn the user and agree on what the test would need to be.
scale 1e-06
cov_{i,i} [inf inf inf inf inf inf]
refined pars ['3.6706083156506546(\\infty)', '2.5974685415472134(\\infty) \\times 10^{7}', '1.0(\\infty)', '1.0(\\infty)', '1.0(\\infty)', '1.0(\\infty)']
*******************
scale 0.0001
cov_{i,i} [inf inf inf inf inf inf]
refined pars ['3.6677561463762256(\\infty)', '-4.212643747226656(\\infty) \\times 10^{5}', '0.9545652064222612(\\infty) \\times 10^{6}', '3.7029844608402707(\\infty) \\times 10^{9}', '1.0(\\infty)', '1.0(\\infty)']
*******************
scale 0.01
cov_{i,i} [2.69334022e-03 1.57065432e+06 4.34379045e+06 2.71349153e+23
2.07541477e+24 2.06824248e+08]
refined pars ['3.67(5)', '5(1) \\times 10^{3}', '2.0(2) \\times 10^{4}', '0(5) \\times 10^{11}', '0(1) \\times 10^{12}', '3.2(1) \\times 10^{5}']
*******************
scale 1
cov_{i,i} [ 2.68191112e-03 1.57065445e+02 4.34378298e+02 -1.03995736e+17
-7.95411679e+17 2.06823768e+00]
refined pars ['3.67(5)', '5(1) \\times 10^{1}', '2.0(2) \\times 10^{2}', '8.58914118332196(\\mathrm{nan}) \\times 10^{2}', '-2.236805008351183(\\mathrm{nan}) \\times 10^{3}', '32(1)']
*******************
scale 100.0
cov_{i,i} [ 2.69305596e-03 1.57160514e-02 4.34321729e-02 -2.03248403e+07
-1.55454604e+08 2.06822708e-08]
refined pars ['3.67(5)', '0.5(1)', '2.0(2)', '2.5119565473935195(\\mathrm{nan}) \\times 10^{1}', '-6.945662666939283(\\mathrm{nan}) \\times 10^{1}', '0.0032(1)']
*******************
scale 10000.0
cov_{i,i} [2.69328915e-03 4.94204268e-08 3.83639539e-05 2.77955452e-01
2.12594317e+00 2.40954569e-16]
refined pars ['3.67(5)', '0.0013(2)', '-0.002(6)', '-7.2(5)', '20(1)', '3.0(2) \\times 10^{-7}']
*******************
scale 1000000.0
cov_{i,i} [2.69510865e-03 1.38808672e-09 2.57613279e-09 3.08928669e-09
2.36284188e-08 1.83976029e-24]
refined pars ['3.67(5)', '1(4) \\times 10^{-5}', '-2(5) \\times 10^{-5}', '-0.26619(6)', '0.7362(2)', '3.0(1) \\times 10^{-11}']
*******************
Extending the number of points helps in improving this fit result. Still I am quite dissatisfied on how curve_fit does not catch the problem and still returns a seemingly sensible result. No one else sees the need for a fix?
As said, I am willing to work a pull request if we agree it is necessary to warn the user and agree on what the test would need to be.
@ilayn gh-17247 fixed the diagonal covariance element part of this issue.
For detecting a bad covariance matrix, I think the condition number is a good check, but that can be left to the user. Perhaps the ultimate enhancement would be some sort of autoscaling, but that doesn't sound like an easy problem. Should we close this with some advice about checking the condition number? e.g. check np.linalg.cond(covariance); the larger the number, the less it should be trusted? (If that statement is accurate enough, I could add it to the notes, if you think that's appropriate.)
Apologies Matt, I'm really running behind due to work. The covariance check is indeed quite common lately isn't it? The positive definiteness guarantee allows for usage of pocon. It's cheaper than any other conditioning check.
But as a generic documentation input indeed this would be a nice addition.
When I went to actually implement it, I saw that my suggestion is not quite right. I'd have to study this a little more. I agree that there is no bug, but the documentation of curve_fit could add suggestions for the user to check to avoid this sort of thing. Oops... never mind. It is fine; I'll submit a PR.
#12715 (comment) notes that that "fitting your 6-parameter model here is pretty underdetermined" given the data. More specifically, the squared terms of the model are linearly dependent, that is:
x1_squared = y_dy_x1_x2[:, -2]**2
x2_squared = y_dy_x1_x2[:, -1]**2
np.allclose(0.36158614 * x1_squared , x2_squared )
Consequently, one of the quadB and quadW terms is redundant, and this leads to is ambiguity about the best value of each of these parameters. This is reflected in the unpredictability of elements in rows/columns 3 and 4 of the covariance matrix. (See #12715 (comment) above; note that only the diagonal elements 3 and 4 appear to change by something other than orders of magnitude.) When one of these terms is removed from the model, the problem disappears.
(The fact that the model cannot adequately predict the output may be another problem, but that is also not something curve_fit can draw conclusions about. It returns the best fit of the curve to the data; only the user can make judgments about whether that is good enough.)
The algorithm in curve_fit is not designed to detect these things. Like many algorithms, it chugs along assuming that the input was valid. After all, the data is not exactly degenerate:
np.equal(y_dy_x1_x2[:, -1]**2, 0.36158614 * y_dy_x1_x2[:, -2]**2).all() # False
It is difficult for it to determine that this was not intentional.
Some algorithms include tolerances to check for this sort of thing. curve_fit's algorithms do not; it only reports a problem when (in finite precision arithmetic) the Jacobian is exactly singular. Presumably, checking for near-singularity of the Jacobian matrix in every iteration was deemed too computationally expensive. If/when we add callback functionality to least-squares, this sort of check could be added by the user.
In the meantime, the user can detect this sort of problem by computing the condition number of the covariance matrix. Large values (many orders of magnitude greater than 1) suggest that it is worth checking the input more carefully. The diagonals of pcov may also give a hint at what's going on. For example, using _x[1e2], I see that the diagonal elements are:
array([1.28672136e-02, 1.59986104e-02, 4.41199439e-02, 4.40652339e+16, 3.37033077e+17, 2.08437258e-08])
The ones corresponding with parameters 3 and 4 are much larger than the others, hinting that the problem lies with those parameters.