def sign(x):
return (x > 0) - (x < 0)
Returns 1
for positive floats, -1
for negative floats, 0
for 0
. nan
gets returned as 0
I think so that would need to be special cased to a ValueError
and complex input does raise a TypeError
but not the right one so that would need to be cleaned up.
Use cases:
I’m a bit flummoxed at the idea that there aren’t sufficient use cases to include sign
in the standard library. I think the most convincing use case (which Oscar brought up earlier) is that we sometimes model mathematical functions in python. In this case we might make use of functions like math.cos
or math.exp
. Regularly those mathematical functions we’re modeling might include the sign
/sgn
function.
Mark Dickinson:
in almost all the use-cases I’ve encountered, the decision has been binary (e.g., negative vs. non-negative, or sign bit set versus sign bit clear, or more rarely positive vs. non-positive) rather than ternary.
I agree that there are probably more use cases for a binary comparison than ternary, and yes such a comparison can be built pretty easily using lesser/greater than comparisons. Nonetheless, for someone with a heavy math background they will think “I want the sign function here”. They won’t first think “I need to write a function that does comparison to zero”.
Justin Gerber:
How do you boil that discussion down to what you would like to see for a sign()
function in the standard library? Is the upshot that you are advocating that it should be defined on complex input because it would conform to the array API better?
Firstly the broader ecosystem needs to be considered rather than just focusing on the stdlib in isolation. If the model is that different modules provide functions with the same names and consistent mathematical definitions but different types then that model should be followed consistently. It is too late for the stdlib to try to define what the expected domain, behaviour or name of a sign function should be because the rest of the Python ecosystem has already established these things and now the stdlib should be consistent with that.
Secondly the idea that the math module is only for 64 bit real floating point functions is not really true any more. If someone wants to argue that some function does not belong in the math module unless it is only for 64 bit floats and/or is one of the functions from the C standard then they are overlooking the other functions that already break this convention. There is either a need to have something separate from the math module where those functions should go or we should just accept that the math module can have functions that do not meet those criteria. The inclusion of gcd
, prod
etc already implies the latter choice so the statement of the top of the math module docs that it “provides access to the mathematical functions defined by the C standard” is already out of date.
Thirdly I think that using polymorphic functions would be a much better design in general rather than having separate modules like math, cmath etc but that can be a topic for another thread.
Justin Gerber:
I would say this could be the distinction between sign
and copysign
. sign
implements the mathematical signum operation while copysign
provides access to the sign bit of the underlying float
.
I think this is the crux of the proposal – nicely stated. I was on the fence, but when you put it this way, I’m +1 
Paul Moore:
it should be in a way that’s useful to people who do work with the technical details of floating point (for whom “edge cases” like signed zeroes, NaN, etc. are familiar and well-understood) because that’s the only situation where there’s any sort of complexity that makes writing your own implementation non-trivial.
doesn’t copysign
satisfy the “people who do work with the technical details of floating point” ?
I think the use case here is for the folks that aren’t that familiar with FP – and as the OP posited above, less experienced people are likely to write a quickie function that seems to work, but will break when handed an edge case (e.g. NaN) – and breaking is far worse than a reasonable default behavior.
Justin Gerber:
but it seems like the math
module has specifically evolved as a tool (or out of a tool) to help users who do specifically care about the low-level finer points of floating point arithmetic.
I don’t think that’s the case – it started as a wrapper around libmath
, and it has evolved to grow other useful things, mostly for floats. math.isclose()
is an example of a function quite specifically for use by folks that are not FP experts.
Oscar Benjamin:
It would be better if all functions in the math module were polymorphic
Practicality beats purity – the way Python does polymorphism like that is through the dunder protocols e.g. __abs__
,__round__
, etc. In fact Mark suggested maybe a __sign__
dunder. But we really can’t add a dunder of every math function!
The other option is for the polymath
module functions to know about all the data types they might need to work with – that might be doable if you limit it to the stdlib ones: float, int, Decimal, Fraction, complex, but a bit painful to write (and maintain?)
Which leads to the array interface-like suggestion – seems fine to me, we kinda have that with math
and cmath
already.
And numpy was originally designed (when it was Numeric) to be a mostly drop-in replacement for math :-). Interesting that you brought up sqrt
– isn’t np.sqrt
the same as math.sqrt
for scalars?
Honest question: What is this reasonable default behavior, and how can we know that for the most likely situations? Are people going to generally assume that sign only ever returns (-1,1) (i.e. did they forget about 0)? Are they going to expect ValueError? Are they going to expect nan?
If they don’t expect those, there is no reasonable default behavior, and it doesn’t matter if our implementation breaks their code or if their implementation breaks.
IMO, we can’t provide something that will cover the edge cases well enough, so we shouldn’t try and just provide a recipe in the math module, pointing out these edge cases.
If we really want to add a function, we should copy np.sign
’s handling. Not because it’s the best or the most logical, but because it’s the most common. (I personally dislike that it can return nan
, I would expect a ValueError, but that does against numpys design in general)
Chris Barker:
Practicality beats purity – the way Python does polymorphism like that is through the dunder protocols e.g. __abs__
,__round__
, etc. In fact Mark suggested maybe a __sign__
dunder. But we really can’t add a dunder of every math function!
The other option is for the polymath
module functions to know about all the data types they might need to work with – that might be doable if you limit it to the stdlib ones: float, int, Decimal, Fraction, complex, but a bit painful to write (and maintain?)
I agree with you that you can’t add a dunder for every math function.
I, personally, think the most versatile approach would be to register the function with singledispatch
, and then register it with all the standard library types. Then, users can still register their types if they want.
Chris Barker:
doesn’t copysign
satisfy the “people who do work with the technical details of floating point” ?
Yes. My point was that it’s also sufficient for anyone wanting a sign function who needs something more than the sort of one-liner they could write themselves. But see below.
Chris Barker:
I think the use case here is for the folks that aren’t that familiar with FP – and as the OP posited above, less experienced people are likely to write a quickie function that seems to work, but will break when handed an edge case (e.g. NaN) – and breaking is far worse than a reasonable default behavior.
That’s fair. But the sort of developers wanting a sign
function are also unlikely to encounter special floating point values like NaN and signed zeroes. So the risk of breaking is, IMO, extremely low. The one exception is that data science packages like Pandas use NaN (rightly or wrongly) for missing values. In that context, sign(nan)
needs to equal nan
to preserve the “missing value” meaning.
Which leaves me with the view:
I no longer object to sign
being added to the math
library.
It should have the semantics sign(x) = math.nan if math.isnan(x) else 0.0 if x == 0 else -1.0 if x < 0 else 1.0
(note the x == 0
check should pick up signed zeroes).
It should return a float, to allow for the NaN return value.
It should definitely not handle complex numbers, or anything not convertible to float. If you want a sign
function for complex numbers, it should go in cmath
. This function is not the place to start a fight to merge math
and cmath
. That debate should be a separate question.
Cornelius Krupp:
If we really want to add a function, we should copy np.sign
’s handling. Not because it’s the best or the most logical, but because it’s the most common.
Paul Moore:
It should have the semantics sign(x) = math.nan if math.isnan(x) else 0.0 if x == 0 else -1.0 if x < 0 else 1.0
(note the x == 0
check should pick up signed zeroes).
It should return a float, to allow for the NaN return value.
[I am a major numpy user, so like this in principle, but …]
numpy has two things that are distinct from pure Python:
vectorized computations
single data type arrays.
Because of (1) – “normal” operations should not raise due to the value of the inputs: if one item in an array is NaN, then sign(an_array) shouldn’t raise. So numpy propagates NaNs and other special values instead of raising.
Because of (2), sign() can’t return a None or any other type that isn’t a float.
But does the stdlib have to follow those rules? I don’t think so – it already doesn’t with, e.g. divide by zero – Python raises, numpy (by default) returns inf
and provides a warning.
So I think t’s fine for math.sign()
to return int
s, and None
, or raise, if that’s the better API for Python.
But no need for gratuitous differences – it should have the same behavior around -0 and 0, for instance.
Paul Moore:
The one exception is that data science packages like Pandas use NaN (rightly or wrongly) for missing values. In that context, sign(nan)
needs to equal nan
to preserve the “missing value” meaning.
But pandas (and other data science packages) are built on numpy, and soon, on the “array api” brought up on this thread. The stdlib can (and should) do what best fits with pure Python, and not worry about what pandas et. al. do.
Final process note: I haven’t looked in the archives, but I’ll take the OP’s word for it that this was blocked in the past due to a lack of consensus around the edge cases.
That means that consensus wasn’t reached, not that it couldn’t be reached. Consensus is hard – but it can be done. I’ve seen a number of PEPs get derailed by that challenge, and others (all the successful ones) meet the challenge – and it’s never easy.
In short – this is still worth discussion, and I think it can be successful if someone has the persistence and consensus building skills to do it.
Chris Barker:
Of course, any decision can be revisited, but this particular one is well suited to a C implementation anyway.
I don’t see a need to revisit that decision. C seems fine (I just don’t know much about C and the C implementation of Python so it’s more opaque to me personally).
Cornelius Krupp:
Honest question: What is this reasonable default behavior, and how can we know that for the most likely situations?
Here are the possible handlings of the edge cases that have been discussed and I can imagine. (*) indicates that I find that handling “reasonable”.
0 always returns the same thing as +0.0 (*) (All items below assume this one but one could alternatively imagine e.g. sign(0) == 0
while sign("-0.0") == float("+0.0")
or float("-0.0")
.)
±0.0 returns 0 (*)
±0.0 returns 1 (Not consistent with mathematical signum function)
±0.0 returns ±1 (Not consistent with mathematical signum function)
±0.0 returns 0.0 (*)
±0.0 returns ±0.0 (*)
±0.0 returns 1.0 (Not consistent with mathematical signum function)
±0.0 returns ±1.0 (Not consistent with mathematical signum function)
±0.0 returns None
(This makes the output type of sign
varied, and the sign(0.0)
is otherwise conventionally defined)
±nan raises an exception (*)
±nan returns 0 (not being a number, nan shouldn’t have a sign)
±nan returns 1 (not being a number, nan shouldn’t have a sign)
±nan returns ±1 (not being a number, nan shouldn’t have a sign)
±nan returns 0.0 (not being a number, nan shouldn’t have a sign)
±nan returns ±0.0 (not being a number, nan shouldn’t have a sign)
±nan returns 1.0 (not being a number, nan shouldn’t have a sign)
±nan returns ±1.0 (not being a number, nan shouldn’t have a sign)
±nan returns nan (*)
±nan returns None
(this makes the output type of sign
varied)
So that leaves
0 always returns the same thing as +0.0 (*)
+/-0 returns 0
±0.0 returns 0.0 (*)
±0.0 returns ±0.0 (*)
±nan raises an exception (*)
±nan returns nan (*)
As all being definitely reasonable. There is a question about which of these is optimal, that is, which will be most useful for the most # of users of the next e.g. 10 years. That question is almost certainly impossible to answer, but as long as we go with any of these “reasonable” choices I think that would be acceptable. I don’t think we should hold ourselves to the standard of the “optimal” choice otherwise a choice won’t be made. This is the sort of stuff where we make a choice and explain why we made that choice in, e.g. the rejected ideas section of a PEP or changelog note.
FWIW, at this point in the discussion, I’m shifting towards designing the sign
function as a “pure python” thing as if numpy
etc. didn’t exist. I say this because this function will likely be used outside of the context of numpy
, pandas
etc, because in those cases users could just use numpy.sign()
. To that end, I think simple is best and I’m on board with Mark’s suggestion. sign()
at least accepts input castable to float, always returns an int
(no ± information on returned 0) and raises an exception for nan input.
copysign
is the float oriented sign function which remembers ALL the sign bit information about a float, even when you may not want it to (±0.0 or ±nan) while sign
would be the float naive sign function which will never surprisingly reveal to users that there is a technical difference between +-0.0
and +-nan
.
Paul Moore:
It should definitely not handle complex numbers, or anything not convertible to float. If you want a sign
function for complex numbers, it should go in cmath
. This function is not the place to start a fight to merge math
and cmath
. That debate should be a separate question.