I saw a thread on this topic from a few months ago, and I was
wondering if there had been any progress or new thought on the topic.
Essentially, it appears that the serialization library can't reliably
deserialize the serialized version of NaN and infinity for doubles and
floats. This seems to be a result of relying on the, AFAIK, undefined
behavior of writing NaN/infinity to a stream; it will work correctly
with some standard library implementations, but not all.
I think the problem can be addressed with these changes:
basic_text_oprimitive::save(float or double): on NaN or Infinity,
write out some known, stable string (i.e. "nan" or "inf"); don't rely
on std implementation.
basic_text_iprimitive::load(float or double): look for the "known
values" printed by save(), generating the correct values when they're
seen.
That's a lot of hand waving, to be sure, but something like this would
really help out. Of course, a better solution would be fine with me,
but there are definitely cases where it's necessary to serialize these
values. Currently, there doesn't seem to be a way to reliably to it.
Austin Bingham
The simple truth is I never consider this.
When it came up the last time I didn't really think about it very much as I
was involved in other things and I hoped intereste parties might come to a
consensus without my having to bend my over-stretched brain.
Post by Austin Bingham
I saw a thread on this topic from a few months ago, and I was
wondering if there had been any progress or new thought on the topic.
Essentially, it appears that the serialization library can't reliably
deserialize the serialized version of NaN and infinity for doubles and
floats. This seems to be a result of relying on the, AFAIK, undefined
behavior of writing NaN/infinity to a stream; it will work correctly
with some standard library implementations, but not all.
basic_text_oprimitive::save(float or double): on NaN or Infinity,
write out some known, stable string (i.e. "nan" or "inf"); don't rely
on std implementation.
basic_text_iprimitive::load(float or double): look for the "known
values" printed by save(), generating the correct values when they're
seen.
I'm not convinced this would work very well. when loading a data value, you
have to know ahead of time what type its going to be - float or string.
Here are a couple random thoughts on this issue
a) I believe that native binary archives will handle this without out change
as they just copy the bits to the archive and back. As long as you read the
archive on the same compiler/os/machine, there should be no issue.
b)define a special type for Nan:
class NanType {};
Use variant serialization
ar << boost::variant<NanType, float>(value)
boost::variant x;
ar >> x;
Of course this presumes that one has implemented serialization for
boost::variant. I havn't done this but I did receive code from someone who
did. I wanted to upload it to the boost file section but the fiile section
was full. I noted this on this list but so far no one has responded.
c) a simpler approximation of the above could easily be made.
class NanOrNot {
bool Nan;
float & value; // its its not a Nan
template<class Archive>
void save(Archvive &ar, unsigned int version){
ar << Nan;
if(! Nan)
ar << Value;
}
template<class Archive>
void load(Archive &ar, unsigned int version{
ar >> Nan;
if(! Nan)
ar >> Value;
}
NanOrNot(float & t){
value(t)
{
// initialize Nan
}
BOOST_MEMBER_SPLIT
};
...
float x;
...
ar << NanOrNot(x)
etc.
which i believe is more or less what you have in mind. This is would be a
serialization wrapper which is explained in the documentation. An example
of a complete serialization wrapper is NameValuePair. Once you had a
wrapper you could just
This approach would have a couple of valuable features:
1) its usage is optional. This would keep machine cycle misers happy.
2) it wouldn't require changing any archive class implemention - this would
keep me happy.
Just random ideas - I'm not going to start defending them.
Robert Ramey
Post by Robert Ramey
I'm not convinced this would work very well. when loading a data value, you
have to know ahead of time what type its going to be - float or string.
As I understand things, istream::peek() is always going to work,
meaning that you could check to see if the next char is, for example,
an 'n'. If so, this would indicate that nan was written; otherwise, a
normal float could be read. At least in the toy code I've written,
this works.
This approach (assuming it works completely) has the benefit of not
requiring extra information pertaining to the nan-ness of the value.
It has the downside, as you point out, of taking some extra cycles.
I'll address this in bit.
Post by Robert Ramey
c) a simpler approximation of the above could easily be made.
...
The problems I have with these approaches deal, essentially, with the
cognitive load on the programmer. Now a serialization lib user has to
remember to use the wrappers if dealing with NaN, or face the wrath of
a compiler that is not going to tell you what broke when you try to
read a NaN. Maybe this is not as big a deal as I suppose, but I can
envision scenarios where this would be a problem.
These approaches (although I haven't seen the variant serialization
solution) would incur extra storage for each float/double.
So, taking all of your comments into account (I hope), I have another
idea. Would it be possible to make the text-primitive functionality of
xml and text archives a programmer modifiable property? The most
obvious approach, I think, would be to give the archive_impls a
template parameter of TextPrimitive. Rather than having a hard-coded
inheritance from basic_text_i/oprimitive, this TextPrimitive would be
the base class. By default, of course, the basic_text_primitives would
be used, but alternatives could be supplied by anyone.
This has, I think, the great benefit of keeping the primitive
representation and overall file structure orthogonal. Again, I'm
waving my hands a lot here, but I don't see any reasons in the code
why this couldn't be done, but neither do I have intimate knowledge of
the code.
Austin Bingham
Post by Austin Bingham
Post by Robert Ramey
c) a simpler approximation of the above could easily be made.
...
The problems I have with these approaches deal, essentially, with the
cognitive load on the programmer. Now a serialization lib user has to
remember to use the wrappers if dealing with NaN, or face the wrath of
a compiler that is not going to tell you what broke when you try to
read a NaN. Maybe this is not as big a deal as I suppose, but I can
envision scenarios where this would be a problem.
The problem is that someone is going to say "I don't need this and I don't
want to slow down my application" or something like that. My method permits
one to choose weather or not Nan is going to get special attention on an
item by item basis.
Post by Austin Bingham
These approaches (although I haven't seen the variant serialization
solution) would incur extra storage for each float/double.
The variant serialization is mentioned as an incentive to get someone
interested in implementing this. This wouldn't be that hard, but could be a
little bit subject to contraversy depending on the implementation.
Not that the serialization wrapper I proposed could be implemented
differently for native binary files - which don't need anything special.
This would give each platform what it needs.
Post by Austin Bingham
So, taking all of your comments into account (I hope), I have another
idea. Would it be possible to make the text-primitive functionality of
xml and text archives a programmer modifiable property?
This is pretty much what the wrapper functionality above does.
Post by Austin Bingham
The most obvious approach, I think, would be to give the archive_impls a
template parameter of TextPrimitive. Rather than having a hard-coded
inheritance from basic_text_i/oprimitive, this TextPrimitive would be
the base class. By default, of course, the basic_text_primitives would
be used, but alternatives could be supplied by anyone.
On the other hand, one could modify the code so the default is to flag Nan
on text primitives and require usage of the wrapper to override it.
Post by Austin Bingham
This has, I think, the great benefit of keeping the primitive
representation and overall file structure orthogonal. Again, I'm
waving my hands a lot here, but I don't see any reasons in the code
why this couldn't be done, but neither do I have intimate knowledge of
the code.
Well, everything is doable, the problem is coming to agreement on what to
do.
Actually, my main reluctance is really just inertia. If I had thought about
this point long ago, I probably would have included it the text primitives.
If one is using a text archive, the extra overhead of using a Nan flag is
not going to be noticiable. If efficiency at this level is a concern, one
is going to be using a native binary archive anyway. Adding in this in to
the text primitives would require that I go investigate Nan and what it
means in different environments (e.g. IEEE 80 bit ) and to what extent there
are portable functions for checking whether or not a float, double, (complex
?) is a Nan. I'm also a little concerned at this point of invalidating
portable text archives created by previous versions. So its really is just
inertia. (I'm also bogged down in other stuff now)
Robert Ramey
Post by Robert Ramey
a) I believe that native binary archives will handle this without out change
as they just copy the bits to the archive and back. As long as you read the
archive on the same compiler/os/machine, there should be no issue.
I think I've been misinterpreting this bit here. I originally took
this to mean that the binary archives are not portable across
platforms. Is this the case, or is it just that NaN specifically isn't
portable (to non-IEEE 754 machines, I guess)? In the end, we're hoping
to use the binary format anyway, and i can work around the text format
limitations since I'll only be using it for debugging.
Austin
native binary archives are generally NOT portable accross platforms. This
is highlighted in the documentation an is the motivating factor in preparing
the demo demo_portable archive. Text base archives are meant to be
portable, but of course at the cost of speed and archive size.
I did a tiny bit of googling and found that Nan isn't the only issue. There
is also +/- INF . So the whole subject would need a more thorough treatment
Robert Ramey
Post by Austin Bingham
Post by Robert Ramey
a) I believe that native binary archives will handle this without
out change as they just copy the bits to the archive and back. As
long as you read the archive on the same compiler/os/machine, there
should be no issue.
I think I've been misinterpreting this bit here. I originally took
this to mean that the binary archives are not portable across
platforms. Is this the case, or is it just that NaN specifically isn't
portable (to non-IEEE 754 machines, I guess)? In the end, we're hoping
to use the binary format anyway, and i can work around the text format
limitations since I'll only be using it for debugging.