const char* str;
env->GetStringUTFChars(params, 0);
// params is jstring passed from java, params length = 7
std::string tstr(str);
At here, tstr length is 8 and character code of std::string tstr or character code of char* str are:
116, 101, -64, -128 , 116, 105, 110 , 103
Anyone can explain to me why and how to get exact string on C++?
The GetStringUTFChars is modified UTF-8, so the \0 character is normally converted to 0xC0, 0x80. As this is a security risk, it is forbidden.
Your \0 character is converted to 0xFFFFFFFFFFFFFFC0 and 0xFFFFFFFFFFFFFF80, which is -64 and -128.
tieudaotu:
and how to get exact string on C++?
Why does your json string has a \0 character in the middle? UTF-8 strings are \0 terminated, but are not allowed to contain \0 characters in between.
Check why you have a \0 in the middle of your json string!
The C++ string would the string from the beginning to the \0 character, so truncate the json string after the \0 charater.
Nevertheless your const *char has the UTF-8 representation of your string, so has std::string.
Just keep this in mind:
Note that this class handles bytes independently of the encoding used: If used to handle sequences of multi-byte or variable-length characters (such as UTF-8), all members of this class (such as length or size), as well as its iterators, will still operate in terms of bytes (not actual encoded characters).
So what do you need? The UTF-8 representation or the singly byte representation up to the \0 character?
In fact, i nerver have json file have ‘\0’ between, but i was tasked for prevent all such cases.
Buy use std::string now i can hold the ‘\0’ character between string and convert its come back json file.
Thanks you for reply.
I see. If you want to prevent such a case, when working on the java strings in C++, truncate after the \0 character.
If you want to keep the \0 character, just rely on char * and convert it to an UTF-8 string by using some UTF library.
In C++(without using an UTF library) it is never a good idea to keep the converted characters, as the string functions are not UTF agnostic.