urlencode - Encode/Decode URLs in C++ - Stack Overflow

link管理
链接快照平台
输入网页链接，自动生成快照
标签化管理网页链接
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago .
I faced the encoding half of this problem the other day. Unhappy with the available options, and after taking a look at this C sample code , i decided to roll my own C++ url-encode function:
#include <cctype>
#include <iomanip>
#include <sstream>
#include <string>
using namespace std;
string url_encode(const string &value) {
    ostringstream escaped;
    escaped.fill('0');
    escaped << hex;
    for (string::const_iterator i = value.begin(), n = value.end(); i != n; ++i) {
        string::value_type c = (*i);
        // Keep alphanumeric and other accepted characters intact
        if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
            escaped << c;
            continue;
        // Any other characters are percent-encoded
        escaped << uppercase;
        escaped << '%' << setw(2) << int((unsigned char) c);
        escaped << nouppercase;
    return escaped.str();
The implementation of the decode function is left as an exercise to the reader. :P
                I believe it's more generic (more generally correct) to replace ' ' with "%20".  I've updated the code accordingly; feel free to roll back if you disagree.
– Josh Kelley
                Jul 15, 2014 at 17:48
                Nah, I agree. Also took the chance to remove that pointless setw(0) call (at the time I thought minimal width would remain set until I changed it back, but in fact it is reset after the next input).
– xperroni
                Jul 15, 2014 at 22:19
                I had to add std::uppercase to the line  "escaped << '%' << std::uppercase << std::setw(2) << int((unsigned char) c);" In case other people are wondering why this returns for example %3a instead of %3A
– gmm
                Sep 11, 2015 at 9:06
                It looks wrong because UTF-8 strings are not supported (w3schools.com/tags/ref_urlencode.asp). It seems to work only for Windows-1252
– Skywalker13
                Dec 1, 2016 at 16:32
                Related question:  why does curl's unescape not handle changing '+' to space?  Isn't that standard procedure when URL decoding?
– Stéphane
                May 27, 2019 at 6:59
    for (i=0; i<SRC.length(); i++) {
        if (SRC[i]=='%') {
            sscanf(SRC.substr(i+1,2).c_str(), "%x", &ii);
            ch=static_cast<char>(ii);
            ret+=ch;
            i=i+2;
        } else {
            ret+=SRC[i];
    return (ret);
not the best, but working fine ;-)
    namespace uri {    
      inline std::string decoded(const std::string &input);
      inline std::string encoded(const std::string &input);
they allow to encode and decode URL strings very easy.
                omg thank you.   the documentation on cpp-netlib is sparse.  Do you have any links to good cheat sheets?
– user249806
                May 13, 2017 at 13:12
Ordinarily adding '%' to the int value of a char will not work when encoding, the value is supposed to the the hex equivalent.  e.g '/' is '%2F' not '%47'.
I think this is the best and concise solutions for both url encoding and decoding (No much header dependencies).
string urlEncode(string str){
    string new_str = "";
    char c;
    int ic;
    const char* chars = str.c_str();
    char bufHex[10];
    int len = strlen(chars);
    for(int i=0;i<len;i++){
        c = chars[i];
        ic = c;
        // uncomment this if you want to encode spaces with +
        /*if (c==' ') new_str += '+';   
        else */if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') new_str += c;
        else {
            sprintf(bufHex,"%X",c);
            if(ic < 16) 
                new_str += "%0"; 
                new_str += "%";
            new_str += bufHex;
    return new_str;
string urlDecode(string str){
    string ret;
    char ch;
    int i, ii, len = str.length();
    for (i=0; i < len; i++){
        if(str[i] != '%'){
            if(str[i] == '+')
                ret += ' ';
                ret += str[i];
        }else{
            sscanf(str.substr(i + 1, 2).c_str(), "%x", &ii);
            ch = static_cast<char>(ii);
            ret += ch;
            i = i + 2;
    return ret;
                @Kriyen it is used to pad the encoded HEX with leading zero in case it results in a single letter; since 0 to 15 in HEX is 0 to F.
– tormuto
                Mar 1, 2017 at 23:45
                I like this approach the best. +1 for using standard libraries. Though there are two issues to fix. I am Czech and used letter "ý". Result was "%0FFFFFFC3%0FFFFFFBD". First using the 16 switch isn't necessary since utf8 guaranties to start all trailing bytes with 10 and it seemed to fail my multibyte. Second issue is the FF because not all computers have the same amount of bits per int. The fix was to skip the 16 switch (not needed) and grabbing the last two chars from the buffer. (I used stringstream since I feel more comfortable with and a string buffer). Still gave point. Like the frame too
– Volt
                Dec 25, 2017 at 21:58
                @Volt would you be able to post your updated code in a new answer? You mention the issues but it's not enough info for an obvious fix.
– gregn3
                May 30, 2018 at 23:49
                This answer has some problems, because it's using strlen. First, this doesn't make sense, because we already know the size of a string object, so it's a waste of time. Much worse though is, that a string may contain 0-bytes, which would get lost because of the strlen. Also the if(i< 16) is ineffecient, because this can be covered by printf itself using "%%%02X". And finally c should be unsigned byte, otherwise you get the effect that @Volt was describing with leading '0xFFF...'.
– Devolus
                Jan 11, 2019 at 8:18
[Necromancer mode on]

Stumbled upon this question when was looking for fast, modern, platform independent and elegant solution. Didnt like any of above, cpp-netlib would be the winner but it has horrific memory vulnerability in "decoded" function. So I came up with boost's spirit qi/karma solution.

namespace bsq = boost::spirit::qi;
namespace bk = boost::spirit::karma;
bsq::int_parser<unsigned char, 16, 2, 2> hex_byte;
template <typename InputIterator>
struct unescaped_string
    : bsq::grammar<InputIterator, std::string(char const *)> {
  unescaped_string() : unescaped_string::base_type(unesc_str) {
    unesc_char.add("+", ' ');
    unesc_str = *(unesc_char | "%" >> hex_byte | bsq::char_);
  bsq::rule<InputIterator, std::string(char const *)> unesc_str;
  bsq::symbols<char const, char const> unesc_char;
template <typename OutputIterator>
struct escaped_string : bk::grammar<OutputIterator, std::string(char const *)> {
  escaped_string() : escaped_string::base_type(esc_str) {
    esc_str = *(bk::char_("a-zA-Z0-9_.~-") | "%" << bk::right_align(2,0)[bk::hex]);
  bk::rule<OutputIterator, std::string(char const *)> esc_str;
The usage of above as following:

std::string unescape(const std::string &input) {
  std::string retVal;
  retVal.reserve(input.size());
  typedef std::string::const_iterator iterator_type;
  char const *start = "";
  iterator_type beg = input.begin();
  iterator_type end = input.end();
  unescaped_string<iterator_type> p;
  if (!bsq::parse(beg, end, p(start), retVal))
    retVal = input;
  return retVal;
std::string escape(const std::string &input) {
  typedef std::back_insert_iterator<std::string> sink_type;
  std::string retVal;
  retVal.reserve(input.size() * 3);
  sink_type sink(retVal);
  char const *start = "";
  escaped_string<sink_type> g;
  if (!bk::generate(sink, g(start), input))
    retVal = input;
  return retVal;
[Necromancer mode off]
EDIT01: fixed the zero padding stuff - special thanks to Hartmut Kaiser
EDIT02: Live on CoLiRu
                What's the “horrific memory vulnerability” of cpp-netlib? Can you provide a brief explanation or a link?
– Craig M. Brandenburg
                Jul 7, 2015 at 21:26
                It (the problem) was already reported, so I didnt report and actually dont remember... something like access violation when trying to parse invalid escape sequence, or something
– kreuzerkrieg
                Jul 8, 2015 at 14:47
                I suggest to use uint_parser instead of int_parser. As it is , you would probably accept a - sign
– sandwood
                Jun 14, 2022 at 9:43
                The "if (c == '%')" block needs more out-of-bound checking, i[1] and/or i[2] may be beyond text.end(). I would rename "escaped" to "unescaped", too. "escaped.fill('0');" is probably unneeded.
– roalz
                Mar 23, 2018 at 12:54
I ended up on this question when searching for an api to decode url in a win32 c++ app. Since the question doesn't quite specify platform assuming windows isn't a bad thing.
InternetCanonicalizeUrl is the API for windows programs. More info here
        LPTSTR lpOutputBuffer = new TCHAR[1];
        DWORD dwSize = 1;
        BOOL fRes = ::InternetCanonicalizeUrl(strUrl, lpOutputBuffer, &dwSize, ICU_DECODE | ICU_NO_ENCODE);
        DWORD dwError = ::GetLastError();
        if (!fRes && dwError == ERROR_INSUFFICIENT_BUFFER)
            delete lpOutputBuffer;
            lpOutputBuffer = new TCHAR[dwSize];
            fRes = ::InternetCanonicalizeUrl(strUrl, lpOutputBuffer, &dwSize, ICU_DECODE | ICU_NO_ENCODE);
            if (fRes)
                //lpOutputBuffer has decoded url
                //failed to decode
            if (lpOutputBuffer !=NULL)
                delete [] lpOutputBuffer;
                lpOutputBuffer = NULL;
            //some other error OR the input string url is just 1 char and was successfully decoded
InternetCrackUrl (here) also seems to have flags to specify whether to decode url
Adding a follow-up to Bill's recommendation for using libcurl: great suggestion, and to be updated:

after 3 years, the curl_escape function is deprecated, so for future use it's better to use curl_easy_escape.
                I tried this but it did not work correctly for me. See: stackoverflow.com/q/75781057/2287576
– Andrew Truckle
                Mar 19 at 9:50
I couldn't find a URI decode/unescape here that also decodes 2 and 3 byte sequences. Contributing my own version, that on-the-fly converts the c sting input to a wstring:
#include <string>
const char HEX2DEC[55] =
     0, 1, 2, 3,  4, 5, 6, 7,  8, 9,-1,-1, -1,-1,-1,-1,
    -1,10,11,12, 13,14,15,-1, -1,-1,-1,-1, -1,-1,-1,-1,
    -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1,
    -1,10,11,12, 13,14,15
#define __x2d__(s) HEX2DEC[*(s)-48]
#define __x2d2__(s) __x2d__(s) << 4 | __x2d__(s+1)
std::wstring decodeURI(const char * s) {
    unsigned char b;
    std::wstring ws;
    while (*s) {
        if (*s == '%')
            if ((b = __x2d2__(s + 1)) >= 0x80) {
                if (b >= 0xE0) { // three byte codepoint
                    ws += ((b & 0b00001111) << 12) | ((__x2d2__(s + 4) & 0b00111111) << 6) | (__x2d2__(s + 7) & 0b00111111);
                    s += 9;
                else { // two byte codepoint
                    ws += (__x2d2__(s + 4) & 0b00111111) | (b & 0b00000011) << 6;
                    s += 6;
            else { // one byte codepoints
                ws += b;
                s += 3;
        else { // no %
            ws += *s;
    return ws;
                Sorry but "high performance" while adding single chars to a wstring is unrealistic. At least reserve enough space, otherwise you will have massive reallocations all the time
– Felix Dombek
                Aug 17, 2017 at 21:59
This version is pure C and can optionally normalize the resource path. Using it with C++ is trivial:
#include <string>
#include <iostream>
int main(int argc, char** argv)
    const std::string src("/some.url/foo/../bar/%2e/");
    std::cout << "src=\"" << src << "\"" << std::endl;
    // either do it the C++ conformant way:
    char* dst_buf = new char[src.size() + 1];
    urldecode(dst_buf, src.c_str(), 1);
    std::string dst1(dst_buf);
    delete[] dst_buf;
    std::cout << "dst1=\"" << dst1 << "\"" << std::endl;
    // or in-place with the &[0] trick to skip the new/delete
    std::string dst2;
    dst2.resize(src.size() + 1);
    dst2.resize(urldecode(&dst2[0], src.c_str(), 1));
    std::cout << "dst2=\"" << dst2 << "\"" << std::endl;
Outputs:
src="/some.url/foo/../bar/%2e/"
dst1="/some.url/bar/"
dst2="/some.url/bar/"
And the actual function:
#include <stddef.h>
#include <ctype.h>
 * decode a percent-encoded C string with optional path normalization
 * The buffer pointed to by @dst must be at least strlen(@src) bytes.
 * Decoding stops at the first character from @src that decodes to null.
 * Path normalization will remove redundant slashes and slash+dot sequences,
 * as well as removing path components when slash+dot+dot is found. It will
 * keep the root slash (if one was present) and will stop normalization
 * at the first questionmark found (so query parameters won't be normalized).
 * @param dst       destination buffer
 * @param src       source buffer
 * @param normalize perform path normalization if nonzero
 * @return          number of valid characters in @dst
 * @author          Johan Lindh <[email protected]>
 * @legalese        BSD licensed (http://opensource.org/licenses/BSD-2-Clause)
ptrdiff_t urldecode(char* dst, const char* src, int normalize)
    char* org_dst = dst;
    int slash_dot_dot = 0;
    char ch, a, b;
        ch = *src++;
        if (ch == '%' && isxdigit(a = src[0]) && isxdigit(b = src[1])) {
            if (a < 'A') a -= '0';
            else if(a < 'a') a -= 'A' - 10;
            else a -= 'a' - 10;
            if (b < 'A') b -= '0';
            else if(b < 'a') b -= 'A' - 10;
            else b -= 'a' - 10;
            ch = 16 * a + b;
            src += 2;
        if (normalize) {
            switch (ch) {
            case '/':
                if (slash_dot_dot < 3) {
                    /* compress consecutive slashes and remove slash-dot */
                    dst -= slash_dot_dot;
                    slash_dot_dot = 1;
                    break;
                /* fall-through */
            case '?':
                /* at start of query, stop normalizing */
                if (ch == '?')
                    normalize = 0;
                /* fall-through */
            case '\0':
                if (slash_dot_dot > 1) {
                    /* remove trailing slash-dot-(dot) */
                    dst -= slash_dot_dot;
                    /* remove parent directory if it was two dots */
                    if (slash_dot_dot == 3)
                        while (dst > org_dst && *--dst != '/')
                            /* empty body */;
                    slash_dot_dot = (ch == '/') ? 1 : 0;
                    /* keep the root slash if any */
                    if (!slash_dot_dot && dst == org_dst && *dst == '/')
                        ++dst;
                break;
            case '.':
                if (slash_dot_dot == 1 || slash_dot_dot == 2) {
                    ++slash_dot_dot;
                    break;
                /* fall-through */
            default:
                slash_dot_dot = 0;
        *dst++ = ch;
    } while(ch);
    return (dst - org_dst) - 1;
                This does not follow any recommandation, and is completely wrong compared to what the author asks for ('+' is not replaced by space for example). Path normalization has nothing to do with url decoding. If you intent to normalize your path, you should first split your URL in parts (scheme, authority, path, query, fragment) and then apply whatever algorithm you like only on the path part.
– xryl669
                Feb 3, 2015 at 9:04
You can use "g_uri_escape_string()" function provided glib.h.
https://developer.gnome.org/glib/stable/glib-URI-Functions.html
#include <stdio.h>
#include <stdlib.h>
#include <glib.h>
int main() {
    char *uri = "http://www.example.com?hello world";
    char *encoded_uri = NULL;
    //as per wiki (https://en.wikipedia.org/wiki/Percent-encoding)
    char *escape_char_str = "!*'();:@&=+$,/?#[]"; 
    encoded_uri = g_uri_escape_string(uri, escape_char_str, TRUE);
    printf("[%s]\n", encoded_uri);
    free(encoded_uri);
    return 0;
compile it with:
gcc encoding_URI.c `pkg-config --cflags --libs glib-2.0`
I know the question asks for a C++ method, but for those who might need it, I came up with a very short function in plain C to encode a string. It doesn't create a new string, rather it alters the existing one, meaning that it must have enough size to hold the new string. Very easy to keep up.
void urlEncode(char *string)
    char charToEncode;
    int posToEncode;
    while (((posToEncode=strspn(string,"1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_.~"))!=0) &&(posToEncode<strlen(string)))
        charToEncode=string[posToEncode];
        memmove(string+posToEncode+3,string+posToEncode+1,strlen(string+posToEncode));
        string[posToEncode]='%';
        string[posToEncode+1]="0123456789ABCDEF"[charToEncode>>4];
        string[posToEncode+2]="0123456789ABCDEF"[charToEncode&0xf];
        string+=posToEncode+3;
Had to do it in a project without Boost. So, ended up writing my own. I will just put it on GitHub: https://github.com/corporateshark/LUrlParser
clParseURL URL = clParseURL::ParseURL( "https://name:[email protected]:80/path/res" );
if ( URL.IsValid() )
    cout << "Scheme    : " << URL.m_Scheme << endl;
    cout << "Host      : " << URL.m_Host << endl;
    cout << "Port      : " << URL.m_Port << endl;
    cout << "Path      : " << URL.m_Path << endl;
    cout << "Query     : " << URL.m_Query << endl;
    cout << "Fragment  : " << URL.m_Fragment << endl;
    cout << "User name : " << URL.m_UserName << endl;
    cout << "Password  : " << URL.m_Password << endl;
                Your link is to a library which parses a URL.  It does not %-encode a URL.  (Or at least, I couldn't see a % anywhere in the source.)  As such, I don't think this answers the question.
– Martin Bonner supports Monica
                Nov 20, 2015 at 13:23