When converting between
Buffer
s and strings, a character encoding may be
specified. If no character encoding is specified, UTF-8 will be used as the
default.
Node.js buffers accept all case variations of encoding strings that they
receive. For example, UTF-8 can be specified as
'utf8'
,
'UTF8'
, or
'uTf8'
.
The character encodings currently supported by Node.js are the following:
'utf8'
(alias:
'utf-8'
): Multi-byte encoded Unicode characters. Many web
pages and other document formats use
UTF-8
. This is the default character
encoding. When decoding a
Buffer
into a string that does not exclusively
contain valid UTF-8 data, the Unicode replacement character
U+FFFD
� will be
used to represent those errors.
'utf16le'
(alias:
'utf-16le'
): Multi-byte encoded Unicode characters.
Unlike
'utf8'
, each character in the string will be encoded using either 2
or 4 bytes. Node.js only supports the
little-endian
variant of
UTF-16
.
'latin1'
: Latin-1 stands for
ISO-8859-1
. This character encoding only
supports the Unicode characters from
U+0000
to
U+00FF
. Each character is
encoded using a single byte. Characters that do not fit into that range are
truncated and will be mapped to characters in that range.
Converting a
Buffer
into a string using one of the above is referred to as
decoding, and converting a string into a
Buffer
is referred to as encoding.
Node.js also supports the following binary-to-text encodings. For
binary-to-text encodings, the naming convention is reversed: Converting a
Buffer
into a string is typically referred to as encoding, and converting a
string into a
Buffer
as decoding.
'base64'
:
Base64
encoding. When creating a
Buffer
from a string,
this encoding will also correctly accept "URL and Filename Safe Alphabet" as
specified in
RFC 4648, Section 5
. Whitespace characters such as spaces,
tabs, and new lines contained within the base64-encoded string are ignored.
'base64url'
:
base64url
encoding as specified in
RFC 4648, Section 5
. When creating a
Buffer
from a string, this
encoding will also correctly accept regular base64-encoded strings. When
encoding a
Buffer
to a string, this encoding will omit padding.
'hex'
: Encode each byte as two hexadecimal characters. Data truncation
may occur when decoding strings that do not exclusively consist of an even
number of hexadecimal characters. See below for an example.
The following legacy character encodings are also supported:
'ascii'
: For 7-bit
ASCII
data only. When encoding a string into a
Buffer
, this is equivalent to using
'latin1'
. When decoding a
Buffer
into a string, using this encoding will additionally unset the highest bit of
each byte before decoding as
'latin1'
.
Generally, there should be no reason to use this encoding, as
'utf8'
(or, if the data is known to always be ASCII-only,
'latin1'
) will be a
better choice when encoding or decoding ASCII-only text. It is only provided
for legacy compatibility.
'binary'
: Alias for
'latin1'
. See
binary strings
for more background
on this topic. The name of this encoding can be very misleading, as all of the
encodings listed here convert between strings and binary data. For converting
between strings and
Buffer
s, typically
'utf8'
is the right choice.
'ucs2'
,
'ucs-2'
: Aliases of
'utf16le'
. UCS-2 used to refer to a variant
of UTF-16 that did not support characters that had code points larger than
U+FFFF. In Node.js, these code points are always supported.
import { Buffer } from'node:buffer';
Buffer.from('1ag123', 'hex');
// Prints <Buffer 1a>, data truncated when first non-hexadecimal value// ('g') encountered.Buffer.from('1a7', 'hex');
// Prints <Buffer 1a>, data truncated when data ends in single digit ('7').Buffer.from('1634', 'hex');
// Prints <Buffer 16 34>, all data represented.const { Buffer } = require('node:buffer');
Buffer.from('1ag123', 'hex');
// Prints <Buffer 1a>, data truncated when first non-hexadecimal value// ('g') encountered.Buffer.from('1a7', 'hex');
// Prints <Buffer 1a>, data truncated when data ends in single digit ('7').Buffer.from('1634', 'hex');
// Prints <Buffer 16 34>, all data represented.
Modern Web browsers follow the
WHATWG Encoding Standard
which aliases
both
'latin1'
and
'ISO-8859-1'
to
'win-1252'
. This means that while doing
something like
http.get()
, if the returned charset is one of those listed in
the WHATWG specification it is possible that the server actually returned
'win-1252'
-encoded data, and using
'latin1'
encoding may incorrectly decode
the characters.