php latin1转成utf-8,PHP cp1252/windows-1252转换为UTF-8

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

另类的路灯 · C++ ...· 4 小时前 ·

爱搭讪的手套 · Lemon-Duck ...· 9 小时前 ·

逼格高的李子 · VTK: vtkQtView Class ...· 昨天 ·

知识渊博的打火机 · C# 字符串(String)的使用-CJavaPy· 昨天 ·

愤怒的键盘 · 認識 C# 7.2 與 .NET Core ...· 昨天 ·

有腹肌的熊猫 · 杭州市育才教育集团· 1 月前 ·

酷酷的开心果 · 教育部与我省举行部省会商会议_时政要闻_河南 ...· 2 月前 ·

道上混的上铺 · DataReader 类 ...· 2 月前 ·

文雅的登山鞋 · 米乐|米乐·M6(China)官方网站· 5 月前 ·

沉着的墨镜 · Flask交互基础(GET、 POST ...· 6 月前 ·

首先,Windows-1252不是UTF-8的子集.你可以说ASCII是UTF-8的一个子集,但这通常更像是一种意识形态的争论.

其次,不可能处理CP1252和UTF-8“字符”的字符串(实际上CP1252是字节,Unicode是代码点).您尝试将其读作CP1252,并将所有Unicode字符视为单个字节,或者将其读作UTF-8并删除任何无效的字节序列(如果CP1252字符与Unicode代码点匹配,则创建随机字符) .您没有使用$c = mb_strcut($c,1);删除测试字符,您正在删除由mb_convert_encoding创建的问号,因为它无法将该Unicode字符转换为CP1252字符.

第三,你永远不应该转换一个字符串,然后尝试确定编码.转换完第二个测试字符串后,它是？COD？.没有理由检查其中是否存在Unicode字符,因为您已将其转换为CP1252.其中不能包含Unicode字符.作为程序员,您必须知道输出是什么.

唯一的解决方案是检查字符串是否为CP1252,将有问题的字符转换为占位符,然后将该字符串转换为Unicode：

function convert_cp1252_to_utf8($input, $default = '', $replace = array()) {

if ($input === null || $input == '') {

return $default;

// https://en.wikipedia.org/wiki/UTF-8

// https://en.wikipedia.org/wiki/ISO/IEC_8859-1

// https://en.wikipedia.org/wiki/Windows-1252

// http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT

$encoding = mb_detect_encoding($input, array('Windows-1252', 'ISO-8859-1'), true);

if ($encoding == 'ISO-8859-1' || $encoding == 'Windows-1252') {

* Use the search/replace arrays if a character needs to be replaced with

* something other than its Unicode equivalent.

/*$replace = array(

128 => "€", // http://www.fileformat.info/info/unicode/char/20AC/index.htm EURO SIGN

129 => "", // UNDEFINED

130 => "‚", // http://www.fileformat.info/info/unicode/char/201A/index.htm SINGLE LOW-9 QUOTATION MARK

131 => "ƒ", // http://www.fileformat.info/info/unicode/char/0192/index.htm LATIN SMALL LETTER F WITH HOOK

132 => "„", // http://www.fileformat.info/info/unicode/char/201e/index.htm DOUBLE LOW-9 QUOTATION MARK

133 => "…", // http://www.fileformat.info/info/unicode/char/2026/index.htm HORIZONTAL ELLIPSIS

134 => "†", // http://www.fileformat.info/info/unicode/char/2020/index.htm DAGGER

135 => "‡", // http://www.fileformat.info/info/unicode/char/2021/index.htm DOUBLE DAGGER

136 => "ˆ", // http://www.fileformat.info/info/unicode/char/02c6/index.htm MODIFIER LETTER CIRCUMFLEX ACCENT

137 => "‰", // http://www.fileformat.info/info/unicode/char/2030/index.htm PER MILLE SIGN

138 => "Š", // http://www.fileformat.info/info/unicode/char/0160/index.htm LATIN CAPITAL LETTER S WITH CARON

139 => "‹", // http://www.fileformat.info/info/unicode/char/2039/index.htm SINGLE LEFT-POINTING ANGLE QUOTATION MARK

140 => "Œ", // http://www.fileformat.info/info/unicode/char/0152/index.htm LATIN CAPITAL LIGATURE OE

141 => "", // UNDEFINED

142 => "Ž", // http://www.fileformat.info/info/unicode/char/017d/index.htm LATIN CAPITAL LETTER Z WITH CARON

143 => "", // UNDEFINED

144 => "", // UNDEFINED

145 => "‘", // http://www.fileformat.info/info/unicode/char/2018/index.htm LEFT SINGLE QUOTATION MARK

146 => "’", // http://www.fileformat.info/info/unicode/char/2019/index.htm RIGHT SINGLE QUOTATION MARK

147 => "“", // http://www.fileformat.info/info/unicode/char/201c/index.htm LEFT DOUBLE QUOTATION MARK

148 => "”", // http://www.fileformat.info/info/unicode/char/201d/index.htm RIGHT DOUBLE QUOTATION MARK

149 => "•", // http://www.fileformat.info/info/unicode/char/2022/index.htm BULLET

150 => "–", // http://www.fileformat.info/info/unicode/char/2013/index.htm EN DASH

151 => "—", // http://www.fileformat.info/info/unicode/char/2014/index.htm EM DASH

152 => "˜", // http://www.fileformat.info/info/unicode/char/02DC/index.htm SMALL TILDE

153 => "™", // http://www.fileformat.info/info/unicode/char/2122/index.htm TRADE MARK SIGN

154 => "š", // http://www.fileformat.info/info/unicode/char/0161/index.htm LATIN SMALL LETTER S WITH CARON

155 => "›", // http://www.fileformat.info/info/unicode/char/203A/index.htm SINGLE RIGHT-POINTING ANGLE QUOTATION MARK

156 => "œ", // http://www.fileformat.info/info/unicode/char/0153/index.htm LATIN SMALL LIGATURE OE

157 => "", // UNDEFINED

158 => "ž", // http://www.fileformat.info/info/unicode/char/017E/index.htm LATIN SMALL LETTER Z WITH CARON

159 => "Ÿ", // http://www.fileformat.info/info/unicode/char/0178/index.htm LATIN CAPITAL LETTER Y WITH DIAERESIS

if (count($replace) != 0) {

$find = array();

foreach (array_keys($replace) as $key) {

$find[] = chr($key);

$input = str_replace($find, array_values($replace), $input);

* Because ISO-8859-1 and CP1252 are identical except for 0x80 through 0x9F

* and control characters, always convert from Windows-1252 to UTF-8.

$input = iconv('Windows-1252', 'UTF-8//IGNORE', $input);

if (count($replace) != 0) {

$input = html_entity_decode($input);

return $input;

诀窍是你必须检查ISO-8859-1和CP1252,因为它们非常相似.经过几个小时玩这个功能后,我发现了这个问题,只有this answer才能救我.如果您发现此功能有用,请转1回答.

基本上,此函数用表示Unicode字符的HTML实体替换所有那些错误的CP1252字节.然后我们将字符串从ISO-8859-1 / CP1252转换为UTF-8,而我们的新Unicode字符都没有被破坏,因为它们是简单的ASCII字符.最后,我们解码HTML实体,最后得到100％的Unicode字符串.

拉丁文1至 utf8 mb4 不幸的是，没有单一的主要解决方法可以解决字符集的问题。存在许多不同的问题，并且有许多解决这些问题的方法。这套小的代码是用于试用的，尽管它可以完成工作，但是有更好的解决方案。只需应用以下内容。数据库被带到 utf8 mb4字符集。数据库中的所有表均已获取并更新了字符集。通过采用表中的列，将替换所需字段中给定的字符。 use Xuma \ Fixer \ Database ; use Xuma \ Fixer \ Query ; include "vendor/autoload. php " ; $ db = new Database ( 'veritabani_adi' , 'root' , '' ); $ query = new Query ( $ db ); $ query -> setTypes ([ 'varchar' , 'text' , mysql汉字乱码的原因 mysql默认的编码是 Latin1 是I-8859-1的别名，但 Latin1 是不支持汉字的，所以要将其改为 UTF-8 或GBK 1.关闭mysql服务器，这个很重要。 2.通过my.ini设置mysql数据库的编码在mysql数据库的安装根目录下找到my.ini，例：C:\Program Files\MySQL\MySQL Server 5.5 将其复制到桌面，双击打开，搜索“default-character-set”将其改为 utf8 ，搜索“character-set-server= utf8 ”将其改为 utf8 看下图，修改成功保存，将my.ini复制回C:\Pr PHP 转换文件编码是一个比较简单的事情，但是在开发中传递中文参数的时候，有时候不知道是什么编码，结果造成了乱码的现象。这里有个非常方便的解决办法，可以自动识别编码并转换为 UTF-8 。具体代码如下：复制代码代码如下:function characet($data){ if( !empty($data) ){ $fileType = mb_detect_encoding($data , array(‘ UTF-8 ′,’GBK’,’ LATIN1 ′,’BIG5′)) ; if( $fileType != ‘ UTF-8 ’){ $data = mb_conve (1) Shift_JIS Shift_JIS是一个日本电脑系统常用的编码表。它能容纳全角及半角拉丁字母、平假名、片假名、符号及日语汉字。它被命名为Shift_JIS的原因，是它在放置全角字符时，要避开原本在0xA1-0xDF放置的半角假名字符。在微软及IBM的日语电脑系统中，即使用了这个编码表。这个编码表称为CP932。这是一个常见的问题，所以这里有一个相对彻底的说明。对于非unicode字符串(即那些没有u前缀的字符串，如u'\xc4pple')，必须从本机编码(iso8859-1/ latin1 ，除非modified with the enigmatic ^{}函数)解码到^{}，然后编码到可以显示所需字符的字符集，在这种情况下，我建议使用^{}。首先，这里有一个方便的实用程序函数，可以帮助说明Python2.... 原因分析：可能是没有安装中文包。解决方法：1、查看是否安装了中文包locale -a |grep "zh_CN"没有输出，说明没有安装，输入下面的命令安装：yum groupinstall "fonts" -y安装完成，查看安了哪些中文语言包[root@iz2ze6adlpez0gy7j13vrmz /]# locale -a | grep "zh_CN"zh_CNzh_CN.gb18030zh_... 1、下载安装包，下载地址：mysql8.0.12 。如果你想要下载其它版本可以选择：mysql历史版本地址。2、下载好，删除 php study的mysql目录。如果数据重要的，注意备份数据！同意把my.ini复制出桌面有用。然后把下载好的新版本的mysql解压到这个目录里。我的是D:\ php Study\ PHP Tutorial\MySQL3、编辑my.ini文件，我的是这个，注意 "\"的方向[my... 以 PHP now 1.5.6-1为例: 打开 MySQL 的配置文件，位置在 PHP now/MySQL-5.0.90/my.ini，做以下修改，没有的配置请自己添加，路径请自己修改,需要修改三处就可以。一 [client]#password = your_passwordport = 3306socket = /tmp/mysql.sock后边添加:defaul...