PHP: 转义序列(反斜线) - Manual

比如，如果你希望匹配一个 "*" 字符，就需要在模式中写为 "\*"。这适用于一个字符在不进行转义会有特殊含义的情况下。但是，对于非数字字母的字符，总是在需要其进行原文匹配的时候在它前面增加一个反斜线，来声明它代表自己，这是安全的。如果要匹配一个反斜线，那么在模式中使用 ”\\”。反斜线在单引号和双引号 PHP 字符串中都有特殊含义。因此要匹配一个反斜线 \，正则表达式写法是 \\，然后 PHP 代码中需要转义写成 "\\\\" 或 '\\\\'。如果一个模式被使用 PCRE_EXTENDED 选项编译，模式中的空白字符(除了字符类中的)和未转义的#到行末的所有字符都会被忽略。要在这种情况下使用空白字符或者#，就需要对其进行转义。反斜线的第二种用途提供了一种对非打印字符进行可见编码的控制手段。除了二进制的 0 会终结一个模式外，并不会严格的限制非打印字符(自身)的出现，但是当一个模式以文本编辑器的方式编辑准备的时候，使用下面的转义序列相比使用二进制字符会更加容易：

\cx

的确切效果如下：如果

是一个小写字母，它被转换为大写。接着，将字符的第6位(十六进制 40，右数第一个位为第0位)取反。比如

\cz

成为十六进制的1A，

\c{

成为十六进制3B，

\c;

成为十六进制7B。在”

\x

”后面，读取两个十六进制数(字母可以是大写或小写)。在 UTF-8模式 ， “


   \x{…}

”允许使用，花括号内的内容是十六进制有效数字。它将给出的十六进制数字解释为 UTF-8 字符代码。原来的十六进制转义序列，


   \xhh

，匹配一个双字节的UTF-8字符，如果它的值大于127 在”

\0

”之后，读取两个八进制数。所有情况下，如果数少于2个，则直接使用。序列 ”


   \0\x\07

” 指定了两个二进制 0 紧跟着一个 BEL 字符。请确保初始的 0 之后的两个数字是合法的八进制数。处理一个反斜线紧跟着的不是0的数字的情况比较复杂。在字符类外部， PCRE 读取它并以十进制读取紧随其后的数字。如果数值小于 10，或者之前捕获到了该数字能够代表的左括号(子组)，整个数字序列被认为是 后向引用 。后向引用如何工作在后面描述，接下来就会讨论括号子组。在一个字符类里面，或者十进制数大于 9 并且没有那么多的子组被捕获， PCRE 重新读取反斜线后的第三个 8 进制数字，并且从最低的 8 位生成单字节值。任何的后续数字都代表它们自身。例如： "空白字符"（whitespace）是 HT (9)、LF (10)、FF (12)、CR (13)、space (32)。然而，若发生了本地化匹配，在代码点 128-255 范围内亦可能出现空白字符，比如说 NBSP (A0)。单词字符指的是任意字母、数字、下划线。也就是说任意可以组成perl 单词的字符。字母和数字的定义通过PCRE字符表控制，可以通过指定地域设置使其匹配改变。比如，在法国 (fr) 本地化设置中，一些超过 128 的字符代码被用于重音字母，它们可以实用

\w

匹配。这些字符类序列在字符类内部或外部都可以出现。他们每次匹配所代表的字符类型中的一个字符。如果当前匹配点位于目标字符串末尾，它们中的所有字符都匹配失败，因为没有字符让它们匹配了。反斜线的第四种用法是一些简单的断言。一个断言指定一个必须在特定位置匹配的条件，它们不会从目标字符串中消耗任何字符。接下来我们会讨论使用子组的更加复杂的断言。反斜线断言包括：

\z

断言不同于传统的

和

(详见锚 )，因为他们永远匹配目标字符串的开始和结尾，而不会受模式修饰符的限制。它们不受 PCRE_MULTILINE ， PCRE_DOLLAR_ENDONLY 选项的影响。

\Z

和

\z

之间的不同在于当字符串结束字符时换行符时

\Z

会将其看做字符串结尾匹配, 而

\z

只匹配字符串结尾。

\G

断言在指定了


   offset

参数的 preg_match() 调用中，仅在当前匹配位置在匹配开始点的时候才是成功的。当


   offset

的值不为 0 的时候，它与

\A

是不同的。译注：另外一点与

\A

的不同之处在于使用 preg_match_all() 时，每次匹配

\G

只是断言是否是匹配结果的开始位置，而

\A

断言的则是匹配结果的开始位置是否在目标字符串开始位置。

\Q

和

\E

可以用于在模式中忽略正则表达式元字符。比如：


   \w+\Q.$.\E$

会匹配一个或多个单词字符，紧接着

.$.

，最后锚向字符串末尾。注意这不会改变分隔符的行为；例如模式


   #\Q#\E#$

无效，因为第二个

标记了模式的结束，而

\E#

解释为无效的修饰符。

\K

可以用于重置匹配。比如，


   foot\Kbar

匹配”footbar”。但是得到的匹配结果是 ”bar”。但是，

\K

的使用不会干预到子组内的内容，比如


   (foot)\Kbar

匹配 ”footbar”，第一个子组内的结果仍然会是 ”foo”。译注： \K 放在子组和子组外面的效果是一样的。

发现了问题？

Wirek ¶

7 years ago


    
     Significantly updated version (with new $pat4 utilising \R properly, its results and comments):
     

     Note that there are (sometimes difficult to grasp at first glance) nuances of meaning and application of escape sequences like \r, \R and \v - none of them is perfect in all situations, but they are quite useful nevertheless. Some official PCRE control options and their changes come in handy too - unfortunately neither (*ANYCRLF), (*ANY) nor (*CRLF) is documented here on php.net at the moment (although they seem to be available for over 10 years and 5 months now), but they are described on Wikipedia ("Newline/linebreak options" at
     
      https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions
     
     ) and official PCRE library site ("Newline convention" at
     
      http://www.pcre.org/original/doc/html/pcresyntax.html#SEC17
     
     ) pretty well. The functionality of \R appears somehow disappointing (with default configuration of compile time option) according to php.net as well as official description ("Newline sequences" at
     
      https://www.pcre.org/original/doc/html/pcrepattern.html#newlineseq
     
     ) when used improperly.
     

     A hint for those of you who are trying to fight off (or work around at least) the problem of matching a pattern correctly at the end ($) of any line in multiple lines mode (/m).
     

     
      <?php
      

     
     
      // Various OS-es have various end line (a.k.a line break) chars:
      

      // - Windows uses CR+LF (\r\n);
      

      // - Linux LF (\n);
      

      // - OSX CR (\r).
      

      // And that's why single dollar meta assertion ($) sometimes fails with multiline modifier (/m) mode - possible bug in PHP 5.3.8 or just a "feature"(?).
      

     
     
      $str
     
     
      =
     
     
      "ABC ABC\n\n123 123\r\ndef def\rnop nop\r\n890 890\nQRS QRS\r\r~-_ ~-_"
     
     
      ;
      

     
     
      //          C          3                   p          0                   _
      

     
     
      $pat1
     
     
      =
     
     
      '/\w$/mi'
     
     
      ;
     
     
      // This works excellent in JavaScript (Firefox 7.0.1+)
      

     
     
      $pat2
     
     
      =
     
     
      '/\w\r?$/mi'
     
     
      ;
     
     
      // Slightly better
      

     
     
      $pat3
     
     
      =
     
     
      '/\w\R?$/mi'
     
     
      ;
     
     
      // Somehow disappointing according to php.net and pcre.org when used improperly
      

     
     
      $pat4
     
     
      =
     
     
      '/\w(?=\R)/i'
     
     
      ;
     
     
      // Much better with allowed lookahead assertion (just to detect without capture) without multiline (/m) mode; note that with alternative for end of string ((?=\R|$)) it would grab all 7 elements as expected
      

     
     
      $pat5
     
     
      =
     
     
      '/\w\v?$/mi'
     
     
      ;
      

     
     
      $pat6
     
     
      =
     
     
      '/(*ANYCRLF)\w$/mi'
     
     
      ;
     
     
      // Excellent but undocumented on php.net at the moment (described on pcre.org and en.wikipedia.org)
      

     
     
      $n
     
     
      =
     
     
      preg_match_all
     
     
      (
     
     
      $pat1
     
     
      ,
     
     
      $str
     
     
      ,
     
     
      $m1
     
     
      );
      

     
     
      $o
     
     
      =
     
     
      preg_match_all
     
     
      (
     
     
      $pat2
     
     
      ,
     
     
      $str
     
     
      ,
     
     
      $m2
     
     
      );
      

     
     
      $p
     
     
      =
     
     
      preg_match_all
     
     
      (
     
     
      $pat3
     
     
      ,
     
     
      $str
     
     
      ,
     
     
      $m3
     
     
      );
      

     
     
      $r
     
     
      =
     
     
      preg_match_all
     
     
      (
     
     
      $pat4
     
     
      ,
     
     
      $str
     
     
      ,
     
     
      $m4
     
     
      );
      

     
     
      $s
     
     
      =
     
     
      preg_match_all
     
     
      (
     
     
      $pat5
     
     
      ,
     
     
      $str
     
     
      ,
     
     
      $m5
     
     
      );
      

     
     
      $t
     
     
      =
     
     
      preg_match_all
     
     
      (
     
     
      $pat6
     
     
      ,
     
     
      $str
     
     
      ,
     
     
      $m6
     
     
      );
      

      echo
     
     
      $str
     
     
      .
     
     
      "\n1 !!!
     
     
      $pat1
     
     
      (
     
     
      $n
     
     
      ): "
     
     
      .
     
     
      print_r
     
     
      (
     
     
      $m1
     
     
      [
     
     
      0
     
     
      ],
     
     
      true
     
     
      )
      

      .
     
     
      "\n2 !!!
     
     
      $pat2
     
     
      (
     
     
      $o
     
     
      ): "
     
     
      .
     
     
      print_r
     
     
      (
     
     
      $m2
     
     
      [
     
     
      0
     
     
      ],
     
     
      true
     
     
      )
      

      .
     
     
      "\n3 !!!
     
     
      $pat3
     
     
      (
     
     
      $p
     
     
      ): "
     
     
      .
     
     
      print_r
     
     
      (
     
     
      $m3
     
     
      [
     
     
      0
     
     
      ],
     
     
      true
     
     
      )
      

      .
     
     
      "\n4 !!!
     
     
      $pat4
     
     
      (
     
     
      $r
     
     
      ): "
     
     
      .
     
     
      print_r
     
     
      (
     
     
      $m4
     
     
      [
     
     
      0
     
     
      ],
     
     
      true
     
     
      )
      

      .
     
     
      "\n5 !!!
     
     
      $pat5
     
     
      (
     
     
      $s
     
     
      ): "
     
     
      .
     
     
      print_r
     
     
      (
     
     
      $m5
     
     
      [
     
     
      0
     
     
      ],
     
     
      true
     
     
      )
      

      .
     
     
      "\n6 !!!
     
     
      $pat6
     
     
      (
     
     
      $t
     
     
      ): "
     
     
      .
     
     
      print_r
     
     
      (
     
     
      $m6
     
     
      [
     
     
      0
     
     
      ],
     
     
      true
     
     
      );
      

     
     
      // Note the difference among the three very helpful escape sequences in $pat2 (\r), $pat3 and $pat4 (\R), $pat5 (\v) and altered newline option in $pat6 ((*ANYCRLF)) - for some applications at least.
      

      

      /* The code above results in the following output:
      

      ABC ABC
      

      

      123 123
      

      def def
      

      nop nop
      

      890 890
      

      QRS QRS
      

      

      ~-_ ~-_
      

      1 !!! /\w$/mi (3): Array
      

      (
      

      [0] => C
      

      [1] => 0
      

      [2] => _
      

      )
      

      

      2 !!! /\w\r?$/mi (5): Array
      

      (
      

      [0] => C
      

      [1] => 3
      

      [2] => p
      

      [3] => 0
      

      [4] => _
      

      )
      

      

      3 !!! /\w\R?$/mi (5): Array
      

      (
      

      [0] => C
      

      

      [1] => 3
      

      [2] => p
      

      [3] => 0
      

      [4] => _
      

      )
      

      

      4 !!! /\w(?=\R)/i (6): Array
      

      (
      

      [0] => C
      

      [1] => 3
      

      [2] => f
      

      [3] => p
      

      [4] => 0
      

      [5] => S
      

      )
      

      

      5 !!! /\w\v?$/mi (5): Array
      

      (
      

      [0] => C
      

      

      [1] => 3
      

      [2] => p
      

      [3] => 0
      

      [4] => _
      

      )
      

      

      6 !!! /(*ANYCRLF)\w$/mi (7): Array
      

      (
      

      [0] => C
      

      [1] => 3
      

      [2] => f
      

      [3] => p
      

      [4] => 0
      

      [5] => S
      

      [6] => _
      

      )
      

      */
      

     
     
      ?>
      

     
     Unfortunately, I haven't got any access to a server with the latest PHP version - my local PHP is 5.3.8 and my public host's PHP is version 5.2.17.

Wirek ¶

7 years ago


    
     Note that there are (sometimes difficult to grasp at first glance) nuances of meaning and application of escape sequences like \r, \R and \v - none of them is perfect in all situations, but they are quite useful nevertheless. Some official PCRE control options and their changes come in handy too - unfortunately neither (*ANYCRLF), (*ANY) nor (*CRLF) is documented here on php.net at the moment (although they seem to be available for over 10 years and 5 months now), but they are described on Wikipedia ("Newline/linebreak options" at
     
      https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions
     
     ) and official PCRE library site ("Newline convention" at
     
      http://www.pcre.org/original/doc/html/pcresyntax.html#SEC17
     
     ) pretty well. The functionality of \R appears somehow disappointing (with default configuration of compile time option) according to php.net as well as official description ("Newline sequences" at
     
      https://www.pcre.org/original/doc/html/pcrepattern.html#newlineseq
     
     ).
     

     

     A hint for those of you who are trying to fight off (or work around at least) the problem of matching a pattern correctly at the end ($) of any line in multiple lines mode (/m).
     

     
      <?php
      

     
     
      // Various OS-es have various end line (a.k.a line break) chars:
      

      // - Windows uses CR+LF (\r\n);
      

      // - Linux LF (\n);
      

      // - OSX CR (\r).
      

      // And that's why single dollar meta assertion ($) sometimes fails with multiline modifier (/m) mode - possible bug in PHP 5.3.8 or just a "feature"(?).
      

     
     
      $str
     
     
      =
     
     
      "ABC ABC\n\n123 123\r\ndef def\rnop nop\r\n890 890\nQRS QRS\r\r~-_ ~-_"
     
     
      ;
      

     
     
      //          C          3                   p          0                   _
      

     
     
      $pat1
     
     
      =
     
     
      '/\w$/mi'
     
     
      ;
     
     
      // This works excellent in JavaScript (Firefox 7.0.1+)
      

     
     
      $pat2
     
     
      =
     
     
      '/\w\r?$/mi'
     
     
      ;
      

     
     
      $pat3
     
     
      =
     
     
      '/\w\R?$/mi'
     
     
      ;
     
     
      // Somehow disappointing according to php.net and pcre.org
      

     
     
      $pat4
     
     
      =
     
     
      '/\w\v?$/mi'
     
     
      ;
      

     
     
      $pat5
     
     
      =
     
     
      '/(*ANYCRLF)\w$/mi'
     
     
      ;
     
     
      // Excellent but undocumented on php.net at the moment
      

     
     
      $n
     
     
      =
     
     
      preg_match_all
     
     
      (
     
     
      $pat1
     
     
      ,
     
     
      $str
     
     
      ,
     
     
      $m1
     
     
      );
      

     
     
      $o
     
     
      =
     
     
      preg_match_all
     
     
      (
     
     
      $pat2
     
     
      ,
     
     
      $str
     
     
      ,
     
     
      $m2
     
     
      );
      

     
     
      $p
     
     
      =
     
     
      preg_match_all
     
     
      (
     
     
      $pat3
     
     
      ,
     
     
      $str
     
     
      ,
     
     
      $m3
     
     
      );
      

     
     
      $r
     
     
      =
     
     
      preg_match_all
     
     
      (
     
     
      $pat4
     
     
      ,
     
     
      $str
     
     
      ,
     
     
      $m4
     
     
      );
      

     
     
      $s
     
     
      =
     
     
      preg_match_all
     
     
      (
     
     
      $pat5
     
     
      ,
     
     
      $str
     
     
      ,
     
     
      $m5
     
     
      );
      

      echo
     
     
      $str
     
     
      .
     
     
      "\n1 !!!
     
     
      $pat1
     
     
      (
     
     
      $n
     
     
      ): "
     
     
      .
     
     
      print_r
     
     
      (
     
     
      $m1
     
     
      [
     
     
      0
     
     
      ],
     
     
      true
     
     
      )
      

      .
     
     
      "\n2 !!!
     
     
      $pat2
     
     
      (
     
     
      $o
     
     
      ): "
     
     
      .
     
     
      print_r
     
     
      (
     
     
      $m2
     
     
      [
     
     
      0
     
     
      ],
     
     
      true
     
     
      )
      

      .
     
     
      "\n3 !!!
     
     
      $pat3
     
     
      (
     
     
      $p
     
     
      ): "
     
     
      .
     
     
      print_r
     
     
      (
     
     
      $m3
     
     
      [
     
     
      0
     
     
      ],
     
     
      true
     
     
      )
      

      .
     
     
      "\n4 !!!
     
     
      $pat4
     
     
      (
     
     
      $r
     
     
      ): "
     
     
      .
     
     
      print_r
     
     
      (
     
     
      $m4
     
     
      [
     
     
      0
     
     
      ],
     
     
      true
     
     
      )
      

      .
     
     
      "\n5 !!!
     
     
      $pat5
     
     
      (
     
     
      $s
     
     
      ): "
     
     
      .
     
     
      print_r
     
     
      (
     
     
      $m5
     
     
      [
     
     
      0
     
     
      ],
     
     
      true
     
     
      );
      

     
     
      // Note the difference among the three very helpful escape sequences in $pat2 (\r), $pat3 (\R), $pat4 (\v) and altered newline option in $pat5 ((*ANYCRLF)) - for some applications at least.
      

      

      /* The code above results in the following output:
      

      ABC ABC
      

      

      123 123
      

      def def
      

      nop nop
      

      890 890
      

      QRS QRS
      

      

      ~-_ ~-_
      

      1 !!! /\w$/mi (3): Array
      

      (
      

      [0] => C
      

      [1] => 0
      

      [2] => _
      

      )
      

      

      2 !!! /\w\r?$/mi (5): Array
      

      (
      

      [0] => C
      

      [1] => 3
      

      [2] => p
      

      [3] => 0
      

      [4] => _
      

      )
      

      

      3 !!! /\w\R?$/mi (5): Array
      

      (
      

      [0] => C
      

      

      [1] => 3
      

      [2] => p
      

      [3] => 0
      

      [4] => _
      

      )
      

      

      4 !!! /\w\v?$/mi (5): Array
      

      (
      

      [0] => C
      

      

      [1] => 3
      

      [2] => p
      

      [3] => 0
      

      [4] => _
      

      )
      

      

      5 !!! /(*ANYCRLF)\w$/mi (7): Array
      

      (
      

      [0] => C
      

      [1] => 3
      

      [2] => f
      

      [3] => p
      

      [4] => 0
      

      [5] => S
      

      [6] => _
      

      )
      

      */
      

     
     
      ?>
      

     
     Unfortunately, I haven't got any access to a server with the latest PHP version - my local PHP is 5.3.8 and my public host's PHP is version 5.2.17.

＋添加备注