微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

perl正则表达式递归

    今天在chinaunix上看到有个贴,问

    设最外层括号为第 1 层,请问怎么样能够除去 1 对第 2 层的括号,保留其他括号?

    例如:

  1. (((1,2),3),4)   =>  ((1,3,4)
  2. ((1,(3,4))   =>  ((1,4)
  3.                         or
  4.                     (1,2,4))
  5.                     
  6. (1,(2,4)))   => (1,4))
   
     解决方案一:

    

     解决方案二:
    

     $str =~ /
     (/()         # 分组1: $1匹配左括号
     (?=         # 整体是1个环视,这样,第1次匹配成功会从第1个左括号开始,第2个次匹配成功会从第2个左括号开始,以此类推
        (         # 分组2: $2匹配括号里的内容加上$3
                (?:        # 分组不捕获
                        [^()]              # 要么不包括括号
                        |
                        (?1)(?2)        # 要么是分组1加上分组2的递归
                )+
                (/)) # 分组3:$3匹配右括号
        )
      )
      /xg;


————————————————————分割线————————————————————


     http://perldoc.perl.org/perlre.html上有介绍perl 5.10以上的正则表达式新特性

     (?PARNO) (?-PARNO) (?+PARNO) (?R) (?0)

Similar to ( ?? { code } ) except it does not involve compiling any code,instead it treats the contents of a capture buffer as an independent pattern that must match at the current position. Capture buffers contained by the pattern will have the value as determined by the outermost recursion.

PARNO is a sequence of digits (not starting with 0) whose value reflects the paren-number of the capture buffer to recurse to. (?R) recurses to the beginning of the whole pattern. (?0) is an alternate Syntax for (?R) . If PARNO is preceded by a plus or minus sign then it is assumed to be relative,with negative numbers indicating preceding capture buffers and positive ones following. Thus (?-1) refers to the most recently declared buffer,and (?+1) indicates the next buffer to be declared. Note that the counting for relative recursion differs from that of relative backreferences,in that with recursion unclosed buffers are included.

The following pattern matches a function foo() which may contain balanced parentheses as the argument.

  
  
  1. $re = qr{ ( # paren group 1 (full function)
  2. foo
  3. ( # paren group 2 (parens)
  4. /(
  5. ( # paren group 3 (contents of parens)
  6. (?:
  7. (?> [^()]+ ) # Non-parens without backtracking
  8. |
  9. (?2) # Recurse to start of paren group 2
  10. )*
  11. )
  12. /)
  13. )
  14. )
  15. }x ;

If the pattern was used as follows

  
  
  1. 'foo(bar(baz)+baz(bop))' =~/$re/
  2. and print "/$1 = $1/n" ,
  3. "/$2 = $2/n" ,
  4. "/$3 = $3/n" ;

the output produced should be the following:

  
  
  1. $1 = foo(bar(baz)+baz(bop))
  2. $2 = (bar(baz)+baz(bop))
  3. $3 = bar(baz)+baz(bop)

If there is no corresponding capture buffer defined,then it is a Fatal error. Recursing deeper than 50 times without consuming any input string will also result in a Fatal error. The maximum depth is compiled into perl,so changing it requires a custom build.

The following shows how using negative indexing can make it easier to embed recursive patterns inside of a qr// construct for later use:

  
  
  1. my $parens = qr/(/((?:[^()]++|(?-1))*+/))/ ;
  2. if ( /foo $parens /s+ + /s+ bar $parens/x ) {
  3. # do something here...
  4. }

Note that this pattern does not behave the same way as the equivalent PCRE or Python construct of the same form. In Perl you can backtrack into a recursed group,in PCRE and Python the recursed into group is treated as atomic. Also,modifiers are resolved at compile time,so constructs like (?i:(?1)) or (?:(?i)(?1)) do not affect how the sub-pattern will be processed.

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐