微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Python 子字符串从头开始返回索引有重复的大小写

如何解决Python 子字符串从头开始返回索引有重复的大小写

我有一个长字符串和相应长字符串的子字符串列表。

示例 1:

字符串:

" This is paragraph one "

子串列表:["This","is paragraph","one"]

我需要返回对应子串的索引

结果:[[0,4],[5,17],[18,21]]

示例 2: (可能有更多的空白,可能有重复的子字符串)

字符串:

"
This is a book       a book.
"

列表: 子串列表:["This","is a","book","a","book"]

结果:[[0,9],[10,14],[21,22],[23,27]]

解决方法

您可以尝试以下操作:

s = "This is a book       a book."
subs = ["This","is a","book","a","book"]

bounds = []
end = 0
for sub in subs:
    bounds.append((start := s[end:].find(sub) + end,end := start + len(sub)))
print(bounds)

它给出:

[(0,4),(5,9),(10,14),(21,22),(23,27)]

为了消遣,同样使用re

s = "This is a book       a book."
subs = ["This","book"]

import re 
re.match(".*".join(f"({t})" for t in subs),s).regs[1:]

它给出:

((0,27))
,

您可以使用生成器函数:

├── index.tsx
├── about-us.tsx

输出:

def get_matches(s,sub):
   inds = []
   for i in sub:
      if (k:=[j for j in range(len(s)) if s[j:].startswith(i) and (not inds or j > max(inds))]):
         yield [k[0],k[0]+len(i)]
         inds.append(k[0])
         
s = 'This is a book       a book.'
subs = ['This','is a','book','a','book']
print(list(get_matches(s,subs)))

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。