Python 子字符串从头开始返回索引有重复的大小写

如何解决Python 子字符串从头开始返回索引有重复的大小写

我有一个长字符串和相应长字符串的子字符串列表。

示例 1：

字符串：

" This is paragraph one "

子串列表：["This","is paragraph","one"]

我需要返回对应子串的索引

结果：[[0,4],[5,17],[18,21]]

示例 2：（可能有更多的空白，可能有重复的子字符串）

字符串：

"
This is a book       a book.
"

列表：子串列表：["This","is a","book","a","book"]

结果：[[0,9],[10,14],[21,22],[23,27]]

解决方法

您可以尝试以下操作：

s = "This is a book       a book."
subs = ["This","is a","book","a","book"]

bounds = []
end = 0
for sub in subs:
    bounds.append((start := s[end:].find(sub) + end,end := start + len(sub)))
print(bounds)

它给出：

[(0,4),(5,9),(10,14),(21,22),(23,27)]

为了消遣，同样使用re：

s = "This is a book       a book."
subs = ["This","book"]

import re 
re.match(".*".join(f"({t})" for t in subs),s).regs[1:]

它给出：

((0,27))

您可以使用生成器函数：

├── index.tsx
├── about-us.tsx

输出：

def get_matches(s,sub):
   inds = []
   for i in sub:
      if (k:=[j for j in range(len(s)) if s[j:].startswith(i) and (not inds or j > max(inds))]):
         yield [k[0],k[0]+len(i)]
         inds.append(k[0])
         
s = 'This is a book       a book.'
subs = ['This','is a','book','a','book']
print(list(get_matches(s,subs)))