Capturing repeating subpatterns in Python regex

While matching an email address, after I match something like yasar@webmail, I want to capture one or more of (.w+)(what I am doing is a little bit more complicated, this is just an example), I tried adding (.w+)+ , but it only captures last match. For example, yasar@webmail.something.edu.tr matches but only include .tr after yasar@webmail part, so I lost .something and .edu groups. Can I do this in Python regular expressions, or would you suggest matching everything at first, and split the subpatterns later?

asked Mar 19 '12 at 4:09

yasar

4,8262061131

1

Capturing repeated expressions was proposed in Python Issue 7132 but rejected. It is however supported by the third-party regex module.
– Todd Owen
Oct 15 '18 at 0:27

@ToddOwen But, isn't this now possible in 2.7? I don't know when it became possible. But, the answer from stackoverflow.com/a/9765037/3541976 seems to work just fine for me in 2.7 using the re module.
– Michael Ohlrogge
Nov 25 '18 at 0:22

1

@MichaelOhlrogge Issue 7132 is about what happens if the capturing parentheses are inside a repeat. The issue is not fixed, and will still only keep the last match. A possible workaround, as mentioned in the answer you linked to, is to put the capturing parentheses around a repeating pattern. (Note that (?: ...) are not capturing parentheses).
– Todd Owen
Nov 28 '18 at 21:36

@ToddOwen Got it, thank you, that is a helpful clarification!
– Michael Ohlrogge
Nov 29 '18 at 1:03

add a comment |

asked Mar 19 '12 at 4:09

yasar

4,8262061131

1

Capturing repeated expressions was proposed in Python Issue 7132 but rejected. It is however supported by the third-party regex module.
– Todd Owen
Oct 15 '18 at 0:27

@ToddOwen But, isn't this now possible in 2.7? I don't know when it became possible. But, the answer from stackoverflow.com/a/9765037/3541976 seems to work just fine for me in 2.7 using the re module.
– Michael Ohlrogge
Nov 25 '18 at 0:22

1

@MichaelOhlrogge Issue 7132 is about what happens if the capturing parentheses are inside a repeat. The issue is not fixed, and will still only keep the last match. A possible workaround, as mentioned in the answer you linked to, is to put the capturing parentheses around a repeating pattern. (Note that (?: ...) are not capturing parentheses).
– Todd Owen
Nov 28 '18 at 21:36

@ToddOwen Got it, thank you, that is a helpful clarification!
– Michael Ohlrogge
Nov 29 '18 at 1:03

add a comment |

asked Mar 19 '12 at 4:09

yasar

4,8262061131

python regex

asked Mar 19 '12 at 4:09

yasar

4,8262061131

asked Mar 19 '12 at 4:09

yasar

4,8262061131

asked Mar 19 '12 at 4:09

yasar

4,8262061131

asked Mar 19 '12 at 4:09

yasar

4,8262061131

asked Mar 19 '12 at 4:09

yasar

4,8262061131

1

Capturing repeated expressions was proposed in Python Issue 7132 but rejected. It is however supported by the third-party regex module.
– Todd Owen
Oct 15 '18 at 0:27

@ToddOwen But, isn't this now possible in 2.7? I don't know when it became possible. But, the answer from stackoverflow.com/a/9765037/3541976 seems to work just fine for me in 2.7 using the re module.
– Michael Ohlrogge
Nov 25 '18 at 0:22

1

@MichaelOhlrogge Issue 7132 is about what happens if the capturing parentheses are inside a repeat. The issue is not fixed, and will still only keep the last match. A possible workaround, as mentioned in the answer you linked to, is to put the capturing parentheses around a repeating pattern. (Note that (?: ...) are not capturing parentheses).
– Todd Owen
Nov 28 '18 at 21:36

@ToddOwen Got it, thank you, that is a helpful clarification!
– Michael Ohlrogge
Nov 29 '18 at 1:03

add a comment |

1

Capturing repeated expressions was proposed in Python Issue 7132 but rejected. It is however supported by the third-party regex module.
– Todd Owen
Oct 15 '18 at 0:27

@ToddOwen But, isn't this now possible in 2.7? I don't know when it became possible. But, the answer from stackoverflow.com/a/9765037/3541976 seems to work just fine for me in 2.7 using the re module.
– Michael Ohlrogge
Nov 25 '18 at 0:22

1

@MichaelOhlrogge Issue 7132 is about what happens if the capturing parentheses are inside a repeat. The issue is not fixed, and will still only keep the last match. A possible workaround, as mentioned in the answer you linked to, is to put the capturing parentheses around a repeating pattern. (Note that (?: ...) are not capturing parentheses).
– Todd Owen
Nov 28 '18 at 21:36

@ToddOwen Got it, thank you, that is a helpful clarification!
– Michael Ohlrogge
Nov 29 '18 at 1:03

Capturing repeated expressions was proposed in Python Issue 7132 but rejected. It is however supported by the third-party regex module.
– Todd Owen
Oct 15 '18 at 0:27

@ToddOwen But, isn't this now possible in 2.7? I don't know when it became possible. But, the answer from stackoverflow.com/a/9765037/3541976 seems to work just fine for me in 2.7 using the re module.
– Michael Ohlrogge
Nov 25 '18 at 0:22

@MichaelOhlrogge Issue 7132 is about what happens if the capturing parentheses are inside a repeat. The issue is not fixed, and will still only keep the last match. A possible workaround, as mentioned in the answer you linked to, is to put the capturing parentheses around a repeating pattern. (Note that (?: ...) are not capturing parentheses).
– Todd Owen
Nov 28 '18 at 21:36

@ToddOwen Got it, thank you, that is a helpful clarification!
– Michael Ohlrogge
Nov 29 '18 at 1:03

add a comment |

4 Answers
4

active

oldest

votes

re module doesn't support repeated captures (regex supports it):

>>> m = regex.match(r'([.w]+)@((w+)(.w+)+)', 'yasar@webmail.something.edu.tr')

>>> m.groups()

('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')

>>> m.captures(4)

['.something', '.edu', '.tr']

In your case I'd go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in @Li-aung Yip's answer.

edited May 23 '17 at 12:09

Community♦

answered Mar 19 '12 at 5:22

jfs

262k775481077

Out of curiosity, how do you write a replacement pattern when you match repeated captures? Does the meaning of 1, 2, 3 etc. change depending on how many times you matched (.w+)?
– Li-aung Yip
Mar 19 '12 at 7:55

@Li-aung Yip: 1 corresponds to m.group(1); the meaning hasn't changed. You could use a function as a replacement pattern and call m.captures() in it.
– jfs
Mar 19 '12 at 9:03

In your example, the meaning of 1, 2, and 3 is obvious because they only capture once. But what is the meaning of 4, corresponding to (.w+)+? 4 appears to be "the last substring matched by the 4th capture group", in this case .tr.
– Li-aung Yip
Mar 19 '12 at 9:12

@Li-aung Yip: m.groups() above explicitly shows what 4 is.
– jfs
Mar 19 '12 at 9:13

The meaning hasn't changed: 4 is m.group(4) whatever it is.
– jfs
Mar 19 '12 at 9:21

add a comment |

This will work:

>>> regexp = r"[w.]+@(w+)(.w+)?(.w+)?(.w+)?(.w+)?(.w+)?"

>>> email_address = "william.adama@galactica.caprica.fleet.mil"

>>> m = re.match(regexp, email_address)

>>> m.groups()

('galactica', '.caprica', '.fleet', '.mil', None, None)

But it's limited to a maximum of six subgroups. A better way to do this would be:

>>> m = re.match(r"[w.]+@(.+)", email_address)

>>> m.groups()

('galactica.caprica.fleet.mil',)

>>> m.group(1).split('.')

['galactica', 'caprica', 'fleet', 'mil']

Note that regexps are fine so long as the email addresses are simple - but there are all kinds of things that this will break for. See this question for a detailed treatment of email address regexes.

edited May 23 '17 at 11:46

Community♦

answered Mar 19 '12 at 4:50

Li-aung Yip

9,41642241

add a comment |

You can fix the problem of (.w+)+ only capturing the last match by doing this instead: ((?:.w+)+)

answered Mar 19 '12 at 4:28

Taymon

16k64575

2

For abbreviations (if you've lower-cased): re.sub(ur'((?:[a-z].){2,})', lambda m: m.group(1).replace('.', ''), text)
– scharfmn
Aug 15 '15 at 9:58

1

Thanks. I was able adding parentheses allowed me to match a repeated subpattern, but then there was a group in the match with the last one of the pattern. I hadn't seen that (?: ...) makes a non-capturing group. docs.python.org/2/library/re.html#regular-expression-syntax Adding that fixes that problem.
– Tim Swast
Jul 21 '16 at 22:22

Thank you@TimSwast this was exactly the comment and reference I needed!
– Michael Ohlrogge
Nov 24 '18 at 18:00

add a comment |

This is what you are looking for:

>>> import re



>>> s="yasar@webmail.something.edu.tr"

>>> r=re.compile(".w+")

>>> m=r.findall(s)



>>> m

['.something', '.edu', '.tr']

answered Oct 4 '17 at 18:22

Tushar Vazirani

40539

This doesn't match for the yasar@webmail. As such, it could easily pick up false positive results where there are things other than email addresses with multiple periods separating them.
– Michael Ohlrogge
Nov 24 '18 at 18:07

OP has clearly written that this is just an example and what he is trying to do is more complicated. Hence, my answer.
– Tushar Vazirani
Nov 24 '18 at 18:09

Yes, but the problem is that your solution won't work even on the simplified version of the problem OP gave. Your solution is trivially simple for anyone with even the most basic understanding of RegEx. All other answers are more complicated because this is a genuinely non-trivial problem to solve.
– Michael Ohlrogge
Nov 24 '18 at 18:31

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f9764930%2fcapturing-repeating-subpatterns-in-python-regex%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

re module doesn't support repeated captures (regex supports it):

>>> m = regex.match(r'([.w]+)@((w+)(.w+)+)', 'yasar@webmail.something.edu.tr')

>>> m.groups()

('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')

>>> m.captures(4)

['.something', '.edu', '.tr']

In your case I'd go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in @Li-aung Yip's answer.

edited May 23 '17 at 12:09

Community♦

answered Mar 19 '12 at 5:22

jfs

262k775481077

Out of curiosity, how do you write a replacement pattern when you match repeated captures? Does the meaning of 1, 2, 3 etc. change depending on how many times you matched (.w+)?
– Li-aung Yip
Mar 19 '12 at 7:55

@Li-aung Yip: 1 corresponds to m.group(1); the meaning hasn't changed. You could use a function as a replacement pattern and call m.captures() in it.
– jfs
Mar 19 '12 at 9:03

In your example, the meaning of 1, 2, and 3 is obvious because they only capture once. But what is the meaning of 4, corresponding to (.w+)+? 4 appears to be "the last substring matched by the 4th capture group", in this case .tr.
– Li-aung Yip
Mar 19 '12 at 9:12

@Li-aung Yip: m.groups() above explicitly shows what 4 is.
– jfs
Mar 19 '12 at 9:13

The meaning hasn't changed: 4 is m.group(4) whatever it is.
– jfs
Mar 19 '12 at 9:21

add a comment |

re module doesn't support repeated captures (regex supports it):

>>> m = regex.match(r'([.w]+)@((w+)(.w+)+)', 'yasar@webmail.something.edu.tr')

>>> m.groups()

('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')

>>> m.captures(4)

['.something', '.edu', '.tr']

In your case I'd go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in @Li-aung Yip's answer.

edited May 23 '17 at 12:09

Community♦

answered Mar 19 '12 at 5:22

jfs

262k775481077

Out of curiosity, how do you write a replacement pattern when you match repeated captures? Does the meaning of 1, 2, 3 etc. change depending on how many times you matched (.w+)?
– Li-aung Yip
Mar 19 '12 at 7:55

@Li-aung Yip: 1 corresponds to m.group(1); the meaning hasn't changed. You could use a function as a replacement pattern and call m.captures() in it.
– jfs
Mar 19 '12 at 9:03

In your example, the meaning of 1, 2, and 3 is obvious because they only capture once. But what is the meaning of 4, corresponding to (.w+)+? 4 appears to be "the last substring matched by the 4th capture group", in this case .tr.
– Li-aung Yip
Mar 19 '12 at 9:12

@Li-aung Yip: m.groups() above explicitly shows what 4 is.
– jfs
Mar 19 '12 at 9:13

The meaning hasn't changed: 4 is m.group(4) whatever it is.
– jfs
Mar 19 '12 at 9:21

add a comment |

re module doesn't support repeated captures (regex supports it):

>>> m = regex.match(r'([.w]+)@((w+)(.w+)+)', 'yasar@webmail.something.edu.tr')

>>> m.groups()

('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')

>>> m.captures(4)

['.something', '.edu', '.tr']

In your case I'd go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in @Li-aung Yip's answer.

edited May 23 '17 at 12:09

Community♦

answered Mar 19 '12 at 5:22

jfs

262k775481077

re module doesn't support repeated captures (regex supports it):

>>> m = regex.match(r'([.w]+)@((w+)(.w+)+)', 'yasar@webmail.something.edu.tr')

>>> m.groups()

('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')

>>> m.captures(4)

['.something', '.edu', '.tr']

In your case I'd go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in @Li-aung Yip's answer.

edited May 23 '17 at 12:09

Community♦

answered Mar 19 '12 at 5:22

jfs

262k775481077

edited May 23 '17 at 12:09

Community♦

edited May 23 '17 at 12:09

Community♦

edited May 23 '17 at 12:09

Community♦

answered Mar 19 '12 at 5:22

jfs

262k775481077

answered Mar 19 '12 at 5:22

jfs

262k775481077

answered Mar 19 '12 at 5:22

jfs

262k775481077

Out of curiosity, how do you write a replacement pattern when you match repeated captures? Does the meaning of 1, 2, 3 etc. change depending on how many times you matched (.w+)?
– Li-aung Yip
Mar 19 '12 at 7:55

@Li-aung Yip: 1 corresponds to m.group(1); the meaning hasn't changed. You could use a function as a replacement pattern and call m.captures() in it.
– jfs
Mar 19 '12 at 9:03

In your example, the meaning of 1, 2, and 3 is obvious because they only capture once. But what is the meaning of 4, corresponding to (.w+)+? 4 appears to be "the last substring matched by the 4th capture group", in this case .tr.
– Li-aung Yip
Mar 19 '12 at 9:12

@Li-aung Yip: m.groups() above explicitly shows what 4 is.
– jfs
Mar 19 '12 at 9:13

The meaning hasn't changed: 4 is m.group(4) whatever it is.
– jfs
Mar 19 '12 at 9:21

add a comment |

Out of curiosity, how do you write a replacement pattern when you match repeated captures? Does the meaning of 1, 2, 3 etc. change depending on how many times you matched (.w+)?
– Li-aung Yip
Mar 19 '12 at 7:55

@Li-aung Yip: 1 corresponds to m.group(1); the meaning hasn't changed. You could use a function as a replacement pattern and call m.captures() in it.
– jfs
Mar 19 '12 at 9:03

In your example, the meaning of 1, 2, and 3 is obvious because they only capture once. But what is the meaning of 4, corresponding to (.w+)+? 4 appears to be "the last substring matched by the 4th capture group", in this case .tr.
– Li-aung Yip
Mar 19 '12 at 9:12

@Li-aung Yip: m.groups() above explicitly shows what 4 is.
– jfs
Mar 19 '12 at 9:13

The meaning hasn't changed: 4 is m.group(4) whatever it is.
– jfs
Mar 19 '12 at 9:21

Out of curiosity, how do you write a replacement pattern when you match repeated captures? Does the meaning of 1, 2, 3 etc. change depending on how many times you matched (.w+)?
– Li-aung Yip
Mar 19 '12 at 7:55

@Li-aung Yip: 1 corresponds to m.group(1); the meaning hasn't changed. You could use a function as a replacement pattern and call m.captures() in it.
– jfs
Mar 19 '12 at 9:03

In your example, the meaning of 1, 2, and 3 is obvious because they only capture once. But what is the meaning of 4, corresponding to (.w+)+? 4 appears to be "the last substring matched by the 4th capture group", in this case .tr.
– Li-aung Yip
Mar 19 '12 at 9:12

@Li-aung Yip: m.groups() above explicitly shows what 4 is.
– jfs
Mar 19 '12 at 9:13

The meaning hasn't changed: 4 is m.group(4) whatever it is.
– jfs
Mar 19 '12 at 9:21

add a comment |

This will work:

>>> regexp = r"[w.]+@(w+)(.w+)?(.w+)?(.w+)?(.w+)?(.w+)?"

>>> email_address = "william.adama@galactica.caprica.fleet.mil"

>>> m = re.match(regexp, email_address)

>>> m.groups()

('galactica', '.caprica', '.fleet', '.mil', None, None)

But it's limited to a maximum of six subgroups. A better way to do this would be:

>>> m = re.match(r"[w.]+@(.+)", email_address)

>>> m.groups()

('galactica.caprica.fleet.mil',)

>>> m.group(1).split('.')

['galactica', 'caprica', 'fleet', 'mil']

Note that regexps are fine so long as the email addresses are simple - but there are all kinds of things that this will break for. See this question for a detailed treatment of email address regexes.

edited May 23 '17 at 11:46

Community♦

answered Mar 19 '12 at 4:50

Li-aung Yip

9,41642241

add a comment |

This will work:

>>> regexp = r"[w.]+@(w+)(.w+)?(.w+)?(.w+)?(.w+)?(.w+)?"

>>> email_address = "william.adama@galactica.caprica.fleet.mil"

>>> m = re.match(regexp, email_address)

>>> m.groups()

('galactica', '.caprica', '.fleet', '.mil', None, None)

But it's limited to a maximum of six subgroups. A better way to do this would be:

>>> m = re.match(r"[w.]+@(.+)", email_address)

>>> m.groups()

('galactica.caprica.fleet.mil',)

>>> m.group(1).split('.')

['galactica', 'caprica', 'fleet', 'mil']

Note that regexps are fine so long as the email addresses are simple - but there are all kinds of things that this will break for. See this question for a detailed treatment of email address regexes.

edited May 23 '17 at 11:46

Community♦

answered Mar 19 '12 at 4:50

Li-aung Yip

9,41642241

add a comment |

This will work:

>>> regexp = r"[w.]+@(w+)(.w+)?(.w+)?(.w+)?(.w+)?(.w+)?"

>>> email_address = "william.adama@galactica.caprica.fleet.mil"

>>> m = re.match(regexp, email_address)

>>> m.groups()

('galactica', '.caprica', '.fleet', '.mil', None, None)

But it's limited to a maximum of six subgroups. A better way to do this would be:

>>> m = re.match(r"[w.]+@(.+)", email_address)

>>> m.groups()

('galactica.caprica.fleet.mil',)

>>> m.group(1).split('.')

['galactica', 'caprica', 'fleet', 'mil']

Note that regexps are fine so long as the email addresses are simple - but there are all kinds of things that this will break for. See this question for a detailed treatment of email address regexes.

edited May 23 '17 at 11:46

Community♦

answered Mar 19 '12 at 4:50

Li-aung Yip

9,41642241

This will work:

>>> regexp = r"[w.]+@(w+)(.w+)?(.w+)?(.w+)?(.w+)?(.w+)?"

>>> email_address = "william.adama@galactica.caprica.fleet.mil"

>>> m = re.match(regexp, email_address)

>>> m.groups()

('galactica', '.caprica', '.fleet', '.mil', None, None)

But it's limited to a maximum of six subgroups. A better way to do this would be:

>>> m = re.match(r"[w.]+@(.+)", email_address)

>>> m.groups()

('galactica.caprica.fleet.mil',)

>>> m.group(1).split('.')

['galactica', 'caprica', 'fleet', 'mil']

Note that regexps are fine so long as the email addresses are simple - but there are all kinds of things that this will break for. See this question for a detailed treatment of email address regexes.

edited May 23 '17 at 11:46

Community♦

answered Mar 19 '12 at 4:50

Li-aung Yip

9,41642241

edited May 23 '17 at 11:46

Community♦

edited May 23 '17 at 11:46

Community♦

edited May 23 '17 at 11:46

Community♦

answered Mar 19 '12 at 4:50

Li-aung Yip

9,41642241

answered Mar 19 '12 at 4:50

Li-aung Yip

9,41642241

answered Mar 19 '12 at 4:50

Li-aung Yip

9,41642241

add a comment |

You can fix the problem of (.w+)+ only capturing the last match by doing this instead: ((?:.w+)+)

answered Mar 19 '12 at 4:28

Taymon

16k64575

2

For abbreviations (if you've lower-cased): re.sub(ur'((?:[a-z].){2,})', lambda m: m.group(1).replace('.', ''), text)
– scharfmn
Aug 15 '15 at 9:58

1

Thanks. I was able adding parentheses allowed me to match a repeated subpattern, but then there was a group in the match with the last one of the pattern. I hadn't seen that (?: ...) makes a non-capturing group. docs.python.org/2/library/re.html#regular-expression-syntax Adding that fixes that problem.
– Tim Swast
Jul 21 '16 at 22:22

Thank you@TimSwast this was exactly the comment and reference I needed!
– Michael Ohlrogge
Nov 24 '18 at 18:00

add a comment |

You can fix the problem of (.w+)+ only capturing the last match by doing this instead: ((?:.w+)+)

answered Mar 19 '12 at 4:28

Taymon

16k64575

2

For abbreviations (if you've lower-cased): re.sub(ur'((?:[a-z].){2,})', lambda m: m.group(1).replace('.', ''), text)
– scharfmn
Aug 15 '15 at 9:58

1

Thanks. I was able adding parentheses allowed me to match a repeated subpattern, but then there was a group in the match with the last one of the pattern. I hadn't seen that (?: ...) makes a non-capturing group. docs.python.org/2/library/re.html#regular-expression-syntax Adding that fixes that problem.
– Tim Swast
Jul 21 '16 at 22:22

Thank you@TimSwast this was exactly the comment and reference I needed!
– Michael Ohlrogge
Nov 24 '18 at 18:00

add a comment |

You can fix the problem of (.w+)+ only capturing the last match by doing this instead: ((?:.w+)+)

answered Mar 19 '12 at 4:28

Taymon

16k64575

You can fix the problem of (.w+)+ only capturing the last match by doing this instead: ((?:.w+)+)

answered Mar 19 '12 at 4:28

Taymon

16k64575

answered Mar 19 '12 at 4:28

Taymon

16k64575

answered Mar 19 '12 at 4:28

Taymon

16k64575

answered Mar 19 '12 at 4:28

Taymon

16k64575

2

For abbreviations (if you've lower-cased): re.sub(ur'((?:[a-z].){2,})', lambda m: m.group(1).replace('.', ''), text)
– scharfmn
Aug 15 '15 at 9:58

1

Thanks. I was able adding parentheses allowed me to match a repeated subpattern, but then there was a group in the match with the last one of the pattern. I hadn't seen that (?: ...) makes a non-capturing group. docs.python.org/2/library/re.html#regular-expression-syntax Adding that fixes that problem.
– Tim Swast
Jul 21 '16 at 22:22

Thank you@TimSwast this was exactly the comment and reference I needed!
– Michael Ohlrogge
Nov 24 '18 at 18:00

add a comment |

2

For abbreviations (if you've lower-cased): re.sub(ur'((?:[a-z].){2,})', lambda m: m.group(1).replace('.', ''), text)
– scharfmn
Aug 15 '15 at 9:58

1

Thanks. I was able adding parentheses allowed me to match a repeated subpattern, but then there was a group in the match with the last one of the pattern. I hadn't seen that (?: ...) makes a non-capturing group. docs.python.org/2/library/re.html#regular-expression-syntax Adding that fixes that problem.
– Tim Swast
Jul 21 '16 at 22:22

Thank you@TimSwast this was exactly the comment and reference I needed!
– Michael Ohlrogge
Nov 24 '18 at 18:00

For abbreviations (if you've lower-cased): re.sub(ur'((?:[a-z].){2,})', lambda m: m.group(1).replace('.', ''), text)
– scharfmn
Aug 15 '15 at 9:58

Thanks. I was able adding parentheses allowed me to match a repeated subpattern, but then there was a group in the match with the last one of the pattern. I hadn't seen that (?: ...) makes a non-capturing group. docs.python.org/2/library/re.html#regular-expression-syntax Adding that fixes that problem.
– Tim Swast
Jul 21 '16 at 22:22

Thank you@TimSwast this was exactly the comment and reference I needed!
– Michael Ohlrogge
Nov 24 '18 at 18:00

add a comment |

This is what you are looking for:

>>> import re



>>> s="yasar@webmail.something.edu.tr"

>>> r=re.compile(".w+")

>>> m=r.findall(s)



>>> m

['.something', '.edu', '.tr']

answered Oct 4 '17 at 18:22

Tushar Vazirani

40539

This doesn't match for the yasar@webmail. As such, it could easily pick up false positive results where there are things other than email addresses with multiple periods separating them.
– Michael Ohlrogge
Nov 24 '18 at 18:07

OP has clearly written that this is just an example and what he is trying to do is more complicated. Hence, my answer.
– Tushar Vazirani
Nov 24 '18 at 18:09

Yes, but the problem is that your solution won't work even on the simplified version of the problem OP gave. Your solution is trivially simple for anyone with even the most basic understanding of RegEx. All other answers are more complicated because this is a genuinely non-trivial problem to solve.
– Michael Ohlrogge
Nov 24 '18 at 18:31

add a comment |

This is what you are looking for:

>>> import re



>>> s="yasar@webmail.something.edu.tr"

>>> r=re.compile(".w+")

>>> m=r.findall(s)



>>> m

['.something', '.edu', '.tr']

answered Oct 4 '17 at 18:22

Tushar Vazirani

40539

This doesn't match for the yasar@webmail. As such, it could easily pick up false positive results where there are things other than email addresses with multiple periods separating them.
– Michael Ohlrogge
Nov 24 '18 at 18:07

OP has clearly written that this is just an example and what he is trying to do is more complicated. Hence, my answer.
– Tushar Vazirani
Nov 24 '18 at 18:09

Yes, but the problem is that your solution won't work even on the simplified version of the problem OP gave. Your solution is trivially simple for anyone with even the most basic understanding of RegEx. All other answers are more complicated because this is a genuinely non-trivial problem to solve.
– Michael Ohlrogge
Nov 24 '18 at 18:31

add a comment |

This is what you are looking for:

>>> import re



>>> s="yasar@webmail.something.edu.tr"

>>> r=re.compile(".w+")

>>> m=r.findall(s)



>>> m

['.something', '.edu', '.tr']

answered Oct 4 '17 at 18:22

Tushar Vazirani

40539

This is what you are looking for:

>>> import re



>>> s="yasar@webmail.something.edu.tr"

>>> r=re.compile(".w+")

>>> m=r.findall(s)



>>> m

['.something', '.edu', '.tr']

answered Oct 4 '17 at 18:22

Tushar Vazirani

40539

answered Oct 4 '17 at 18:22

Tushar Vazirani

40539

answered Oct 4 '17 at 18:22

Tushar Vazirani

40539

answered Oct 4 '17 at 18:22

Tushar Vazirani

40539

This doesn't match for the yasar@webmail. As such, it could easily pick up false positive results where there are things other than email addresses with multiple periods separating them.
– Michael Ohlrogge
Nov 24 '18 at 18:07

OP has clearly written that this is just an example and what he is trying to do is more complicated. Hence, my answer.
– Tushar Vazirani
Nov 24 '18 at 18:09

Yes, but the problem is that your solution won't work even on the simplified version of the problem OP gave. Your solution is trivially simple for anyone with even the most basic understanding of RegEx. All other answers are more complicated because this is a genuinely non-trivial problem to solve.
– Michael Ohlrogge
Nov 24 '18 at 18:31

add a comment |

This doesn't match for the yasar@webmail. As such, it could easily pick up false positive results where there are things other than email addresses with multiple periods separating them.
– Michael Ohlrogge
Nov 24 '18 at 18:07

OP has clearly written that this is just an example and what he is trying to do is more complicated. Hence, my answer.
– Tushar Vazirani
Nov 24 '18 at 18:09

Yes, but the problem is that your solution won't work even on the simplified version of the problem OP gave. Your solution is trivially simple for anyone with even the most basic understanding of RegEx. All other answers are more complicated because this is a genuinely non-trivial problem to solve.
– Michael Ohlrogge
Nov 24 '18 at 18:31

This doesn't match for the yasar@webmail. As such, it could easily pick up false positive results where there are things other than email addresses with multiple periods separating them.
– Michael Ohlrogge
Nov 24 '18 at 18:07

OP has clearly written that this is just an example and what he is trying to do is more complicated. Hence, my answer.
– Tushar Vazirani
Nov 24 '18 at 18:09

Yes, but the problem is that your solution won't work even on the simplified version of the problem OP gave. Your solution is trivially simple for anyone with even the most basic understanding of RegEx. All other answers are more complicated because this is a genuinely non-trivial problem to solve.
– Michael Ohlrogge
Nov 24 '18 at 18:31

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

C,ny97rPGYijBtAAW2Tx,hdtvPEKyi,ojvcLQ7TMy xzXeBIQBox4c

搜尋此網誌

Btukfyl