Shortest unique combination of at least three characters for each string in list
I want to find the shortest unique combination of characters for each element in a list of strings. Each combination should consist of the string's first character and its two rarest characters at least (more if necessary) and order matters. If a character appears more than once in one string, it should get more weight.
Consider the following example:
liste = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]
combinations = ["apl", "per", "ban", "xyh", "ber", "bnu"
for apple
, both p
and e
appear 4 times overall, but since p
appears twice in apple
, it should be used in the combination.
What is the most efficient way to write this logic in python?
python combinatorics
add a comment |
I want to find the shortest unique combination of characters for each element in a list of strings. Each combination should consist of the string's first character and its two rarest characters at least (more if necessary) and order matters. If a character appears more than once in one string, it should get more weight.
Consider the following example:
liste = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]
combinations = ["apl", "per", "ban", "xyh", "ber", "bnu"
for apple
, both p
and e
appear 4 times overall, but since p
appears twice in apple
, it should be used in the combination.
What is the most efficient way to write this logic in python?
python combinatorics
1
"for apple, both p and e appear 4 times overall,..." I didn't get it. How do they appear 4 times?
– Austin
Nov 25 '18 at 16:28
In all strings together, bothp
ande
occur 4 times in total. The total number of occurrences of a character is what determines its rarity and whether it should be included in the combination. In case of a tie, like here, preference should be given to the character that appears more than once in the string (if any).
– iuvbio
Nov 25 '18 at 16:34
Why'l'
instead of'h'
in xylophone?
– Daniel Mesejo
Nov 25 '18 at 16:46
My mistake, thanks. Edited the question.
– iuvbio
Nov 25 '18 at 16:48
add a comment |
I want to find the shortest unique combination of characters for each element in a list of strings. Each combination should consist of the string's first character and its two rarest characters at least (more if necessary) and order matters. If a character appears more than once in one string, it should get more weight.
Consider the following example:
liste = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]
combinations = ["apl", "per", "ban", "xyh", "ber", "bnu"
for apple
, both p
and e
appear 4 times overall, but since p
appears twice in apple
, it should be used in the combination.
What is the most efficient way to write this logic in python?
python combinatorics
I want to find the shortest unique combination of characters for each element in a list of strings. Each combination should consist of the string's first character and its two rarest characters at least (more if necessary) and order matters. If a character appears more than once in one string, it should get more weight.
Consider the following example:
liste = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]
combinations = ["apl", "per", "ban", "xyh", "ber", "bnu"
for apple
, both p
and e
appear 4 times overall, but since p
appears twice in apple
, it should be used in the combination.
What is the most efficient way to write this logic in python?
python combinatorics
python combinatorics
edited Nov 25 '18 at 16:48
iuvbio
asked Nov 25 '18 at 16:21
iuvbioiuvbio
359
359
1
"for apple, both p and e appear 4 times overall,..." I didn't get it. How do they appear 4 times?
– Austin
Nov 25 '18 at 16:28
In all strings together, bothp
ande
occur 4 times in total. The total number of occurrences of a character is what determines its rarity and whether it should be included in the combination. In case of a tie, like here, preference should be given to the character that appears more than once in the string (if any).
– iuvbio
Nov 25 '18 at 16:34
Why'l'
instead of'h'
in xylophone?
– Daniel Mesejo
Nov 25 '18 at 16:46
My mistake, thanks. Edited the question.
– iuvbio
Nov 25 '18 at 16:48
add a comment |
1
"for apple, both p and e appear 4 times overall,..." I didn't get it. How do they appear 4 times?
– Austin
Nov 25 '18 at 16:28
In all strings together, bothp
ande
occur 4 times in total. The total number of occurrences of a character is what determines its rarity and whether it should be included in the combination. In case of a tie, like here, preference should be given to the character that appears more than once in the string (if any).
– iuvbio
Nov 25 '18 at 16:34
Why'l'
instead of'h'
in xylophone?
– Daniel Mesejo
Nov 25 '18 at 16:46
My mistake, thanks. Edited the question.
– iuvbio
Nov 25 '18 at 16:48
1
1
"for apple, both p and e appear 4 times overall,..." I didn't get it. How do they appear 4 times?
– Austin
Nov 25 '18 at 16:28
"for apple, both p and e appear 4 times overall,..." I didn't get it. How do they appear 4 times?
– Austin
Nov 25 '18 at 16:28
In all strings together, both
p
and e
occur 4 times in total. The total number of occurrences of a character is what determines its rarity and whether it should be included in the combination. In case of a tie, like here, preference should be given to the character that appears more than once in the string (if any).– iuvbio
Nov 25 '18 at 16:34
In all strings together, both
p
and e
occur 4 times in total. The total number of occurrences of a character is what determines its rarity and whether it should be included in the combination. In case of a tie, like here, preference should be given to the character that appears more than once in the string (if any).– iuvbio
Nov 25 '18 at 16:34
Why
'l'
instead of 'h'
in xylophone?– Daniel Mesejo
Nov 25 '18 at 16:46
Why
'l'
instead of 'h'
in xylophone?– Daniel Mesejo
Nov 25 '18 at 16:46
My mistake, thanks. Edited the question.
– iuvbio
Nov 25 '18 at 16:48
My mistake, thanks. Edited the question.
– iuvbio
Nov 25 '18 at 16:48
add a comment |
1 Answer
1
active
oldest
votes
You could do something like this:
import heapq
from collections import Counter
from operator import itemgetter
def combination(word, n, counts):
word_count = Counter(word)
elements =
seen = set()
for i, c in enumerate(word[1:]):
if c not in seen:
elements.append((-1 * counts[c], word_count[c], i, c))
seen.add(c)
top = heapq.nlargest(n, elements)
characters = map(itemgetter(3), top)
return word[0] + ''.join(sorted(characters, key=lambda x: word.index(x)))
lst = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]
counts = Counter(''.join(lst))
result = [combination(w, 2, counts) for w in lst]
print(result)
Output
['apl', 'per', 'ban', 'xyh', 'ber', 'bnu']
The idea is to create a tuple of the priority criterias representing each unique letter. So elements
is a list containing tuples represeting:
counts[c]
: The overall count (as you want the rarest multiply by -1)
word_count[c]
: The specific count of the letter in the word
i
: represents the firs position of the letter
c
: the letter itself.
Once you create the list elements
with:
elements =
seen = set()
for i, c in enumerate(word[1:]):
if c not in seen:
elements.append((-1 * counts[c], word_count[c], i, c))
seen.add(c)
Note that as the characters must be unique we use a set (seen
) to guarantee uniqueness. Finally you use heapq.nlargest to get the top n
elements according to the above criterias.
Zero explanation on how this works, or what heapq is
– Moralous
Nov 25 '18 at 17:00
That seems like exactly what I was looking for! Could you elaborate a bit onelements.append((-1 * counts[c], word_count[c], i, c))
and whatheapq.nlargest(n, elements)
does?
– iuvbio
Nov 25 '18 at 17:02
@iuvbio Updated the answer!
– Daniel Mesejo
Nov 25 '18 at 17:20
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53469457%2fshortest-unique-combination-of-at-least-three-characters-for-each-string-in-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You could do something like this:
import heapq
from collections import Counter
from operator import itemgetter
def combination(word, n, counts):
word_count = Counter(word)
elements =
seen = set()
for i, c in enumerate(word[1:]):
if c not in seen:
elements.append((-1 * counts[c], word_count[c], i, c))
seen.add(c)
top = heapq.nlargest(n, elements)
characters = map(itemgetter(3), top)
return word[0] + ''.join(sorted(characters, key=lambda x: word.index(x)))
lst = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]
counts = Counter(''.join(lst))
result = [combination(w, 2, counts) for w in lst]
print(result)
Output
['apl', 'per', 'ban', 'xyh', 'ber', 'bnu']
The idea is to create a tuple of the priority criterias representing each unique letter. So elements
is a list containing tuples represeting:
counts[c]
: The overall count (as you want the rarest multiply by -1)
word_count[c]
: The specific count of the letter in the word
i
: represents the firs position of the letter
c
: the letter itself.
Once you create the list elements
with:
elements =
seen = set()
for i, c in enumerate(word[1:]):
if c not in seen:
elements.append((-1 * counts[c], word_count[c], i, c))
seen.add(c)
Note that as the characters must be unique we use a set (seen
) to guarantee uniqueness. Finally you use heapq.nlargest to get the top n
elements according to the above criterias.
Zero explanation on how this works, or what heapq is
– Moralous
Nov 25 '18 at 17:00
That seems like exactly what I was looking for! Could you elaborate a bit onelements.append((-1 * counts[c], word_count[c], i, c))
and whatheapq.nlargest(n, elements)
does?
– iuvbio
Nov 25 '18 at 17:02
@iuvbio Updated the answer!
– Daniel Mesejo
Nov 25 '18 at 17:20
add a comment |
You could do something like this:
import heapq
from collections import Counter
from operator import itemgetter
def combination(word, n, counts):
word_count = Counter(word)
elements =
seen = set()
for i, c in enumerate(word[1:]):
if c not in seen:
elements.append((-1 * counts[c], word_count[c], i, c))
seen.add(c)
top = heapq.nlargest(n, elements)
characters = map(itemgetter(3), top)
return word[0] + ''.join(sorted(characters, key=lambda x: word.index(x)))
lst = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]
counts = Counter(''.join(lst))
result = [combination(w, 2, counts) for w in lst]
print(result)
Output
['apl', 'per', 'ban', 'xyh', 'ber', 'bnu']
The idea is to create a tuple of the priority criterias representing each unique letter. So elements
is a list containing tuples represeting:
counts[c]
: The overall count (as you want the rarest multiply by -1)
word_count[c]
: The specific count of the letter in the word
i
: represents the firs position of the letter
c
: the letter itself.
Once you create the list elements
with:
elements =
seen = set()
for i, c in enumerate(word[1:]):
if c not in seen:
elements.append((-1 * counts[c], word_count[c], i, c))
seen.add(c)
Note that as the characters must be unique we use a set (seen
) to guarantee uniqueness. Finally you use heapq.nlargest to get the top n
elements according to the above criterias.
Zero explanation on how this works, or what heapq is
– Moralous
Nov 25 '18 at 17:00
That seems like exactly what I was looking for! Could you elaborate a bit onelements.append((-1 * counts[c], word_count[c], i, c))
and whatheapq.nlargest(n, elements)
does?
– iuvbio
Nov 25 '18 at 17:02
@iuvbio Updated the answer!
– Daniel Mesejo
Nov 25 '18 at 17:20
add a comment |
You could do something like this:
import heapq
from collections import Counter
from operator import itemgetter
def combination(word, n, counts):
word_count = Counter(word)
elements =
seen = set()
for i, c in enumerate(word[1:]):
if c not in seen:
elements.append((-1 * counts[c], word_count[c], i, c))
seen.add(c)
top = heapq.nlargest(n, elements)
characters = map(itemgetter(3), top)
return word[0] + ''.join(sorted(characters, key=lambda x: word.index(x)))
lst = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]
counts = Counter(''.join(lst))
result = [combination(w, 2, counts) for w in lst]
print(result)
Output
['apl', 'per', 'ban', 'xyh', 'ber', 'bnu']
The idea is to create a tuple of the priority criterias representing each unique letter. So elements
is a list containing tuples represeting:
counts[c]
: The overall count (as you want the rarest multiply by -1)
word_count[c]
: The specific count of the letter in the word
i
: represents the firs position of the letter
c
: the letter itself.
Once you create the list elements
with:
elements =
seen = set()
for i, c in enumerate(word[1:]):
if c not in seen:
elements.append((-1 * counts[c], word_count[c], i, c))
seen.add(c)
Note that as the characters must be unique we use a set (seen
) to guarantee uniqueness. Finally you use heapq.nlargest to get the top n
elements according to the above criterias.
You could do something like this:
import heapq
from collections import Counter
from operator import itemgetter
def combination(word, n, counts):
word_count = Counter(word)
elements =
seen = set()
for i, c in enumerate(word[1:]):
if c not in seen:
elements.append((-1 * counts[c], word_count[c], i, c))
seen.add(c)
top = heapq.nlargest(n, elements)
characters = map(itemgetter(3), top)
return word[0] + ''.join(sorted(characters, key=lambda x: word.index(x)))
lst = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]
counts = Counter(''.join(lst))
result = [combination(w, 2, counts) for w in lst]
print(result)
Output
['apl', 'per', 'ban', 'xyh', 'ber', 'bnu']
The idea is to create a tuple of the priority criterias representing each unique letter. So elements
is a list containing tuples represeting:
counts[c]
: The overall count (as you want the rarest multiply by -1)
word_count[c]
: The specific count of the letter in the word
i
: represents the firs position of the letter
c
: the letter itself.
Once you create the list elements
with:
elements =
seen = set()
for i, c in enumerate(word[1:]):
if c not in seen:
elements.append((-1 * counts[c], word_count[c], i, c))
seen.add(c)
Note that as the characters must be unique we use a set (seen
) to guarantee uniqueness. Finally you use heapq.nlargest to get the top n
elements according to the above criterias.
edited Nov 25 '18 at 17:20
answered Nov 25 '18 at 16:49
Daniel MesejoDaniel Mesejo
16.8k21430
16.8k21430
Zero explanation on how this works, or what heapq is
– Moralous
Nov 25 '18 at 17:00
That seems like exactly what I was looking for! Could you elaborate a bit onelements.append((-1 * counts[c], word_count[c], i, c))
and whatheapq.nlargest(n, elements)
does?
– iuvbio
Nov 25 '18 at 17:02
@iuvbio Updated the answer!
– Daniel Mesejo
Nov 25 '18 at 17:20
add a comment |
Zero explanation on how this works, or what heapq is
– Moralous
Nov 25 '18 at 17:00
That seems like exactly what I was looking for! Could you elaborate a bit onelements.append((-1 * counts[c], word_count[c], i, c))
and whatheapq.nlargest(n, elements)
does?
– iuvbio
Nov 25 '18 at 17:02
@iuvbio Updated the answer!
– Daniel Mesejo
Nov 25 '18 at 17:20
Zero explanation on how this works, or what heapq is
– Moralous
Nov 25 '18 at 17:00
Zero explanation on how this works, or what heapq is
– Moralous
Nov 25 '18 at 17:00
That seems like exactly what I was looking for! Could you elaborate a bit on
elements.append((-1 * counts[c], word_count[c], i, c))
and what heapq.nlargest(n, elements)
does?– iuvbio
Nov 25 '18 at 17:02
That seems like exactly what I was looking for! Could you elaborate a bit on
elements.append((-1 * counts[c], word_count[c], i, c))
and what heapq.nlargest(n, elements)
does?– iuvbio
Nov 25 '18 at 17:02
@iuvbio Updated the answer!
– Daniel Mesejo
Nov 25 '18 at 17:20
@iuvbio Updated the answer!
– Daniel Mesejo
Nov 25 '18 at 17:20
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53469457%2fshortest-unique-combination-of-at-least-three-characters-for-each-string-in-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
"for apple, both p and e appear 4 times overall,..." I didn't get it. How do they appear 4 times?
– Austin
Nov 25 '18 at 16:28
In all strings together, both
p
ande
occur 4 times in total. The total number of occurrences of a character is what determines its rarity and whether it should be included in the combination. In case of a tie, like here, preference should be given to the character that appears more than once in the string (if any).– iuvbio
Nov 25 '18 at 16:34
Why
'l'
instead of'h'
in xylophone?– Daniel Mesejo
Nov 25 '18 at 16:46
My mistake, thanks. Edited the question.
– iuvbio
Nov 25 '18 at 16:48