Shortest unique combination of at least three characters for each string in list

I want to find the shortest unique combination of characters for each element in a list of strings. Each combination should consist of the string's first character and its two rarest characters at least (more if necessary) and order matters. If a character appears more than once in one string, it should get more weight.

Consider the following example:

liste = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]

combinations = ["apl", "per", "ban", "xyh", "ber", "bnu"

for apple, both p and e appear 4 times overall, but since p appears twice in apple, it should be used in the combination.

What is the most efficient way to write this logic in python?

edited Nov 25 '18 at 16:48

asked Nov 25 '18 at 16:21

iuvbio

359

1

"for apple, both p and e appear 4 times overall,..." I didn't get it. How do they appear 4 times?

– Austin
Nov 25 '18 at 16:28

In all strings together, both p and e occur 4 times in total. The total number of occurrences of a character is what determines its rarity and whether it should be included in the combination. In case of a tie, like here, preference should be given to the character that appears more than once in the string (if any).

– iuvbio
Nov 25 '18 at 16:34

Why 'l' instead of 'h' in xylophone?

– Daniel Mesejo
Nov 25 '18 at 16:46

My mistake, thanks. Edited the question.

– iuvbio
Nov 25 '18 at 16:48

add a comment |

Consider the following example:

liste = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]

combinations = ["apl", "per", "ban", "xyh", "ber", "bnu"

for apple, both p and e appear 4 times overall, but since p appears twice in apple, it should be used in the combination.

What is the most efficient way to write this logic in python?

edited Nov 25 '18 at 16:48

asked Nov 25 '18 at 16:21

iuvbio

359

1

"for apple, both p and e appear 4 times overall,..." I didn't get it. How do they appear 4 times?

– Austin
Nov 25 '18 at 16:28

In all strings together, both p and e occur 4 times in total. The total number of occurrences of a character is what determines its rarity and whether it should be included in the combination. In case of a tie, like here, preference should be given to the character that appears more than once in the string (if any).

– iuvbio
Nov 25 '18 at 16:34

Why 'l' instead of 'h' in xylophone?

– Daniel Mesejo
Nov 25 '18 at 16:46

My mistake, thanks. Edited the question.

– iuvbio
Nov 25 '18 at 16:48

add a comment |

Consider the following example:

liste = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]

combinations = ["apl", "per", "ban", "xyh", "ber", "bnu"

for apple, both p and e appear 4 times overall, but since p appears twice in apple, it should be used in the combination.

What is the most efficient way to write this logic in python?

edited Nov 25 '18 at 16:48

asked Nov 25 '18 at 16:21

iuvbio

359

Consider the following example:

liste = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]

combinations = ["apl", "per", "ban", "xyh", "ber", "bnu"

for apple, both p and e appear 4 times overall, but since p appears twice in apple, it should be used in the combination.

What is the most efficient way to write this logic in python?

python combinatorics

edited Nov 25 '18 at 16:48

asked Nov 25 '18 at 16:21

iuvbio

359

edited Nov 25 '18 at 16:48

asked Nov 25 '18 at 16:21

iuvbio

359

edited Nov 25 '18 at 16:48

asked Nov 25 '18 at 16:21

iuvbio

359

asked Nov 25 '18 at 16:21

iuvbio

359

asked Nov 25 '18 at 16:21

iuvbio

359

1

"for apple, both p and e appear 4 times overall,..." I didn't get it. How do they appear 4 times?

– Austin
Nov 25 '18 at 16:28

In all strings together, both p and e occur 4 times in total. The total number of occurrences of a character is what determines its rarity and whether it should be included in the combination. In case of a tie, like here, preference should be given to the character that appears more than once in the string (if any).

– iuvbio
Nov 25 '18 at 16:34

Why 'l' instead of 'h' in xylophone?

– Daniel Mesejo
Nov 25 '18 at 16:46

My mistake, thanks. Edited the question.

– iuvbio
Nov 25 '18 at 16:48

add a comment |

1

"for apple, both p and e appear 4 times overall,..." I didn't get it. How do they appear 4 times?

– Austin
Nov 25 '18 at 16:28

In all strings together, both p and e occur 4 times in total. The total number of occurrences of a character is what determines its rarity and whether it should be included in the combination. In case of a tie, like here, preference should be given to the character that appears more than once in the string (if any).

– iuvbio
Nov 25 '18 at 16:34

Why 'l' instead of 'h' in xylophone?

– Daniel Mesejo
Nov 25 '18 at 16:46

My mistake, thanks. Edited the question.

– iuvbio
Nov 25 '18 at 16:48

"for apple, both p and e appear 4 times overall,..." I didn't get it. How do they appear 4 times?

– Austin
Nov 25 '18 at 16:28

In all strings together, both p and e occur 4 times in total. The total number of occurrences of a character is what determines its rarity and whether it should be included in the combination. In case of a tie, like here, preference should be given to the character that appears more than once in the string (if any).

– iuvbio
Nov 25 '18 at 16:34

Why 'l' instead of 'h' in xylophone?

– Daniel Mesejo
Nov 25 '18 at 16:46

My mistake, thanks. Edited the question.

– iuvbio
Nov 25 '18 at 16:48

add a comment |

1 Answer
1

active

oldest

votes

You could do something like this:

import heapq



from collections import Counter

from operator import itemgetter





def combination(word, n, counts):

    word_count = Counter(word)

    elements = 

    seen = set()

    for i, c in enumerate(word[1:]):

        if c not in seen:

            elements.append((-1 * counts[c], word_count[c], i, c))

            seen.add(c)

    top = heapq.nlargest(n, elements)

    characters = map(itemgetter(3), top)



    return word[0] + ''.join(sorted(characters, key=lambda x: word.index(x)))





lst = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]



counts = Counter(''.join(lst))



result = [combination(w, 2, counts) for w in lst]

print(result)

Output

['apl', 'per', 'ban', 'xyh', 'ber', 'bnu']

The idea is to create a tuple of the priority criterias representing each unique letter. So elements is a list containing tuples represeting:

counts[c]: The overall count (as you want the rarest multiply by -1)

word_count[c]: The specific count of the letter in the word

i: represents the firs position of the letter

c: the letter itself.

Once you create the list elements with:

elements = 

seen = set()

for i, c in enumerate(word[1:]):

    if c not in seen:

        elements.append((-1 * counts[c], word_count[c], i, c))

        seen.add(c)

Note that as the characters must be unique we use a set (seen) to guarantee uniqueness. Finally you use heapq.nlargest to get the top n elements according to the above criterias.

edited Nov 25 '18 at 17:20

answered Nov 25 '18 at 16:49

Daniel Mesejo

16.8k21430

Zero explanation on how this works, or what heapq is

– Moralous
Nov 25 '18 at 17:00

That seems like exactly what I was looking for! Could you elaborate a bit on elements.append((-1 * counts[c], word_count[c], i, c)) and what heapq.nlargest(n, elements) does?

– iuvbio
Nov 25 '18 at 17:02

@iuvbio Updated the answer!

– Daniel Mesejo
Nov 25 '18 at 17:20

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53469457%2fshortest-unique-combination-of-at-least-three-characters-for-each-string-in-list%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You could do something like this:

import heapq



from collections import Counter

from operator import itemgetter





def combination(word, n, counts):

    word_count = Counter(word)

    elements = 

    seen = set()

    for i, c in enumerate(word[1:]):

        if c not in seen:

            elements.append((-1 * counts[c], word_count[c], i, c))

            seen.add(c)

    top = heapq.nlargest(n, elements)

    characters = map(itemgetter(3), top)



    return word[0] + ''.join(sorted(characters, key=lambda x: word.index(x)))





lst = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]



counts = Counter(''.join(lst))



result = [combination(w, 2, counts) for w in lst]

print(result)

Output

['apl', 'per', 'ban', 'xyh', 'ber', 'bnu']

The idea is to create a tuple of the priority criterias representing each unique letter. So elements is a list containing tuples represeting:

counts[c]: The overall count (as you want the rarest multiply by -1)

word_count[c]: The specific count of the letter in the word

i: represents the firs position of the letter

c: the letter itself.

Once you create the list elements with:

elements = 

seen = set()

for i, c in enumerate(word[1:]):

    if c not in seen:

        elements.append((-1 * counts[c], word_count[c], i, c))

        seen.add(c)

Note that as the characters must be unique we use a set (seen) to guarantee uniqueness. Finally you use heapq.nlargest to get the top n elements according to the above criterias.

edited Nov 25 '18 at 17:20

answered Nov 25 '18 at 16:49

Daniel Mesejo

16.8k21430

Zero explanation on how this works, or what heapq is

– Moralous
Nov 25 '18 at 17:00

That seems like exactly what I was looking for! Could you elaborate a bit on elements.append((-1 * counts[c], word_count[c], i, c)) and what heapq.nlargest(n, elements) does?

– iuvbio
Nov 25 '18 at 17:02

@iuvbio Updated the answer!

– Daniel Mesejo
Nov 25 '18 at 17:20

add a comment |

You could do something like this:

import heapq



from collections import Counter

from operator import itemgetter





def combination(word, n, counts):

    word_count = Counter(word)

    elements = 

    seen = set()

    for i, c in enumerate(word[1:]):

        if c not in seen:

            elements.append((-1 * counts[c], word_count[c], i, c))

            seen.add(c)

    top = heapq.nlargest(n, elements)

    characters = map(itemgetter(3), top)



    return word[0] + ''.join(sorted(characters, key=lambda x: word.index(x)))





lst = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]



counts = Counter(''.join(lst))



result = [combination(w, 2, counts) for w in lst]

print(result)

Output

['apl', 'per', 'ban', 'xyh', 'ber', 'bnu']

The idea is to create a tuple of the priority criterias representing each unique letter. So elements is a list containing tuples represeting:

counts[c]: The overall count (as you want the rarest multiply by -1)

word_count[c]: The specific count of the letter in the word

i: represents the firs position of the letter

c: the letter itself.

Once you create the list elements with:

elements = 

seen = set()

for i, c in enumerate(word[1:]):

    if c not in seen:

        elements.append((-1 * counts[c], word_count[c], i, c))

        seen.add(c)

Note that as the characters must be unique we use a set (seen) to guarantee uniqueness. Finally you use heapq.nlargest to get the top n elements according to the above criterias.

edited Nov 25 '18 at 17:20

answered Nov 25 '18 at 16:49

Daniel Mesejo

16.8k21430

Zero explanation on how this works, or what heapq is

– Moralous
Nov 25 '18 at 17:00

That seems like exactly what I was looking for! Could you elaborate a bit on elements.append((-1 * counts[c], word_count[c], i, c)) and what heapq.nlargest(n, elements) does?

– iuvbio
Nov 25 '18 at 17:02

@iuvbio Updated the answer!

– Daniel Mesejo
Nov 25 '18 at 17:20

add a comment |

You could do something like this:

import heapq



from collections import Counter

from operator import itemgetter





def combination(word, n, counts):

    word_count = Counter(word)

    elements = 

    seen = set()

    for i, c in enumerate(word[1:]):

        if c not in seen:

            elements.append((-1 * counts[c], word_count[c], i, c))

            seen.add(c)

    top = heapq.nlargest(n, elements)

    characters = map(itemgetter(3), top)



    return word[0] + ''.join(sorted(characters, key=lambda x: word.index(x)))





lst = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]



counts = Counter(''.join(lst))



result = [combination(w, 2, counts) for w in lst]

print(result)

Output

['apl', 'per', 'ban', 'xyh', 'ber', 'bnu']

The idea is to create a tuple of the priority criterias representing each unique letter. So elements is a list containing tuples represeting:

counts[c]: The overall count (as you want the rarest multiply by -1)

word_count[c]: The specific count of the letter in the word

i: represents the firs position of the letter

c: the letter itself.

Once you create the list elements with:

elements = 

seen = set()

for i, c in enumerate(word[1:]):

    if c not in seen:

        elements.append((-1 * counts[c], word_count[c], i, c))

        seen.add(c)

Note that as the characters must be unique we use a set (seen) to guarantee uniqueness. Finally you use heapq.nlargest to get the top n elements according to the above criterias.

edited Nov 25 '18 at 17:20

answered Nov 25 '18 at 16:49

Daniel Mesejo

16.8k21430

You could do something like this:

import heapq



from collections import Counter

from operator import itemgetter





def combination(word, n, counts):

    word_count = Counter(word)

    elements = 

    seen = set()

    for i, c in enumerate(word[1:]):

        if c not in seen:

            elements.append((-1 * counts[c], word_count[c], i, c))

            seen.add(c)

    top = heapq.nlargest(n, elements)

    characters = map(itemgetter(3), top)



    return word[0] + ''.join(sorted(characters, key=lambda x: word.index(x)))





lst = ["apple", "pear", "banana", "xylophone", "bear", "banunu"]



counts = Counter(''.join(lst))



result = [combination(w, 2, counts) for w in lst]

print(result)

Output

['apl', 'per', 'ban', 'xyh', 'ber', 'bnu']

The idea is to create a tuple of the priority criterias representing each unique letter. So elements is a list containing tuples represeting:

counts[c]: The overall count (as you want the rarest multiply by -1)

word_count[c]: The specific count of the letter in the word

i: represents the firs position of the letter

c: the letter itself.

Once you create the list elements with:

elements = 

seen = set()

for i, c in enumerate(word[1:]):

    if c not in seen:

        elements.append((-1 * counts[c], word_count[c], i, c))

        seen.add(c)

Note that as the characters must be unique we use a set (seen) to guarantee uniqueness. Finally you use heapq.nlargest to get the top n elements according to the above criterias.

edited Nov 25 '18 at 17:20

answered Nov 25 '18 at 16:49

Daniel Mesejo

16.8k21430

edited Nov 25 '18 at 17:20

answered Nov 25 '18 at 16:49

Daniel Mesejo

16.8k21430

answered Nov 25 '18 at 16:49

Daniel Mesejo

16.8k21430

answered Nov 25 '18 at 16:49

Daniel Mesejo

16.8k21430

Zero explanation on how this works, or what heapq is

– Moralous
Nov 25 '18 at 17:00

That seems like exactly what I was looking for! Could you elaborate a bit on elements.append((-1 * counts[c], word_count[c], i, c)) and what heapq.nlargest(n, elements) does?

– iuvbio
Nov 25 '18 at 17:02

@iuvbio Updated the answer!

– Daniel Mesejo
Nov 25 '18 at 17:20

add a comment |

Zero explanation on how this works, or what heapq is

– Moralous
Nov 25 '18 at 17:00

That seems like exactly what I was looking for! Could you elaborate a bit on elements.append((-1 * counts[c], word_count[c], i, c)) and what heapq.nlargest(n, elements) does?

– iuvbio
Nov 25 '18 at 17:02

@iuvbio Updated the answer!

– Daniel Mesejo
Nov 25 '18 at 17:20

Zero explanation on how this works, or what heapq is

– Moralous
Nov 25 '18 at 17:00

That seems like exactly what I was looking for! Could you elaborate a bit on elements.append((-1 * counts[c], word_count[c], i, c)) and what heapq.nlargest(n, elements) does?

– iuvbio
Nov 25 '18 at 17:02

@iuvbio Updated the answer!

– Daniel Mesejo
Nov 25 '18 at 17:20

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

oat6kI,E4MzQeaes7w,gWUpNUkdnLL3gUJsecw qtPyiK026QQPHbBasq18,ezfXxco31I8qWtrksq,oxEh w96qiu0

搜尋此網誌

Btukfyl