How to remove spaces in between characters without removing ALL spaces in a dataframe?











up vote
1
down vote

favorite












Lets say I have a dataframe like this:



ID    Name       Description
0 Manny V e r y calm
1 Joey Keen and a n a l y t i c a l
2 Lisa R a s h and careless
3 Ash Always joyful


I want to remove all the spaces between each letter in the Description column without completely removing all the necessary spaces between words.



Is there a simple way to this in Pandas?










share|improve this question






















  • Are the spaced out words always followed or preceded by a word with no spaces between the letters?
    – duncster94
    Nov 21 at 20:38










  • No. It varies. Sometimes it may and sometimes it may not. @duncster94
    – The Dodo
    Nov 21 at 20:42












  • Do you have a vocabulary you can use? Or can these words be effectively anything?
    – duncster94
    Nov 21 at 20:48










  • They can be anything. No patterns at all. Each description is unique and independent from all the other descriptions.
    – The Dodo
    Nov 21 at 20:50










  • I don't see how this can be done. For example, the string 'v e r y c a l m' can't be distinguished as two words (not with Pandas anyway).
    – duncster94
    Nov 21 at 20:52















up vote
1
down vote

favorite












Lets say I have a dataframe like this:



ID    Name       Description
0 Manny V e r y calm
1 Joey Keen and a n a l y t i c a l
2 Lisa R a s h and careless
3 Ash Always joyful


I want to remove all the spaces between each letter in the Description column without completely removing all the necessary spaces between words.



Is there a simple way to this in Pandas?










share|improve this question






















  • Are the spaced out words always followed or preceded by a word with no spaces between the letters?
    – duncster94
    Nov 21 at 20:38










  • No. It varies. Sometimes it may and sometimes it may not. @duncster94
    – The Dodo
    Nov 21 at 20:42












  • Do you have a vocabulary you can use? Or can these words be effectively anything?
    – duncster94
    Nov 21 at 20:48










  • They can be anything. No patterns at all. Each description is unique and independent from all the other descriptions.
    – The Dodo
    Nov 21 at 20:50










  • I don't see how this can be done. For example, the string 'v e r y c a l m' can't be distinguished as two words (not with Pandas anyway).
    – duncster94
    Nov 21 at 20:52













up vote
1
down vote

favorite









up vote
1
down vote

favorite











Lets say I have a dataframe like this:



ID    Name       Description
0 Manny V e r y calm
1 Joey Keen and a n a l y t i c a l
2 Lisa R a s h and careless
3 Ash Always joyful


I want to remove all the spaces between each letter in the Description column without completely removing all the necessary spaces between words.



Is there a simple way to this in Pandas?










share|improve this question













Lets say I have a dataframe like this:



ID    Name       Description
0 Manny V e r y calm
1 Joey Keen and a n a l y t i c a l
2 Lisa R a s h and careless
3 Ash Always joyful


I want to remove all the spaces between each letter in the Description column without completely removing all the necessary spaces between words.



Is there a simple way to this in Pandas?







python pandas dataframe






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 21 at 20:22









The Dodo

738




738












  • Are the spaced out words always followed or preceded by a word with no spaces between the letters?
    – duncster94
    Nov 21 at 20:38










  • No. It varies. Sometimes it may and sometimes it may not. @duncster94
    – The Dodo
    Nov 21 at 20:42












  • Do you have a vocabulary you can use? Or can these words be effectively anything?
    – duncster94
    Nov 21 at 20:48










  • They can be anything. No patterns at all. Each description is unique and independent from all the other descriptions.
    – The Dodo
    Nov 21 at 20:50










  • I don't see how this can be done. For example, the string 'v e r y c a l m' can't be distinguished as two words (not with Pandas anyway).
    – duncster94
    Nov 21 at 20:52


















  • Are the spaced out words always followed or preceded by a word with no spaces between the letters?
    – duncster94
    Nov 21 at 20:38










  • No. It varies. Sometimes it may and sometimes it may not. @duncster94
    – The Dodo
    Nov 21 at 20:42












  • Do you have a vocabulary you can use? Or can these words be effectively anything?
    – duncster94
    Nov 21 at 20:48










  • They can be anything. No patterns at all. Each description is unique and independent from all the other descriptions.
    – The Dodo
    Nov 21 at 20:50










  • I don't see how this can be done. For example, the string 'v e r y c a l m' can't be distinguished as two words (not with Pandas anyway).
    – duncster94
    Nov 21 at 20:52
















Are the spaced out words always followed or preceded by a word with no spaces between the letters?
– duncster94
Nov 21 at 20:38




Are the spaced out words always followed or preceded by a word with no spaces between the letters?
– duncster94
Nov 21 at 20:38












No. It varies. Sometimes it may and sometimes it may not. @duncster94
– The Dodo
Nov 21 at 20:42






No. It varies. Sometimes it may and sometimes it may not. @duncster94
– The Dodo
Nov 21 at 20:42














Do you have a vocabulary you can use? Or can these words be effectively anything?
– duncster94
Nov 21 at 20:48




Do you have a vocabulary you can use? Or can these words be effectively anything?
– duncster94
Nov 21 at 20:48












They can be anything. No patterns at all. Each description is unique and independent from all the other descriptions.
– The Dodo
Nov 21 at 20:50




They can be anything. No patterns at all. Each description is unique and independent from all the other descriptions.
– The Dodo
Nov 21 at 20:50












I don't see how this can be done. For example, the string 'v e r y c a l m' can't be distinguished as two words (not with Pandas anyway).
– duncster94
Nov 21 at 20:52




I don't see how this can be done. For example, the string 'v e r y c a l m' can't be distinguished as two words (not with Pandas anyway).
– duncster94
Nov 21 at 20:52












1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










This is a tricky problem, but one approach that may get you most of the way there is to use negative and positive lookbehinds/lookaheads to encode a few basic rules.



The following example would likely work well enough given what you've described. It will incorrectly combine characters from consecutive "real" words that have been exploded into separated characters, but if that's rare this will probably be fine. You could add additional rules to cover more edge cases.



import re
import pandas as pd

s = pd.Series(['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful'])

regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')

0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
dtype: object


This regex effectively says:



Look for sequences of spaces and replace spaces, but only if there is one letter before them. If there are two letters, don't do anything (i.e., a 2-letter word). But more specifically, actually only replace a space if there is a letter after the last space in the sequence, or any character that terminates the string.






share|improve this answer





















  • This was genius. Thank you
    – The Dodo
    Nov 21 at 22:21











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53419940%2fhow-to-remove-spaces-in-between-characters-without-removing-all-spaces-in-a-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










This is a tricky problem, but one approach that may get you most of the way there is to use negative and positive lookbehinds/lookaheads to encode a few basic rules.



The following example would likely work well enough given what you've described. It will incorrectly combine characters from consecutive "real" words that have been exploded into separated characters, but if that's rare this will probably be fine. You could add additional rules to cover more edge cases.



import re
import pandas as pd

s = pd.Series(['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful'])

regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')

0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
dtype: object


This regex effectively says:



Look for sequences of spaces and replace spaces, but only if there is one letter before them. If there are two letters, don't do anything (i.e., a 2-letter word). But more specifically, actually only replace a space if there is a letter after the last space in the sequence, or any character that terminates the string.






share|improve this answer





















  • This was genius. Thank you
    – The Dodo
    Nov 21 at 22:21















up vote
2
down vote



accepted










This is a tricky problem, but one approach that may get you most of the way there is to use negative and positive lookbehinds/lookaheads to encode a few basic rules.



The following example would likely work well enough given what you've described. It will incorrectly combine characters from consecutive "real" words that have been exploded into separated characters, but if that's rare this will probably be fine. You could add additional rules to cover more edge cases.



import re
import pandas as pd

s = pd.Series(['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful'])

regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')

0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
dtype: object


This regex effectively says:



Look for sequences of spaces and replace spaces, but only if there is one letter before them. If there are two letters, don't do anything (i.e., a 2-letter word). But more specifically, actually only replace a space if there is a letter after the last space in the sequence, or any character that terminates the string.






share|improve this answer





















  • This was genius. Thank you
    – The Dodo
    Nov 21 at 22:21













up vote
2
down vote



accepted







up vote
2
down vote



accepted






This is a tricky problem, but one approach that may get you most of the way there is to use negative and positive lookbehinds/lookaheads to encode a few basic rules.



The following example would likely work well enough given what you've described. It will incorrectly combine characters from consecutive "real" words that have been exploded into separated characters, but if that's rare this will probably be fine. You could add additional rules to cover more edge cases.



import re
import pandas as pd

s = pd.Series(['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful'])

regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')

0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
dtype: object


This regex effectively says:



Look for sequences of spaces and replace spaces, but only if there is one letter before them. If there are two letters, don't do anything (i.e., a 2-letter word). But more specifically, actually only replace a space if there is a letter after the last space in the sequence, or any character that terminates the string.






share|improve this answer












This is a tricky problem, but one approach that may get you most of the way there is to use negative and positive lookbehinds/lookaheads to encode a few basic rules.



The following example would likely work well enough given what you've described. It will incorrectly combine characters from consecutive "real" words that have been exploded into separated characters, but if that's rare this will probably be fine. You could add additional rules to cover more edge cases.



import re
import pandas as pd

s = pd.Series(['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful'])

regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')

0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
dtype: object


This regex effectively says:



Look for sequences of spaces and replace spaces, but only if there is one letter before them. If there are two letters, don't do anything (i.e., a 2-letter word). But more specifically, actually only replace a space if there is a letter after the last space in the sequence, or any character that terminates the string.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 21 at 21:40









Nick Becker

1,182412




1,182412












  • This was genius. Thank you
    – The Dodo
    Nov 21 at 22:21


















  • This was genius. Thank you
    – The Dodo
    Nov 21 at 22:21
















This was genius. Thank you
– The Dodo
Nov 21 at 22:21




This was genius. Thank you
– The Dodo
Nov 21 at 22:21


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53419940%2fhow-to-remove-spaces-in-between-characters-without-removing-all-spaces-in-a-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Lallio

Unable to find Lightning Node

Futebolista