How to remove spaces in between characters without removing ALL spaces in a dataframe?
up vote
1
down vote
favorite
Lets say I have a dataframe like this:
ID Name Description
0 Manny V e r y calm
1 Joey Keen and a n a l y t i c a l
2 Lisa R a s h and careless
3 Ash Always joyful
I want to remove all the spaces between each letter in the Description column without completely removing all the necessary spaces between words.
Is there a simple way to this in Pandas?
python pandas dataframe
|
show 1 more comment
up vote
1
down vote
favorite
Lets say I have a dataframe like this:
ID Name Description
0 Manny V e r y calm
1 Joey Keen and a n a l y t i c a l
2 Lisa R a s h and careless
3 Ash Always joyful
I want to remove all the spaces between each letter in the Description column without completely removing all the necessary spaces between words.
Is there a simple way to this in Pandas?
python pandas dataframe
Are the spaced out words always followed or preceded by a word with no spaces between the letters?
– duncster94
Nov 21 at 20:38
No. It varies. Sometimes it may and sometimes it may not. @duncster94
– The Dodo
Nov 21 at 20:42
Do you have a vocabulary you can use? Or can these words be effectively anything?
– duncster94
Nov 21 at 20:48
They can be anything. No patterns at all. Each description is unique and independent from all the other descriptions.
– The Dodo
Nov 21 at 20:50
I don't see how this can be done. For example, the string 'v e r y c a l m' can't be distinguished as two words (not with Pandas anyway).
– duncster94
Nov 21 at 20:52
|
show 1 more comment
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Lets say I have a dataframe like this:
ID Name Description
0 Manny V e r y calm
1 Joey Keen and a n a l y t i c a l
2 Lisa R a s h and careless
3 Ash Always joyful
I want to remove all the spaces between each letter in the Description column without completely removing all the necessary spaces between words.
Is there a simple way to this in Pandas?
python pandas dataframe
Lets say I have a dataframe like this:
ID Name Description
0 Manny V e r y calm
1 Joey Keen and a n a l y t i c a l
2 Lisa R a s h and careless
3 Ash Always joyful
I want to remove all the spaces between each letter in the Description column without completely removing all the necessary spaces between words.
Is there a simple way to this in Pandas?
python pandas dataframe
python pandas dataframe
asked Nov 21 at 20:22
The Dodo
738
738
Are the spaced out words always followed or preceded by a word with no spaces between the letters?
– duncster94
Nov 21 at 20:38
No. It varies. Sometimes it may and sometimes it may not. @duncster94
– The Dodo
Nov 21 at 20:42
Do you have a vocabulary you can use? Or can these words be effectively anything?
– duncster94
Nov 21 at 20:48
They can be anything. No patterns at all. Each description is unique and independent from all the other descriptions.
– The Dodo
Nov 21 at 20:50
I don't see how this can be done. For example, the string 'v e r y c a l m' can't be distinguished as two words (not with Pandas anyway).
– duncster94
Nov 21 at 20:52
|
show 1 more comment
Are the spaced out words always followed or preceded by a word with no spaces between the letters?
– duncster94
Nov 21 at 20:38
No. It varies. Sometimes it may and sometimes it may not. @duncster94
– The Dodo
Nov 21 at 20:42
Do you have a vocabulary you can use? Or can these words be effectively anything?
– duncster94
Nov 21 at 20:48
They can be anything. No patterns at all. Each description is unique and independent from all the other descriptions.
– The Dodo
Nov 21 at 20:50
I don't see how this can be done. For example, the string 'v e r y c a l m' can't be distinguished as two words (not with Pandas anyway).
– duncster94
Nov 21 at 20:52
Are the spaced out words always followed or preceded by a word with no spaces between the letters?
– duncster94
Nov 21 at 20:38
Are the spaced out words always followed or preceded by a word with no spaces between the letters?
– duncster94
Nov 21 at 20:38
No. It varies. Sometimes it may and sometimes it may not. @duncster94
– The Dodo
Nov 21 at 20:42
No. It varies. Sometimes it may and sometimes it may not. @duncster94
– The Dodo
Nov 21 at 20:42
Do you have a vocabulary you can use? Or can these words be effectively anything?
– duncster94
Nov 21 at 20:48
Do you have a vocabulary you can use? Or can these words be effectively anything?
– duncster94
Nov 21 at 20:48
They can be anything. No patterns at all. Each description is unique and independent from all the other descriptions.
– The Dodo
Nov 21 at 20:50
They can be anything. No patterns at all. Each description is unique and independent from all the other descriptions.
– The Dodo
Nov 21 at 20:50
I don't see how this can be done. For example, the string 'v e r y c a l m' can't be distinguished as two words (not with Pandas anyway).
– duncster94
Nov 21 at 20:52
I don't see how this can be done. For example, the string 'v e r y c a l m' can't be distinguished as two words (not with Pandas anyway).
– duncster94
Nov 21 at 20:52
|
show 1 more comment
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
This is a tricky problem, but one approach that may get you most of the way there is to use negative and positive lookbehinds/lookaheads to encode a few basic rules.
The following example would likely work well enough given what you've described. It will incorrectly combine characters from consecutive "real" words that have been exploded into separated characters, but if that's rare this will probably be fine. You could add additional rules to cover more edge cases.
import re
import pandas as pd
s = pd.Series(['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful'])
regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')
0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
dtype: object
This regex effectively says:
Look for sequences of spaces and replace spaces, but only if there is one letter before them. If there are two letters, don't do anything (i.e., a 2-letter word). But more specifically, actually only replace a space if there is a letter after the last space in the sequence, or any character that terminates the string.
This was genius. Thank you
– The Dodo
Nov 21 at 22:21
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
This is a tricky problem, but one approach that may get you most of the way there is to use negative and positive lookbehinds/lookaheads to encode a few basic rules.
The following example would likely work well enough given what you've described. It will incorrectly combine characters from consecutive "real" words that have been exploded into separated characters, but if that's rare this will probably be fine. You could add additional rules to cover more edge cases.
import re
import pandas as pd
s = pd.Series(['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful'])
regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')
0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
dtype: object
This regex effectively says:
Look for sequences of spaces and replace spaces, but only if there is one letter before them. If there are two letters, don't do anything (i.e., a 2-letter word). But more specifically, actually only replace a space if there is a letter after the last space in the sequence, or any character that terminates the string.
This was genius. Thank you
– The Dodo
Nov 21 at 22:21
add a comment |
up vote
2
down vote
accepted
This is a tricky problem, but one approach that may get you most of the way there is to use negative and positive lookbehinds/lookaheads to encode a few basic rules.
The following example would likely work well enough given what you've described. It will incorrectly combine characters from consecutive "real" words that have been exploded into separated characters, but if that's rare this will probably be fine. You could add additional rules to cover more edge cases.
import re
import pandas as pd
s = pd.Series(['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful'])
regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')
0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
dtype: object
This regex effectively says:
Look for sequences of spaces and replace spaces, but only if there is one letter before them. If there are two letters, don't do anything (i.e., a 2-letter word). But more specifically, actually only replace a space if there is a letter after the last space in the sequence, or any character that terminates the string.
This was genius. Thank you
– The Dodo
Nov 21 at 22:21
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
This is a tricky problem, but one approach that may get you most of the way there is to use negative and positive lookbehinds/lookaheads to encode a few basic rules.
The following example would likely work well enough given what you've described. It will incorrectly combine characters from consecutive "real" words that have been exploded into separated characters, but if that's rare this will probably be fine. You could add additional rules to cover more edge cases.
import re
import pandas as pd
s = pd.Series(['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful'])
regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')
0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
dtype: object
This regex effectively says:
Look for sequences of spaces and replace spaces, but only if there is one letter before them. If there are two letters, don't do anything (i.e., a 2-letter word). But more specifically, actually only replace a space if there is a letter after the last space in the sequence, or any character that terminates the string.
This is a tricky problem, but one approach that may get you most of the way there is to use negative and positive lookbehinds/lookaheads to encode a few basic rules.
The following example would likely work well enough given what you've described. It will incorrectly combine characters from consecutive "real" words that have been exploded into separated characters, but if that's rare this will probably be fine. You could add additional rules to cover more edge cases.
import re
import pandas as pd
s = pd.Series(['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful'])
regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')
0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
dtype: object
This regex effectively says:
Look for sequences of spaces and replace spaces, but only if there is one letter before them. If there are two letters, don't do anything (i.e., a 2-letter word). But more specifically, actually only replace a space if there is a letter after the last space in the sequence, or any character that terminates the string.
answered Nov 21 at 21:40
Nick Becker
1,182412
1,182412
This was genius. Thank you
– The Dodo
Nov 21 at 22:21
add a comment |
This was genius. Thank you
– The Dodo
Nov 21 at 22:21
This was genius. Thank you
– The Dodo
Nov 21 at 22:21
This was genius. Thank you
– The Dodo
Nov 21 at 22:21
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53419940%2fhow-to-remove-spaces-in-between-characters-without-removing-all-spaces-in-a-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Are the spaced out words always followed or preceded by a word with no spaces between the letters?
– duncster94
Nov 21 at 20:38
No. It varies. Sometimes it may and sometimes it may not. @duncster94
– The Dodo
Nov 21 at 20:42
Do you have a vocabulary you can use? Or can these words be effectively anything?
– duncster94
Nov 21 at 20:48
They can be anything. No patterns at all. Each description is unique and independent from all the other descriptions.
– The Dodo
Nov 21 at 20:50
I don't see how this can be done. For example, the string 'v e r y c a l m' can't be distinguished as two words (not with Pandas anyway).
– duncster94
Nov 21 at 20:52