Pandas pd.read_csv does not work for simple sep=','
Good afternoon, everybody.
I know that it is quite an easy question, although, I simply do not understand why it does not work the way I expected.
The task is as following:
I have a file data.csv presented in this format:
id,"feature_1","feature_2","feature_3"
00100429,"PROTO","Proprietary","Phone"
00100429,"PROTO","Proprietary","Phone"
The thing is to import this data using pandas. I know that by default pandas read_csv uses comma separator, so I just imported it as following:
data = pd.read_csv('data.csv')
And the result I got is the one I presented at the beginning with no change at all. I mean one column which contains everything.
I tried many other separators using regex, and the only one that made some sort of improvement was:
data = pd.read_csv('data.csv',sep=",",engine='python')
On the one hand it finally separated all columns, on the other hand the way data is presented is not that convenient to use. In particular:
"id ""feature_1"" ""feature_2"" ""feature_3"""
"00100429 ""PROTO"" ""Proprietary"" ""Phone"""
Therefore, I think that somewhere must be a mistake, because the data seems to be fine.
So the question is - how to import csv file with separated columns and no triple quote symbols?
Thank you.
python pandas csv
add a comment |
Good afternoon, everybody.
I know that it is quite an easy question, although, I simply do not understand why it does not work the way I expected.
The task is as following:
I have a file data.csv presented in this format:
id,"feature_1","feature_2","feature_3"
00100429,"PROTO","Proprietary","Phone"
00100429,"PROTO","Proprietary","Phone"
The thing is to import this data using pandas. I know that by default pandas read_csv uses comma separator, so I just imported it as following:
data = pd.read_csv('data.csv')
And the result I got is the one I presented at the beginning with no change at all. I mean one column which contains everything.
I tried many other separators using regex, and the only one that made some sort of improvement was:
data = pd.read_csv('data.csv',sep=",",engine='python')
On the one hand it finally separated all columns, on the other hand the way data is presented is not that convenient to use. In particular:
"id ""feature_1"" ""feature_2"" ""feature_3"""
"00100429 ""PROTO"" ""Proprietary"" ""Phone"""
Therefore, I think that somewhere must be a mistake, because the data seems to be fine.
So the question is - how to import csv file with separated columns and no triple quote symbols?
Thank you.
python pandas csv
I think there is another format like you mentionedhave a file data.csv presented in this format:, because yur sample data working withsep=','very nice. Can you create better data sample which return your bad output?
– jezrael
Nov 24 '18 at 7:16
Your Problem is heresep=",", simply usesep=","dont put ``
– pygo
Nov 24 '18 at 8:04
Usingdata = pd.read_csv("sample.csv", sep=",",engine='python')gives me same output as your because or of that ``.
– pygo
Nov 24 '18 at 8:07
add a comment |
Good afternoon, everybody.
I know that it is quite an easy question, although, I simply do not understand why it does not work the way I expected.
The task is as following:
I have a file data.csv presented in this format:
id,"feature_1","feature_2","feature_3"
00100429,"PROTO","Proprietary","Phone"
00100429,"PROTO","Proprietary","Phone"
The thing is to import this data using pandas. I know that by default pandas read_csv uses comma separator, so I just imported it as following:
data = pd.read_csv('data.csv')
And the result I got is the one I presented at the beginning with no change at all. I mean one column which contains everything.
I tried many other separators using regex, and the only one that made some sort of improvement was:
data = pd.read_csv('data.csv',sep=",",engine='python')
On the one hand it finally separated all columns, on the other hand the way data is presented is not that convenient to use. In particular:
"id ""feature_1"" ""feature_2"" ""feature_3"""
"00100429 ""PROTO"" ""Proprietary"" ""Phone"""
Therefore, I think that somewhere must be a mistake, because the data seems to be fine.
So the question is - how to import csv file with separated columns and no triple quote symbols?
Thank you.
python pandas csv
Good afternoon, everybody.
I know that it is quite an easy question, although, I simply do not understand why it does not work the way I expected.
The task is as following:
I have a file data.csv presented in this format:
id,"feature_1","feature_2","feature_3"
00100429,"PROTO","Proprietary","Phone"
00100429,"PROTO","Proprietary","Phone"
The thing is to import this data using pandas. I know that by default pandas read_csv uses comma separator, so I just imported it as following:
data = pd.read_csv('data.csv')
And the result I got is the one I presented at the beginning with no change at all. I mean one column which contains everything.
I tried many other separators using regex, and the only one that made some sort of improvement was:
data = pd.read_csv('data.csv',sep=",",engine='python')
On the one hand it finally separated all columns, on the other hand the way data is presented is not that convenient to use. In particular:
"id ""feature_1"" ""feature_2"" ""feature_3"""
"00100429 ""PROTO"" ""Proprietary"" ""Phone"""
Therefore, I think that somewhere must be a mistake, because the data seems to be fine.
So the question is - how to import csv file with separated columns and no triple quote symbols?
Thank you.
python pandas csv
python pandas csv
asked Nov 24 '18 at 7:01
KakalukiaKakalukia
133
133
I think there is another format like you mentionedhave a file data.csv presented in this format:, because yur sample data working withsep=','very nice. Can you create better data sample which return your bad output?
– jezrael
Nov 24 '18 at 7:16
Your Problem is heresep=",", simply usesep=","dont put ``
– pygo
Nov 24 '18 at 8:04
Usingdata = pd.read_csv("sample.csv", sep=",",engine='python')gives me same output as your because or of that ``.
– pygo
Nov 24 '18 at 8:07
add a comment |
I think there is another format like you mentionedhave a file data.csv presented in this format:, because yur sample data working withsep=','very nice. Can you create better data sample which return your bad output?
– jezrael
Nov 24 '18 at 7:16
Your Problem is heresep=",", simply usesep=","dont put ``
– pygo
Nov 24 '18 at 8:04
Usingdata = pd.read_csv("sample.csv", sep=",",engine='python')gives me same output as your because or of that ``.
– pygo
Nov 24 '18 at 8:07
I think there is another format like you mentioned
have a file data.csv presented in this format:, because yur sample data working with sep=',' very nice. Can you create better data sample which return your bad output?– jezrael
Nov 24 '18 at 7:16
I think there is another format like you mentioned
have a file data.csv presented in this format:, because yur sample data working with sep=',' very nice. Can you create better data sample which return your bad output?– jezrael
Nov 24 '18 at 7:16
Your Problem is here
sep="," , simply use sep="," dont put ``– pygo
Nov 24 '18 at 8:04
Your Problem is here
sep="," , simply use sep="," dont put ``– pygo
Nov 24 '18 at 8:04
Using
data = pd.read_csv("sample.csv", sep=",",engine='python') gives me same output as your because or of that ``.– pygo
Nov 24 '18 at 8:07
Using
data = pd.read_csv("sample.csv", sep=",",engine='python') gives me same output as your because or of that ``.– pygo
Nov 24 '18 at 8:07
add a comment |
3 Answers
3
active
oldest
votes
Here's my quick solution for your problem -
import numpy as np
import pandas as pd
### Reading the file, treating header as first row and later removing all the double apostrophe
df = pd.read_csv('file.csv', sep=',', header=None).apply(lambda x: x.str.replace(r""",""))
df
0 1 2 3
0 id feature_1 feature_2 feature_3
1 00100429 PROTO Proprietary Phone
2 00100429 PROTO Proprietary Phone
### Putting column names back and dropping the first row.
df.columns = df.iloc[0]
df.drop(index=0, inplace=True)
df
## You can reset the index
id feature_1 feature_2 feature_3
1 00100429 PROTO Proprietary Phone
2 00100429 PROTO Proprietary Phone
### Converting `id` column datatype back to `int` (change according to your needs)
df.id = df.id.astype(np.int)
np.result_type(df.id)
dtype('int64')
Thank you for your help, I tried this solution, and it worked perfectly. In fact, I tried to open this dataset with excel and it did not show me any problems with it (that's why I though that problem is with the code), however, when I opened it using python's open('file.csv','r'), I found that lines were presented like this - '"tac,""vendor"",""platform"",""type"""n' That's clearly shows why I had such an issue with reading it using pandas. Thanks again for help.
– Kakalukia
Nov 25 '18 at 9:51
1
@kakalukia good to hear that it helped. Also if it's a small dataset which excel can handle then you can simply split one column into distinct columns and later import in Python. That way much of the things will be simplified. Good going and you can also upvote this answer :)
– dataLeo
Nov 25 '18 at 10:02
add a comment |
Here's just an alternative way to dataLeo's answer -
import pandas as pd
import numpy as np
Reading the file in a dataframe, and later removing all the double apostrophe from row values
df = pd.read_csv("file.csv", sep=",").apply(lambda x: x.str.replace(r""",""))
df
"id" "feature_1" "feature_2" "feature_3"
0 00100429 PROTO Proprietary Phone
1 00100429 PROTO Proprietary Phone
Removing all the double apostrophe from column names
df.columns = df.columns.str.replace('"', '')
df
id feature_1 feature_2 feature_3
0 00100429 PROTO Proprietary Phone
1 00100429 PROTO Proprietary Phone
Converting id column datatype back to int (change according to your needs)
df.id = df.id.astype('int')
np.result_type(df.id)
dtype('int32')
add a comment |
It should work without any issue with sep until there is anything really bad on the CSV file you have, However simulating your data example it works file for me:
As per your data sample, you don't need to escape char for comma delimited Values.
>>> import pandas as pd
>>> data = pd.read_csv("sample.csv", sep=",")
>>> data
id feature_1 feature_2 feature_3
0 100429 PROTO Proprietary Phone
1 100429 PROTO Proprietary Phone
>>> pd.__version__
'0.23.3'
There is a problem here as i noticed sep=","
Alternatively Try:
Here
skipinitialspace=True- this "deals with the spaces after the comma-delimiter"quotechar='"': string (length 1) The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.
So, in that case worth trying..
>>> data1 = pd.read_csv("sample.csv", skipinitialspace = True, quotechar = '"')
>>> data1
id feature_1 feature_2 feature_3
0 100429 PROTO Proprietary Phone
1 100429 PROTO Proprietary Phone
Note from Pandas doc:
Separators longer than 1 character and different from 's+' will be
interpreted as regular expressions, will force use of the python
parsing engine and will ignore quotes in the data.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455947%2fpandas-pd-read-csv-does-not-work-for-simple-sep%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here's my quick solution for your problem -
import numpy as np
import pandas as pd
### Reading the file, treating header as first row and later removing all the double apostrophe
df = pd.read_csv('file.csv', sep=',', header=None).apply(lambda x: x.str.replace(r""",""))
df
0 1 2 3
0 id feature_1 feature_2 feature_3
1 00100429 PROTO Proprietary Phone
2 00100429 PROTO Proprietary Phone
### Putting column names back and dropping the first row.
df.columns = df.iloc[0]
df.drop(index=0, inplace=True)
df
## You can reset the index
id feature_1 feature_2 feature_3
1 00100429 PROTO Proprietary Phone
2 00100429 PROTO Proprietary Phone
### Converting `id` column datatype back to `int` (change according to your needs)
df.id = df.id.astype(np.int)
np.result_type(df.id)
dtype('int64')
Thank you for your help, I tried this solution, and it worked perfectly. In fact, I tried to open this dataset with excel and it did not show me any problems with it (that's why I though that problem is with the code), however, when I opened it using python's open('file.csv','r'), I found that lines were presented like this - '"tac,""vendor"",""platform"",""type"""n' That's clearly shows why I had such an issue with reading it using pandas. Thanks again for help.
– Kakalukia
Nov 25 '18 at 9:51
1
@kakalukia good to hear that it helped. Also if it's a small dataset which excel can handle then you can simply split one column into distinct columns and later import in Python. That way much of the things will be simplified. Good going and you can also upvote this answer :)
– dataLeo
Nov 25 '18 at 10:02
add a comment |
Here's my quick solution for your problem -
import numpy as np
import pandas as pd
### Reading the file, treating header as first row and later removing all the double apostrophe
df = pd.read_csv('file.csv', sep=',', header=None).apply(lambda x: x.str.replace(r""",""))
df
0 1 2 3
0 id feature_1 feature_2 feature_3
1 00100429 PROTO Proprietary Phone
2 00100429 PROTO Proprietary Phone
### Putting column names back and dropping the first row.
df.columns = df.iloc[0]
df.drop(index=0, inplace=True)
df
## You can reset the index
id feature_1 feature_2 feature_3
1 00100429 PROTO Proprietary Phone
2 00100429 PROTO Proprietary Phone
### Converting `id` column datatype back to `int` (change according to your needs)
df.id = df.id.astype(np.int)
np.result_type(df.id)
dtype('int64')
Thank you for your help, I tried this solution, and it worked perfectly. In fact, I tried to open this dataset with excel and it did not show me any problems with it (that's why I though that problem is with the code), however, when I opened it using python's open('file.csv','r'), I found that lines were presented like this - '"tac,""vendor"",""platform"",""type"""n' That's clearly shows why I had such an issue with reading it using pandas. Thanks again for help.
– Kakalukia
Nov 25 '18 at 9:51
1
@kakalukia good to hear that it helped. Also if it's a small dataset which excel can handle then you can simply split one column into distinct columns and later import in Python. That way much of the things will be simplified. Good going and you can also upvote this answer :)
– dataLeo
Nov 25 '18 at 10:02
add a comment |
Here's my quick solution for your problem -
import numpy as np
import pandas as pd
### Reading the file, treating header as first row and later removing all the double apostrophe
df = pd.read_csv('file.csv', sep=',', header=None).apply(lambda x: x.str.replace(r""",""))
df
0 1 2 3
0 id feature_1 feature_2 feature_3
1 00100429 PROTO Proprietary Phone
2 00100429 PROTO Proprietary Phone
### Putting column names back and dropping the first row.
df.columns = df.iloc[0]
df.drop(index=0, inplace=True)
df
## You can reset the index
id feature_1 feature_2 feature_3
1 00100429 PROTO Proprietary Phone
2 00100429 PROTO Proprietary Phone
### Converting `id` column datatype back to `int` (change according to your needs)
df.id = df.id.astype(np.int)
np.result_type(df.id)
dtype('int64')
Here's my quick solution for your problem -
import numpy as np
import pandas as pd
### Reading the file, treating header as first row and later removing all the double apostrophe
df = pd.read_csv('file.csv', sep=',', header=None).apply(lambda x: x.str.replace(r""",""))
df
0 1 2 3
0 id feature_1 feature_2 feature_3
1 00100429 PROTO Proprietary Phone
2 00100429 PROTO Proprietary Phone
### Putting column names back and dropping the first row.
df.columns = df.iloc[0]
df.drop(index=0, inplace=True)
df
## You can reset the index
id feature_1 feature_2 feature_3
1 00100429 PROTO Proprietary Phone
2 00100429 PROTO Proprietary Phone
### Converting `id` column datatype back to `int` (change according to your needs)
df.id = df.id.astype(np.int)
np.result_type(df.id)
dtype('int64')
answered Nov 24 '18 at 8:09
dataLeodataLeo
5861419
5861419
Thank you for your help, I tried this solution, and it worked perfectly. In fact, I tried to open this dataset with excel and it did not show me any problems with it (that's why I though that problem is with the code), however, when I opened it using python's open('file.csv','r'), I found that lines were presented like this - '"tac,""vendor"",""platform"",""type"""n' That's clearly shows why I had such an issue with reading it using pandas. Thanks again for help.
– Kakalukia
Nov 25 '18 at 9:51
1
@kakalukia good to hear that it helped. Also if it's a small dataset which excel can handle then you can simply split one column into distinct columns and later import in Python. That way much of the things will be simplified. Good going and you can also upvote this answer :)
– dataLeo
Nov 25 '18 at 10:02
add a comment |
Thank you for your help, I tried this solution, and it worked perfectly. In fact, I tried to open this dataset with excel and it did not show me any problems with it (that's why I though that problem is with the code), however, when I opened it using python's open('file.csv','r'), I found that lines were presented like this - '"tac,""vendor"",""platform"",""type"""n' That's clearly shows why I had such an issue with reading it using pandas. Thanks again for help.
– Kakalukia
Nov 25 '18 at 9:51
1
@kakalukia good to hear that it helped. Also if it's a small dataset which excel can handle then you can simply split one column into distinct columns and later import in Python. That way much of the things will be simplified. Good going and you can also upvote this answer :)
– dataLeo
Nov 25 '18 at 10:02
Thank you for your help, I tried this solution, and it worked perfectly. In fact, I tried to open this dataset with excel and it did not show me any problems with it (that's why I though that problem is with the code), however, when I opened it using python's open('file.csv','r'), I found that lines were presented like this - '"tac,""vendor"",""platform"",""type"""n' That's clearly shows why I had such an issue with reading it using pandas. Thanks again for help.
– Kakalukia
Nov 25 '18 at 9:51
Thank you for your help, I tried this solution, and it worked perfectly. In fact, I tried to open this dataset with excel and it did not show me any problems with it (that's why I though that problem is with the code), however, when I opened it using python's open('file.csv','r'), I found that lines were presented like this - '"tac,""vendor"",""platform"",""type"""n' That's clearly shows why I had such an issue with reading it using pandas. Thanks again for help.
– Kakalukia
Nov 25 '18 at 9:51
1
1
@kakalukia good to hear that it helped. Also if it's a small dataset which excel can handle then you can simply split one column into distinct columns and later import in Python. That way much of the things will be simplified. Good going and you can also upvote this answer :)
– dataLeo
Nov 25 '18 at 10:02
@kakalukia good to hear that it helped. Also if it's a small dataset which excel can handle then you can simply split one column into distinct columns and later import in Python. That way much of the things will be simplified. Good going and you can also upvote this answer :)
– dataLeo
Nov 25 '18 at 10:02
add a comment |
Here's just an alternative way to dataLeo's answer -
import pandas as pd
import numpy as np
Reading the file in a dataframe, and later removing all the double apostrophe from row values
df = pd.read_csv("file.csv", sep=",").apply(lambda x: x.str.replace(r""",""))
df
"id" "feature_1" "feature_2" "feature_3"
0 00100429 PROTO Proprietary Phone
1 00100429 PROTO Proprietary Phone
Removing all the double apostrophe from column names
df.columns = df.columns.str.replace('"', '')
df
id feature_1 feature_2 feature_3
0 00100429 PROTO Proprietary Phone
1 00100429 PROTO Proprietary Phone
Converting id column datatype back to int (change according to your needs)
df.id = df.id.astype('int')
np.result_type(df.id)
dtype('int32')
add a comment |
Here's just an alternative way to dataLeo's answer -
import pandas as pd
import numpy as np
Reading the file in a dataframe, and later removing all the double apostrophe from row values
df = pd.read_csv("file.csv", sep=",").apply(lambda x: x.str.replace(r""",""))
df
"id" "feature_1" "feature_2" "feature_3"
0 00100429 PROTO Proprietary Phone
1 00100429 PROTO Proprietary Phone
Removing all the double apostrophe from column names
df.columns = df.columns.str.replace('"', '')
df
id feature_1 feature_2 feature_3
0 00100429 PROTO Proprietary Phone
1 00100429 PROTO Proprietary Phone
Converting id column datatype back to int (change according to your needs)
df.id = df.id.astype('int')
np.result_type(df.id)
dtype('int32')
add a comment |
Here's just an alternative way to dataLeo's answer -
import pandas as pd
import numpy as np
Reading the file in a dataframe, and later removing all the double apostrophe from row values
df = pd.read_csv("file.csv", sep=",").apply(lambda x: x.str.replace(r""",""))
df
"id" "feature_1" "feature_2" "feature_3"
0 00100429 PROTO Proprietary Phone
1 00100429 PROTO Proprietary Phone
Removing all the double apostrophe from column names
df.columns = df.columns.str.replace('"', '')
df
id feature_1 feature_2 feature_3
0 00100429 PROTO Proprietary Phone
1 00100429 PROTO Proprietary Phone
Converting id column datatype back to int (change according to your needs)
df.id = df.id.astype('int')
np.result_type(df.id)
dtype('int32')
Here's just an alternative way to dataLeo's answer -
import pandas as pd
import numpy as np
Reading the file in a dataframe, and later removing all the double apostrophe from row values
df = pd.read_csv("file.csv", sep=",").apply(lambda x: x.str.replace(r""",""))
df
"id" "feature_1" "feature_2" "feature_3"
0 00100429 PROTO Proprietary Phone
1 00100429 PROTO Proprietary Phone
Removing all the double apostrophe from column names
df.columns = df.columns.str.replace('"', '')
df
id feature_1 feature_2 feature_3
0 00100429 PROTO Proprietary Phone
1 00100429 PROTO Proprietary Phone
Converting id column datatype back to int (change according to your needs)
df.id = df.id.astype('int')
np.result_type(df.id)
dtype('int32')
edited Nov 24 '18 at 8:36
dataLeo
5861419
5861419
answered Nov 24 '18 at 8:25
Shadab HussainShadab Hussain
117
117
add a comment |
add a comment |
It should work without any issue with sep until there is anything really bad on the CSV file you have, However simulating your data example it works file for me:
As per your data sample, you don't need to escape char for comma delimited Values.
>>> import pandas as pd
>>> data = pd.read_csv("sample.csv", sep=",")
>>> data
id feature_1 feature_2 feature_3
0 100429 PROTO Proprietary Phone
1 100429 PROTO Proprietary Phone
>>> pd.__version__
'0.23.3'
There is a problem here as i noticed sep=","
Alternatively Try:
Here
skipinitialspace=True- this "deals with the spaces after the comma-delimiter"quotechar='"': string (length 1) The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.
So, in that case worth trying..
>>> data1 = pd.read_csv("sample.csv", skipinitialspace = True, quotechar = '"')
>>> data1
id feature_1 feature_2 feature_3
0 100429 PROTO Proprietary Phone
1 100429 PROTO Proprietary Phone
Note from Pandas doc:
Separators longer than 1 character and different from 's+' will be
interpreted as regular expressions, will force use of the python
parsing engine and will ignore quotes in the data.
add a comment |
It should work without any issue with sep until there is anything really bad on the CSV file you have, However simulating your data example it works file for me:
As per your data sample, you don't need to escape char for comma delimited Values.
>>> import pandas as pd
>>> data = pd.read_csv("sample.csv", sep=",")
>>> data
id feature_1 feature_2 feature_3
0 100429 PROTO Proprietary Phone
1 100429 PROTO Proprietary Phone
>>> pd.__version__
'0.23.3'
There is a problem here as i noticed sep=","
Alternatively Try:
Here
skipinitialspace=True- this "deals with the spaces after the comma-delimiter"quotechar='"': string (length 1) The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.
So, in that case worth trying..
>>> data1 = pd.read_csv("sample.csv", skipinitialspace = True, quotechar = '"')
>>> data1
id feature_1 feature_2 feature_3
0 100429 PROTO Proprietary Phone
1 100429 PROTO Proprietary Phone
Note from Pandas doc:
Separators longer than 1 character and different from 's+' will be
interpreted as regular expressions, will force use of the python
parsing engine and will ignore quotes in the data.
add a comment |
It should work without any issue with sep until there is anything really bad on the CSV file you have, However simulating your data example it works file for me:
As per your data sample, you don't need to escape char for comma delimited Values.
>>> import pandas as pd
>>> data = pd.read_csv("sample.csv", sep=",")
>>> data
id feature_1 feature_2 feature_3
0 100429 PROTO Proprietary Phone
1 100429 PROTO Proprietary Phone
>>> pd.__version__
'0.23.3'
There is a problem here as i noticed sep=","
Alternatively Try:
Here
skipinitialspace=True- this "deals with the spaces after the comma-delimiter"quotechar='"': string (length 1) The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.
So, in that case worth trying..
>>> data1 = pd.read_csv("sample.csv", skipinitialspace = True, quotechar = '"')
>>> data1
id feature_1 feature_2 feature_3
0 100429 PROTO Proprietary Phone
1 100429 PROTO Proprietary Phone
Note from Pandas doc:
Separators longer than 1 character and different from 's+' will be
interpreted as regular expressions, will force use of the python
parsing engine and will ignore quotes in the data.
It should work without any issue with sep until there is anything really bad on the CSV file you have, However simulating your data example it works file for me:
As per your data sample, you don't need to escape char for comma delimited Values.
>>> import pandas as pd
>>> data = pd.read_csv("sample.csv", sep=",")
>>> data
id feature_1 feature_2 feature_3
0 100429 PROTO Proprietary Phone
1 100429 PROTO Proprietary Phone
>>> pd.__version__
'0.23.3'
There is a problem here as i noticed sep=","
Alternatively Try:
Here
skipinitialspace=True- this "deals with the spaces after the comma-delimiter"quotechar='"': string (length 1) The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.
So, in that case worth trying..
>>> data1 = pd.read_csv("sample.csv", skipinitialspace = True, quotechar = '"')
>>> data1
id feature_1 feature_2 feature_3
0 100429 PROTO Proprietary Phone
1 100429 PROTO Proprietary Phone
Note from Pandas doc:
Separators longer than 1 character and different from 's+' will be
interpreted as regular expressions, will force use of the python
parsing engine and will ignore quotes in the data.
edited Nov 24 '18 at 8:52
answered Nov 24 '18 at 8:01
pygopygo
2,4281619
2,4281619
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455947%2fpandas-pd-read-csv-does-not-work-for-simple-sep%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I think there is another format like you mentioned
have a file data.csv presented in this format:, because yur sample data working withsep=','very nice. Can you create better data sample which return your bad output?– jezrael
Nov 24 '18 at 7:16
Your Problem is here
sep=",", simply usesep=","dont put ``– pygo
Nov 24 '18 at 8:04
Using
data = pd.read_csv("sample.csv", sep=",",engine='python')gives me same output as your because or of that ``.– pygo
Nov 24 '18 at 8:07