Python Pandas Error tokenizing data
I'm trying to use pandas to manipulate a .csv file but I get this error:
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12
I have tried to read the pandas docs, but found nothing.
My code is simple:
path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)
How can I resolve this? Should I use the csv
module or another language ?
File is from Morningstar
python csv pandas
add a comment |
I'm trying to use pandas to manipulate a .csv file but I get this error:
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12
I have tried to read the pandas docs, but found nothing.
My code is simple:
path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)
How can I resolve this? Should I use the csv
module or another language ?
File is from Morningstar
python csv pandas
4
If this error arises when reading a file written bypandas.to_csv()
, it MIGHT be because there is a 'r' in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, causing a difference between the number of columns in the first X rows. This difference is one cause of the C error.
– user0
Jan 23 '17 at 0:56
Sometime just explicitly giving the "sep" parameter helps. Seems to be a parser issue.
– gilgamash
May 23 '18 at 12:30
2
This error may arise also when you're using comma as a delimiter and you have more commas then expected (more fields in the error row then defined in the header). So you need to either remove the additional field or remove the extra comma if it's there by mistake. You can fix this manually and then you don't need to skip the error lines.
– tsveti_iko
Aug 22 '18 at 9:44
add a comment |
I'm trying to use pandas to manipulate a .csv file but I get this error:
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12
I have tried to read the pandas docs, but found nothing.
My code is simple:
path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)
How can I resolve this? Should I use the csv
module or another language ?
File is from Morningstar
python csv pandas
I'm trying to use pandas to manipulate a .csv file but I get this error:
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12
I have tried to read the pandas docs, but found nothing.
My code is simple:
path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)
How can I resolve this? Should I use the csv
module or another language ?
File is from Morningstar
python csv pandas
python csv pandas
edited Dec 30 '17 at 15:00
Ronak Shah
35.8k103856
35.8k103856
asked Aug 4 '13 at 1:54
abuteauabuteau
1,14821117
1,14821117
4
If this error arises when reading a file written bypandas.to_csv()
, it MIGHT be because there is a 'r' in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, causing a difference between the number of columns in the first X rows. This difference is one cause of the C error.
– user0
Jan 23 '17 at 0:56
Sometime just explicitly giving the "sep" parameter helps. Seems to be a parser issue.
– gilgamash
May 23 '18 at 12:30
2
This error may arise also when you're using comma as a delimiter and you have more commas then expected (more fields in the error row then defined in the header). So you need to either remove the additional field or remove the extra comma if it's there by mistake. You can fix this manually and then you don't need to skip the error lines.
– tsveti_iko
Aug 22 '18 at 9:44
add a comment |
4
If this error arises when reading a file written bypandas.to_csv()
, it MIGHT be because there is a 'r' in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, causing a difference between the number of columns in the first X rows. This difference is one cause of the C error.
– user0
Jan 23 '17 at 0:56
Sometime just explicitly giving the "sep" parameter helps. Seems to be a parser issue.
– gilgamash
May 23 '18 at 12:30
2
This error may arise also when you're using comma as a delimiter and you have more commas then expected (more fields in the error row then defined in the header). So you need to either remove the additional field or remove the extra comma if it's there by mistake. You can fix this manually and then you don't need to skip the error lines.
– tsveti_iko
Aug 22 '18 at 9:44
4
4
If this error arises when reading a file written by
pandas.to_csv()
, it MIGHT be because there is a 'r' in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, causing a difference between the number of columns in the first X rows. This difference is one cause of the C error.– user0
Jan 23 '17 at 0:56
If this error arises when reading a file written by
pandas.to_csv()
, it MIGHT be because there is a 'r' in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, causing a difference between the number of columns in the first X rows. This difference is one cause of the C error.– user0
Jan 23 '17 at 0:56
Sometime just explicitly giving the "sep" parameter helps. Seems to be a parser issue.
– gilgamash
May 23 '18 at 12:30
Sometime just explicitly giving the "sep" parameter helps. Seems to be a parser issue.
– gilgamash
May 23 '18 at 12:30
2
2
This error may arise also when you're using comma as a delimiter and you have more commas then expected (more fields in the error row then defined in the header). So you need to either remove the additional field or remove the extra comma if it's there by mistake. You can fix this manually and then you don't need to skip the error lines.
– tsveti_iko
Aug 22 '18 at 9:44
This error may arise also when you're using comma as a delimiter and you have more commas then expected (more fields in the error row then defined in the header). So you need to either remove the additional field or remove the extra comma if it's there by mistake. You can fix this manually and then you don't need to skip the error lines.
– tsveti_iko
Aug 22 '18 at 9:44
add a comment |
25 Answers
25
active
oldest
votes
you could also try;
data = pd.read_csv('file1.csv', error_bad_lines=False)
92
Do note that using error_bad_lines=False will cause the offending lines to be skipped.
– biobirdman
May 20 '14 at 7:27
5
Stumbled on this answer, is there a way to fill missing columns on lines that outputs something likeexpected 8 fields, saw 9
?
– Petra Barus
Sep 24 '14 at 10:11
17
The better solution is to investigate the offending file and to correct the bad lines so that they can be read byread_csv
. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?
– dbliss
Oct 6 '14 at 22:57
3
Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.
– Petra Barus
Oct 7 '14 at 2:17
1
There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
– MTT
May 15 '17 at 2:48
|
show 2 more comments
It might be an issue with
- the delimiters in your data
- the first row, as @TomAugspurger noted
To solve it, try specifying the sep
and/or header
arguments when calling read_csv
. For instance,
df = pandas.read_csv(fileName, sep='delimiter', header=None)
In the code above, sep
defines your delimiter and header=None
tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indices for each field {0,1,2,...}.
According to the docs, the delimiter thing should not be an issue. The docs say that "if sep is None [not specified], will try to automatically determine this." I however have not had good luck with this, including instances with obvious delimiters.
1
this solved my issue
– Hemaa mathavan
Apr 8 '17 at 8:37
add a comment |
The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren't representative of the actual data in the file.
Try it with data = pd.read_csv(path, skiprows=2)
Works like a charm. Thanks !
– abuteau
Aug 4 '13 at 2:43
add a comment |
Your CSV file might have variable number of columns and read_csv
inferred the number of columns from the first few rows. Two ways to solve it in this case:
1) Change the CSV file to have a dummy first line with max number of columns (and specify header=[0]
)
2) Or use names = list(range(0,N))
where N is the max number of columns.
1
This really helped!
– Archie
May 30 '17 at 16:18
This should be the accepted answer
– Vivek
Sep 8 '18 at 10:41
add a comment |
I had this problem as well but perhaps for a different reason. I had some trailing commas in my CSV that were adding an additional column that pandas was attempting to read. Using the following works but it simply ignores the bad lines:
data = pd.read_csv('file1.csv', error_bad_lines=False)
If you want to keep the lines an ugly kind of hack for handling the errors is to do something like the following:
line =
expected =
saw =
cont = True
while cont == True:
try:
data = pd.read_csv('file1.csv',skiprows=line)
cont = False
except Exception as e:
errortype = e.message.split('.')[0].strip()
if errortype == 'Error tokenizing data':
cerror = e.message.split(':')[1].strip().replace(',','')
nums = [n for n in cerror.split(' ') if str.isdigit(n)]
expected.append(int(nums[0]))
saw.append(int(nums[2]))
line.append(int(nums[1])-1)
else:
cerror = 'Unknown'
print 'Unknown Error - 222'
if line != :
# Handle the errors however you want
I proceeded to write a script to reinsert the lines into the DataFrame since the bad lines will be given by the variable 'line' in the above code. This can all be avoided by simply using the csv reader. Hopefully the pandas developers can make it easier to deal with this situation in the future.
add a comment |
This is definitely an issue of delimiter, as most of the csv CSV are got create using sep='/t'
so try to read_csv
using the tab character (t)
using separator /t
. so, try to open using following code line.
data=pd.read_csv("File_path", sep='t')
4
@MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)
– DJGrandpaJ
Apr 13 '16 at 19:54
@DJGrandpaJ Thanks did not know that!
– Michael Queue
May 16 '16 at 3:25
in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)
– user108569
Jul 17 '18 at 16:41
add a comment |
I've had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by "properly", I mean each row had the same number of separators or columns.
Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.
Any file saved with pandas to_csv will be properly formatted and shouldn't have that issue. But if you open it with another program, it may change the structure.
Hope that helps.
6
What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.
– elPastor
Jul 7 '16 at 19:31
add a comment |
I came across the same issue. Using pd.read_table()
on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.
Edit:
I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.
I usually get around this by reading the extra data into a file then use the read_csv() method.
The exact solution might differ depending on your actual file, but this approach has worked for me in several cases
add a comment |
I've had a similar problem while trying to read a tab-delimited table with spaces, commas and quotes:
1115794 4218 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", ""
1144102 3180 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", "g__Bacillus", ""
368444 2328 "k__Bacteria", "p__Bacteroidetes", "c__Bacteroidia", "o__Bacteroidales", "f__Bacteroidaceae", "g__Bacteroides", ""
import pandas as pd
# Same error for read_table
counts = pd.read_csv(path_counts, sep='t', index_col=2, header=None, engine = 'c')
pandas.io.common.CParserError: Error tokenizing data. C error: out of memory
This says it has something to do with C parsing engine (which is the default one). Maybe changing to a python one will change anything
counts = pd.read_table(path_counts, sep='t', index_col=2, header=None, engine='python')
Segmentation fault (core dumped)
Now that is a different error.
If we go ahead and try to remove spaces from the table, the error from python-engine changes once again:
1115794 4218 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae",""
1144102 3180 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae","g__Bacillus",""
368444 2328 "k__Bacteria","p__Bacteroidetes","c__Bacteroidia","o__Bacteroidales","f__Bacteroidaceae","g__Bacteroides",""
_csv.Error: ' ' expected after '"'
And it gets clear that pandas was having problems parsing our rows. To parse a table with python engine I needed to remove all spaces and quotes from the table beforehand. Meanwhile C-engine kept crashing even with commas in rows.
To avoid creating a new file with replacements I did this, as my tables are small:
from io import StringIO
with open(path_counts) as f:
input = StringIO(f.read().replace('", ""', '').replace('"', '').replace(', ', ',').replace('',''))
counts = pd.read_table(input, sep='t', index_col=2, header=None, engine='python')
tl;dr
Change parsing engine, try to avoid any non-delimiting quotes/commas/spaces in your data.
add a comment |
Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg
compression
resolved my problem.
result = pandas.read_csv(data_source, compression='gzip')
add a comment |
following sequence of commands works (I lose the first line of the data -no header=None present-, but at least it loads):
df = pd.read_csv(filename,
usecols=range(0, 42))
df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']
Following does NOT work:
df = pd.read_csv(filename,
names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'],
usecols=range(0, 42))
CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
Following does NOT work:
df = pd.read_csv(filename,
header=None)
CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
Hence, in your problem you have to pass usecols=range(0, 2)
add a comment |
Sometimes the problem is not how to use python, but with the raw data.
I got this error message
Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.
It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.
add a comment |
use
pandas.read_csv('CSVFILENAME',header=None,sep=', ')
when trying to read csv data from the link
http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data
I copied the data from the site into my csvfile. It had extra spaces so used sep =', ' and it worked :)
add a comment |
An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:
import csv
import pandas as pd
path = 'C:/FileLocation/'
file = 'filename.csv'
f = open(path+file,'rt')
reader = csv.reader(f)
#once contents are available, I then put them in a list
csv_list =
for l in reader:
csv_list.append(l)
f.close()
#now pandas has no problem getting into a df
df = pd.DataFrame(csv_list)
I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.
add a comment |
I had a dataset with prexisting row numbers, I used index_col:
pd.read_csv('train.csv', index_col=0)
add a comment |
This is what I did.
sep='::'
solved my issue:
data=pd.read_csv('C:\Users\HP\Downloads\NPL ASSINGMENT 2 imdb_labelled\imdb_labelled.txt',engine='python',header=None,sep='::')
add a comment |
I had a similar case as this and setting
train = pd.read_csv('input.csv' , encoding='latin1',engine='python')
worked
add a comment |
Use delimiter in parameter
pd.read_csv(filename, delimiter=",", encoding='utf-8')
It will read.
add a comment |
I have the same problem when read_csv: ParserError: Error tokenizing data.
I just saved the old csv file to a new csv file. The problem is solved!
add a comment |
I had this problem, where I was trying to read in a CSV without passing in column names.
df = pd.read_csv(filename, header=None)
I specified the column names in a list beforehand and then pass them into names
, and it solved it immediately. If you don't have set column names, you could just create as many placeholder names as the maximum number of columns that might be in your data.
col_names = ["col1", "col2", "col3", ...]
df = pd.read_csv(filename, names=col_names)
add a comment |
I had a similar error and the issue was that I had some escaped quotes in my csv file and needed to set the escapechar parameter appropriately.
add a comment |
You can do this step to avoid the problem -
train = pd.read_csv('/home/Project/output.csv' , header=None)
just add - header=None
Hope this helps!!
add a comment |
Issue could be with file Issues, In my case, Issue was solved after renaming the file. yet to figure out the reason..
add a comment |
I had received a .csv from a coworker and when I tried to read the csv using pd.read_csv(), I received a similar error. It was apparently attempting to use the first row to generate the columns for the dataframe, but there were many rows which contained more columns than the first row would imply. I ended up fixing this problem by simply opening and re-saving the file as .csv and using pd.read_csv() again.
add a comment |
try: pandas.read_csv(path, sep = ',' ,header=None)
add a comment |
protected by Community♦ Jan 8 at 13:26
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
25 Answers
25
active
oldest
votes
25 Answers
25
active
oldest
votes
active
oldest
votes
active
oldest
votes
you could also try;
data = pd.read_csv('file1.csv', error_bad_lines=False)
92
Do note that using error_bad_lines=False will cause the offending lines to be skipped.
– biobirdman
May 20 '14 at 7:27
5
Stumbled on this answer, is there a way to fill missing columns on lines that outputs something likeexpected 8 fields, saw 9
?
– Petra Barus
Sep 24 '14 at 10:11
17
The better solution is to investigate the offending file and to correct the bad lines so that they can be read byread_csv
. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?
– dbliss
Oct 6 '14 at 22:57
3
Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.
– Petra Barus
Oct 7 '14 at 2:17
1
There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
– MTT
May 15 '17 at 2:48
|
show 2 more comments
you could also try;
data = pd.read_csv('file1.csv', error_bad_lines=False)
92
Do note that using error_bad_lines=False will cause the offending lines to be skipped.
– biobirdman
May 20 '14 at 7:27
5
Stumbled on this answer, is there a way to fill missing columns on lines that outputs something likeexpected 8 fields, saw 9
?
– Petra Barus
Sep 24 '14 at 10:11
17
The better solution is to investigate the offending file and to correct the bad lines so that they can be read byread_csv
. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?
– dbliss
Oct 6 '14 at 22:57
3
Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.
– Petra Barus
Oct 7 '14 at 2:17
1
There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
– MTT
May 15 '17 at 2:48
|
show 2 more comments
you could also try;
data = pd.read_csv('file1.csv', error_bad_lines=False)
you could also try;
data = pd.read_csv('file1.csv', error_bad_lines=False)
answered Aug 8 '13 at 14:47
richierichie
5,02563355
5,02563355
92
Do note that using error_bad_lines=False will cause the offending lines to be skipped.
– biobirdman
May 20 '14 at 7:27
5
Stumbled on this answer, is there a way to fill missing columns on lines that outputs something likeexpected 8 fields, saw 9
?
– Petra Barus
Sep 24 '14 at 10:11
17
The better solution is to investigate the offending file and to correct the bad lines so that they can be read byread_csv
. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?
– dbliss
Oct 6 '14 at 22:57
3
Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.
– Petra Barus
Oct 7 '14 at 2:17
1
There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
– MTT
May 15 '17 at 2:48
|
show 2 more comments
92
Do note that using error_bad_lines=False will cause the offending lines to be skipped.
– biobirdman
May 20 '14 at 7:27
5
Stumbled on this answer, is there a way to fill missing columns on lines that outputs something likeexpected 8 fields, saw 9
?
– Petra Barus
Sep 24 '14 at 10:11
17
The better solution is to investigate the offending file and to correct the bad lines so that they can be read byread_csv
. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?
– dbliss
Oct 6 '14 at 22:57
3
Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.
– Petra Barus
Oct 7 '14 at 2:17
1
There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
– MTT
May 15 '17 at 2:48
92
92
Do note that using error_bad_lines=False will cause the offending lines to be skipped.
– biobirdman
May 20 '14 at 7:27
Do note that using error_bad_lines=False will cause the offending lines to be skipped.
– biobirdman
May 20 '14 at 7:27
5
5
Stumbled on this answer, is there a way to fill missing columns on lines that outputs something like
expected 8 fields, saw 9
?– Petra Barus
Sep 24 '14 at 10:11
Stumbled on this answer, is there a way to fill missing columns on lines that outputs something like
expected 8 fields, saw 9
?– Petra Barus
Sep 24 '14 at 10:11
17
17
The better solution is to investigate the offending file and to correct the bad lines so that they can be read by
read_csv
. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?– dbliss
Oct 6 '14 at 22:57
The better solution is to investigate the offending file and to correct the bad lines so that they can be read by
read_csv
. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?– dbliss
Oct 6 '14 at 22:57
3
3
Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.
– Petra Barus
Oct 7 '14 at 2:17
Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.
– Petra Barus
Oct 7 '14 at 2:17
1
1
There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
– MTT
May 15 '17 at 2:48
There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
– MTT
May 15 '17 at 2:48
|
show 2 more comments
It might be an issue with
- the delimiters in your data
- the first row, as @TomAugspurger noted
To solve it, try specifying the sep
and/or header
arguments when calling read_csv
. For instance,
df = pandas.read_csv(fileName, sep='delimiter', header=None)
In the code above, sep
defines your delimiter and header=None
tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indices for each field {0,1,2,...}.
According to the docs, the delimiter thing should not be an issue. The docs say that "if sep is None [not specified], will try to automatically determine this." I however have not had good luck with this, including instances with obvious delimiters.
1
this solved my issue
– Hemaa mathavan
Apr 8 '17 at 8:37
add a comment |
It might be an issue with
- the delimiters in your data
- the first row, as @TomAugspurger noted
To solve it, try specifying the sep
and/or header
arguments when calling read_csv
. For instance,
df = pandas.read_csv(fileName, sep='delimiter', header=None)
In the code above, sep
defines your delimiter and header=None
tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indices for each field {0,1,2,...}.
According to the docs, the delimiter thing should not be an issue. The docs say that "if sep is None [not specified], will try to automatically determine this." I however have not had good luck with this, including instances with obvious delimiters.
1
this solved my issue
– Hemaa mathavan
Apr 8 '17 at 8:37
add a comment |
It might be an issue with
- the delimiters in your data
- the first row, as @TomAugspurger noted
To solve it, try specifying the sep
and/or header
arguments when calling read_csv
. For instance,
df = pandas.read_csv(fileName, sep='delimiter', header=None)
In the code above, sep
defines your delimiter and header=None
tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indices for each field {0,1,2,...}.
According to the docs, the delimiter thing should not be an issue. The docs say that "if sep is None [not specified], will try to automatically determine this." I however have not had good luck with this, including instances with obvious delimiters.
It might be an issue with
- the delimiters in your data
- the first row, as @TomAugspurger noted
To solve it, try specifying the sep
and/or header
arguments when calling read_csv
. For instance,
df = pandas.read_csv(fileName, sep='delimiter', header=None)
In the code above, sep
defines your delimiter and header=None
tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indices for each field {0,1,2,...}.
According to the docs, the delimiter thing should not be an issue. The docs say that "if sep is None [not specified], will try to automatically determine this." I however have not had good luck with this, including instances with obvious delimiters.
edited Jun 5 '18 at 14:24
answered Oct 28 '14 at 2:18
grisaitisgrisaitis
1,11811121
1,11811121
1
this solved my issue
– Hemaa mathavan
Apr 8 '17 at 8:37
add a comment |
1
this solved my issue
– Hemaa mathavan
Apr 8 '17 at 8:37
1
1
this solved my issue
– Hemaa mathavan
Apr 8 '17 at 8:37
this solved my issue
– Hemaa mathavan
Apr 8 '17 at 8:37
add a comment |
The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren't representative of the actual data in the file.
Try it with data = pd.read_csv(path, skiprows=2)
Works like a charm. Thanks !
– abuteau
Aug 4 '13 at 2:43
add a comment |
The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren't representative of the actual data in the file.
Try it with data = pd.read_csv(path, skiprows=2)
Works like a charm. Thanks !
– abuteau
Aug 4 '13 at 2:43
add a comment |
The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren't representative of the actual data in the file.
Try it with data = pd.read_csv(path, skiprows=2)
The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren't representative of the actual data in the file.
Try it with data = pd.read_csv(path, skiprows=2)
answered Aug 4 '13 at 2:24
TomAugspurgerTomAugspurger
15.2k35155
15.2k35155
Works like a charm. Thanks !
– abuteau
Aug 4 '13 at 2:43
add a comment |
Works like a charm. Thanks !
– abuteau
Aug 4 '13 at 2:43
Works like a charm. Thanks !
– abuteau
Aug 4 '13 at 2:43
Works like a charm. Thanks !
– abuteau
Aug 4 '13 at 2:43
add a comment |
Your CSV file might have variable number of columns and read_csv
inferred the number of columns from the first few rows. Two ways to solve it in this case:
1) Change the CSV file to have a dummy first line with max number of columns (and specify header=[0]
)
2) Or use names = list(range(0,N))
where N is the max number of columns.
1
This really helped!
– Archie
May 30 '17 at 16:18
This should be the accepted answer
– Vivek
Sep 8 '18 at 10:41
add a comment |
Your CSV file might have variable number of columns and read_csv
inferred the number of columns from the first few rows. Two ways to solve it in this case:
1) Change the CSV file to have a dummy first line with max number of columns (and specify header=[0]
)
2) Or use names = list(range(0,N))
where N is the max number of columns.
1
This really helped!
– Archie
May 30 '17 at 16:18
This should be the accepted answer
– Vivek
Sep 8 '18 at 10:41
add a comment |
Your CSV file might have variable number of columns and read_csv
inferred the number of columns from the first few rows. Two ways to solve it in this case:
1) Change the CSV file to have a dummy first line with max number of columns (and specify header=[0]
)
2) Or use names = list(range(0,N))
where N is the max number of columns.
Your CSV file might have variable number of columns and read_csv
inferred the number of columns from the first few rows. Two ways to solve it in this case:
1) Change the CSV file to have a dummy first line with max number of columns (and specify header=[0]
)
2) Or use names = list(range(0,N))
where N is the max number of columns.
edited Sep 20 '17 at 0:53
Ajean
3,976103250
3,976103250
answered Mar 31 '17 at 16:29
computeristcomputerist
33435
33435
1
This really helped!
– Archie
May 30 '17 at 16:18
This should be the accepted answer
– Vivek
Sep 8 '18 at 10:41
add a comment |
1
This really helped!
– Archie
May 30 '17 at 16:18
This should be the accepted answer
– Vivek
Sep 8 '18 at 10:41
1
1
This really helped!
– Archie
May 30 '17 at 16:18
This really helped!
– Archie
May 30 '17 at 16:18
This should be the accepted answer
– Vivek
Sep 8 '18 at 10:41
This should be the accepted answer
– Vivek
Sep 8 '18 at 10:41
add a comment |
I had this problem as well but perhaps for a different reason. I had some trailing commas in my CSV that were adding an additional column that pandas was attempting to read. Using the following works but it simply ignores the bad lines:
data = pd.read_csv('file1.csv', error_bad_lines=False)
If you want to keep the lines an ugly kind of hack for handling the errors is to do something like the following:
line =
expected =
saw =
cont = True
while cont == True:
try:
data = pd.read_csv('file1.csv',skiprows=line)
cont = False
except Exception as e:
errortype = e.message.split('.')[0].strip()
if errortype == 'Error tokenizing data':
cerror = e.message.split(':')[1].strip().replace(',','')
nums = [n for n in cerror.split(' ') if str.isdigit(n)]
expected.append(int(nums[0]))
saw.append(int(nums[2]))
line.append(int(nums[1])-1)
else:
cerror = 'Unknown'
print 'Unknown Error - 222'
if line != :
# Handle the errors however you want
I proceeded to write a script to reinsert the lines into the DataFrame since the bad lines will be given by the variable 'line' in the above code. This can all be avoided by simply using the csv reader. Hopefully the pandas developers can make it easier to deal with this situation in the future.
add a comment |
I had this problem as well but perhaps for a different reason. I had some trailing commas in my CSV that were adding an additional column that pandas was attempting to read. Using the following works but it simply ignores the bad lines:
data = pd.read_csv('file1.csv', error_bad_lines=False)
If you want to keep the lines an ugly kind of hack for handling the errors is to do something like the following:
line =
expected =
saw =
cont = True
while cont == True:
try:
data = pd.read_csv('file1.csv',skiprows=line)
cont = False
except Exception as e:
errortype = e.message.split('.')[0].strip()
if errortype == 'Error tokenizing data':
cerror = e.message.split(':')[1].strip().replace(',','')
nums = [n for n in cerror.split(' ') if str.isdigit(n)]
expected.append(int(nums[0]))
saw.append(int(nums[2]))
line.append(int(nums[1])-1)
else:
cerror = 'Unknown'
print 'Unknown Error - 222'
if line != :
# Handle the errors however you want
I proceeded to write a script to reinsert the lines into the DataFrame since the bad lines will be given by the variable 'line' in the above code. This can all be avoided by simply using the csv reader. Hopefully the pandas developers can make it easier to deal with this situation in the future.
add a comment |
I had this problem as well but perhaps for a different reason. I had some trailing commas in my CSV that were adding an additional column that pandas was attempting to read. Using the following works but it simply ignores the bad lines:
data = pd.read_csv('file1.csv', error_bad_lines=False)
If you want to keep the lines an ugly kind of hack for handling the errors is to do something like the following:
line =
expected =
saw =
cont = True
while cont == True:
try:
data = pd.read_csv('file1.csv',skiprows=line)
cont = False
except Exception as e:
errortype = e.message.split('.')[0].strip()
if errortype == 'Error tokenizing data':
cerror = e.message.split(':')[1].strip().replace(',','')
nums = [n for n in cerror.split(' ') if str.isdigit(n)]
expected.append(int(nums[0]))
saw.append(int(nums[2]))
line.append(int(nums[1])-1)
else:
cerror = 'Unknown'
print 'Unknown Error - 222'
if line != :
# Handle the errors however you want
I proceeded to write a script to reinsert the lines into the DataFrame since the bad lines will be given by the variable 'line' in the above code. This can all be avoided by simply using the csv reader. Hopefully the pandas developers can make it easier to deal with this situation in the future.
I had this problem as well but perhaps for a different reason. I had some trailing commas in my CSV that were adding an additional column that pandas was attempting to read. Using the following works but it simply ignores the bad lines:
data = pd.read_csv('file1.csv', error_bad_lines=False)
If you want to keep the lines an ugly kind of hack for handling the errors is to do something like the following:
line =
expected =
saw =
cont = True
while cont == True:
try:
data = pd.read_csv('file1.csv',skiprows=line)
cont = False
except Exception as e:
errortype = e.message.split('.')[0].strip()
if errortype == 'Error tokenizing data':
cerror = e.message.split(':')[1].strip().replace(',','')
nums = [n for n in cerror.split(' ') if str.isdigit(n)]
expected.append(int(nums[0]))
saw.append(int(nums[2]))
line.append(int(nums[1])-1)
else:
cerror = 'Unknown'
print 'Unknown Error - 222'
if line != :
# Handle the errors however you want
I proceeded to write a script to reinsert the lines into the DataFrame since the bad lines will be given by the variable 'line' in the above code. This can all be avoided by simply using the csv reader. Hopefully the pandas developers can make it easier to deal with this situation in the future.
answered Feb 4 '16 at 22:16
Robert GeigerRobert Geiger
14912
14912
add a comment |
add a comment |
This is definitely an issue of delimiter, as most of the csv CSV are got create using sep='/t'
so try to read_csv
using the tab character (t)
using separator /t
. so, try to open using following code line.
data=pd.read_csv("File_path", sep='t')
4
@MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)
– DJGrandpaJ
Apr 13 '16 at 19:54
@DJGrandpaJ Thanks did not know that!
– Michael Queue
May 16 '16 at 3:25
in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)
– user108569
Jul 17 '18 at 16:41
add a comment |
This is definitely an issue of delimiter, as most of the csv CSV are got create using sep='/t'
so try to read_csv
using the tab character (t)
using separator /t
. so, try to open using following code line.
data=pd.read_csv("File_path", sep='t')
4
@MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)
– DJGrandpaJ
Apr 13 '16 at 19:54
@DJGrandpaJ Thanks did not know that!
– Michael Queue
May 16 '16 at 3:25
in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)
– user108569
Jul 17 '18 at 16:41
add a comment |
This is definitely an issue of delimiter, as most of the csv CSV are got create using sep='/t'
so try to read_csv
using the tab character (t)
using separator /t
. so, try to open using following code line.
data=pd.read_csv("File_path", sep='t')
This is definitely an issue of delimiter, as most of the csv CSV are got create using sep='/t'
so try to read_csv
using the tab character (t)
using separator /t
. so, try to open using following code line.
data=pd.read_csv("File_path", sep='t')
edited Jun 1 '17 at 13:31
Lucas
2,35211128
2,35211128
answered Apr 1 '15 at 5:42
Piyush S. WanarePiyush S. Wanare
2,09011327
2,09011327
4
@MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)
– DJGrandpaJ
Apr 13 '16 at 19:54
@DJGrandpaJ Thanks did not know that!
– Michael Queue
May 16 '16 at 3:25
in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)
– user108569
Jul 17 '18 at 16:41
add a comment |
4
@MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)
– DJGrandpaJ
Apr 13 '16 at 19:54
@DJGrandpaJ Thanks did not know that!
– Michael Queue
May 16 '16 at 3:25
in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)
– user108569
Jul 17 '18 at 16:41
4
4
@MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)
– DJGrandpaJ
Apr 13 '16 at 19:54
@MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)
– DJGrandpaJ
Apr 13 '16 at 19:54
@DJGrandpaJ Thanks did not know that!
– Michael Queue
May 16 '16 at 3:25
@DJGrandpaJ Thanks did not know that!
– Michael Queue
May 16 '16 at 3:25
in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)
– user108569
Jul 17 '18 at 16:41
in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)
– user108569
Jul 17 '18 at 16:41
add a comment |
I've had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by "properly", I mean each row had the same number of separators or columns.
Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.
Any file saved with pandas to_csv will be properly formatted and shouldn't have that issue. But if you open it with another program, it may change the structure.
Hope that helps.
6
What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.
– elPastor
Jul 7 '16 at 19:31
add a comment |
I've had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by "properly", I mean each row had the same number of separators or columns.
Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.
Any file saved with pandas to_csv will be properly formatted and shouldn't have that issue. But if you open it with another program, it may change the structure.
Hope that helps.
6
What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.
– elPastor
Jul 7 '16 at 19:31
add a comment |
I've had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by "properly", I mean each row had the same number of separators or columns.
Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.
Any file saved with pandas to_csv will be properly formatted and shouldn't have that issue. But if you open it with another program, it may change the structure.
Hope that helps.
I've had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by "properly", I mean each row had the same number of separators or columns.
Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.
Any file saved with pandas to_csv will be properly formatted and shouldn't have that issue. But if you open it with another program, it may change the structure.
Hope that helps.
answered Jul 7 '16 at 17:22
elPastorelPastor
2,73231938
2,73231938
6
What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.
– elPastor
Jul 7 '16 at 19:31
add a comment |
6
What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.
– elPastor
Jul 7 '16 at 19:31
6
6
What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.
– elPastor
Jul 7 '16 at 19:31
What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.
– elPastor
Jul 7 '16 at 19:31
add a comment |
I came across the same issue. Using pd.read_table()
on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.
Edit:
I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.
I usually get around this by reading the extra data into a file then use the read_csv() method.
The exact solution might differ depending on your actual file, but this approach has worked for me in several cases
add a comment |
I came across the same issue. Using pd.read_table()
on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.
Edit:
I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.
I usually get around this by reading the extra data into a file then use the read_csv() method.
The exact solution might differ depending on your actual file, but this approach has worked for me in several cases
add a comment |
I came across the same issue. Using pd.read_table()
on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.
Edit:
I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.
I usually get around this by reading the extra data into a file then use the read_csv() method.
The exact solution might differ depending on your actual file, but this approach has worked for me in several cases
I came across the same issue. Using pd.read_table()
on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.
Edit:
I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.
I usually get around this by reading the extra data into a file then use the read_csv() method.
The exact solution might differ depending on your actual file, but this approach has worked for me in several cases
edited Jul 7 '17 at 9:32
answered Jun 30 '14 at 11:46
Legend_AriLegend_Ari
10615
10615
add a comment |
add a comment |
I've had a similar problem while trying to read a tab-delimited table with spaces, commas and quotes:
1115794 4218 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", ""
1144102 3180 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", "g__Bacillus", ""
368444 2328 "k__Bacteria", "p__Bacteroidetes", "c__Bacteroidia", "o__Bacteroidales", "f__Bacteroidaceae", "g__Bacteroides", ""
import pandas as pd
# Same error for read_table
counts = pd.read_csv(path_counts, sep='t', index_col=2, header=None, engine = 'c')
pandas.io.common.CParserError: Error tokenizing data. C error: out of memory
This says it has something to do with C parsing engine (which is the default one). Maybe changing to a python one will change anything
counts = pd.read_table(path_counts, sep='t', index_col=2, header=None, engine='python')
Segmentation fault (core dumped)
Now that is a different error.
If we go ahead and try to remove spaces from the table, the error from python-engine changes once again:
1115794 4218 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae",""
1144102 3180 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae","g__Bacillus",""
368444 2328 "k__Bacteria","p__Bacteroidetes","c__Bacteroidia","o__Bacteroidales","f__Bacteroidaceae","g__Bacteroides",""
_csv.Error: ' ' expected after '"'
And it gets clear that pandas was having problems parsing our rows. To parse a table with python engine I needed to remove all spaces and quotes from the table beforehand. Meanwhile C-engine kept crashing even with commas in rows.
To avoid creating a new file with replacements I did this, as my tables are small:
from io import StringIO
with open(path_counts) as f:
input = StringIO(f.read().replace('", ""', '').replace('"', '').replace(', ', ',').replace('',''))
counts = pd.read_table(input, sep='t', index_col=2, header=None, engine='python')
tl;dr
Change parsing engine, try to avoid any non-delimiting quotes/commas/spaces in your data.
add a comment |
I've had a similar problem while trying to read a tab-delimited table with spaces, commas and quotes:
1115794 4218 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", ""
1144102 3180 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", "g__Bacillus", ""
368444 2328 "k__Bacteria", "p__Bacteroidetes", "c__Bacteroidia", "o__Bacteroidales", "f__Bacteroidaceae", "g__Bacteroides", ""
import pandas as pd
# Same error for read_table
counts = pd.read_csv(path_counts, sep='t', index_col=2, header=None, engine = 'c')
pandas.io.common.CParserError: Error tokenizing data. C error: out of memory
This says it has something to do with C parsing engine (which is the default one). Maybe changing to a python one will change anything
counts = pd.read_table(path_counts, sep='t', index_col=2, header=None, engine='python')
Segmentation fault (core dumped)
Now that is a different error.
If we go ahead and try to remove spaces from the table, the error from python-engine changes once again:
1115794 4218 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae",""
1144102 3180 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae","g__Bacillus",""
368444 2328 "k__Bacteria","p__Bacteroidetes","c__Bacteroidia","o__Bacteroidales","f__Bacteroidaceae","g__Bacteroides",""
_csv.Error: ' ' expected after '"'
And it gets clear that pandas was having problems parsing our rows. To parse a table with python engine I needed to remove all spaces and quotes from the table beforehand. Meanwhile C-engine kept crashing even with commas in rows.
To avoid creating a new file with replacements I did this, as my tables are small:
from io import StringIO
with open(path_counts) as f:
input = StringIO(f.read().replace('", ""', '').replace('"', '').replace(', ', ',').replace('',''))
counts = pd.read_table(input, sep='t', index_col=2, header=None, engine='python')
tl;dr
Change parsing engine, try to avoid any non-delimiting quotes/commas/spaces in your data.
add a comment |
I've had a similar problem while trying to read a tab-delimited table with spaces, commas and quotes:
1115794 4218 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", ""
1144102 3180 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", "g__Bacillus", ""
368444 2328 "k__Bacteria", "p__Bacteroidetes", "c__Bacteroidia", "o__Bacteroidales", "f__Bacteroidaceae", "g__Bacteroides", ""
import pandas as pd
# Same error for read_table
counts = pd.read_csv(path_counts, sep='t', index_col=2, header=None, engine = 'c')
pandas.io.common.CParserError: Error tokenizing data. C error: out of memory
This says it has something to do with C parsing engine (which is the default one). Maybe changing to a python one will change anything
counts = pd.read_table(path_counts, sep='t', index_col=2, header=None, engine='python')
Segmentation fault (core dumped)
Now that is a different error.
If we go ahead and try to remove spaces from the table, the error from python-engine changes once again:
1115794 4218 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae",""
1144102 3180 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae","g__Bacillus",""
368444 2328 "k__Bacteria","p__Bacteroidetes","c__Bacteroidia","o__Bacteroidales","f__Bacteroidaceae","g__Bacteroides",""
_csv.Error: ' ' expected after '"'
And it gets clear that pandas was having problems parsing our rows. To parse a table with python engine I needed to remove all spaces and quotes from the table beforehand. Meanwhile C-engine kept crashing even with commas in rows.
To avoid creating a new file with replacements I did this, as my tables are small:
from io import StringIO
with open(path_counts) as f:
input = StringIO(f.read().replace('", ""', '').replace('"', '').replace(', ', ',').replace('',''))
counts = pd.read_table(input, sep='t', index_col=2, header=None, engine='python')
tl;dr
Change parsing engine, try to avoid any non-delimiting quotes/commas/spaces in your data.
I've had a similar problem while trying to read a tab-delimited table with spaces, commas and quotes:
1115794 4218 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", ""
1144102 3180 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", "g__Bacillus", ""
368444 2328 "k__Bacteria", "p__Bacteroidetes", "c__Bacteroidia", "o__Bacteroidales", "f__Bacteroidaceae", "g__Bacteroides", ""
import pandas as pd
# Same error for read_table
counts = pd.read_csv(path_counts, sep='t', index_col=2, header=None, engine = 'c')
pandas.io.common.CParserError: Error tokenizing data. C error: out of memory
This says it has something to do with C parsing engine (which is the default one). Maybe changing to a python one will change anything
counts = pd.read_table(path_counts, sep='t', index_col=2, header=None, engine='python')
Segmentation fault (core dumped)
Now that is a different error.
If we go ahead and try to remove spaces from the table, the error from python-engine changes once again:
1115794 4218 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae",""
1144102 3180 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae","g__Bacillus",""
368444 2328 "k__Bacteria","p__Bacteroidetes","c__Bacteroidia","o__Bacteroidales","f__Bacteroidaceae","g__Bacteroides",""
_csv.Error: ' ' expected after '"'
And it gets clear that pandas was having problems parsing our rows. To parse a table with python engine I needed to remove all spaces and quotes from the table beforehand. Meanwhile C-engine kept crashing even with commas in rows.
To avoid creating a new file with replacements I did this, as my tables are small:
from io import StringIO
with open(path_counts) as f:
input = StringIO(f.read().replace('", ""', '').replace('"', '').replace(', ', ',').replace('',''))
counts = pd.read_table(input, sep='t', index_col=2, header=None, engine='python')
tl;dr
Change parsing engine, try to avoid any non-delimiting quotes/commas/spaces in your data.
edited Apr 25 '17 at 15:00
answered Apr 24 '17 at 11:28
lotrus28lotrus28
183212
183212
add a comment |
add a comment |
Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg
compression
resolved my problem.
result = pandas.read_csv(data_source, compression='gzip')
add a comment |
Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg
compression
resolved my problem.
result = pandas.read_csv(data_source, compression='gzip')
add a comment |
Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg
compression
resolved my problem.
result = pandas.read_csv(data_source, compression='gzip')
Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg
compression
resolved my problem.
result = pandas.read_csv(data_source, compression='gzip')
answered Oct 3 '16 at 15:45
RegularlyScheduledProgrammingRegularlyScheduledProgramming
9091624
9091624
add a comment |
add a comment |
following sequence of commands works (I lose the first line of the data -no header=None present-, but at least it loads):
df = pd.read_csv(filename,
usecols=range(0, 42))
df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']
Following does NOT work:
df = pd.read_csv(filename,
names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'],
usecols=range(0, 42))
CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
Following does NOT work:
df = pd.read_csv(filename,
header=None)
CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
Hence, in your problem you have to pass usecols=range(0, 2)
add a comment |
following sequence of commands works (I lose the first line of the data -no header=None present-, but at least it loads):
df = pd.read_csv(filename,
usecols=range(0, 42))
df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']
Following does NOT work:
df = pd.read_csv(filename,
names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'],
usecols=range(0, 42))
CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
Following does NOT work:
df = pd.read_csv(filename,
header=None)
CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
Hence, in your problem you have to pass usecols=range(0, 2)
add a comment |
following sequence of commands works (I lose the first line of the data -no header=None present-, but at least it loads):
df = pd.read_csv(filename,
usecols=range(0, 42))
df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']
Following does NOT work:
df = pd.read_csv(filename,
names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'],
usecols=range(0, 42))
CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
Following does NOT work:
df = pd.read_csv(filename,
header=None)
CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
Hence, in your problem you have to pass usecols=range(0, 2)
following sequence of commands works (I lose the first line of the data -no header=None present-, but at least it loads):
df = pd.read_csv(filename,
usecols=range(0, 42))
df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']
Following does NOT work:
df = pd.read_csv(filename,
names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'],
usecols=range(0, 42))
CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
Following does NOT work:
df = pd.read_csv(filename,
header=None)
CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
Hence, in your problem you have to pass usecols=range(0, 2)
answered May 23 '18 at 11:45
kepy97kepy97
14310
14310
add a comment |
add a comment |
Sometimes the problem is not how to use python, but with the raw data.
I got this error message
Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.
It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.
add a comment |
Sometimes the problem is not how to use python, but with the raw data.
I got this error message
Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.
It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.
add a comment |
Sometimes the problem is not how to use python, but with the raw data.
I got this error message
Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.
It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.
Sometimes the problem is not how to use python, but with the raw data.
I got this error message
Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.
It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.
edited Nov 15 '17 at 12:13
Aks4125
2,73111131
2,73111131
answered Nov 15 '17 at 10:59
Kims SifersKims Sifers
211
211
add a comment |
add a comment |
use
pandas.read_csv('CSVFILENAME',header=None,sep=', ')
when trying to read csv data from the link
http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data
I copied the data from the site into my csvfile. It had extra spaces so used sep =', ' and it worked :)
add a comment |
use
pandas.read_csv('CSVFILENAME',header=None,sep=', ')
when trying to read csv data from the link
http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data
I copied the data from the site into my csvfile. It had extra spaces so used sep =', ' and it worked :)
add a comment |
use
pandas.read_csv('CSVFILENAME',header=None,sep=', ')
when trying to read csv data from the link
http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data
I copied the data from the site into my csvfile. It had extra spaces so used sep =', ' and it worked :)
use
pandas.read_csv('CSVFILENAME',header=None,sep=', ')
when trying to read csv data from the link
http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data
I copied the data from the site into my csvfile. It had extra spaces so used sep =', ' and it worked :)
answered Jan 2 '18 at 9:56
AbhishekAbhishek
673
673
add a comment |
add a comment |
An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:
import csv
import pandas as pd
path = 'C:/FileLocation/'
file = 'filename.csv'
f = open(path+file,'rt')
reader = csv.reader(f)
#once contents are available, I then put them in a list
csv_list =
for l in reader:
csv_list.append(l)
f.close()
#now pandas has no problem getting into a df
df = pd.DataFrame(csv_list)
I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.
add a comment |
An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:
import csv
import pandas as pd
path = 'C:/FileLocation/'
file = 'filename.csv'
f = open(path+file,'rt')
reader = csv.reader(f)
#once contents are available, I then put them in a list
csv_list =
for l in reader:
csv_list.append(l)
f.close()
#now pandas has no problem getting into a df
df = pd.DataFrame(csv_list)
I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.
add a comment |
An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:
import csv
import pandas as pd
path = 'C:/FileLocation/'
file = 'filename.csv'
f = open(path+file,'rt')
reader = csv.reader(f)
#once contents are available, I then put them in a list
csv_list =
for l in reader:
csv_list.append(l)
f.close()
#now pandas has no problem getting into a df
df = pd.DataFrame(csv_list)
I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.
An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:
import csv
import pandas as pd
path = 'C:/FileLocation/'
file = 'filename.csv'
f = open(path+file,'rt')
reader = csv.reader(f)
#once contents are available, I then put them in a list
csv_list =
for l in reader:
csv_list.append(l)
f.close()
#now pandas has no problem getting into a df
df = pd.DataFrame(csv_list)
I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.
answered Jan 26 '18 at 20:54
bcozbcoz
384
384
add a comment |
add a comment |
I had a dataset with prexisting row numbers, I used index_col:
pd.read_csv('train.csv', index_col=0)
add a comment |
I had a dataset with prexisting row numbers, I used index_col:
pd.read_csv('train.csv', index_col=0)
add a comment |
I had a dataset with prexisting row numbers, I used index_col:
pd.read_csv('train.csv', index_col=0)
I had a dataset with prexisting row numbers, I used index_col:
pd.read_csv('train.csv', index_col=0)
answered Jun 20 '17 at 5:28
spicyramenspicyramen
3,26123469
3,26123469
add a comment |
add a comment |
This is what I did.
sep='::'
solved my issue:
data=pd.read_csv('C:\Users\HP\Downloads\NPL ASSINGMENT 2 imdb_labelled\imdb_labelled.txt',engine='python',header=None,sep='::')
add a comment |
This is what I did.
sep='::'
solved my issue:
data=pd.read_csv('C:\Users\HP\Downloads\NPL ASSINGMENT 2 imdb_labelled\imdb_labelled.txt',engine='python',header=None,sep='::')
add a comment |
This is what I did.
sep='::'
solved my issue:
data=pd.read_csv('C:\Users\HP\Downloads\NPL ASSINGMENT 2 imdb_labelled\imdb_labelled.txt',engine='python',header=None,sep='::')
This is what I did.
sep='::'
solved my issue:
data=pd.read_csv('C:\Users\HP\Downloads\NPL ASSINGMENT 2 imdb_labelled\imdb_labelled.txt',engine='python',header=None,sep='::')
edited Oct 21 '18 at 15:54
ssuperczynski
1,53112446
1,53112446
answered Oct 21 '18 at 13:04
Saurabh TripathiSaurabh Tripathi
112
112
add a comment |
add a comment |
I had a similar case as this and setting
train = pd.read_csv('input.csv' , encoding='latin1',engine='python')
worked
add a comment |
I had a similar case as this and setting
train = pd.read_csv('input.csv' , encoding='latin1',engine='python')
worked
add a comment |
I had a similar case as this and setting
train = pd.read_csv('input.csv' , encoding='latin1',engine='python')
worked
I had a similar case as this and setting
train = pd.read_csv('input.csv' , encoding='latin1',engine='python')
worked
answered Nov 20 '18 at 2:08
Adewole AdesolaAdewole Adesola
6112
6112
add a comment |
add a comment |
Use delimiter in parameter
pd.read_csv(filename, delimiter=",", encoding='utf-8')
It will read.
add a comment |
Use delimiter in parameter
pd.read_csv(filename, delimiter=",", encoding='utf-8')
It will read.
add a comment |
Use delimiter in parameter
pd.read_csv(filename, delimiter=",", encoding='utf-8')
It will read.
Use delimiter in parameter
pd.read_csv(filename, delimiter=",", encoding='utf-8')
It will read.
answered Nov 21 '18 at 13:03
Bhavesh KumarBhavesh Kumar
569
569
add a comment |
add a comment |
I have the same problem when read_csv: ParserError: Error tokenizing data.
I just saved the old csv file to a new csv file. The problem is solved!
add a comment |
I have the same problem when read_csv: ParserError: Error tokenizing data.
I just saved the old csv file to a new csv file. The problem is solved!
add a comment |
I have the same problem when read_csv: ParserError: Error tokenizing data.
I just saved the old csv file to a new csv file. The problem is solved!
I have the same problem when read_csv: ParserError: Error tokenizing data.
I just saved the old csv file to a new csv file. The problem is solved!
answered Nov 26 '18 at 13:32
Simin ZuoSimin Zuo
112
112
add a comment |
add a comment |
I had this problem, where I was trying to read in a CSV without passing in column names.
df = pd.read_csv(filename, header=None)
I specified the column names in a list beforehand and then pass them into names
, and it solved it immediately. If you don't have set column names, you could just create as many placeholder names as the maximum number of columns that might be in your data.
col_names = ["col1", "col2", "col3", ...]
df = pd.read_csv(filename, names=col_names)
add a comment |
I had this problem, where I was trying to read in a CSV without passing in column names.
df = pd.read_csv(filename, header=None)
I specified the column names in a list beforehand and then pass them into names
, and it solved it immediately. If you don't have set column names, you could just create as many placeholder names as the maximum number of columns that might be in your data.
col_names = ["col1", "col2", "col3", ...]
df = pd.read_csv(filename, names=col_names)
add a comment |
I had this problem, where I was trying to read in a CSV without passing in column names.
df = pd.read_csv(filename, header=None)
I specified the column names in a list beforehand and then pass them into names
, and it solved it immediately. If you don't have set column names, you could just create as many placeholder names as the maximum number of columns that might be in your data.
col_names = ["col1", "col2", "col3", ...]
df = pd.read_csv(filename, names=col_names)
I had this problem, where I was trying to read in a CSV without passing in column names.
df = pd.read_csv(filename, header=None)
I specified the column names in a list beforehand and then pass them into names
, and it solved it immediately. If you don't have set column names, you could just create as many placeholder names as the maximum number of columns that might be in your data.
col_names = ["col1", "col2", "col3", ...]
df = pd.read_csv(filename, names=col_names)
answered Jan 8 at 18:57
Steven RoukSteven Rouk
636
636
add a comment |
add a comment |
I had a similar error and the issue was that I had some escaped quotes in my csv file and needed to set the escapechar parameter appropriately.
add a comment |
I had a similar error and the issue was that I had some escaped quotes in my csv file and needed to set the escapechar parameter appropriately.
add a comment |
I had a similar error and the issue was that I had some escaped quotes in my csv file and needed to set the escapechar parameter appropriately.
I had a similar error and the issue was that I had some escaped quotes in my csv file and needed to set the escapechar parameter appropriately.
answered Dec 12 '17 at 11:43
jvvwjvvw
50656
50656
add a comment |
add a comment |
You can do this step to avoid the problem -
train = pd.read_csv('/home/Project/output.csv' , header=None)
just add - header=None
Hope this helps!!
add a comment |
You can do this step to avoid the problem -
train = pd.read_csv('/home/Project/output.csv' , header=None)
just add - header=None
Hope this helps!!
add a comment |
You can do this step to avoid the problem -
train = pd.read_csv('/home/Project/output.csv' , header=None)
just add - header=None
Hope this helps!!
You can do this step to avoid the problem -
train = pd.read_csv('/home/Project/output.csv' , header=None)
just add - header=None
Hope this helps!!
edited Aug 19 '18 at 7:27
LuFFy
3,547102751
3,547102751
answered Aug 19 '18 at 6:59
rahul ranjanrahul ranjan
34
34
add a comment |
add a comment |
Issue could be with file Issues, In my case, Issue was solved after renaming the file. yet to figure out the reason..
add a comment |
Issue could be with file Issues, In my case, Issue was solved after renaming the file. yet to figure out the reason..
add a comment |
Issue could be with file Issues, In my case, Issue was solved after renaming the file. yet to figure out the reason..
Issue could be with file Issues, In my case, Issue was solved after renaming the file. yet to figure out the reason..
answered Oct 28 '18 at 12:46
SQA_LEARNSQA_LEARN
317
317
add a comment |
add a comment |
I had received a .csv from a coworker and when I tried to read the csv using pd.read_csv(), I received a similar error. It was apparently attempting to use the first row to generate the columns for the dataframe, but there were many rows which contained more columns than the first row would imply. I ended up fixing this problem by simply opening and re-saving the file as .csv and using pd.read_csv() again.
add a comment |
I had received a .csv from a coworker and when I tried to read the csv using pd.read_csv(), I received a similar error. It was apparently attempting to use the first row to generate the columns for the dataframe, but there were many rows which contained more columns than the first row would imply. I ended up fixing this problem by simply opening and re-saving the file as .csv and using pd.read_csv() again.
add a comment |
I had received a .csv from a coworker and when I tried to read the csv using pd.read_csv(), I received a similar error. It was apparently attempting to use the first row to generate the columns for the dataframe, but there were many rows which contained more columns than the first row would imply. I ended up fixing this problem by simply opening and re-saving the file as .csv and using pd.read_csv() again.
I had received a .csv from a coworker and when I tried to read the csv using pd.read_csv(), I received a similar error. It was apparently attempting to use the first row to generate the columns for the dataframe, but there were many rows which contained more columns than the first row would imply. I ended up fixing this problem by simply opening and re-saving the file as .csv and using pd.read_csv() again.
answered Jul 13 '18 at 17:31
Victor BurnettVictor Burnett
345
345
add a comment |
add a comment |
try: pandas.read_csv(path, sep = ',' ,header=None)
add a comment |
try: pandas.read_csv(path, sep = ',' ,header=None)
add a comment |
try: pandas.read_csv(path, sep = ',' ,header=None)
try: pandas.read_csv(path, sep = ',' ,header=None)
answered Oct 10 '17 at 8:40
THE2ndMOUSETHE2ndMOUSE
247
247
add a comment |
add a comment |
protected by Community♦ Jan 8 at 13:26
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
4
If this error arises when reading a file written by
pandas.to_csv()
, it MIGHT be because there is a 'r' in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, causing a difference between the number of columns in the first X rows. This difference is one cause of the C error.– user0
Jan 23 '17 at 0:56
Sometime just explicitly giving the "sep" parameter helps. Seems to be a parser issue.
– gilgamash
May 23 '18 at 12:30
2
This error may arise also when you're using comma as a delimiter and you have more commas then expected (more fields in the error row then defined in the header). So you need to either remove the additional field or remove the extra comma if it's there by mistake. You can fix this manually and then you don't need to skip the error lines.
– tsveti_iko
Aug 22 '18 at 9:44