Python Pandas Error tokenizing data












199















I'm trying to use pandas to manipulate a .csv file but I get this error:




pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12




I have tried to read the pandas docs, but found nothing.



My code is simple:



path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)


How can I resolve this? Should I use the csv module or another language ?



File is from Morningstar










share|improve this question




















  • 4





    If this error arises when reading a file written by pandas.to_csv(), it MIGHT be because there is a 'r' in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, causing a difference between the number of columns in the first X rows. This difference is one cause of the C error.

    – user0
    Jan 23 '17 at 0:56











  • Sometime just explicitly giving the "sep" parameter helps. Seems to be a parser issue.

    – gilgamash
    May 23 '18 at 12:30






  • 2





    This error may arise also when you're using comma as a delimiter and you have more commas then expected (more fields in the error row then defined in the header). So you need to either remove the additional field or remove the extra comma if it's there by mistake. You can fix this manually and then you don't need to skip the error lines.

    – tsveti_iko
    Aug 22 '18 at 9:44


















199















I'm trying to use pandas to manipulate a .csv file but I get this error:




pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12




I have tried to read the pandas docs, but found nothing.



My code is simple:



path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)


How can I resolve this? Should I use the csv module or another language ?



File is from Morningstar










share|improve this question




















  • 4





    If this error arises when reading a file written by pandas.to_csv(), it MIGHT be because there is a 'r' in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, causing a difference between the number of columns in the first X rows. This difference is one cause of the C error.

    – user0
    Jan 23 '17 at 0:56











  • Sometime just explicitly giving the "sep" parameter helps. Seems to be a parser issue.

    – gilgamash
    May 23 '18 at 12:30






  • 2





    This error may arise also when you're using comma as a delimiter and you have more commas then expected (more fields in the error row then defined in the header). So you need to either remove the additional field or remove the extra comma if it's there by mistake. You can fix this manually and then you don't need to skip the error lines.

    – tsveti_iko
    Aug 22 '18 at 9:44
















199












199








199


40






I'm trying to use pandas to manipulate a .csv file but I get this error:




pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12




I have tried to read the pandas docs, but found nothing.



My code is simple:



path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)


How can I resolve this? Should I use the csv module or another language ?



File is from Morningstar










share|improve this question
















I'm trying to use pandas to manipulate a .csv file but I get this error:




pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12




I have tried to read the pandas docs, but found nothing.



My code is simple:



path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)


How can I resolve this? Should I use the csv module or another language ?



File is from Morningstar







python csv pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 30 '17 at 15:00









Ronak Shah

35.8k103856




35.8k103856










asked Aug 4 '13 at 1:54









abuteauabuteau

1,14821117




1,14821117








  • 4





    If this error arises when reading a file written by pandas.to_csv(), it MIGHT be because there is a 'r' in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, causing a difference between the number of columns in the first X rows. This difference is one cause of the C error.

    – user0
    Jan 23 '17 at 0:56











  • Sometime just explicitly giving the "sep" parameter helps. Seems to be a parser issue.

    – gilgamash
    May 23 '18 at 12:30






  • 2





    This error may arise also when you're using comma as a delimiter and you have more commas then expected (more fields in the error row then defined in the header). So you need to either remove the additional field or remove the extra comma if it's there by mistake. You can fix this manually and then you don't need to skip the error lines.

    – tsveti_iko
    Aug 22 '18 at 9:44
















  • 4





    If this error arises when reading a file written by pandas.to_csv(), it MIGHT be because there is a 'r' in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, causing a difference between the number of columns in the first X rows. This difference is one cause of the C error.

    – user0
    Jan 23 '17 at 0:56











  • Sometime just explicitly giving the "sep" parameter helps. Seems to be a parser issue.

    – gilgamash
    May 23 '18 at 12:30






  • 2





    This error may arise also when you're using comma as a delimiter and you have more commas then expected (more fields in the error row then defined in the header). So you need to either remove the additional field or remove the extra comma if it's there by mistake. You can fix this manually and then you don't need to skip the error lines.

    – tsveti_iko
    Aug 22 '18 at 9:44










4




4





If this error arises when reading a file written by pandas.to_csv(), it MIGHT be because there is a 'r' in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, causing a difference between the number of columns in the first X rows. This difference is one cause of the C error.

– user0
Jan 23 '17 at 0:56





If this error arises when reading a file written by pandas.to_csv(), it MIGHT be because there is a 'r' in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, causing a difference between the number of columns in the first X rows. This difference is one cause of the C error.

– user0
Jan 23 '17 at 0:56













Sometime just explicitly giving the "sep" parameter helps. Seems to be a parser issue.

– gilgamash
May 23 '18 at 12:30





Sometime just explicitly giving the "sep" parameter helps. Seems to be a parser issue.

– gilgamash
May 23 '18 at 12:30




2




2





This error may arise also when you're using comma as a delimiter and you have more commas then expected (more fields in the error row then defined in the header). So you need to either remove the additional field or remove the extra comma if it's there by mistake. You can fix this manually and then you don't need to skip the error lines.

– tsveti_iko
Aug 22 '18 at 9:44







This error may arise also when you're using comma as a delimiter and you have more commas then expected (more fields in the error row then defined in the header). So you need to either remove the additional field or remove the extra comma if it's there by mistake. You can fix this manually and then you don't need to skip the error lines.

– tsveti_iko
Aug 22 '18 at 9:44














25 Answers
25






active

oldest

votes


















293














you could also try;



data = pd.read_csv('file1.csv', error_bad_lines=False)





share|improve this answer



















  • 92





    Do note that using error_bad_lines=False will cause the offending lines to be skipped.

    – biobirdman
    May 20 '14 at 7:27






  • 5





    Stumbled on this answer, is there a way to fill missing columns on lines that outputs something like expected 8 fields, saw 9?

    – Petra Barus
    Sep 24 '14 at 10:11






  • 17





    The better solution is to investigate the offending file and to correct the bad lines so that they can be read by read_csv. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?

    – dbliss
    Oct 6 '14 at 22:57






  • 3





    Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.

    – Petra Barus
    Oct 7 '14 at 2:17






  • 1





    There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

    – MTT
    May 15 '17 at 2:48



















51














It might be an issue with




  • the delimiters in your data

  • the first row, as @TomAugspurger noted


To solve it, try specifying the sep and/or header arguments when calling read_csv. For instance,



df = pandas.read_csv(fileName, sep='delimiter', header=None)


In the code above, sep defines your delimiter and header=None tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indices for each field {0,1,2,...}.



According to the docs, the delimiter thing should not be an issue. The docs say that "if sep is None [not specified], will try to automatically determine this." I however have not had good luck with this, including instances with obvious delimiters.






share|improve this answer





















  • 1





    this solved my issue

    – Hemaa mathavan
    Apr 8 '17 at 8:37



















29














The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren't representative of the actual data in the file.



Try it with data = pd.read_csv(path, skiprows=2)






share|improve this answer
























  • Works like a charm. Thanks !

    – abuteau
    Aug 4 '13 at 2:43



















21














Your CSV file might have variable number of columns and read_csv inferred the number of columns from the first few rows. Two ways to solve it in this case:



1) Change the CSV file to have a dummy first line with max number of columns (and specify header=[0])



2) Or use names = list(range(0,N)) where N is the max number of columns.






share|improve this answer





















  • 1





    This really helped!

    – Archie
    May 30 '17 at 16:18











  • This should be the accepted answer

    – Vivek
    Sep 8 '18 at 10:41



















14














I had this problem as well but perhaps for a different reason. I had some trailing commas in my CSV that were adding an additional column that pandas was attempting to read. Using the following works but it simply ignores the bad lines:



data = pd.read_csv('file1.csv', error_bad_lines=False)


If you want to keep the lines an ugly kind of hack for handling the errors is to do something like the following:



line     = 
expected =
saw =
cont = True

while cont == True:
try:
data = pd.read_csv('file1.csv',skiprows=line)
cont = False
except Exception as e:
errortype = e.message.split('.')[0].strip()
if errortype == 'Error tokenizing data':
cerror = e.message.split(':')[1].strip().replace(',','')
nums = [n for n in cerror.split(' ') if str.isdigit(n)]
expected.append(int(nums[0]))
saw.append(int(nums[2]))
line.append(int(nums[1])-1)
else:
cerror = 'Unknown'
print 'Unknown Error - 222'

if line != :
# Handle the errors however you want


I proceeded to write a script to reinsert the lines into the DataFrame since the bad lines will be given by the variable 'line' in the above code. This can all be avoided by simply using the csv reader. Hopefully the pandas developers can make it easier to deal with this situation in the future.






share|improve this answer































    13














    This is definitely an issue of delimiter, as most of the csv CSV are got create using sep='/t' so try to read_csv using the tab character (t) using separator /t. so, try to open using following code line.



    data=pd.read_csv("File_path", sep='t')





    share|improve this answer





















    • 4





      @MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)

      – DJGrandpaJ
      Apr 13 '16 at 19:54











    • @DJGrandpaJ Thanks did not know that!

      – Michael Queue
      May 16 '16 at 3:25











    • in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)

      – user108569
      Jul 17 '18 at 16:41



















    8














    I've had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by "properly", I mean each row had the same number of separators or columns.



    Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.



    Any file saved with pandas to_csv will be properly formatted and shouldn't have that issue. But if you open it with another program, it may change the structure.



    Hope that helps.






    share|improve this answer



















    • 6





      What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.

      – elPastor
      Jul 7 '16 at 19:31



















    7














    I came across the same issue. Using pd.read_table() on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.



    Edit:
    I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.



    I usually get around this by reading the extra data into a file then use the read_csv() method.



    The exact solution might differ depending on your actual file, but this approach has worked for me in several cases






    share|improve this answer

































      4














      I've had a similar problem while trying to read a tab-delimited table with spaces, commas and quotes:



      1115794 4218    "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", ""
      1144102 3180 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", "g__Bacillus", ""
      368444 2328 "k__Bacteria", "p__Bacteroidetes", "c__Bacteroidia", "o__Bacteroidales", "f__Bacteroidaceae", "g__Bacteroides", ""



      import pandas as pd
      # Same error for read_table
      counts = pd.read_csv(path_counts, sep='t', index_col=2, header=None, engine = 'c')

      pandas.io.common.CParserError: Error tokenizing data. C error: out of memory


      This says it has something to do with C parsing engine (which is the default one). Maybe changing to a python one will change anything



      counts = pd.read_table(path_counts, sep='t', index_col=2, header=None, engine='python')

      Segmentation fault (core dumped)


      Now that is a different error.

      If we go ahead and try to remove spaces from the table, the error from python-engine changes once again:



      1115794 4218    "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae",""
      1144102 3180 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae","g__Bacillus",""
      368444 2328 "k__Bacteria","p__Bacteroidetes","c__Bacteroidia","o__Bacteroidales","f__Bacteroidaceae","g__Bacteroides",""


      _csv.Error: ' ' expected after '"'


      And it gets clear that pandas was having problems parsing our rows. To parse a table with python engine I needed to remove all spaces and quotes from the table beforehand. Meanwhile C-engine kept crashing even with commas in rows.



      To avoid creating a new file with replacements I did this, as my tables are small:



      from io import StringIO
      with open(path_counts) as f:
      input = StringIO(f.read().replace('", ""', '').replace('"', '').replace(', ', ',').replace('',''))
      counts = pd.read_table(input, sep='t', index_col=2, header=None, engine='python')


      tl;dr


      Change parsing engine, try to avoid any non-delimiting quotes/commas/spaces in your data.






      share|improve this answer

































        3














        Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg compression resolved my problem.



        result = pandas.read_csv(data_source, compression='gzip')





        share|improve this answer































          3














          following sequence of commands works (I lose the first line of the data -no header=None present-, but at least it loads):



          df = pd.read_csv(filename,
          usecols=range(0, 42))
          df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
          'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
          'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
          'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
          'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
          'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']



          Following does NOT work:



          df = pd.read_csv(filename,
          names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
          'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
          'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
          'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
          'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
          'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'],
          usecols=range(0, 42))



          CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
          Following does NOT work:



          df = pd.read_csv(filename,
          header=None)



          CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54



          Hence, in your problem you have to pass usecols=range(0, 2)






          share|improve this answer































            2














            Sometimes the problem is not how to use python, but with the raw data.

            I got this error message



            Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.


            It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.






            share|improve this answer

































              2














              use
              pandas.read_csv('CSVFILENAME',header=None,sep=', ')



              when trying to read csv data from the link



              http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data



              I copied the data from the site into my csvfile. It had extra spaces so used sep =', ' and it worked :)






              share|improve this answer































                2














                An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:



                import csv
                import pandas as pd
                path = 'C:/FileLocation/'
                file = 'filename.csv'
                f = open(path+file,'rt')
                reader = csv.reader(f)

                #once contents are available, I then put them in a list
                csv_list =
                for l in reader:
                csv_list.append(l)
                f.close()
                #now pandas has no problem getting into a df
                df = pd.DataFrame(csv_list)


                I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.






                share|improve this answer































                  1














                  I had a dataset with prexisting row numbers, I used index_col:



                  pd.read_csv('train.csv', index_col=0)





                  share|improve this answer































                    1














                    This is what I did.



                    sep='::' solved my issue:



                    data=pd.read_csv('C:\Users\HP\Downloads\NPL ASSINGMENT 2 imdb_labelled\imdb_labelled.txt',engine='python',header=None,sep='::')





                    share|improve this answer

































                      1














                      I had a similar case as this and setting



                      train = pd.read_csv('input.csv' , encoding='latin1',engine='python') 


                      worked






                      share|improve this answer































                        1














                        Use delimiter in parameter



                        pd.read_csv(filename, delimiter=",", encoding='utf-8')


                        It will read.






                        share|improve this answer































                          1














                          I have the same problem when read_csv: ParserError: Error tokenizing data.
                          I just saved the old csv file to a new csv file. The problem is solved!






                          share|improve this answer































                            1














                            I had this problem, where I was trying to read in a CSV without passing in column names.



                            df = pd.read_csv(filename, header=None)


                            I specified the column names in a list beforehand and then pass them into names, and it solved it immediately. If you don't have set column names, you could just create as many placeholder names as the maximum number of columns that might be in your data.



                            col_names = ["col1", "col2", "col3", ...]
                            df = pd.read_csv(filename, names=col_names)





                            share|improve this answer































                              0














                              I had a similar error and the issue was that I had some escaped quotes in my csv file and needed to set the escapechar parameter appropriately.






                              share|improve this answer































                                0














                                You can do this step to avoid the problem -



                                train = pd.read_csv('/home/Project/output.csv' , header=None)


                                just add - header=None



                                Hope this helps!!






                                share|improve this answer

































                                  0














                                  Issue could be with file Issues, In my case, Issue was solved after renaming the file. yet to figure out the reason..






                                  share|improve this answer































                                    -1














                                    I had received a .csv from a coworker and when I tried to read the csv using pd.read_csv(), I received a similar error. It was apparently attempting to use the first row to generate the columns for the dataframe, but there were many rows which contained more columns than the first row would imply. I ended up fixing this problem by simply opening and re-saving the file as .csv and using pd.read_csv() again.






                                    share|improve this answer































                                      -2














                                      try: pandas.read_csv(path, sep = ',' ,header=None)






                                      share|improve this answer






















                                        protected by Community Jan 8 at 13:26



                                        Thank you for your interest in this question.
                                        Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                                        Would you like to answer one of these unanswered questions instead?














                                        25 Answers
                                        25






                                        active

                                        oldest

                                        votes








                                        25 Answers
                                        25






                                        active

                                        oldest

                                        votes









                                        active

                                        oldest

                                        votes






                                        active

                                        oldest

                                        votes









                                        293














                                        you could also try;



                                        data = pd.read_csv('file1.csv', error_bad_lines=False)





                                        share|improve this answer



















                                        • 92





                                          Do note that using error_bad_lines=False will cause the offending lines to be skipped.

                                          – biobirdman
                                          May 20 '14 at 7:27






                                        • 5





                                          Stumbled on this answer, is there a way to fill missing columns on lines that outputs something like expected 8 fields, saw 9?

                                          – Petra Barus
                                          Sep 24 '14 at 10:11






                                        • 17





                                          The better solution is to investigate the offending file and to correct the bad lines so that they can be read by read_csv. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?

                                          – dbliss
                                          Oct 6 '14 at 22:57






                                        • 3





                                          Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.

                                          – Petra Barus
                                          Oct 7 '14 at 2:17






                                        • 1





                                          There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

                                          – MTT
                                          May 15 '17 at 2:48
















                                        293














                                        you could also try;



                                        data = pd.read_csv('file1.csv', error_bad_lines=False)





                                        share|improve this answer



















                                        • 92





                                          Do note that using error_bad_lines=False will cause the offending lines to be skipped.

                                          – biobirdman
                                          May 20 '14 at 7:27






                                        • 5





                                          Stumbled on this answer, is there a way to fill missing columns on lines that outputs something like expected 8 fields, saw 9?

                                          – Petra Barus
                                          Sep 24 '14 at 10:11






                                        • 17





                                          The better solution is to investigate the offending file and to correct the bad lines so that they can be read by read_csv. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?

                                          – dbliss
                                          Oct 6 '14 at 22:57






                                        • 3





                                          Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.

                                          – Petra Barus
                                          Oct 7 '14 at 2:17






                                        • 1





                                          There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

                                          – MTT
                                          May 15 '17 at 2:48














                                        293












                                        293








                                        293







                                        you could also try;



                                        data = pd.read_csv('file1.csv', error_bad_lines=False)





                                        share|improve this answer













                                        you could also try;



                                        data = pd.read_csv('file1.csv', error_bad_lines=False)






                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered Aug 8 '13 at 14:47









                                        richierichie

                                        5,02563355




                                        5,02563355








                                        • 92





                                          Do note that using error_bad_lines=False will cause the offending lines to be skipped.

                                          – biobirdman
                                          May 20 '14 at 7:27






                                        • 5





                                          Stumbled on this answer, is there a way to fill missing columns on lines that outputs something like expected 8 fields, saw 9?

                                          – Petra Barus
                                          Sep 24 '14 at 10:11






                                        • 17





                                          The better solution is to investigate the offending file and to correct the bad lines so that they can be read by read_csv. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?

                                          – dbliss
                                          Oct 6 '14 at 22:57






                                        • 3





                                          Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.

                                          – Petra Barus
                                          Oct 7 '14 at 2:17






                                        • 1





                                          There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

                                          – MTT
                                          May 15 '17 at 2:48














                                        • 92





                                          Do note that using error_bad_lines=False will cause the offending lines to be skipped.

                                          – biobirdman
                                          May 20 '14 at 7:27






                                        • 5





                                          Stumbled on this answer, is there a way to fill missing columns on lines that outputs something like expected 8 fields, saw 9?

                                          – Petra Barus
                                          Sep 24 '14 at 10:11






                                        • 17





                                          The better solution is to investigate the offending file and to correct the bad lines so that they can be read by read_csv. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?

                                          – dbliss
                                          Oct 6 '14 at 22:57






                                        • 3





                                          Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.

                                          – Petra Barus
                                          Oct 7 '14 at 2:17






                                        • 1





                                          There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

                                          – MTT
                                          May 15 '17 at 2:48








                                        92




                                        92





                                        Do note that using error_bad_lines=False will cause the offending lines to be skipped.

                                        – biobirdman
                                        May 20 '14 at 7:27





                                        Do note that using error_bad_lines=False will cause the offending lines to be skipped.

                                        – biobirdman
                                        May 20 '14 at 7:27




                                        5




                                        5





                                        Stumbled on this answer, is there a way to fill missing columns on lines that outputs something like expected 8 fields, saw 9?

                                        – Petra Barus
                                        Sep 24 '14 at 10:11





                                        Stumbled on this answer, is there a way to fill missing columns on lines that outputs something like expected 8 fields, saw 9?

                                        – Petra Barus
                                        Sep 24 '14 at 10:11




                                        17




                                        17





                                        The better solution is to investigate the offending file and to correct the bad lines so that they can be read by read_csv. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?

                                        – dbliss
                                        Oct 6 '14 at 22:57





                                        The better solution is to investigate the offending file and to correct the bad lines so that they can be read by read_csv. @PetraBarus, why not just add columns to the CSV files that are missing them (with null values as needed)?

                                        – dbliss
                                        Oct 6 '14 at 22:57




                                        3




                                        3





                                        Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.

                                        – Petra Barus
                                        Oct 7 '14 at 2:17





                                        Yes, I just did that. It's much easier by adding columns. Opening CSV in a spreadsheet does this.

                                        – Petra Barus
                                        Oct 7 '14 at 2:17




                                        1




                                        1





                                        There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

                                        – MTT
                                        May 15 '17 at 2:48





                                        There is a chance to get this error: CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

                                        – MTT
                                        May 15 '17 at 2:48













                                        51














                                        It might be an issue with




                                        • the delimiters in your data

                                        • the first row, as @TomAugspurger noted


                                        To solve it, try specifying the sep and/or header arguments when calling read_csv. For instance,



                                        df = pandas.read_csv(fileName, sep='delimiter', header=None)


                                        In the code above, sep defines your delimiter and header=None tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indices for each field {0,1,2,...}.



                                        According to the docs, the delimiter thing should not be an issue. The docs say that "if sep is None [not specified], will try to automatically determine this." I however have not had good luck with this, including instances with obvious delimiters.






                                        share|improve this answer





















                                        • 1





                                          this solved my issue

                                          – Hemaa mathavan
                                          Apr 8 '17 at 8:37
















                                        51














                                        It might be an issue with




                                        • the delimiters in your data

                                        • the first row, as @TomAugspurger noted


                                        To solve it, try specifying the sep and/or header arguments when calling read_csv. For instance,



                                        df = pandas.read_csv(fileName, sep='delimiter', header=None)


                                        In the code above, sep defines your delimiter and header=None tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indices for each field {0,1,2,...}.



                                        According to the docs, the delimiter thing should not be an issue. The docs say that "if sep is None [not specified], will try to automatically determine this." I however have not had good luck with this, including instances with obvious delimiters.






                                        share|improve this answer





















                                        • 1





                                          this solved my issue

                                          – Hemaa mathavan
                                          Apr 8 '17 at 8:37














                                        51












                                        51








                                        51







                                        It might be an issue with




                                        • the delimiters in your data

                                        • the first row, as @TomAugspurger noted


                                        To solve it, try specifying the sep and/or header arguments when calling read_csv. For instance,



                                        df = pandas.read_csv(fileName, sep='delimiter', header=None)


                                        In the code above, sep defines your delimiter and header=None tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indices for each field {0,1,2,...}.



                                        According to the docs, the delimiter thing should not be an issue. The docs say that "if sep is None [not specified], will try to automatically determine this." I however have not had good luck with this, including instances with obvious delimiters.






                                        share|improve this answer















                                        It might be an issue with




                                        • the delimiters in your data

                                        • the first row, as @TomAugspurger noted


                                        To solve it, try specifying the sep and/or header arguments when calling read_csv. For instance,



                                        df = pandas.read_csv(fileName, sep='delimiter', header=None)


                                        In the code above, sep defines your delimiter and header=None tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indices for each field {0,1,2,...}.



                                        According to the docs, the delimiter thing should not be an issue. The docs say that "if sep is None [not specified], will try to automatically determine this." I however have not had good luck with this, including instances with obvious delimiters.







                                        share|improve this answer














                                        share|improve this answer



                                        share|improve this answer








                                        edited Jun 5 '18 at 14:24

























                                        answered Oct 28 '14 at 2:18









                                        grisaitisgrisaitis

                                        1,11811121




                                        1,11811121








                                        • 1





                                          this solved my issue

                                          – Hemaa mathavan
                                          Apr 8 '17 at 8:37














                                        • 1





                                          this solved my issue

                                          – Hemaa mathavan
                                          Apr 8 '17 at 8:37








                                        1




                                        1





                                        this solved my issue

                                        – Hemaa mathavan
                                        Apr 8 '17 at 8:37





                                        this solved my issue

                                        – Hemaa mathavan
                                        Apr 8 '17 at 8:37











                                        29














                                        The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren't representative of the actual data in the file.



                                        Try it with data = pd.read_csv(path, skiprows=2)






                                        share|improve this answer
























                                        • Works like a charm. Thanks !

                                          – abuteau
                                          Aug 4 '13 at 2:43
















                                        29














                                        The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren't representative of the actual data in the file.



                                        Try it with data = pd.read_csv(path, skiprows=2)






                                        share|improve this answer
























                                        • Works like a charm. Thanks !

                                          – abuteau
                                          Aug 4 '13 at 2:43














                                        29












                                        29








                                        29







                                        The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren't representative of the actual data in the file.



                                        Try it with data = pd.read_csv(path, skiprows=2)






                                        share|improve this answer













                                        The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren't representative of the actual data in the file.



                                        Try it with data = pd.read_csv(path, skiprows=2)







                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered Aug 4 '13 at 2:24









                                        TomAugspurgerTomAugspurger

                                        15.2k35155




                                        15.2k35155













                                        • Works like a charm. Thanks !

                                          – abuteau
                                          Aug 4 '13 at 2:43



















                                        • Works like a charm. Thanks !

                                          – abuteau
                                          Aug 4 '13 at 2:43

















                                        Works like a charm. Thanks !

                                        – abuteau
                                        Aug 4 '13 at 2:43





                                        Works like a charm. Thanks !

                                        – abuteau
                                        Aug 4 '13 at 2:43











                                        21














                                        Your CSV file might have variable number of columns and read_csv inferred the number of columns from the first few rows. Two ways to solve it in this case:



                                        1) Change the CSV file to have a dummy first line with max number of columns (and specify header=[0])



                                        2) Or use names = list(range(0,N)) where N is the max number of columns.






                                        share|improve this answer





















                                        • 1





                                          This really helped!

                                          – Archie
                                          May 30 '17 at 16:18











                                        • This should be the accepted answer

                                          – Vivek
                                          Sep 8 '18 at 10:41
















                                        21














                                        Your CSV file might have variable number of columns and read_csv inferred the number of columns from the first few rows. Two ways to solve it in this case:



                                        1) Change the CSV file to have a dummy first line with max number of columns (and specify header=[0])



                                        2) Or use names = list(range(0,N)) where N is the max number of columns.






                                        share|improve this answer





















                                        • 1





                                          This really helped!

                                          – Archie
                                          May 30 '17 at 16:18











                                        • This should be the accepted answer

                                          – Vivek
                                          Sep 8 '18 at 10:41














                                        21












                                        21








                                        21







                                        Your CSV file might have variable number of columns and read_csv inferred the number of columns from the first few rows. Two ways to solve it in this case:



                                        1) Change the CSV file to have a dummy first line with max number of columns (and specify header=[0])



                                        2) Or use names = list(range(0,N)) where N is the max number of columns.






                                        share|improve this answer















                                        Your CSV file might have variable number of columns and read_csv inferred the number of columns from the first few rows. Two ways to solve it in this case:



                                        1) Change the CSV file to have a dummy first line with max number of columns (and specify header=[0])



                                        2) Or use names = list(range(0,N)) where N is the max number of columns.







                                        share|improve this answer














                                        share|improve this answer



                                        share|improve this answer








                                        edited Sep 20 '17 at 0:53









                                        Ajean

                                        3,976103250




                                        3,976103250










                                        answered Mar 31 '17 at 16:29









                                        computeristcomputerist

                                        33435




                                        33435








                                        • 1





                                          This really helped!

                                          – Archie
                                          May 30 '17 at 16:18











                                        • This should be the accepted answer

                                          – Vivek
                                          Sep 8 '18 at 10:41














                                        • 1





                                          This really helped!

                                          – Archie
                                          May 30 '17 at 16:18











                                        • This should be the accepted answer

                                          – Vivek
                                          Sep 8 '18 at 10:41








                                        1




                                        1





                                        This really helped!

                                        – Archie
                                        May 30 '17 at 16:18





                                        This really helped!

                                        – Archie
                                        May 30 '17 at 16:18













                                        This should be the accepted answer

                                        – Vivek
                                        Sep 8 '18 at 10:41





                                        This should be the accepted answer

                                        – Vivek
                                        Sep 8 '18 at 10:41











                                        14














                                        I had this problem as well but perhaps for a different reason. I had some trailing commas in my CSV that were adding an additional column that pandas was attempting to read. Using the following works but it simply ignores the bad lines:



                                        data = pd.read_csv('file1.csv', error_bad_lines=False)


                                        If you want to keep the lines an ugly kind of hack for handling the errors is to do something like the following:



                                        line     = 
                                        expected =
                                        saw =
                                        cont = True

                                        while cont == True:
                                        try:
                                        data = pd.read_csv('file1.csv',skiprows=line)
                                        cont = False
                                        except Exception as e:
                                        errortype = e.message.split('.')[0].strip()
                                        if errortype == 'Error tokenizing data':
                                        cerror = e.message.split(':')[1].strip().replace(',','')
                                        nums = [n for n in cerror.split(' ') if str.isdigit(n)]
                                        expected.append(int(nums[0]))
                                        saw.append(int(nums[2]))
                                        line.append(int(nums[1])-1)
                                        else:
                                        cerror = 'Unknown'
                                        print 'Unknown Error - 222'

                                        if line != :
                                        # Handle the errors however you want


                                        I proceeded to write a script to reinsert the lines into the DataFrame since the bad lines will be given by the variable 'line' in the above code. This can all be avoided by simply using the csv reader. Hopefully the pandas developers can make it easier to deal with this situation in the future.






                                        share|improve this answer




























                                          14














                                          I had this problem as well but perhaps for a different reason. I had some trailing commas in my CSV that were adding an additional column that pandas was attempting to read. Using the following works but it simply ignores the bad lines:



                                          data = pd.read_csv('file1.csv', error_bad_lines=False)


                                          If you want to keep the lines an ugly kind of hack for handling the errors is to do something like the following:



                                          line     = 
                                          expected =
                                          saw =
                                          cont = True

                                          while cont == True:
                                          try:
                                          data = pd.read_csv('file1.csv',skiprows=line)
                                          cont = False
                                          except Exception as e:
                                          errortype = e.message.split('.')[0].strip()
                                          if errortype == 'Error tokenizing data':
                                          cerror = e.message.split(':')[1].strip().replace(',','')
                                          nums = [n for n in cerror.split(' ') if str.isdigit(n)]
                                          expected.append(int(nums[0]))
                                          saw.append(int(nums[2]))
                                          line.append(int(nums[1])-1)
                                          else:
                                          cerror = 'Unknown'
                                          print 'Unknown Error - 222'

                                          if line != :
                                          # Handle the errors however you want


                                          I proceeded to write a script to reinsert the lines into the DataFrame since the bad lines will be given by the variable 'line' in the above code. This can all be avoided by simply using the csv reader. Hopefully the pandas developers can make it easier to deal with this situation in the future.






                                          share|improve this answer


























                                            14












                                            14








                                            14







                                            I had this problem as well but perhaps for a different reason. I had some trailing commas in my CSV that were adding an additional column that pandas was attempting to read. Using the following works but it simply ignores the bad lines:



                                            data = pd.read_csv('file1.csv', error_bad_lines=False)


                                            If you want to keep the lines an ugly kind of hack for handling the errors is to do something like the following:



                                            line     = 
                                            expected =
                                            saw =
                                            cont = True

                                            while cont == True:
                                            try:
                                            data = pd.read_csv('file1.csv',skiprows=line)
                                            cont = False
                                            except Exception as e:
                                            errortype = e.message.split('.')[0].strip()
                                            if errortype == 'Error tokenizing data':
                                            cerror = e.message.split(':')[1].strip().replace(',','')
                                            nums = [n for n in cerror.split(' ') if str.isdigit(n)]
                                            expected.append(int(nums[0]))
                                            saw.append(int(nums[2]))
                                            line.append(int(nums[1])-1)
                                            else:
                                            cerror = 'Unknown'
                                            print 'Unknown Error - 222'

                                            if line != :
                                            # Handle the errors however you want


                                            I proceeded to write a script to reinsert the lines into the DataFrame since the bad lines will be given by the variable 'line' in the above code. This can all be avoided by simply using the csv reader. Hopefully the pandas developers can make it easier to deal with this situation in the future.






                                            share|improve this answer













                                            I had this problem as well but perhaps for a different reason. I had some trailing commas in my CSV that were adding an additional column that pandas was attempting to read. Using the following works but it simply ignores the bad lines:



                                            data = pd.read_csv('file1.csv', error_bad_lines=False)


                                            If you want to keep the lines an ugly kind of hack for handling the errors is to do something like the following:



                                            line     = 
                                            expected =
                                            saw =
                                            cont = True

                                            while cont == True:
                                            try:
                                            data = pd.read_csv('file1.csv',skiprows=line)
                                            cont = False
                                            except Exception as e:
                                            errortype = e.message.split('.')[0].strip()
                                            if errortype == 'Error tokenizing data':
                                            cerror = e.message.split(':')[1].strip().replace(',','')
                                            nums = [n for n in cerror.split(' ') if str.isdigit(n)]
                                            expected.append(int(nums[0]))
                                            saw.append(int(nums[2]))
                                            line.append(int(nums[1])-1)
                                            else:
                                            cerror = 'Unknown'
                                            print 'Unknown Error - 222'

                                            if line != :
                                            # Handle the errors however you want


                                            I proceeded to write a script to reinsert the lines into the DataFrame since the bad lines will be given by the variable 'line' in the above code. This can all be avoided by simply using the csv reader. Hopefully the pandas developers can make it easier to deal with this situation in the future.







                                            share|improve this answer












                                            share|improve this answer



                                            share|improve this answer










                                            answered Feb 4 '16 at 22:16









                                            Robert GeigerRobert Geiger

                                            14912




                                            14912























                                                13














                                                This is definitely an issue of delimiter, as most of the csv CSV are got create using sep='/t' so try to read_csv using the tab character (t) using separator /t. so, try to open using following code line.



                                                data=pd.read_csv("File_path", sep='t')





                                                share|improve this answer





















                                                • 4





                                                  @MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)

                                                  – DJGrandpaJ
                                                  Apr 13 '16 at 19:54











                                                • @DJGrandpaJ Thanks did not know that!

                                                  – Michael Queue
                                                  May 16 '16 at 3:25











                                                • in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)

                                                  – user108569
                                                  Jul 17 '18 at 16:41
















                                                13














                                                This is definitely an issue of delimiter, as most of the csv CSV are got create using sep='/t' so try to read_csv using the tab character (t) using separator /t. so, try to open using following code line.



                                                data=pd.read_csv("File_path", sep='t')





                                                share|improve this answer





















                                                • 4





                                                  @MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)

                                                  – DJGrandpaJ
                                                  Apr 13 '16 at 19:54











                                                • @DJGrandpaJ Thanks did not know that!

                                                  – Michael Queue
                                                  May 16 '16 at 3:25











                                                • in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)

                                                  – user108569
                                                  Jul 17 '18 at 16:41














                                                13












                                                13








                                                13







                                                This is definitely an issue of delimiter, as most of the csv CSV are got create using sep='/t' so try to read_csv using the tab character (t) using separator /t. so, try to open using following code line.



                                                data=pd.read_csv("File_path", sep='t')





                                                share|improve this answer















                                                This is definitely an issue of delimiter, as most of the csv CSV are got create using sep='/t' so try to read_csv using the tab character (t) using separator /t. so, try to open using following code line.



                                                data=pd.read_csv("File_path", sep='t')






                                                share|improve this answer














                                                share|improve this answer



                                                share|improve this answer








                                                edited Jun 1 '17 at 13:31









                                                Lucas

                                                2,35211128




                                                2,35211128










                                                answered Apr 1 '15 at 5:42









                                                Piyush S. WanarePiyush S. Wanare

                                                2,09011327




                                                2,09011327








                                                • 4





                                                  @MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)

                                                  – DJGrandpaJ
                                                  Apr 13 '16 at 19:54











                                                • @DJGrandpaJ Thanks did not know that!

                                                  – Michael Queue
                                                  May 16 '16 at 3:25











                                                • in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)

                                                  – user108569
                                                  Jul 17 '18 at 16:41














                                                • 4





                                                  @MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)

                                                  – DJGrandpaJ
                                                  Apr 13 '16 at 19:54











                                                • @DJGrandpaJ Thanks did not know that!

                                                  – Michael Queue
                                                  May 16 '16 at 3:25











                                                • in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)

                                                  – user108569
                                                  Jul 17 '18 at 16:41








                                                4




                                                4





                                                @MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)

                                                – DJGrandpaJ
                                                Apr 13 '16 at 19:54





                                                @MichaelQueue : This is incorrect. A CSV, although commonly delimited by a comma, may be delimited by other characters as well. See CSV specifications. It may be a comma, a tab ('t'), semicolon, and possibly additional spaces. :)

                                                – DJGrandpaJ
                                                Apr 13 '16 at 19:54













                                                @DJGrandpaJ Thanks did not know that!

                                                – Michael Queue
                                                May 16 '16 at 3:25





                                                @DJGrandpaJ Thanks did not know that!

                                                – Michael Queue
                                                May 16 '16 at 3:25













                                                in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)

                                                – user108569
                                                Jul 17 '18 at 16:41





                                                in my case it was a separator issue. read_csv apparently defaults to commas, and i have text fields which include commas (and the data was stored with a different separator anyway)

                                                – user108569
                                                Jul 17 '18 at 16:41











                                                8














                                                I've had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by "properly", I mean each row had the same number of separators or columns.



                                                Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.



                                                Any file saved with pandas to_csv will be properly formatted and shouldn't have that issue. But if you open it with another program, it may change the structure.



                                                Hope that helps.






                                                share|improve this answer



















                                                • 6





                                                  What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.

                                                  – elPastor
                                                  Jul 7 '16 at 19:31
















                                                8














                                                I've had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by "properly", I mean each row had the same number of separators or columns.



                                                Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.



                                                Any file saved with pandas to_csv will be properly formatted and shouldn't have that issue. But if you open it with another program, it may change the structure.



                                                Hope that helps.






                                                share|improve this answer



















                                                • 6





                                                  What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.

                                                  – elPastor
                                                  Jul 7 '16 at 19:31














                                                8












                                                8








                                                8







                                                I've had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by "properly", I mean each row had the same number of separators or columns.



                                                Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.



                                                Any file saved with pandas to_csv will be properly formatted and shouldn't have that issue. But if you open it with another program, it may change the structure.



                                                Hope that helps.






                                                share|improve this answer













                                                I've had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by "properly", I mean each row had the same number of separators or columns.



                                                Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.



                                                Any file saved with pandas to_csv will be properly formatted and shouldn't have that issue. But if you open it with another program, it may change the structure.



                                                Hope that helps.







                                                share|improve this answer












                                                share|improve this answer



                                                share|improve this answer










                                                answered Jul 7 '16 at 17:22









                                                elPastorelPastor

                                                2,73231938




                                                2,73231938








                                                • 6





                                                  What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.

                                                  – elPastor
                                                  Jul 7 '16 at 19:31














                                                • 6





                                                  What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.

                                                  – elPastor
                                                  Jul 7 '16 at 19:31








                                                6




                                                6





                                                What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.

                                                – elPastor
                                                Jul 7 '16 at 19:31





                                                What's up with the down vote? Speak up if you're going to do that. Not all solutions required fancy code, it could be simple methodology that needs changing.

                                                – elPastor
                                                Jul 7 '16 at 19:31











                                                7














                                                I came across the same issue. Using pd.read_table() on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.



                                                Edit:
                                                I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.



                                                I usually get around this by reading the extra data into a file then use the read_csv() method.



                                                The exact solution might differ depending on your actual file, but this approach has worked for me in several cases






                                                share|improve this answer






























                                                  7














                                                  I came across the same issue. Using pd.read_table() on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.



                                                  Edit:
                                                  I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.



                                                  I usually get around this by reading the extra data into a file then use the read_csv() method.



                                                  The exact solution might differ depending on your actual file, but this approach has worked for me in several cases






                                                  share|improve this answer




























                                                    7












                                                    7








                                                    7







                                                    I came across the same issue. Using pd.read_table() on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.



                                                    Edit:
                                                    I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.



                                                    I usually get around this by reading the extra data into a file then use the read_csv() method.



                                                    The exact solution might differ depending on your actual file, but this approach has worked for me in several cases






                                                    share|improve this answer















                                                    I came across the same issue. Using pd.read_table() on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.



                                                    Edit:
                                                    I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.



                                                    I usually get around this by reading the extra data into a file then use the read_csv() method.



                                                    The exact solution might differ depending on your actual file, but this approach has worked for me in several cases







                                                    share|improve this answer














                                                    share|improve this answer



                                                    share|improve this answer








                                                    edited Jul 7 '17 at 9:32

























                                                    answered Jun 30 '14 at 11:46









                                                    Legend_AriLegend_Ari

                                                    10615




                                                    10615























                                                        4














                                                        I've had a similar problem while trying to read a tab-delimited table with spaces, commas and quotes:



                                                        1115794 4218    "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", ""
                                                        1144102 3180 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", "g__Bacillus", ""
                                                        368444 2328 "k__Bacteria", "p__Bacteroidetes", "c__Bacteroidia", "o__Bacteroidales", "f__Bacteroidaceae", "g__Bacteroides", ""



                                                        import pandas as pd
                                                        # Same error for read_table
                                                        counts = pd.read_csv(path_counts, sep='t', index_col=2, header=None, engine = 'c')

                                                        pandas.io.common.CParserError: Error tokenizing data. C error: out of memory


                                                        This says it has something to do with C parsing engine (which is the default one). Maybe changing to a python one will change anything



                                                        counts = pd.read_table(path_counts, sep='t', index_col=2, header=None, engine='python')

                                                        Segmentation fault (core dumped)


                                                        Now that is a different error.

                                                        If we go ahead and try to remove spaces from the table, the error from python-engine changes once again:



                                                        1115794 4218    "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae",""
                                                        1144102 3180 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae","g__Bacillus",""
                                                        368444 2328 "k__Bacteria","p__Bacteroidetes","c__Bacteroidia","o__Bacteroidales","f__Bacteroidaceae","g__Bacteroides",""


                                                        _csv.Error: ' ' expected after '"'


                                                        And it gets clear that pandas was having problems parsing our rows. To parse a table with python engine I needed to remove all spaces and quotes from the table beforehand. Meanwhile C-engine kept crashing even with commas in rows.



                                                        To avoid creating a new file with replacements I did this, as my tables are small:



                                                        from io import StringIO
                                                        with open(path_counts) as f:
                                                        input = StringIO(f.read().replace('", ""', '').replace('"', '').replace(', ', ',').replace('',''))
                                                        counts = pd.read_table(input, sep='t', index_col=2, header=None, engine='python')


                                                        tl;dr


                                                        Change parsing engine, try to avoid any non-delimiting quotes/commas/spaces in your data.






                                                        share|improve this answer






























                                                          4














                                                          I've had a similar problem while trying to read a tab-delimited table with spaces, commas and quotes:



                                                          1115794 4218    "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", ""
                                                          1144102 3180 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", "g__Bacillus", ""
                                                          368444 2328 "k__Bacteria", "p__Bacteroidetes", "c__Bacteroidia", "o__Bacteroidales", "f__Bacteroidaceae", "g__Bacteroides", ""



                                                          import pandas as pd
                                                          # Same error for read_table
                                                          counts = pd.read_csv(path_counts, sep='t', index_col=2, header=None, engine = 'c')

                                                          pandas.io.common.CParserError: Error tokenizing data. C error: out of memory


                                                          This says it has something to do with C parsing engine (which is the default one). Maybe changing to a python one will change anything



                                                          counts = pd.read_table(path_counts, sep='t', index_col=2, header=None, engine='python')

                                                          Segmentation fault (core dumped)


                                                          Now that is a different error.

                                                          If we go ahead and try to remove spaces from the table, the error from python-engine changes once again:



                                                          1115794 4218    "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae",""
                                                          1144102 3180 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae","g__Bacillus",""
                                                          368444 2328 "k__Bacteria","p__Bacteroidetes","c__Bacteroidia","o__Bacteroidales","f__Bacteroidaceae","g__Bacteroides",""


                                                          _csv.Error: ' ' expected after '"'


                                                          And it gets clear that pandas was having problems parsing our rows. To parse a table with python engine I needed to remove all spaces and quotes from the table beforehand. Meanwhile C-engine kept crashing even with commas in rows.



                                                          To avoid creating a new file with replacements I did this, as my tables are small:



                                                          from io import StringIO
                                                          with open(path_counts) as f:
                                                          input = StringIO(f.read().replace('", ""', '').replace('"', '').replace(', ', ',').replace('',''))
                                                          counts = pd.read_table(input, sep='t', index_col=2, header=None, engine='python')


                                                          tl;dr


                                                          Change parsing engine, try to avoid any non-delimiting quotes/commas/spaces in your data.






                                                          share|improve this answer




























                                                            4












                                                            4








                                                            4







                                                            I've had a similar problem while trying to read a tab-delimited table with spaces, commas and quotes:



                                                            1115794 4218    "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", ""
                                                            1144102 3180 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", "g__Bacillus", ""
                                                            368444 2328 "k__Bacteria", "p__Bacteroidetes", "c__Bacteroidia", "o__Bacteroidales", "f__Bacteroidaceae", "g__Bacteroides", ""



                                                            import pandas as pd
                                                            # Same error for read_table
                                                            counts = pd.read_csv(path_counts, sep='t', index_col=2, header=None, engine = 'c')

                                                            pandas.io.common.CParserError: Error tokenizing data. C error: out of memory


                                                            This says it has something to do with C parsing engine (which is the default one). Maybe changing to a python one will change anything



                                                            counts = pd.read_table(path_counts, sep='t', index_col=2, header=None, engine='python')

                                                            Segmentation fault (core dumped)


                                                            Now that is a different error.

                                                            If we go ahead and try to remove spaces from the table, the error from python-engine changes once again:



                                                            1115794 4218    "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae",""
                                                            1144102 3180 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae","g__Bacillus",""
                                                            368444 2328 "k__Bacteria","p__Bacteroidetes","c__Bacteroidia","o__Bacteroidales","f__Bacteroidaceae","g__Bacteroides",""


                                                            _csv.Error: ' ' expected after '"'


                                                            And it gets clear that pandas was having problems parsing our rows. To parse a table with python engine I needed to remove all spaces and quotes from the table beforehand. Meanwhile C-engine kept crashing even with commas in rows.



                                                            To avoid creating a new file with replacements I did this, as my tables are small:



                                                            from io import StringIO
                                                            with open(path_counts) as f:
                                                            input = StringIO(f.read().replace('", ""', '').replace('"', '').replace(', ', ',').replace('',''))
                                                            counts = pd.read_table(input, sep='t', index_col=2, header=None, engine='python')


                                                            tl;dr


                                                            Change parsing engine, try to avoid any non-delimiting quotes/commas/spaces in your data.






                                                            share|improve this answer















                                                            I've had a similar problem while trying to read a tab-delimited table with spaces, commas and quotes:



                                                            1115794 4218    "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", ""
                                                            1144102 3180 "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", "g__Bacillus", ""
                                                            368444 2328 "k__Bacteria", "p__Bacteroidetes", "c__Bacteroidia", "o__Bacteroidales", "f__Bacteroidaceae", "g__Bacteroides", ""



                                                            import pandas as pd
                                                            # Same error for read_table
                                                            counts = pd.read_csv(path_counts, sep='t', index_col=2, header=None, engine = 'c')

                                                            pandas.io.common.CParserError: Error tokenizing data. C error: out of memory


                                                            This says it has something to do with C parsing engine (which is the default one). Maybe changing to a python one will change anything



                                                            counts = pd.read_table(path_counts, sep='t', index_col=2, header=None, engine='python')

                                                            Segmentation fault (core dumped)


                                                            Now that is a different error.

                                                            If we go ahead and try to remove spaces from the table, the error from python-engine changes once again:



                                                            1115794 4218    "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae",""
                                                            1144102 3180 "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae","g__Bacillus",""
                                                            368444 2328 "k__Bacteria","p__Bacteroidetes","c__Bacteroidia","o__Bacteroidales","f__Bacteroidaceae","g__Bacteroides",""


                                                            _csv.Error: ' ' expected after '"'


                                                            And it gets clear that pandas was having problems parsing our rows. To parse a table with python engine I needed to remove all spaces and quotes from the table beforehand. Meanwhile C-engine kept crashing even with commas in rows.



                                                            To avoid creating a new file with replacements I did this, as my tables are small:



                                                            from io import StringIO
                                                            with open(path_counts) as f:
                                                            input = StringIO(f.read().replace('", ""', '').replace('"', '').replace(', ', ',').replace('',''))
                                                            counts = pd.read_table(input, sep='t', index_col=2, header=None, engine='python')


                                                            tl;dr


                                                            Change parsing engine, try to avoid any non-delimiting quotes/commas/spaces in your data.







                                                            share|improve this answer














                                                            share|improve this answer



                                                            share|improve this answer








                                                            edited Apr 25 '17 at 15:00

























                                                            answered Apr 24 '17 at 11:28









                                                            lotrus28lotrus28

                                                            183212




                                                            183212























                                                                3














                                                                Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg compression resolved my problem.



                                                                result = pandas.read_csv(data_source, compression='gzip')





                                                                share|improve this answer




























                                                                  3














                                                                  Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg compression resolved my problem.



                                                                  result = pandas.read_csv(data_source, compression='gzip')





                                                                  share|improve this answer


























                                                                    3












                                                                    3








                                                                    3







                                                                    Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg compression resolved my problem.



                                                                    result = pandas.read_csv(data_source, compression='gzip')





                                                                    share|improve this answer













                                                                    Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg compression resolved my problem.



                                                                    result = pandas.read_csv(data_source, compression='gzip')






                                                                    share|improve this answer












                                                                    share|improve this answer



                                                                    share|improve this answer










                                                                    answered Oct 3 '16 at 15:45









                                                                    RegularlyScheduledProgrammingRegularlyScheduledProgramming

                                                                    9091624




                                                                    9091624























                                                                        3














                                                                        following sequence of commands works (I lose the first line of the data -no header=None present-, but at least it loads):



                                                                        df = pd.read_csv(filename,
                                                                        usecols=range(0, 42))
                                                                        df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
                                                                        'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
                                                                        'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
                                                                        'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
                                                                        'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
                                                                        'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']



                                                                        Following does NOT work:



                                                                        df = pd.read_csv(filename,
                                                                        names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
                                                                        'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
                                                                        'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
                                                                        'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
                                                                        'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
                                                                        'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'],
                                                                        usecols=range(0, 42))



                                                                        CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
                                                                        Following does NOT work:



                                                                        df = pd.read_csv(filename,
                                                                        header=None)



                                                                        CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54



                                                                        Hence, in your problem you have to pass usecols=range(0, 2)






                                                                        share|improve this answer




























                                                                          3














                                                                          following sequence of commands works (I lose the first line of the data -no header=None present-, but at least it loads):



                                                                          df = pd.read_csv(filename,
                                                                          usecols=range(0, 42))
                                                                          df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
                                                                          'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
                                                                          'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
                                                                          'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
                                                                          'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
                                                                          'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']



                                                                          Following does NOT work:



                                                                          df = pd.read_csv(filename,
                                                                          names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
                                                                          'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
                                                                          'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
                                                                          'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
                                                                          'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
                                                                          'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'],
                                                                          usecols=range(0, 42))



                                                                          CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
                                                                          Following does NOT work:



                                                                          df = pd.read_csv(filename,
                                                                          header=None)



                                                                          CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54



                                                                          Hence, in your problem you have to pass usecols=range(0, 2)






                                                                          share|improve this answer


























                                                                            3












                                                                            3








                                                                            3







                                                                            following sequence of commands works (I lose the first line of the data -no header=None present-, but at least it loads):



                                                                            df = pd.read_csv(filename,
                                                                            usecols=range(0, 42))
                                                                            df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
                                                                            'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
                                                                            'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
                                                                            'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
                                                                            'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
                                                                            'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']



                                                                            Following does NOT work:



                                                                            df = pd.read_csv(filename,
                                                                            names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
                                                                            'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
                                                                            'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
                                                                            'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
                                                                            'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
                                                                            'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'],
                                                                            usecols=range(0, 42))



                                                                            CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
                                                                            Following does NOT work:



                                                                            df = pd.read_csv(filename,
                                                                            header=None)



                                                                            CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54



                                                                            Hence, in your problem you have to pass usecols=range(0, 2)






                                                                            share|improve this answer













                                                                            following sequence of commands works (I lose the first line of the data -no header=None present-, but at least it loads):



                                                                            df = pd.read_csv(filename,
                                                                            usecols=range(0, 42))
                                                                            df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
                                                                            'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
                                                                            'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
                                                                            'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
                                                                            'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
                                                                            'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']



                                                                            Following does NOT work:



                                                                            df = pd.read_csv(filename,
                                                                            names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND',
                                                                            'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS',
                                                                            'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2',
                                                                            'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6',
                                                                            'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10',
                                                                            'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'],
                                                                            usecols=range(0, 42))



                                                                            CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54
                                                                            Following does NOT work:



                                                                            df = pd.read_csv(filename,
                                                                            header=None)



                                                                            CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54



                                                                            Hence, in your problem you have to pass usecols=range(0, 2)







                                                                            share|improve this answer












                                                                            share|improve this answer



                                                                            share|improve this answer










                                                                            answered May 23 '18 at 11:45









                                                                            kepy97kepy97

                                                                            14310




                                                                            14310























                                                                                2














                                                                                Sometimes the problem is not how to use python, but with the raw data.

                                                                                I got this error message



                                                                                Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.


                                                                                It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.






                                                                                share|improve this answer






























                                                                                  2














                                                                                  Sometimes the problem is not how to use python, but with the raw data.

                                                                                  I got this error message



                                                                                  Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.


                                                                                  It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.






                                                                                  share|improve this answer




























                                                                                    2












                                                                                    2








                                                                                    2







                                                                                    Sometimes the problem is not how to use python, but with the raw data.

                                                                                    I got this error message



                                                                                    Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.


                                                                                    It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.






                                                                                    share|improve this answer















                                                                                    Sometimes the problem is not how to use python, but with the raw data.

                                                                                    I got this error message



                                                                                    Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.


                                                                                    It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.







                                                                                    share|improve this answer














                                                                                    share|improve this answer



                                                                                    share|improve this answer








                                                                                    edited Nov 15 '17 at 12:13









                                                                                    Aks4125

                                                                                    2,73111131




                                                                                    2,73111131










                                                                                    answered Nov 15 '17 at 10:59









                                                                                    Kims SifersKims Sifers

                                                                                    211




                                                                                    211























                                                                                        2














                                                                                        use
                                                                                        pandas.read_csv('CSVFILENAME',header=None,sep=', ')



                                                                                        when trying to read csv data from the link



                                                                                        http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data



                                                                                        I copied the data from the site into my csvfile. It had extra spaces so used sep =', ' and it worked :)






                                                                                        share|improve this answer




























                                                                                          2














                                                                                          use
                                                                                          pandas.read_csv('CSVFILENAME',header=None,sep=', ')



                                                                                          when trying to read csv data from the link



                                                                                          http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data



                                                                                          I copied the data from the site into my csvfile. It had extra spaces so used sep =', ' and it worked :)






                                                                                          share|improve this answer


























                                                                                            2












                                                                                            2








                                                                                            2







                                                                                            use
                                                                                            pandas.read_csv('CSVFILENAME',header=None,sep=', ')



                                                                                            when trying to read csv data from the link



                                                                                            http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data



                                                                                            I copied the data from the site into my csvfile. It had extra spaces so used sep =', ' and it worked :)






                                                                                            share|improve this answer













                                                                                            use
                                                                                            pandas.read_csv('CSVFILENAME',header=None,sep=', ')



                                                                                            when trying to read csv data from the link



                                                                                            http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data



                                                                                            I copied the data from the site into my csvfile. It had extra spaces so used sep =', ' and it worked :)







                                                                                            share|improve this answer












                                                                                            share|improve this answer



                                                                                            share|improve this answer










                                                                                            answered Jan 2 '18 at 9:56









                                                                                            AbhishekAbhishek

                                                                                            673




                                                                                            673























                                                                                                2














                                                                                                An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:



                                                                                                import csv
                                                                                                import pandas as pd
                                                                                                path = 'C:/FileLocation/'
                                                                                                file = 'filename.csv'
                                                                                                f = open(path+file,'rt')
                                                                                                reader = csv.reader(f)

                                                                                                #once contents are available, I then put them in a list
                                                                                                csv_list =
                                                                                                for l in reader:
                                                                                                csv_list.append(l)
                                                                                                f.close()
                                                                                                #now pandas has no problem getting into a df
                                                                                                df = pd.DataFrame(csv_list)


                                                                                                I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.






                                                                                                share|improve this answer




























                                                                                                  2














                                                                                                  An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:



                                                                                                  import csv
                                                                                                  import pandas as pd
                                                                                                  path = 'C:/FileLocation/'
                                                                                                  file = 'filename.csv'
                                                                                                  f = open(path+file,'rt')
                                                                                                  reader = csv.reader(f)

                                                                                                  #once contents are available, I then put them in a list
                                                                                                  csv_list =
                                                                                                  for l in reader:
                                                                                                  csv_list.append(l)
                                                                                                  f.close()
                                                                                                  #now pandas has no problem getting into a df
                                                                                                  df = pd.DataFrame(csv_list)


                                                                                                  I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.






                                                                                                  share|improve this answer


























                                                                                                    2












                                                                                                    2








                                                                                                    2







                                                                                                    An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:



                                                                                                    import csv
                                                                                                    import pandas as pd
                                                                                                    path = 'C:/FileLocation/'
                                                                                                    file = 'filename.csv'
                                                                                                    f = open(path+file,'rt')
                                                                                                    reader = csv.reader(f)

                                                                                                    #once contents are available, I then put them in a list
                                                                                                    csv_list =
                                                                                                    for l in reader:
                                                                                                    csv_list.append(l)
                                                                                                    f.close()
                                                                                                    #now pandas has no problem getting into a df
                                                                                                    df = pd.DataFrame(csv_list)


                                                                                                    I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.






                                                                                                    share|improve this answer













                                                                                                    An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:



                                                                                                    import csv
                                                                                                    import pandas as pd
                                                                                                    path = 'C:/FileLocation/'
                                                                                                    file = 'filename.csv'
                                                                                                    f = open(path+file,'rt')
                                                                                                    reader = csv.reader(f)

                                                                                                    #once contents are available, I then put them in a list
                                                                                                    csv_list =
                                                                                                    for l in reader:
                                                                                                    csv_list.append(l)
                                                                                                    f.close()
                                                                                                    #now pandas has no problem getting into a df
                                                                                                    df = pd.DataFrame(csv_list)


                                                                                                    I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.







                                                                                                    share|improve this answer












                                                                                                    share|improve this answer



                                                                                                    share|improve this answer










                                                                                                    answered Jan 26 '18 at 20:54









                                                                                                    bcozbcoz

                                                                                                    384




                                                                                                    384























                                                                                                        1














                                                                                                        I had a dataset with prexisting row numbers, I used index_col:



                                                                                                        pd.read_csv('train.csv', index_col=0)





                                                                                                        share|improve this answer




























                                                                                                          1














                                                                                                          I had a dataset with prexisting row numbers, I used index_col:



                                                                                                          pd.read_csv('train.csv', index_col=0)





                                                                                                          share|improve this answer


























                                                                                                            1












                                                                                                            1








                                                                                                            1







                                                                                                            I had a dataset with prexisting row numbers, I used index_col:



                                                                                                            pd.read_csv('train.csv', index_col=0)





                                                                                                            share|improve this answer













                                                                                                            I had a dataset with prexisting row numbers, I used index_col:



                                                                                                            pd.read_csv('train.csv', index_col=0)






                                                                                                            share|improve this answer












                                                                                                            share|improve this answer



                                                                                                            share|improve this answer










                                                                                                            answered Jun 20 '17 at 5:28









                                                                                                            spicyramenspicyramen

                                                                                                            3,26123469




                                                                                                            3,26123469























                                                                                                                1














                                                                                                                This is what I did.



                                                                                                                sep='::' solved my issue:



                                                                                                                data=pd.read_csv('C:\Users\HP\Downloads\NPL ASSINGMENT 2 imdb_labelled\imdb_labelled.txt',engine='python',header=None,sep='::')





                                                                                                                share|improve this answer






























                                                                                                                  1














                                                                                                                  This is what I did.



                                                                                                                  sep='::' solved my issue:



                                                                                                                  data=pd.read_csv('C:\Users\HP\Downloads\NPL ASSINGMENT 2 imdb_labelled\imdb_labelled.txt',engine='python',header=None,sep='::')





                                                                                                                  share|improve this answer




























                                                                                                                    1












                                                                                                                    1








                                                                                                                    1







                                                                                                                    This is what I did.



                                                                                                                    sep='::' solved my issue:



                                                                                                                    data=pd.read_csv('C:\Users\HP\Downloads\NPL ASSINGMENT 2 imdb_labelled\imdb_labelled.txt',engine='python',header=None,sep='::')





                                                                                                                    share|improve this answer















                                                                                                                    This is what I did.



                                                                                                                    sep='::' solved my issue:



                                                                                                                    data=pd.read_csv('C:\Users\HP\Downloads\NPL ASSINGMENT 2 imdb_labelled\imdb_labelled.txt',engine='python',header=None,sep='::')






                                                                                                                    share|improve this answer














                                                                                                                    share|improve this answer



                                                                                                                    share|improve this answer








                                                                                                                    edited Oct 21 '18 at 15:54









                                                                                                                    ssuperczynski

                                                                                                                    1,53112446




                                                                                                                    1,53112446










                                                                                                                    answered Oct 21 '18 at 13:04









                                                                                                                    Saurabh TripathiSaurabh Tripathi

                                                                                                                    112




                                                                                                                    112























                                                                                                                        1














                                                                                                                        I had a similar case as this and setting



                                                                                                                        train = pd.read_csv('input.csv' , encoding='latin1',engine='python') 


                                                                                                                        worked






                                                                                                                        share|improve this answer




























                                                                                                                          1














                                                                                                                          I had a similar case as this and setting



                                                                                                                          train = pd.read_csv('input.csv' , encoding='latin1',engine='python') 


                                                                                                                          worked






                                                                                                                          share|improve this answer


























                                                                                                                            1












                                                                                                                            1








                                                                                                                            1







                                                                                                                            I had a similar case as this and setting



                                                                                                                            train = pd.read_csv('input.csv' , encoding='latin1',engine='python') 


                                                                                                                            worked






                                                                                                                            share|improve this answer













                                                                                                                            I had a similar case as this and setting



                                                                                                                            train = pd.read_csv('input.csv' , encoding='latin1',engine='python') 


                                                                                                                            worked







                                                                                                                            share|improve this answer












                                                                                                                            share|improve this answer



                                                                                                                            share|improve this answer










                                                                                                                            answered Nov 20 '18 at 2:08









                                                                                                                            Adewole AdesolaAdewole Adesola

                                                                                                                            6112




                                                                                                                            6112























                                                                                                                                1














                                                                                                                                Use delimiter in parameter



                                                                                                                                pd.read_csv(filename, delimiter=",", encoding='utf-8')


                                                                                                                                It will read.






                                                                                                                                share|improve this answer




























                                                                                                                                  1














                                                                                                                                  Use delimiter in parameter



                                                                                                                                  pd.read_csv(filename, delimiter=",", encoding='utf-8')


                                                                                                                                  It will read.






                                                                                                                                  share|improve this answer


























                                                                                                                                    1












                                                                                                                                    1








                                                                                                                                    1







                                                                                                                                    Use delimiter in parameter



                                                                                                                                    pd.read_csv(filename, delimiter=",", encoding='utf-8')


                                                                                                                                    It will read.






                                                                                                                                    share|improve this answer













                                                                                                                                    Use delimiter in parameter



                                                                                                                                    pd.read_csv(filename, delimiter=",", encoding='utf-8')


                                                                                                                                    It will read.







                                                                                                                                    share|improve this answer












                                                                                                                                    share|improve this answer



                                                                                                                                    share|improve this answer










                                                                                                                                    answered Nov 21 '18 at 13:03









                                                                                                                                    Bhavesh KumarBhavesh Kumar

                                                                                                                                    569




                                                                                                                                    569























                                                                                                                                        1














                                                                                                                                        I have the same problem when read_csv: ParserError: Error tokenizing data.
                                                                                                                                        I just saved the old csv file to a new csv file. The problem is solved!






                                                                                                                                        share|improve this answer




























                                                                                                                                          1














                                                                                                                                          I have the same problem when read_csv: ParserError: Error tokenizing data.
                                                                                                                                          I just saved the old csv file to a new csv file. The problem is solved!






                                                                                                                                          share|improve this answer


























                                                                                                                                            1












                                                                                                                                            1








                                                                                                                                            1







                                                                                                                                            I have the same problem when read_csv: ParserError: Error tokenizing data.
                                                                                                                                            I just saved the old csv file to a new csv file. The problem is solved!






                                                                                                                                            share|improve this answer













                                                                                                                                            I have the same problem when read_csv: ParserError: Error tokenizing data.
                                                                                                                                            I just saved the old csv file to a new csv file. The problem is solved!







                                                                                                                                            share|improve this answer












                                                                                                                                            share|improve this answer



                                                                                                                                            share|improve this answer










                                                                                                                                            answered Nov 26 '18 at 13:32









                                                                                                                                            Simin ZuoSimin Zuo

                                                                                                                                            112




                                                                                                                                            112























                                                                                                                                                1














                                                                                                                                                I had this problem, where I was trying to read in a CSV without passing in column names.



                                                                                                                                                df = pd.read_csv(filename, header=None)


                                                                                                                                                I specified the column names in a list beforehand and then pass them into names, and it solved it immediately. If you don't have set column names, you could just create as many placeholder names as the maximum number of columns that might be in your data.



                                                                                                                                                col_names = ["col1", "col2", "col3", ...]
                                                                                                                                                df = pd.read_csv(filename, names=col_names)





                                                                                                                                                share|improve this answer




























                                                                                                                                                  1














                                                                                                                                                  I had this problem, where I was trying to read in a CSV without passing in column names.



                                                                                                                                                  df = pd.read_csv(filename, header=None)


                                                                                                                                                  I specified the column names in a list beforehand and then pass them into names, and it solved it immediately. If you don't have set column names, you could just create as many placeholder names as the maximum number of columns that might be in your data.



                                                                                                                                                  col_names = ["col1", "col2", "col3", ...]
                                                                                                                                                  df = pd.read_csv(filename, names=col_names)





                                                                                                                                                  share|improve this answer


























                                                                                                                                                    1












                                                                                                                                                    1








                                                                                                                                                    1







                                                                                                                                                    I had this problem, where I was trying to read in a CSV without passing in column names.



                                                                                                                                                    df = pd.read_csv(filename, header=None)


                                                                                                                                                    I specified the column names in a list beforehand and then pass them into names, and it solved it immediately. If you don't have set column names, you could just create as many placeholder names as the maximum number of columns that might be in your data.



                                                                                                                                                    col_names = ["col1", "col2", "col3", ...]
                                                                                                                                                    df = pd.read_csv(filename, names=col_names)





                                                                                                                                                    share|improve this answer













                                                                                                                                                    I had this problem, where I was trying to read in a CSV without passing in column names.



                                                                                                                                                    df = pd.read_csv(filename, header=None)


                                                                                                                                                    I specified the column names in a list beforehand and then pass them into names, and it solved it immediately. If you don't have set column names, you could just create as many placeholder names as the maximum number of columns that might be in your data.



                                                                                                                                                    col_names = ["col1", "col2", "col3", ...]
                                                                                                                                                    df = pd.read_csv(filename, names=col_names)






                                                                                                                                                    share|improve this answer












                                                                                                                                                    share|improve this answer



                                                                                                                                                    share|improve this answer










                                                                                                                                                    answered Jan 8 at 18:57









                                                                                                                                                    Steven RoukSteven Rouk

                                                                                                                                                    636




                                                                                                                                                    636























                                                                                                                                                        0














                                                                                                                                                        I had a similar error and the issue was that I had some escaped quotes in my csv file and needed to set the escapechar parameter appropriately.






                                                                                                                                                        share|improve this answer




























                                                                                                                                                          0














                                                                                                                                                          I had a similar error and the issue was that I had some escaped quotes in my csv file and needed to set the escapechar parameter appropriately.






                                                                                                                                                          share|improve this answer


























                                                                                                                                                            0












                                                                                                                                                            0








                                                                                                                                                            0







                                                                                                                                                            I had a similar error and the issue was that I had some escaped quotes in my csv file and needed to set the escapechar parameter appropriately.






                                                                                                                                                            share|improve this answer













                                                                                                                                                            I had a similar error and the issue was that I had some escaped quotes in my csv file and needed to set the escapechar parameter appropriately.







                                                                                                                                                            share|improve this answer












                                                                                                                                                            share|improve this answer



                                                                                                                                                            share|improve this answer










                                                                                                                                                            answered Dec 12 '17 at 11:43









                                                                                                                                                            jvvwjvvw

                                                                                                                                                            50656




                                                                                                                                                            50656























                                                                                                                                                                0














                                                                                                                                                                You can do this step to avoid the problem -



                                                                                                                                                                train = pd.read_csv('/home/Project/output.csv' , header=None)


                                                                                                                                                                just add - header=None



                                                                                                                                                                Hope this helps!!






                                                                                                                                                                share|improve this answer






























                                                                                                                                                                  0














                                                                                                                                                                  You can do this step to avoid the problem -



                                                                                                                                                                  train = pd.read_csv('/home/Project/output.csv' , header=None)


                                                                                                                                                                  just add - header=None



                                                                                                                                                                  Hope this helps!!






                                                                                                                                                                  share|improve this answer




























                                                                                                                                                                    0












                                                                                                                                                                    0








                                                                                                                                                                    0







                                                                                                                                                                    You can do this step to avoid the problem -



                                                                                                                                                                    train = pd.read_csv('/home/Project/output.csv' , header=None)


                                                                                                                                                                    just add - header=None



                                                                                                                                                                    Hope this helps!!






                                                                                                                                                                    share|improve this answer















                                                                                                                                                                    You can do this step to avoid the problem -



                                                                                                                                                                    train = pd.read_csv('/home/Project/output.csv' , header=None)


                                                                                                                                                                    just add - header=None



                                                                                                                                                                    Hope this helps!!







                                                                                                                                                                    share|improve this answer














                                                                                                                                                                    share|improve this answer



                                                                                                                                                                    share|improve this answer








                                                                                                                                                                    edited Aug 19 '18 at 7:27









                                                                                                                                                                    LuFFy

                                                                                                                                                                    3,547102751




                                                                                                                                                                    3,547102751










                                                                                                                                                                    answered Aug 19 '18 at 6:59









                                                                                                                                                                    rahul ranjanrahul ranjan

                                                                                                                                                                    34




                                                                                                                                                                    34























                                                                                                                                                                        0














                                                                                                                                                                        Issue could be with file Issues, In my case, Issue was solved after renaming the file. yet to figure out the reason..






                                                                                                                                                                        share|improve this answer




























                                                                                                                                                                          0














                                                                                                                                                                          Issue could be with file Issues, In my case, Issue was solved after renaming the file. yet to figure out the reason..






                                                                                                                                                                          share|improve this answer


























                                                                                                                                                                            0












                                                                                                                                                                            0








                                                                                                                                                                            0







                                                                                                                                                                            Issue could be with file Issues, In my case, Issue was solved after renaming the file. yet to figure out the reason..






                                                                                                                                                                            share|improve this answer













                                                                                                                                                                            Issue could be with file Issues, In my case, Issue was solved after renaming the file. yet to figure out the reason..







                                                                                                                                                                            share|improve this answer












                                                                                                                                                                            share|improve this answer



                                                                                                                                                                            share|improve this answer










                                                                                                                                                                            answered Oct 28 '18 at 12:46









                                                                                                                                                                            SQA_LEARNSQA_LEARN

                                                                                                                                                                            317




                                                                                                                                                                            317























                                                                                                                                                                                -1














                                                                                                                                                                                I had received a .csv from a coworker and when I tried to read the csv using pd.read_csv(), I received a similar error. It was apparently attempting to use the first row to generate the columns for the dataframe, but there were many rows which contained more columns than the first row would imply. I ended up fixing this problem by simply opening and re-saving the file as .csv and using pd.read_csv() again.






                                                                                                                                                                                share|improve this answer




























                                                                                                                                                                                  -1














                                                                                                                                                                                  I had received a .csv from a coworker and when I tried to read the csv using pd.read_csv(), I received a similar error. It was apparently attempting to use the first row to generate the columns for the dataframe, but there were many rows which contained more columns than the first row would imply. I ended up fixing this problem by simply opening and re-saving the file as .csv and using pd.read_csv() again.






                                                                                                                                                                                  share|improve this answer


























                                                                                                                                                                                    -1












                                                                                                                                                                                    -1








                                                                                                                                                                                    -1







                                                                                                                                                                                    I had received a .csv from a coworker and when I tried to read the csv using pd.read_csv(), I received a similar error. It was apparently attempting to use the first row to generate the columns for the dataframe, but there were many rows which contained more columns than the first row would imply. I ended up fixing this problem by simply opening and re-saving the file as .csv and using pd.read_csv() again.






                                                                                                                                                                                    share|improve this answer













                                                                                                                                                                                    I had received a .csv from a coworker and when I tried to read the csv using pd.read_csv(), I received a similar error. It was apparently attempting to use the first row to generate the columns for the dataframe, but there were many rows which contained more columns than the first row would imply. I ended up fixing this problem by simply opening and re-saving the file as .csv and using pd.read_csv() again.







                                                                                                                                                                                    share|improve this answer












                                                                                                                                                                                    share|improve this answer



                                                                                                                                                                                    share|improve this answer










                                                                                                                                                                                    answered Jul 13 '18 at 17:31









                                                                                                                                                                                    Victor BurnettVictor Burnett

                                                                                                                                                                                    345




                                                                                                                                                                                    345























                                                                                                                                                                                        -2














                                                                                                                                                                                        try: pandas.read_csv(path, sep = ',' ,header=None)






                                                                                                                                                                                        share|improve this answer




























                                                                                                                                                                                          -2














                                                                                                                                                                                          try: pandas.read_csv(path, sep = ',' ,header=None)






                                                                                                                                                                                          share|improve this answer


























                                                                                                                                                                                            -2












                                                                                                                                                                                            -2








                                                                                                                                                                                            -2







                                                                                                                                                                                            try: pandas.read_csv(path, sep = ',' ,header=None)






                                                                                                                                                                                            share|improve this answer













                                                                                                                                                                                            try: pandas.read_csv(path, sep = ',' ,header=None)







                                                                                                                                                                                            share|improve this answer












                                                                                                                                                                                            share|improve this answer



                                                                                                                                                                                            share|improve this answer










                                                                                                                                                                                            answered Oct 10 '17 at 8:40









                                                                                                                                                                                            THE2ndMOUSETHE2ndMOUSE

                                                                                                                                                                                            247




                                                                                                                                                                                            247

















                                                                                                                                                                                                protected by Community Jan 8 at 13:26



                                                                                                                                                                                                Thank you for your interest in this question.
                                                                                                                                                                                                Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                                                                                                                                                                                                Would you like to answer one of these unanswered questions instead?



                                                                                                                                                                                                Popular posts from this blog

                                                                                                                                                                                                A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                                                                                                                                                                                                Calculate evaluation metrics using cross_val_predict sklearn

                                                                                                                                                                                                Insert data from modal to MySQL (multiple modal on website)