Count string occurrences in pandas raw data row











up vote
7
down vote

favorite
1












I have a csv file as follows:



name,age
something
tom,20


And when I put it into a dataframe it looks like:



df = pd.read_csv('file', header=None)

0 1
1 name age
2 something NaN
3 tom 20


How would I get the count of a comma in the raw row data. For example, the answer should look like:



# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1









share|improve this question
























  • do you also want to count the commas if they're in the column value?
    – Omkar Sabade
    1 hour ago










  • @OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
    – David L
    1 hour ago

















up vote
7
down vote

favorite
1












I have a csv file as follows:



name,age
something
tom,20


And when I put it into a dataframe it looks like:



df = pd.read_csv('file', header=None)

0 1
1 name age
2 something NaN
3 tom 20


How would I get the count of a comma in the raw row data. For example, the answer should look like:



# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1









share|improve this question
























  • do you also want to count the commas if they're in the column value?
    – Omkar Sabade
    1 hour ago










  • @OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
    – David L
    1 hour ago















up vote
7
down vote

favorite
1









up vote
7
down vote

favorite
1






1





I have a csv file as follows:



name,age
something
tom,20


And when I put it into a dataframe it looks like:



df = pd.read_csv('file', header=None)

0 1
1 name age
2 something NaN
3 tom 20


How would I get the count of a comma in the raw row data. For example, the answer should look like:



# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1









share|improve this question















I have a csv file as follows:



name,age
something
tom,20


And when I put it into a dataframe it looks like:



df = pd.read_csv('file', header=None)

0 1
1 name age
2 something NaN
3 tom 20


How would I get the count of a comma in the raw row data. For example, the answer should look like:



# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1






python python-3.x pandas csv dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 1 hour ago









coldspeed

116k18107185




116k18107185










asked 1 hour ago









Henry H

1767




1767












  • do you also want to count the commas if they're in the column value?
    – Omkar Sabade
    1 hour ago










  • @OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
    – David L
    1 hour ago




















  • do you also want to count the commas if they're in the column value?
    – Omkar Sabade
    1 hour ago










  • @OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
    – David L
    1 hour ago


















do you also want to count the commas if they're in the column value?
– Omkar Sabade
1 hour ago




do you also want to count the commas if they're in the column value?
– Omkar Sabade
1 hour ago












@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
1 hour ago






@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
1 hour ago














4 Answers
4






active

oldest

votes

















up vote
3
down vote













Doing this



df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again

df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
0 1
1 0
2 1
3 5
Name: 0, dtype: int64

df['_count_separators']=df2['0'].str.findall(',').str.len()




Data



name,age
something
tom,20
something,,,,,somethingelse





share|improve this answer




























    up vote
    3
    down vote













    Very simply, read your data as a single column series, then split on comma and concatenate with separator count.



    # s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
    s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
    df = pd.concat([
    s.str.split(',', expand=True),
    s.str.count(',').rename('_count_sep')
    ], axis=1)

    df
    0 1 _count_sep
    0 name age 1
    1 something None 0
    2 tom 20 1





    share|improve this answer





















    • We are on the same road:-) cheers
      – W-B
      1 hour ago












    • @W-B yup did not see until I posted... great minds.. huh? ;)
      – coldspeed
      1 hour ago






    • 1




      I read your mind hahahaha:-)
      – W-B
      1 hour ago










    • But learn new strcount:-) thanks man
      – W-B
      1 hour ago






    • 1




      Your answers stopped me from thinking otherwise
      – Dark
      1 hour ago


















    up vote
    0
    down vote













    Try below code



    df = pd.read_csv('file', header=None)
    df['_count_separators'] = df.count(axis='columns')
    print(df)
    output:
    0 1 _count_separators
    1 name age 1
    2 something NaN 0
    3 tom 20 1





    share|improve this answer




























      up vote
      0
      down vote













      One line of code: len(df) - df[1].isna().sum()






      share|improve this answer





















      • Ohk if the nan itself is a part of the dataset then? like something,,,something?
        – Dark
        1 hour ago












      • i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
        – Quang Hoang
        1 hour ago










      • This assumes there are only two columns...?
        – coldspeed
        1 hour ago













      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53862765%2fcount-string-occurrences-in-pandas-raw-data-row%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      3
      down vote













      Doing this



      df = pd.read_csv('file', header=None)
      df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again

      df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
      0 1
      1 0
      2 1
      3 5
      Name: 0, dtype: int64

      df['_count_separators']=df2['0'].str.findall(',').str.len()




      Data



      name,age
      something
      tom,20
      something,,,,,somethingelse





      share|improve this answer

























        up vote
        3
        down vote













        Doing this



        df = pd.read_csv('file', header=None)
        df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again

        df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
        0 1
        1 0
        2 1
        3 5
        Name: 0, dtype: int64

        df['_count_separators']=df2['0'].str.findall(',').str.len()




        Data



        name,age
        something
        tom,20
        something,,,,,somethingelse





        share|improve this answer























          up vote
          3
          down vote










          up vote
          3
          down vote









          Doing this



          df = pd.read_csv('file', header=None)
          df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again

          df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
          0 1
          1 0
          2 1
          3 5
          Name: 0, dtype: int64

          df['_count_separators']=df2['0'].str.findall(',').str.len()




          Data



          name,age
          something
          tom,20
          something,,,,,somethingelse





          share|improve this answer












          Doing this



          df = pd.read_csv('file', header=None)
          df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again

          df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
          0 1
          1 0
          2 1
          3 5
          Name: 0, dtype: int64

          df['_count_separators']=df2['0'].str.findall(',').str.len()




          Data



          name,age
          something
          tom,20
          something,,,,,somethingelse






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 1 hour ago









          W-B

          99.1k73162




          99.1k73162
























              up vote
              3
              down vote













              Very simply, read your data as a single column series, then split on comma and concatenate with separator count.



              # s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
              s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
              df = pd.concat([
              s.str.split(',', expand=True),
              s.str.count(',').rename('_count_sep')
              ], axis=1)

              df
              0 1 _count_sep
              0 name age 1
              1 something None 0
              2 tom 20 1





              share|improve this answer





















              • We are on the same road:-) cheers
                – W-B
                1 hour ago












              • @W-B yup did not see until I posted... great minds.. huh? ;)
                – coldspeed
                1 hour ago






              • 1




                I read your mind hahahaha:-)
                – W-B
                1 hour ago










              • But learn new strcount:-) thanks man
                – W-B
                1 hour ago






              • 1




                Your answers stopped me from thinking otherwise
                – Dark
                1 hour ago















              up vote
              3
              down vote













              Very simply, read your data as a single column series, then split on comma and concatenate with separator count.



              # s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
              s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
              df = pd.concat([
              s.str.split(',', expand=True),
              s.str.count(',').rename('_count_sep')
              ], axis=1)

              df
              0 1 _count_sep
              0 name age 1
              1 something None 0
              2 tom 20 1





              share|improve this answer





















              • We are on the same road:-) cheers
                – W-B
                1 hour ago












              • @W-B yup did not see until I posted... great minds.. huh? ;)
                – coldspeed
                1 hour ago






              • 1




                I read your mind hahahaha:-)
                – W-B
                1 hour ago










              • But learn new strcount:-) thanks man
                – W-B
                1 hour ago






              • 1




                Your answers stopped me from thinking otherwise
                – Dark
                1 hour ago













              up vote
              3
              down vote










              up vote
              3
              down vote









              Very simply, read your data as a single column series, then split on comma and concatenate with separator count.



              # s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
              s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
              df = pd.concat([
              s.str.split(',', expand=True),
              s.str.count(',').rename('_count_sep')
              ], axis=1)

              df
              0 1 _count_sep
              0 name age 1
              1 something None 0
              2 tom 20 1





              share|improve this answer












              Very simply, read your data as a single column series, then split on comma and concatenate with separator count.



              # s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
              s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
              df = pd.concat([
              s.str.split(',', expand=True),
              s.str.count(',').rename('_count_sep')
              ], axis=1)

              df
              0 1 _count_sep
              0 name age 1
              1 something None 0
              2 tom 20 1






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered 1 hour ago









              coldspeed

              116k18107185




              116k18107185












              • We are on the same road:-) cheers
                – W-B
                1 hour ago












              • @W-B yup did not see until I posted... great minds.. huh? ;)
                – coldspeed
                1 hour ago






              • 1




                I read your mind hahahaha:-)
                – W-B
                1 hour ago










              • But learn new strcount:-) thanks man
                – W-B
                1 hour ago






              • 1




                Your answers stopped me from thinking otherwise
                – Dark
                1 hour ago


















              • We are on the same road:-) cheers
                – W-B
                1 hour ago












              • @W-B yup did not see until I posted... great minds.. huh? ;)
                – coldspeed
                1 hour ago






              • 1




                I read your mind hahahaha:-)
                – W-B
                1 hour ago










              • But learn new strcount:-) thanks man
                – W-B
                1 hour ago






              • 1




                Your answers stopped me from thinking otherwise
                – Dark
                1 hour ago
















              We are on the same road:-) cheers
              – W-B
              1 hour ago






              We are on the same road:-) cheers
              – W-B
              1 hour ago














              @W-B yup did not see until I posted... great minds.. huh? ;)
              – coldspeed
              1 hour ago




              @W-B yup did not see until I posted... great minds.. huh? ;)
              – coldspeed
              1 hour ago




              1




              1




              I read your mind hahahaha:-)
              – W-B
              1 hour ago




              I read your mind hahahaha:-)
              – W-B
              1 hour ago












              But learn new strcount:-) thanks man
              – W-B
              1 hour ago




              But learn new strcount:-) thanks man
              – W-B
              1 hour ago




              1




              1




              Your answers stopped me from thinking otherwise
              – Dark
              1 hour ago




              Your answers stopped me from thinking otherwise
              – Dark
              1 hour ago










              up vote
              0
              down vote













              Try below code



              df = pd.read_csv('file', header=None)
              df['_count_separators'] = df.count(axis='columns')
              print(df)
              output:
              0 1 _count_separators
              1 name age 1
              2 something NaN 0
              3 tom 20 1





              share|improve this answer

























                up vote
                0
                down vote













                Try below code



                df = pd.read_csv('file', header=None)
                df['_count_separators'] = df.count(axis='columns')
                print(df)
                output:
                0 1 _count_separators
                1 name age 1
                2 something NaN 0
                3 tom 20 1





                share|improve this answer























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Try below code



                  df = pd.read_csv('file', header=None)
                  df['_count_separators'] = df.count(axis='columns')
                  print(df)
                  output:
                  0 1 _count_separators
                  1 name age 1
                  2 something NaN 0
                  3 tom 20 1





                  share|improve this answer












                  Try below code



                  df = pd.read_csv('file', header=None)
                  df['_count_separators'] = df.count(axis='columns')
                  print(df)
                  output:
                  0 1 _count_separators
                  1 name age 1
                  2 something NaN 0
                  3 tom 20 1






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 1 hour ago









                  Anjaneyulu Batta

                  3,23511333




                  3,23511333






















                      up vote
                      0
                      down vote













                      One line of code: len(df) - df[1].isna().sum()






                      share|improve this answer





















                      • Ohk if the nan itself is a part of the dataset then? like something,,,something?
                        – Dark
                        1 hour ago












                      • i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
                        – Quang Hoang
                        1 hour ago










                      • This assumes there are only two columns...?
                        – coldspeed
                        1 hour ago

















                      up vote
                      0
                      down vote













                      One line of code: len(df) - df[1].isna().sum()






                      share|improve this answer





















                      • Ohk if the nan itself is a part of the dataset then? like something,,,something?
                        – Dark
                        1 hour ago












                      • i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
                        – Quang Hoang
                        1 hour ago










                      • This assumes there are only two columns...?
                        – coldspeed
                        1 hour ago















                      up vote
                      0
                      down vote










                      up vote
                      0
                      down vote









                      One line of code: len(df) - df[1].isna().sum()






                      share|improve this answer












                      One line of code: len(df) - df[1].isna().sum()







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered 1 hour ago









                      Quang Hoang

                      1,6421913




                      1,6421913












                      • Ohk if the nan itself is a part of the dataset then? like something,,,something?
                        – Dark
                        1 hour ago












                      • i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
                        – Quang Hoang
                        1 hour ago










                      • This assumes there are only two columns...?
                        – coldspeed
                        1 hour ago




















                      • Ohk if the nan itself is a part of the dataset then? like something,,,something?
                        – Dark
                        1 hour ago












                      • i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
                        – Quang Hoang
                        1 hour ago










                      • This assumes there are only two columns...?
                        – coldspeed
                        1 hour ago


















                      Ohk if the nan itself is a part of the dataset then? like something,,,something?
                      – Dark
                      1 hour ago






                      Ohk if the nan itself is a part of the dataset then? like something,,,something?
                      – Dark
                      1 hour ago














                      i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
                      – Quang Hoang
                      1 hour ago




                      i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
                      – Quang Hoang
                      1 hour ago












                      This assumes there are only two columns...?
                      – coldspeed
                      1 hour ago






                      This assumes there are only two columns...?
                      – coldspeed
                      1 hour ago




















                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53862765%2fcount-string-occurrences-in-pandas-raw-data-row%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                      Calculate evaluation metrics using cross_val_predict sklearn

                      Insert data from modal to MySQL (multiple modal on website)