Making a bar chart to represent the number of occurrences in a Pandas Series












1















I was wondering if anyone could help me with how to make a bar chart to show the frequencies of values in a Pandas Series.



I start with a Pandas DataFrame of shape (2000, 7), and from there I extract the last column. The column is shape (2000,).



The entries in the Series that I mentioned vary from 0 to 17, each with different frequencies, and I tried to plot them using a bar chart but faced some difficulties. Here is my code:



# First, I counted the number of occurrences.

count = np.zeros(max(data_val))

for i in range(count.shape[0]):
for j in range(data_val.shape[0]):
if (i == data_val[j]):
count[i] = count[i] + 1

'''
This gives us
count = array([192., 105., ... 19.])
'''

temp = np.arange(0, 18, 1) # Array for the x-axis.

plt.bar(temp, count)


I am getting an error on the last line of code, saying that the objects cannot be broadcast to a single shape.



What I ultimately want is a bar chart where each bar corresponds to an integer value from 0 to 17, and the height of each bar (i.e. the y-axis) represents the frequencies.



Thank you.






UPDATE



I decided to post the fixed code using the suggestions that people were kind enough to give below, just in case anybody facing similar issues will be able to see my revised code in the future.



data = pd.read_csv("./data/train.csv") # Original data is a (2000, 7) DataFrame
# data contains 6 feature columns and 1 target column.

# Separate the design matrix from the target labels.
X = data.iloc[:, :-1]
y = data['target']


'''
The next line of code uses pandas.Series.value_counts() on y in order to count
the number of occurrences for each label, and then proceeds to sort these according to
index (i.e. label).

You can also use pandas.DataFrame.sort_values() instead if you're interested in sorting
according to the number of frequencies rather than labels.
'''
y.value_counts().sort_index().plot.bar(x='Target Value', y='Number of Occurrences')


enter image description here



There was no need to use for loops if we use the methods that are built into the Pandas library.



The specific methods that were mentioned in the answers are pandas.Series.values_count(), pandas.DataFrame.sort_index(), and pandas.DataFrame.plot.bar().










share|improve this question





























    1















    I was wondering if anyone could help me with how to make a bar chart to show the frequencies of values in a Pandas Series.



    I start with a Pandas DataFrame of shape (2000, 7), and from there I extract the last column. The column is shape (2000,).



    The entries in the Series that I mentioned vary from 0 to 17, each with different frequencies, and I tried to plot them using a bar chart but faced some difficulties. Here is my code:



    # First, I counted the number of occurrences.

    count = np.zeros(max(data_val))

    for i in range(count.shape[0]):
    for j in range(data_val.shape[0]):
    if (i == data_val[j]):
    count[i] = count[i] + 1

    '''
    This gives us
    count = array([192., 105., ... 19.])
    '''

    temp = np.arange(0, 18, 1) # Array for the x-axis.

    plt.bar(temp, count)


    I am getting an error on the last line of code, saying that the objects cannot be broadcast to a single shape.



    What I ultimately want is a bar chart where each bar corresponds to an integer value from 0 to 17, and the height of each bar (i.e. the y-axis) represents the frequencies.



    Thank you.






    UPDATE



    I decided to post the fixed code using the suggestions that people were kind enough to give below, just in case anybody facing similar issues will be able to see my revised code in the future.



    data = pd.read_csv("./data/train.csv") # Original data is a (2000, 7) DataFrame
    # data contains 6 feature columns and 1 target column.

    # Separate the design matrix from the target labels.
    X = data.iloc[:, :-1]
    y = data['target']


    '''
    The next line of code uses pandas.Series.value_counts() on y in order to count
    the number of occurrences for each label, and then proceeds to sort these according to
    index (i.e. label).

    You can also use pandas.DataFrame.sort_values() instead if you're interested in sorting
    according to the number of frequencies rather than labels.
    '''
    y.value_counts().sort_index().plot.bar(x='Target Value', y='Number of Occurrences')


    enter image description here



    There was no need to use for loops if we use the methods that are built into the Pandas library.



    The specific methods that were mentioned in the answers are pandas.Series.values_count(), pandas.DataFrame.sort_index(), and pandas.DataFrame.plot.bar().










    share|improve this question



























      1












      1








      1








      I was wondering if anyone could help me with how to make a bar chart to show the frequencies of values in a Pandas Series.



      I start with a Pandas DataFrame of shape (2000, 7), and from there I extract the last column. The column is shape (2000,).



      The entries in the Series that I mentioned vary from 0 to 17, each with different frequencies, and I tried to plot them using a bar chart but faced some difficulties. Here is my code:



      # First, I counted the number of occurrences.

      count = np.zeros(max(data_val))

      for i in range(count.shape[0]):
      for j in range(data_val.shape[0]):
      if (i == data_val[j]):
      count[i] = count[i] + 1

      '''
      This gives us
      count = array([192., 105., ... 19.])
      '''

      temp = np.arange(0, 18, 1) # Array for the x-axis.

      plt.bar(temp, count)


      I am getting an error on the last line of code, saying that the objects cannot be broadcast to a single shape.



      What I ultimately want is a bar chart where each bar corresponds to an integer value from 0 to 17, and the height of each bar (i.e. the y-axis) represents the frequencies.



      Thank you.






      UPDATE



      I decided to post the fixed code using the suggestions that people were kind enough to give below, just in case anybody facing similar issues will be able to see my revised code in the future.



      data = pd.read_csv("./data/train.csv") # Original data is a (2000, 7) DataFrame
      # data contains 6 feature columns and 1 target column.

      # Separate the design matrix from the target labels.
      X = data.iloc[:, :-1]
      y = data['target']


      '''
      The next line of code uses pandas.Series.value_counts() on y in order to count
      the number of occurrences for each label, and then proceeds to sort these according to
      index (i.e. label).

      You can also use pandas.DataFrame.sort_values() instead if you're interested in sorting
      according to the number of frequencies rather than labels.
      '''
      y.value_counts().sort_index().plot.bar(x='Target Value', y='Number of Occurrences')


      enter image description here



      There was no need to use for loops if we use the methods that are built into the Pandas library.



      The specific methods that were mentioned in the answers are pandas.Series.values_count(), pandas.DataFrame.sort_index(), and pandas.DataFrame.plot.bar().










      share|improve this question
















      I was wondering if anyone could help me with how to make a bar chart to show the frequencies of values in a Pandas Series.



      I start with a Pandas DataFrame of shape (2000, 7), and from there I extract the last column. The column is shape (2000,).



      The entries in the Series that I mentioned vary from 0 to 17, each with different frequencies, and I tried to plot them using a bar chart but faced some difficulties. Here is my code:



      # First, I counted the number of occurrences.

      count = np.zeros(max(data_val))

      for i in range(count.shape[0]):
      for j in range(data_val.shape[0]):
      if (i == data_val[j]):
      count[i] = count[i] + 1

      '''
      This gives us
      count = array([192., 105., ... 19.])
      '''

      temp = np.arange(0, 18, 1) # Array for the x-axis.

      plt.bar(temp, count)


      I am getting an error on the last line of code, saying that the objects cannot be broadcast to a single shape.



      What I ultimately want is a bar chart where each bar corresponds to an integer value from 0 to 17, and the height of each bar (i.e. the y-axis) represents the frequencies.



      Thank you.






      UPDATE



      I decided to post the fixed code using the suggestions that people were kind enough to give below, just in case anybody facing similar issues will be able to see my revised code in the future.



      data = pd.read_csv("./data/train.csv") # Original data is a (2000, 7) DataFrame
      # data contains 6 feature columns and 1 target column.

      # Separate the design matrix from the target labels.
      X = data.iloc[:, :-1]
      y = data['target']


      '''
      The next line of code uses pandas.Series.value_counts() on y in order to count
      the number of occurrences for each label, and then proceeds to sort these according to
      index (i.e. label).

      You can also use pandas.DataFrame.sort_values() instead if you're interested in sorting
      according to the number of frequencies rather than labels.
      '''
      y.value_counts().sort_index().plot.bar(x='Target Value', y='Number of Occurrences')


      enter image description here



      There was no need to use for loops if we use the methods that are built into the Pandas library.



      The specific methods that were mentioned in the answers are pandas.Series.values_count(), pandas.DataFrame.sort_index(), and pandas.DataFrame.plot.bar().







      python pandas bar-chart






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 25 '18 at 9:17







      Seankala

















      asked Nov 25 '18 at 6:41









      SeankalaSeankala

      3511213




      3511213
























          2 Answers
          2






          active

          oldest

          votes


















          1














          I believe you need value_counts with Series.plot.bar:



          df = pd.DataFrame({
          'a':[4,5,4,5,5,4],
          'b':[7,8,9,4,2,3],
          'c':[1,3,5,7,1,0],
          'd':[1,1,6,1,6,5],
          })

          print (df)
          a b c d
          0 4 7 1 1
          1 5 8 3 1
          2 4 9 5 6
          3 5 4 7 1
          4 5 2 1 6
          5 4 3 0 5


          df['d'].value_counts(sort=False).plot.bar()


          pic



          If possible some value missing and need set it to 0 add reindex:



          df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0).plot.bar()


          g



          Detail:



          print (df['d'].value_counts(sort=False))
          1 3
          5 1
          6 2
          Name: d, dtype: int64

          print (df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0))
          0 0
          1 3
          2 0
          3 0
          4 0
          5 1
          6 2
          7 0
          8 0
          9 0
          10 0
          11 0
          12 0
          13 0
          14 0
          15 0
          16 0
          17 0
          Name: d, dtype: int64





          share|improve this answer





















          • 1





            Your answer helped me out a lot! I wasn't aware that Pandas had these methods and was using loops unnecessarily. I also happened to use pandas.DataFrame.sort_index() to get the result I wanted as well.

            – Seankala
            Nov 25 '18 at 9:00



















          1














          Here's an approach using Seaborn



          import numpy as np
          import pandas as pd
          import seaborn as sns

          s = pd.Series(np.random.choice(17, 10))
          s
          # 0 10
          # 1 13
          # 2 12
          # 3 0
          # 4 0
          # 5 5
          # 6 13
          # 7 9
          # 8 11
          # 9 0
          # dtype: int64

          val, cnt = np.unique(s, return_counts=True)
          val, cnt
          # (array([ 0, 5, 9, 10, 11, 12, 13]), array([3, 1, 1, 1, 1, 1, 2]))

          sns.barplot(val, cnt)


          Seaborn plot






          share|improve this answer
























          • Thanks for the answer! I've actually never heard of Seaborn before, but will take a look at it in the future. Thanks again.

            – Seankala
            Nov 25 '18 at 9:18











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53465262%2fmaking-a-bar-chart-to-represent-the-number-of-occurrences-in-a-pandas-series%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          I believe you need value_counts with Series.plot.bar:



          df = pd.DataFrame({
          'a':[4,5,4,5,5,4],
          'b':[7,8,9,4,2,3],
          'c':[1,3,5,7,1,0],
          'd':[1,1,6,1,6,5],
          })

          print (df)
          a b c d
          0 4 7 1 1
          1 5 8 3 1
          2 4 9 5 6
          3 5 4 7 1
          4 5 2 1 6
          5 4 3 0 5


          df['d'].value_counts(sort=False).plot.bar()


          pic



          If possible some value missing and need set it to 0 add reindex:



          df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0).plot.bar()


          g



          Detail:



          print (df['d'].value_counts(sort=False))
          1 3
          5 1
          6 2
          Name: d, dtype: int64

          print (df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0))
          0 0
          1 3
          2 0
          3 0
          4 0
          5 1
          6 2
          7 0
          8 0
          9 0
          10 0
          11 0
          12 0
          13 0
          14 0
          15 0
          16 0
          17 0
          Name: d, dtype: int64





          share|improve this answer





















          • 1





            Your answer helped me out a lot! I wasn't aware that Pandas had these methods and was using loops unnecessarily. I also happened to use pandas.DataFrame.sort_index() to get the result I wanted as well.

            – Seankala
            Nov 25 '18 at 9:00
















          1














          I believe you need value_counts with Series.plot.bar:



          df = pd.DataFrame({
          'a':[4,5,4,5,5,4],
          'b':[7,8,9,4,2,3],
          'c':[1,3,5,7,1,0],
          'd':[1,1,6,1,6,5],
          })

          print (df)
          a b c d
          0 4 7 1 1
          1 5 8 3 1
          2 4 9 5 6
          3 5 4 7 1
          4 5 2 1 6
          5 4 3 0 5


          df['d'].value_counts(sort=False).plot.bar()


          pic



          If possible some value missing and need set it to 0 add reindex:



          df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0).plot.bar()


          g



          Detail:



          print (df['d'].value_counts(sort=False))
          1 3
          5 1
          6 2
          Name: d, dtype: int64

          print (df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0))
          0 0
          1 3
          2 0
          3 0
          4 0
          5 1
          6 2
          7 0
          8 0
          9 0
          10 0
          11 0
          12 0
          13 0
          14 0
          15 0
          16 0
          17 0
          Name: d, dtype: int64





          share|improve this answer





















          • 1





            Your answer helped me out a lot! I wasn't aware that Pandas had these methods and was using loops unnecessarily. I also happened to use pandas.DataFrame.sort_index() to get the result I wanted as well.

            – Seankala
            Nov 25 '18 at 9:00














          1












          1








          1







          I believe you need value_counts with Series.plot.bar:



          df = pd.DataFrame({
          'a':[4,5,4,5,5,4],
          'b':[7,8,9,4,2,3],
          'c':[1,3,5,7,1,0],
          'd':[1,1,6,1,6,5],
          })

          print (df)
          a b c d
          0 4 7 1 1
          1 5 8 3 1
          2 4 9 5 6
          3 5 4 7 1
          4 5 2 1 6
          5 4 3 0 5


          df['d'].value_counts(sort=False).plot.bar()


          pic



          If possible some value missing and need set it to 0 add reindex:



          df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0).plot.bar()


          g



          Detail:



          print (df['d'].value_counts(sort=False))
          1 3
          5 1
          6 2
          Name: d, dtype: int64

          print (df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0))
          0 0
          1 3
          2 0
          3 0
          4 0
          5 1
          6 2
          7 0
          8 0
          9 0
          10 0
          11 0
          12 0
          13 0
          14 0
          15 0
          16 0
          17 0
          Name: d, dtype: int64





          share|improve this answer















          I believe you need value_counts with Series.plot.bar:



          df = pd.DataFrame({
          'a':[4,5,4,5,5,4],
          'b':[7,8,9,4,2,3],
          'c':[1,3,5,7,1,0],
          'd':[1,1,6,1,6,5],
          })

          print (df)
          a b c d
          0 4 7 1 1
          1 5 8 3 1
          2 4 9 5 6
          3 5 4 7 1
          4 5 2 1 6
          5 4 3 0 5


          df['d'].value_counts(sort=False).plot.bar()


          pic



          If possible some value missing and need set it to 0 add reindex:



          df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0).plot.bar()


          g



          Detail:



          print (df['d'].value_counts(sort=False))
          1 3
          5 1
          6 2
          Name: d, dtype: int64

          print (df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0))
          0 0
          1 3
          2 0
          3 0
          4 0
          5 1
          6 2
          7 0
          8 0
          9 0
          10 0
          11 0
          12 0
          13 0
          14 0
          15 0
          16 0
          17 0
          Name: d, dtype: int64






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 25 '18 at 6:55

























          answered Nov 25 '18 at 6:44









          jezraeljezrael

          329k23270349




          329k23270349








          • 1





            Your answer helped me out a lot! I wasn't aware that Pandas had these methods and was using loops unnecessarily. I also happened to use pandas.DataFrame.sort_index() to get the result I wanted as well.

            – Seankala
            Nov 25 '18 at 9:00














          • 1





            Your answer helped me out a lot! I wasn't aware that Pandas had these methods and was using loops unnecessarily. I also happened to use pandas.DataFrame.sort_index() to get the result I wanted as well.

            – Seankala
            Nov 25 '18 at 9:00








          1




          1





          Your answer helped me out a lot! I wasn't aware that Pandas had these methods and was using loops unnecessarily. I also happened to use pandas.DataFrame.sort_index() to get the result I wanted as well.

          – Seankala
          Nov 25 '18 at 9:00





          Your answer helped me out a lot! I wasn't aware that Pandas had these methods and was using loops unnecessarily. I also happened to use pandas.DataFrame.sort_index() to get the result I wanted as well.

          – Seankala
          Nov 25 '18 at 9:00













          1














          Here's an approach using Seaborn



          import numpy as np
          import pandas as pd
          import seaborn as sns

          s = pd.Series(np.random.choice(17, 10))
          s
          # 0 10
          # 1 13
          # 2 12
          # 3 0
          # 4 0
          # 5 5
          # 6 13
          # 7 9
          # 8 11
          # 9 0
          # dtype: int64

          val, cnt = np.unique(s, return_counts=True)
          val, cnt
          # (array([ 0, 5, 9, 10, 11, 12, 13]), array([3, 1, 1, 1, 1, 1, 2]))

          sns.barplot(val, cnt)


          Seaborn plot






          share|improve this answer
























          • Thanks for the answer! I've actually never heard of Seaborn before, but will take a look at it in the future. Thanks again.

            – Seankala
            Nov 25 '18 at 9:18
















          1














          Here's an approach using Seaborn



          import numpy as np
          import pandas as pd
          import seaborn as sns

          s = pd.Series(np.random.choice(17, 10))
          s
          # 0 10
          # 1 13
          # 2 12
          # 3 0
          # 4 0
          # 5 5
          # 6 13
          # 7 9
          # 8 11
          # 9 0
          # dtype: int64

          val, cnt = np.unique(s, return_counts=True)
          val, cnt
          # (array([ 0, 5, 9, 10, 11, 12, 13]), array([3, 1, 1, 1, 1, 1, 2]))

          sns.barplot(val, cnt)


          Seaborn plot






          share|improve this answer
























          • Thanks for the answer! I've actually never heard of Seaborn before, but will take a look at it in the future. Thanks again.

            – Seankala
            Nov 25 '18 at 9:18














          1












          1








          1







          Here's an approach using Seaborn



          import numpy as np
          import pandas as pd
          import seaborn as sns

          s = pd.Series(np.random.choice(17, 10))
          s
          # 0 10
          # 1 13
          # 2 12
          # 3 0
          # 4 0
          # 5 5
          # 6 13
          # 7 9
          # 8 11
          # 9 0
          # dtype: int64

          val, cnt = np.unique(s, return_counts=True)
          val, cnt
          # (array([ 0, 5, 9, 10, 11, 12, 13]), array([3, 1, 1, 1, 1, 1, 2]))

          sns.barplot(val, cnt)


          Seaborn plot






          share|improve this answer













          Here's an approach using Seaborn



          import numpy as np
          import pandas as pd
          import seaborn as sns

          s = pd.Series(np.random.choice(17, 10))
          s
          # 0 10
          # 1 13
          # 2 12
          # 3 0
          # 4 0
          # 5 5
          # 6 13
          # 7 9
          # 8 11
          # 9 0
          # dtype: int64

          val, cnt = np.unique(s, return_counts=True)
          val, cnt
          # (array([ 0, 5, 9, 10, 11, 12, 13]), array([3, 1, 1, 1, 1, 1, 2]))

          sns.barplot(val, cnt)


          Seaborn plot







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 25 '18 at 6:51









          dataLeodataLeo

          6031419




          6031419













          • Thanks for the answer! I've actually never heard of Seaborn before, but will take a look at it in the future. Thanks again.

            – Seankala
            Nov 25 '18 at 9:18



















          • Thanks for the answer! I've actually never heard of Seaborn before, but will take a look at it in the future. Thanks again.

            – Seankala
            Nov 25 '18 at 9:18

















          Thanks for the answer! I've actually never heard of Seaborn before, but will take a look at it in the future. Thanks again.

          – Seankala
          Nov 25 '18 at 9:18





          Thanks for the answer! I've actually never heard of Seaborn before, but will take a look at it in the future. Thanks again.

          – Seankala
          Nov 25 '18 at 9:18


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53465262%2fmaking-a-bar-chart-to-represent-the-number-of-occurrences-in-a-pandas-series%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown