Advanced histogram usage in Python with numpy












-4















I need to quickly do this across large amounts of data, so I ideally want to use an approach such as numpy that is fast. I would normally just write a loop but python is too slow for that. Here is the problem:



I would like to add up sums according to the bins of another array. for example, i have three arrays of



weights = [100, 130, 112, 150]
ages = [1, 14, 15, 25]


I want to sum the weights according to ages being binned with bins of 0-9, 10-19, 20-29. so i'd get [100, 130+112, 150] -> [100, 242, 150] as my end result.



My current understanding of numpy's histograms is that I would only be able to sum the array that I am binning with. Meaning that I could only get the sum of the ages if I bin ages.



I would also like the knowledge of how to do this well, it's likely in the future other operations than sums will be required of me (such as averaging them rather than just a pure sum). Thank you for your help.










share|improve this question





























    -4















    I need to quickly do this across large amounts of data, so I ideally want to use an approach such as numpy that is fast. I would normally just write a loop but python is too slow for that. Here is the problem:



    I would like to add up sums according to the bins of another array. for example, i have three arrays of



    weights = [100, 130, 112, 150]
    ages = [1, 14, 15, 25]


    I want to sum the weights according to ages being binned with bins of 0-9, 10-19, 20-29. so i'd get [100, 130+112, 150] -> [100, 242, 150] as my end result.



    My current understanding of numpy's histograms is that I would only be able to sum the array that I am binning with. Meaning that I could only get the sum of the ages if I bin ages.



    I would also like the knowledge of how to do this well, it's likely in the future other operations than sums will be required of me (such as averaging them rather than just a pure sum). Thank you for your help.










    share|improve this question



























      -4












      -4








      -4








      I need to quickly do this across large amounts of data, so I ideally want to use an approach such as numpy that is fast. I would normally just write a loop but python is too slow for that. Here is the problem:



      I would like to add up sums according to the bins of another array. for example, i have three arrays of



      weights = [100, 130, 112, 150]
      ages = [1, 14, 15, 25]


      I want to sum the weights according to ages being binned with bins of 0-9, 10-19, 20-29. so i'd get [100, 130+112, 150] -> [100, 242, 150] as my end result.



      My current understanding of numpy's histograms is that I would only be able to sum the array that I am binning with. Meaning that I could only get the sum of the ages if I bin ages.



      I would also like the knowledge of how to do this well, it's likely in the future other operations than sums will be required of me (such as averaging them rather than just a pure sum). Thank you for your help.










      share|improve this question
















      I need to quickly do this across large amounts of data, so I ideally want to use an approach such as numpy that is fast. I would normally just write a loop but python is too slow for that. Here is the problem:



      I would like to add up sums according to the bins of another array. for example, i have three arrays of



      weights = [100, 130, 112, 150]
      ages = [1, 14, 15, 25]


      I want to sum the weights according to ages being binned with bins of 0-9, 10-19, 20-29. so i'd get [100, 130+112, 150] -> [100, 242, 150] as my end result.



      My current understanding of numpy's histograms is that I would only be able to sum the array that I am binning with. Meaning that I could only get the sum of the ages if I bin ages.



      I would also like the knowledge of how to do this well, it's likely in the future other operations than sums will be required of me (such as averaging them rather than just a pure sum). Thank you for your help.







      python numpy






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 28 '18 at 9:42









      Saugat Bhattarai

      1,02821428




      1,02821428










      asked Nov 28 '18 at 9:35









      PlezosPlezos

      408




      408
























          1 Answer
          1






          active

          oldest

          votes


















          1














          This can be done pretty simply with a list comprehension and some numpy logical functions, and it won't be limited only to summation.



          import numpy as np

          ages = [1, 14, 15, 25]
          weights = np.array([100, 130, 112, 150]) # easier indexing with a np.array
          bin_left_marks = np.arange(0, 40, 10)
          my_func = np.sum
          my_binned_aggregation = [my_func(weights[np.where(np.logical_and(bin_left_marks[i] <= ages, ages < bin_left_marks[i+1]))]) for i in range(len(bin_left_marks) - 1)]


          Basically, for each bin, find the indexes of the ages list that match that bin, and aggregate the weights list accordingly.

          Good luck!





          Obviously can be done "less ugly" by splitting that one-liner, using a straight-forward loop, etc. This solution is going for concise.






          share|improve this answer
























          • Thanks for the help. Was worried I wouldn't get a question due to the negative score. I think its maybe because my question is vague but that's really the problem is that I don't know what direction to approach in but you've pointed me in the right one. I'll study this code now.

            – Plezos
            Nov 28 '18 at 10:09











          • I think it's actually because you didn't include any of your own attempts or show enough effort. For next time...

            – ShlomiF
            Nov 28 '18 at 10:10











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53516285%2fadvanced-histogram-usage-in-python-with-numpy%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          This can be done pretty simply with a list comprehension and some numpy logical functions, and it won't be limited only to summation.



          import numpy as np

          ages = [1, 14, 15, 25]
          weights = np.array([100, 130, 112, 150]) # easier indexing with a np.array
          bin_left_marks = np.arange(0, 40, 10)
          my_func = np.sum
          my_binned_aggregation = [my_func(weights[np.where(np.logical_and(bin_left_marks[i] <= ages, ages < bin_left_marks[i+1]))]) for i in range(len(bin_left_marks) - 1)]


          Basically, for each bin, find the indexes of the ages list that match that bin, and aggregate the weights list accordingly.

          Good luck!





          Obviously can be done "less ugly" by splitting that one-liner, using a straight-forward loop, etc. This solution is going for concise.






          share|improve this answer
























          • Thanks for the help. Was worried I wouldn't get a question due to the negative score. I think its maybe because my question is vague but that's really the problem is that I don't know what direction to approach in but you've pointed me in the right one. I'll study this code now.

            – Plezos
            Nov 28 '18 at 10:09











          • I think it's actually because you didn't include any of your own attempts or show enough effort. For next time...

            – ShlomiF
            Nov 28 '18 at 10:10
















          1














          This can be done pretty simply with a list comprehension and some numpy logical functions, and it won't be limited only to summation.



          import numpy as np

          ages = [1, 14, 15, 25]
          weights = np.array([100, 130, 112, 150]) # easier indexing with a np.array
          bin_left_marks = np.arange(0, 40, 10)
          my_func = np.sum
          my_binned_aggregation = [my_func(weights[np.where(np.logical_and(bin_left_marks[i] <= ages, ages < bin_left_marks[i+1]))]) for i in range(len(bin_left_marks) - 1)]


          Basically, for each bin, find the indexes of the ages list that match that bin, and aggregate the weights list accordingly.

          Good luck!





          Obviously can be done "less ugly" by splitting that one-liner, using a straight-forward loop, etc. This solution is going for concise.






          share|improve this answer
























          • Thanks for the help. Was worried I wouldn't get a question due to the negative score. I think its maybe because my question is vague but that's really the problem is that I don't know what direction to approach in but you've pointed me in the right one. I'll study this code now.

            – Plezos
            Nov 28 '18 at 10:09











          • I think it's actually because you didn't include any of your own attempts or show enough effort. For next time...

            – ShlomiF
            Nov 28 '18 at 10:10














          1












          1








          1







          This can be done pretty simply with a list comprehension and some numpy logical functions, and it won't be limited only to summation.



          import numpy as np

          ages = [1, 14, 15, 25]
          weights = np.array([100, 130, 112, 150]) # easier indexing with a np.array
          bin_left_marks = np.arange(0, 40, 10)
          my_func = np.sum
          my_binned_aggregation = [my_func(weights[np.where(np.logical_and(bin_left_marks[i] <= ages, ages < bin_left_marks[i+1]))]) for i in range(len(bin_left_marks) - 1)]


          Basically, for each bin, find the indexes of the ages list that match that bin, and aggregate the weights list accordingly.

          Good luck!





          Obviously can be done "less ugly" by splitting that one-liner, using a straight-forward loop, etc. This solution is going for concise.






          share|improve this answer













          This can be done pretty simply with a list comprehension and some numpy logical functions, and it won't be limited only to summation.



          import numpy as np

          ages = [1, 14, 15, 25]
          weights = np.array([100, 130, 112, 150]) # easier indexing with a np.array
          bin_left_marks = np.arange(0, 40, 10)
          my_func = np.sum
          my_binned_aggregation = [my_func(weights[np.where(np.logical_and(bin_left_marks[i] <= ages, ages < bin_left_marks[i+1]))]) for i in range(len(bin_left_marks) - 1)]


          Basically, for each bin, find the indexes of the ages list that match that bin, and aggregate the weights list accordingly.

          Good luck!





          Obviously can be done "less ugly" by splitting that one-liner, using a straight-forward loop, etc. This solution is going for concise.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 28 '18 at 9:48









          ShlomiFShlomiF

          855410




          855410













          • Thanks for the help. Was worried I wouldn't get a question due to the negative score. I think its maybe because my question is vague but that's really the problem is that I don't know what direction to approach in but you've pointed me in the right one. I'll study this code now.

            – Plezos
            Nov 28 '18 at 10:09











          • I think it's actually because you didn't include any of your own attempts or show enough effort. For next time...

            – ShlomiF
            Nov 28 '18 at 10:10



















          • Thanks for the help. Was worried I wouldn't get a question due to the negative score. I think its maybe because my question is vague but that's really the problem is that I don't know what direction to approach in but you've pointed me in the right one. I'll study this code now.

            – Plezos
            Nov 28 '18 at 10:09











          • I think it's actually because you didn't include any of your own attempts or show enough effort. For next time...

            – ShlomiF
            Nov 28 '18 at 10:10

















          Thanks for the help. Was worried I wouldn't get a question due to the negative score. I think its maybe because my question is vague but that's really the problem is that I don't know what direction to approach in but you've pointed me in the right one. I'll study this code now.

          – Plezos
          Nov 28 '18 at 10:09





          Thanks for the help. Was worried I wouldn't get a question due to the negative score. I think its maybe because my question is vague but that's really the problem is that I don't know what direction to approach in but you've pointed me in the right one. I'll study this code now.

          – Plezos
          Nov 28 '18 at 10:09













          I think it's actually because you didn't include any of your own attempts or show enough effort. For next time...

          – ShlomiF
          Nov 28 '18 at 10:10





          I think it's actually because you didn't include any of your own attempts or show enough effort. For next time...

          – ShlomiF
          Nov 28 '18 at 10:10




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53516285%2fadvanced-histogram-usage-in-python-with-numpy%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

          Calculate evaluation metrics using cross_val_predict sklearn

          Insert data from modal to MySQL (multiple modal on website)