python pivot table/group by - i need to know top 3 group











up vote
0
down vote

favorite












import pandas as pd

df = pd.DataFrame({
'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})


With the above data frame , I wanted the output as below :



For each card number, I wanted the below records :



Card number, % of Amount spent of Group code 1, % of Amount spent on Group code 2, ….so on for different Group code



% of Amount spent on any group = (Total amount spend on the card / Amount spend on that group ) * 100



Also, on larger picture, I wanted to know the Top 5 Groups for each card where the amount is spent ?



It's basically 2 queries , It will be great if anyone can help me.



Note : The code given is just for understanding how my data frame looks like.










share|improve this question




























    up vote
    0
    down vote

    favorite












    import pandas as pd

    df = pd.DataFrame({
    'customer': [1,2,1,3,1,2,3],
    "group_code": ['111', '111', '222', '111', '111', '111', '333'],
    "ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
    "amount": [100, 200, 140, 400, 225, 125, 600],
    "card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})


    With the above data frame , I wanted the output as below :



    For each card number, I wanted the below records :



    Card number, % of Amount spent of Group code 1, % of Amount spent on Group code 2, ….so on for different Group code



    % of Amount spent on any group = (Total amount spend on the card / Amount spend on that group ) * 100



    Also, on larger picture, I wanted to know the Top 5 Groups for each card where the amount is spent ?



    It's basically 2 queries , It will be great if anyone can help me.



    Note : The code given is just for understanding how my data frame looks like.










    share|improve this question


























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      import pandas as pd

      df = pd.DataFrame({
      'customer': [1,2,1,3,1,2,3],
      "group_code": ['111', '111', '222', '111', '111', '111', '333'],
      "ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
      "amount": [100, 200, 140, 400, 225, 125, 600],
      "card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})


      With the above data frame , I wanted the output as below :



      For each card number, I wanted the below records :



      Card number, % of Amount spent of Group code 1, % of Amount spent on Group code 2, ….so on for different Group code



      % of Amount spent on any group = (Total amount spend on the card / Amount spend on that group ) * 100



      Also, on larger picture, I wanted to know the Top 5 Groups for each card where the amount is spent ?



      It's basically 2 queries , It will be great if anyone can help me.



      Note : The code given is just for understanding how my data frame looks like.










      share|improve this question















      import pandas as pd

      df = pd.DataFrame({
      'customer': [1,2,1,3,1,2,3],
      "group_code": ['111', '111', '222', '111', '111', '111', '333'],
      "ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
      "amount": [100, 200, 140, 400, 225, 125, 600],
      "card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})


      With the above data frame , I wanted the output as below :



      For each card number, I wanted the below records :



      Card number, % of Amount spent of Group code 1, % of Amount spent on Group code 2, ….so on for different Group code



      % of Amount spent on any group = (Total amount spend on the card / Amount spend on that group ) * 100



      Also, on larger picture, I wanted to know the Top 5 Groups for each card where the amount is spent ?



      It's basically 2 queries , It will be great if anyone can help me.



      Note : The code given is just for understanding how my data frame looks like.







      python pivot-table pandas-groupby






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 22 at 7:28









      rdj7

      7331718




      7331718










      asked Nov 22 at 6:13









      Aysha

      11




      11
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          Regarding the first query: first we get the total amount spent for each card:



          card_totals = df.groupby('card').sum()['amount'].reset_index().to_dict(orient='list')
          card_totals_dict = dict(zip(card_totals['card'], card_totals['amount']))
          card_totals_dict


          Output:



          {'XXX': 1325, 'YYY': 465}


          Then we calculate the percentage for each group:



          group_percentage = df.groupby(['card', 'group_code']).sum()['amount'].reset_index()
          group_percentage['percentage'] = group_percentage['amount'] * 100 / group_percentage['card'].apply(card_totals_dict.get)
          group_percentage


          Output:



          card group_code  amount  percentage
          0 XXX 111 725 54.7170
          1 XXX 333 600 45.2830
          2 YYY 111 325 69.8925
          3 YYY 222 140 30.1075


          Regarding the second query, it sounds very similar to this question, so I would say:



          df.groupby(['card', 'group_code']).agg({'amount': sum})['amount'].groupby(level=0, group_keys=False).nlargest(5)


          Using nlargest(1) returns



          card  group_code
          XXX 111 725
          YYY 111 325
          Name: amount, dtype: int64





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424864%2fpython-pivot-table-group-by-i-need-to-know-top-3-group%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            0
            down vote













            Regarding the first query: first we get the total amount spent for each card:



            card_totals = df.groupby('card').sum()['amount'].reset_index().to_dict(orient='list')
            card_totals_dict = dict(zip(card_totals['card'], card_totals['amount']))
            card_totals_dict


            Output:



            {'XXX': 1325, 'YYY': 465}


            Then we calculate the percentage for each group:



            group_percentage = df.groupby(['card', 'group_code']).sum()['amount'].reset_index()
            group_percentage['percentage'] = group_percentage['amount'] * 100 / group_percentage['card'].apply(card_totals_dict.get)
            group_percentage


            Output:



            card group_code  amount  percentage
            0 XXX 111 725 54.7170
            1 XXX 333 600 45.2830
            2 YYY 111 325 69.8925
            3 YYY 222 140 30.1075


            Regarding the second query, it sounds very similar to this question, so I would say:



            df.groupby(['card', 'group_code']).agg({'amount': sum})['amount'].groupby(level=0, group_keys=False).nlargest(5)


            Using nlargest(1) returns



            card  group_code
            XXX 111 725
            YYY 111 325
            Name: amount, dtype: int64





            share|improve this answer



























              up vote
              0
              down vote













              Regarding the first query: first we get the total amount spent for each card:



              card_totals = df.groupby('card').sum()['amount'].reset_index().to_dict(orient='list')
              card_totals_dict = dict(zip(card_totals['card'], card_totals['amount']))
              card_totals_dict


              Output:



              {'XXX': 1325, 'YYY': 465}


              Then we calculate the percentage for each group:



              group_percentage = df.groupby(['card', 'group_code']).sum()['amount'].reset_index()
              group_percentage['percentage'] = group_percentage['amount'] * 100 / group_percentage['card'].apply(card_totals_dict.get)
              group_percentage


              Output:



              card group_code  amount  percentage
              0 XXX 111 725 54.7170
              1 XXX 333 600 45.2830
              2 YYY 111 325 69.8925
              3 YYY 222 140 30.1075


              Regarding the second query, it sounds very similar to this question, so I would say:



              df.groupby(['card', 'group_code']).agg({'amount': sum})['amount'].groupby(level=0, group_keys=False).nlargest(5)


              Using nlargest(1) returns



              card  group_code
              XXX 111 725
              YYY 111 325
              Name: amount, dtype: int64





              share|improve this answer

























                up vote
                0
                down vote










                up vote
                0
                down vote









                Regarding the first query: first we get the total amount spent for each card:



                card_totals = df.groupby('card').sum()['amount'].reset_index().to_dict(orient='list')
                card_totals_dict = dict(zip(card_totals['card'], card_totals['amount']))
                card_totals_dict


                Output:



                {'XXX': 1325, 'YYY': 465}


                Then we calculate the percentage for each group:



                group_percentage = df.groupby(['card', 'group_code']).sum()['amount'].reset_index()
                group_percentage['percentage'] = group_percentage['amount'] * 100 / group_percentage['card'].apply(card_totals_dict.get)
                group_percentage


                Output:



                card group_code  amount  percentage
                0 XXX 111 725 54.7170
                1 XXX 333 600 45.2830
                2 YYY 111 325 69.8925
                3 YYY 222 140 30.1075


                Regarding the second query, it sounds very similar to this question, so I would say:



                df.groupby(['card', 'group_code']).agg({'amount': sum})['amount'].groupby(level=0, group_keys=False).nlargest(5)


                Using nlargest(1) returns



                card  group_code
                XXX 111 725
                YYY 111 325
                Name: amount, dtype: int64





                share|improve this answer














                Regarding the first query: first we get the total amount spent for each card:



                card_totals = df.groupby('card').sum()['amount'].reset_index().to_dict(orient='list')
                card_totals_dict = dict(zip(card_totals['card'], card_totals['amount']))
                card_totals_dict


                Output:



                {'XXX': 1325, 'YYY': 465}


                Then we calculate the percentage for each group:



                group_percentage = df.groupby(['card', 'group_code']).sum()['amount'].reset_index()
                group_percentage['percentage'] = group_percentage['amount'] * 100 / group_percentage['card'].apply(card_totals_dict.get)
                group_percentage


                Output:



                card group_code  amount  percentage
                0 XXX 111 725 54.7170
                1 XXX 333 600 45.2830
                2 YYY 111 325 69.8925
                3 YYY 222 140 30.1075


                Regarding the second query, it sounds very similar to this question, so I would say:



                df.groupby(['card', 'group_code']).agg({'amount': sum})['amount'].groupby(level=0, group_keys=False).nlargest(5)


                Using nlargest(1) returns



                card  group_code
                XXX 111 725
                YYY 111 325
                Name: amount, dtype: int64






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 22 at 6:48

























                answered Nov 22 at 6:41









                andersource

                27115




                27115






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424864%2fpython-pivot-table-group-by-i-need-to-know-top-3-group%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                    Calculate evaluation metrics using cross_val_predict sklearn

                    Insert data from modal to MySQL (multiple modal on website)