How to extract rows, from datatframe, based on column value, to multiple CSV files?












0















I have following dataframe:



data = {'participant_id': [1, 100, 125, 125, 1, 100], 
'test_day':['Day_1', 'Day_1', 'Day_12', 'Day_14', 'Day_4', 'Day_4'],
'favorite_color': ['blue', 'red', 'yellow', 'green', 'yellow', 'green'],
'grade': [88, 92, 95, 70, 80, 30]}
df = pd.DataFrame(data, columns = ['participant_id', 'test_day', 'favorite_color', 'grade'])


It has 10000 rows and contains data for 400 test participants labelled with unique and completely random ID’s stored in 'participant_id' column. My task is to create dataframes for individuals (per ‘participant_id’) and then save them to the separate csv files (400 in total).



I’ve been trying to figure out how to do it for a couple of days now but with no luck.



Can you please help me?



I am still learning how to program and trying to apply knowledge from data science course. I am using Pandas and normally I access data about individual participant with df.loc, I have also created a list of all of the participant_id’s but I don’t know how to combine both to achieve the desired result automatically.










share|improve this question





























    0















    I have following dataframe:



    data = {'participant_id': [1, 100, 125, 125, 1, 100], 
    'test_day':['Day_1', 'Day_1', 'Day_12', 'Day_14', 'Day_4', 'Day_4'],
    'favorite_color': ['blue', 'red', 'yellow', 'green', 'yellow', 'green'],
    'grade': [88, 92, 95, 70, 80, 30]}
    df = pd.DataFrame(data, columns = ['participant_id', 'test_day', 'favorite_color', 'grade'])


    It has 10000 rows and contains data for 400 test participants labelled with unique and completely random ID’s stored in 'participant_id' column. My task is to create dataframes for individuals (per ‘participant_id’) and then save them to the separate csv files (400 in total).



    I’ve been trying to figure out how to do it for a couple of days now but with no luck.



    Can you please help me?



    I am still learning how to program and trying to apply knowledge from data science course. I am using Pandas and normally I access data about individual participant with df.loc, I have also created a list of all of the participant_id’s but I don’t know how to combine both to achieve the desired result automatically.










    share|improve this question



























      0












      0








      0








      I have following dataframe:



      data = {'participant_id': [1, 100, 125, 125, 1, 100], 
      'test_day':['Day_1', 'Day_1', 'Day_12', 'Day_14', 'Day_4', 'Day_4'],
      'favorite_color': ['blue', 'red', 'yellow', 'green', 'yellow', 'green'],
      'grade': [88, 92, 95, 70, 80, 30]}
      df = pd.DataFrame(data, columns = ['participant_id', 'test_day', 'favorite_color', 'grade'])


      It has 10000 rows and contains data for 400 test participants labelled with unique and completely random ID’s stored in 'participant_id' column. My task is to create dataframes for individuals (per ‘participant_id’) and then save them to the separate csv files (400 in total).



      I’ve been trying to figure out how to do it for a couple of days now but with no luck.



      Can you please help me?



      I am still learning how to program and trying to apply knowledge from data science course. I am using Pandas and normally I access data about individual participant with df.loc, I have also created a list of all of the participant_id’s but I don’t know how to combine both to achieve the desired result automatically.










      share|improve this question
















      I have following dataframe:



      data = {'participant_id': [1, 100, 125, 125, 1, 100], 
      'test_day':['Day_1', 'Day_1', 'Day_12', 'Day_14', 'Day_4', 'Day_4'],
      'favorite_color': ['blue', 'red', 'yellow', 'green', 'yellow', 'green'],
      'grade': [88, 92, 95, 70, 80, 30]}
      df = pd.DataFrame(data, columns = ['participant_id', 'test_day', 'favorite_color', 'grade'])


      It has 10000 rows and contains data for 400 test participants labelled with unique and completely random ID’s stored in 'participant_id' column. My task is to create dataframes for individuals (per ‘participant_id’) and then save them to the separate csv files (400 in total).



      I’ve been trying to figure out how to do it for a couple of days now but with no luck.



      Can you please help me?



      I am still learning how to program and trying to apply knowledge from data science course. I am using Pandas and normally I access data about individual participant with df.loc, I have also created a list of all of the participant_id’s but I don’t know how to combine both to achieve the desired result automatically.







      python pandas dataframe pandas-groupby






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 25 '18 at 20:49









      jpp

      99.4k2161110




      99.4k2161110










      asked Nov 25 '18 at 0:56









      MoonlitMoonlit

      207




      207
























          2 Answers
          2






          active

          oldest

          votes


















          2















          groupby + to_csv



          You can group by a particular field and iterate:



          for part_id, df_id in df.groupby('participant_id'):
          df_id.to_csv(f'{part_id}.csv')





          share|improve this answer
























          • Thank you @jpp for this elegant and simple answer. Files are extracted =D!

            – Moonlit
            Nov 25 '18 at 18:33











          • I have one more question:). If I would like to use an automatic prefix and instead of IDs 10011, 13652 have A10011 and A13652 what I suppose to do? Cheers!

            – Moonlit
            Dec 3 '18 at 1:44






          • 1





            How about df_id.to_csv(f'A{part_id}.csv')

            – jpp
            Dec 3 '18 at 9:22



















          1














          Solution by @jpp is great. My adaptation based on your solution is



          import pandas as pd
          import numpy as np

          data = {'participant_id': [1, 100, 125, 125, 1, 100],
          'test_day':['Day_1', 'Day_1', 'Day_12', 'Day_14', 'Day_4', 'Day_4'],
          'favorite_color': ['blue', 'red', 'yellow', 'green', 'yellow', 'green'],
          'grade': [88, 92, 95, 70, 80, 30]
          }

          col = list(data.keys())
          df = pd.DataFrame(data, columns = col)

          for part_id, df_id in df.groupby('participant_id'):
          df_id.to_csv(f'{part_id}.csv',index=False)





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53463774%2fhow-to-extract-rows-from-datatframe-based-on-column-value-to-multiple-csv-fil%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2















            groupby + to_csv



            You can group by a particular field and iterate:



            for part_id, df_id in df.groupby('participant_id'):
            df_id.to_csv(f'{part_id}.csv')





            share|improve this answer
























            • Thank you @jpp for this elegant and simple answer. Files are extracted =D!

              – Moonlit
              Nov 25 '18 at 18:33











            • I have one more question:). If I would like to use an automatic prefix and instead of IDs 10011, 13652 have A10011 and A13652 what I suppose to do? Cheers!

              – Moonlit
              Dec 3 '18 at 1:44






            • 1





              How about df_id.to_csv(f'A{part_id}.csv')

              – jpp
              Dec 3 '18 at 9:22
















            2















            groupby + to_csv



            You can group by a particular field and iterate:



            for part_id, df_id in df.groupby('participant_id'):
            df_id.to_csv(f'{part_id}.csv')





            share|improve this answer
























            • Thank you @jpp for this elegant and simple answer. Files are extracted =D!

              – Moonlit
              Nov 25 '18 at 18:33











            • I have one more question:). If I would like to use an automatic prefix and instead of IDs 10011, 13652 have A10011 and A13652 what I suppose to do? Cheers!

              – Moonlit
              Dec 3 '18 at 1:44






            • 1





              How about df_id.to_csv(f'A{part_id}.csv')

              – jpp
              Dec 3 '18 at 9:22














            2












            2








            2








            groupby + to_csv



            You can group by a particular field and iterate:



            for part_id, df_id in df.groupby('participant_id'):
            df_id.to_csv(f'{part_id}.csv')





            share|improve this answer














            groupby + to_csv



            You can group by a particular field and iterate:



            for part_id, df_id in df.groupby('participant_id'):
            df_id.to_csv(f'{part_id}.csv')






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 25 '18 at 1:00









            jppjpp

            99.4k2161110




            99.4k2161110













            • Thank you @jpp for this elegant and simple answer. Files are extracted =D!

              – Moonlit
              Nov 25 '18 at 18:33











            • I have one more question:). If I would like to use an automatic prefix and instead of IDs 10011, 13652 have A10011 and A13652 what I suppose to do? Cheers!

              – Moonlit
              Dec 3 '18 at 1:44






            • 1





              How about df_id.to_csv(f'A{part_id}.csv')

              – jpp
              Dec 3 '18 at 9:22



















            • Thank you @jpp for this elegant and simple answer. Files are extracted =D!

              – Moonlit
              Nov 25 '18 at 18:33











            • I have one more question:). If I would like to use an automatic prefix and instead of IDs 10011, 13652 have A10011 and A13652 what I suppose to do? Cheers!

              – Moonlit
              Dec 3 '18 at 1:44






            • 1





              How about df_id.to_csv(f'A{part_id}.csv')

              – jpp
              Dec 3 '18 at 9:22

















            Thank you @jpp for this elegant and simple answer. Files are extracted =D!

            – Moonlit
            Nov 25 '18 at 18:33





            Thank you @jpp for this elegant and simple answer. Files are extracted =D!

            – Moonlit
            Nov 25 '18 at 18:33













            I have one more question:). If I would like to use an automatic prefix and instead of IDs 10011, 13652 have A10011 and A13652 what I suppose to do? Cheers!

            – Moonlit
            Dec 3 '18 at 1:44





            I have one more question:). If I would like to use an automatic prefix and instead of IDs 10011, 13652 have A10011 and A13652 what I suppose to do? Cheers!

            – Moonlit
            Dec 3 '18 at 1:44




            1




            1





            How about df_id.to_csv(f'A{part_id}.csv')

            – jpp
            Dec 3 '18 at 9:22





            How about df_id.to_csv(f'A{part_id}.csv')

            – jpp
            Dec 3 '18 at 9:22













            1














            Solution by @jpp is great. My adaptation based on your solution is



            import pandas as pd
            import numpy as np

            data = {'participant_id': [1, 100, 125, 125, 1, 100],
            'test_day':['Day_1', 'Day_1', 'Day_12', 'Day_14', 'Day_4', 'Day_4'],
            'favorite_color': ['blue', 'red', 'yellow', 'green', 'yellow', 'green'],
            'grade': [88, 92, 95, 70, 80, 30]
            }

            col = list(data.keys())
            df = pd.DataFrame(data, columns = col)

            for part_id, df_id in df.groupby('participant_id'):
            df_id.to_csv(f'{part_id}.csv',index=False)





            share|improve this answer




























              1














              Solution by @jpp is great. My adaptation based on your solution is



              import pandas as pd
              import numpy as np

              data = {'participant_id': [1, 100, 125, 125, 1, 100],
              'test_day':['Day_1', 'Day_1', 'Day_12', 'Day_14', 'Day_4', 'Day_4'],
              'favorite_color': ['blue', 'red', 'yellow', 'green', 'yellow', 'green'],
              'grade': [88, 92, 95, 70, 80, 30]
              }

              col = list(data.keys())
              df = pd.DataFrame(data, columns = col)

              for part_id, df_id in df.groupby('participant_id'):
              df_id.to_csv(f'{part_id}.csv',index=False)





              share|improve this answer


























                1












                1








                1







                Solution by @jpp is great. My adaptation based on your solution is



                import pandas as pd
                import numpy as np

                data = {'participant_id': [1, 100, 125, 125, 1, 100],
                'test_day':['Day_1', 'Day_1', 'Day_12', 'Day_14', 'Day_4', 'Day_4'],
                'favorite_color': ['blue', 'red', 'yellow', 'green', 'yellow', 'green'],
                'grade': [88, 92, 95, 70, 80, 30]
                }

                col = list(data.keys())
                df = pd.DataFrame(data, columns = col)

                for part_id, df_id in df.groupby('participant_id'):
                df_id.to_csv(f'{part_id}.csv',index=False)





                share|improve this answer













                Solution by @jpp is great. My adaptation based on your solution is



                import pandas as pd
                import numpy as np

                data = {'participant_id': [1, 100, 125, 125, 1, 100],
                'test_day':['Day_1', 'Day_1', 'Day_12', 'Day_14', 'Day_4', 'Day_4'],
                'favorite_color': ['blue', 'red', 'yellow', 'green', 'yellow', 'green'],
                'grade': [88, 92, 95, 70, 80, 30]
                }

                col = list(data.keys())
                df = pd.DataFrame(data, columns = col)

                for part_id, df_id in df.groupby('participant_id'):
                df_id.to_csv(f'{part_id}.csv',index=False)






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 25 '18 at 1:53









                yoonghmyoonghm

                1,086918




                1,086918






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53463774%2fhow-to-extract-rows-from-datatframe-based-on-column-value-to-multiple-csv-fil%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Lallio

                    Unable to find Lightning Node

                    Futebolista