pandas - read_excel efficiency on multiple large sheets












0















I have an Excel workbook with multiple sheets. Some contain lots of data (f.e. 6000000 cells), and some do not. I'm attempting to read one of the sheets that's significantly smaller, a simple 2 column - 500 row sheet using the following line of code:



df = pd.read_excel('C:/Data.xlsx', sheetname='Contracts')


However, this read takes an incredible amount of time, whereas the sheet standalone in an Excel does not. Is there a reason for this?










share|improve this question





























    0















    I have an Excel workbook with multiple sheets. Some contain lots of data (f.e. 6000000 cells), and some do not. I'm attempting to read one of the sheets that's significantly smaller, a simple 2 column - 500 row sheet using the following line of code:



    df = pd.read_excel('C:/Data.xlsx', sheetname='Contracts')


    However, this read takes an incredible amount of time, whereas the sheet standalone in an Excel does not. Is there a reason for this?










    share|improve this question



























      0












      0








      0








      I have an Excel workbook with multiple sheets. Some contain lots of data (f.e. 6000000 cells), and some do not. I'm attempting to read one of the sheets that's significantly smaller, a simple 2 column - 500 row sheet using the following line of code:



      df = pd.read_excel('C:/Data.xlsx', sheetname='Contracts')


      However, this read takes an incredible amount of time, whereas the sheet standalone in an Excel does not. Is there a reason for this?










      share|improve this question
















      I have an Excel workbook with multiple sheets. Some contain lots of data (f.e. 6000000 cells), and some do not. I'm attempting to read one of the sheets that's significantly smaller, a simple 2 column - 500 row sheet using the following line of code:



      df = pd.read_excel('C:/Data.xlsx', sheetname='Contracts')


      However, this read takes an incredible amount of time, whereas the sheet standalone in an Excel does not. Is there a reason for this?







      python excel pandas






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 28 '18 at 0:08







      Évariste Galois

















      asked Nov 27 '18 at 23:21









      Évariste GaloisÉvariste Galois

      347213




      347213
























          1 Answer
          1






          active

          oldest

          votes


















          1














          I tried to look at the API to help on how the function works for processing it but didn't come up with anything big. Few things of note:



          1) assuming you are using 0.21.0 on wards you want to use sheet_name instead of sheet name



          2) according to: https://realpython.com/working-with-large-excel-files-in-pandas/ the speed of pandas process directly correlates to your system ram.



          3) the read_excel function opens the entire excel file and then selects the specific sheet making you load those super long sheets as well. You can test for this by just making the short sheet into a separate excel file and then running the read_excel on your new file.



          Hope this helps






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53509744%2fpandas-read-excel-efficiency-on-multiple-large-sheets%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            I tried to look at the API to help on how the function works for processing it but didn't come up with anything big. Few things of note:



            1) assuming you are using 0.21.0 on wards you want to use sheet_name instead of sheet name



            2) according to: https://realpython.com/working-with-large-excel-files-in-pandas/ the speed of pandas process directly correlates to your system ram.



            3) the read_excel function opens the entire excel file and then selects the specific sheet making you load those super long sheets as well. You can test for this by just making the short sheet into a separate excel file and then running the read_excel on your new file.



            Hope this helps






            share|improve this answer




























              1














              I tried to look at the API to help on how the function works for processing it but didn't come up with anything big. Few things of note:



              1) assuming you are using 0.21.0 on wards you want to use sheet_name instead of sheet name



              2) according to: https://realpython.com/working-with-large-excel-files-in-pandas/ the speed of pandas process directly correlates to your system ram.



              3) the read_excel function opens the entire excel file and then selects the specific sheet making you load those super long sheets as well. You can test for this by just making the short sheet into a separate excel file and then running the read_excel on your new file.



              Hope this helps






              share|improve this answer


























                1












                1








                1







                I tried to look at the API to help on how the function works for processing it but didn't come up with anything big. Few things of note:



                1) assuming you are using 0.21.0 on wards you want to use sheet_name instead of sheet name



                2) according to: https://realpython.com/working-with-large-excel-files-in-pandas/ the speed of pandas process directly correlates to your system ram.



                3) the read_excel function opens the entire excel file and then selects the specific sheet making you load those super long sheets as well. You can test for this by just making the short sheet into a separate excel file and then running the read_excel on your new file.



                Hope this helps






                share|improve this answer













                I tried to look at the API to help on how the function works for processing it but didn't come up with anything big. Few things of note:



                1) assuming you are using 0.21.0 on wards you want to use sheet_name instead of sheet name



                2) according to: https://realpython.com/working-with-large-excel-files-in-pandas/ the speed of pandas process directly correlates to your system ram.



                3) the read_excel function opens the entire excel file and then selects the specific sheet making you load those super long sheets as well. You can test for this by just making the short sheet into a separate excel file and then running the read_excel on your new file.



                Hope this helps







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 28 '18 at 2:51









                Jay PatelJay Patel

                261




                261
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53509744%2fpandas-read-excel-efficiency-on-multiple-large-sheets%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                    Calculate evaluation metrics using cross_val_predict sklearn

                    Insert data from modal to MySQL (multiple modal on website)