Convert list of pyodbc.rows to pandas Dataframe takes very long time












1















Is there a faster way to convert pyodbc.rows object to pandas Dataframe? It take about 30-40 minutes to convert a list of 10 million+ pyodbc.rows objects to pandas dataframe.



import pyodbc
import pandas

server = <server_ip>
database = <db_name>
username = <db_user>
password = <password>
port='1443'

conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)

#takes upto 12 minutes
rows = cursor.execute("select top 10000000 * from [LSLTGT].[MBR_DIM] ").fetchall()

#Read cursor data into Pandas dataframe.....Takes forever!
df = pandas.DataFrame([tuple(t) for t in rows])









share|improve this question























  • If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)

    – Owen
    Nov 26 '18 at 17:21











  • @Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.

    – Anjana Shivangi
    Nov 26 '18 at 17:27













  • How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.

    – Owen
    Nov 26 '18 at 17:40











  • @Owen - On SSMS it takes 8:25 minutes to read 10 million records.

    – Anjana Shivangi
    Nov 26 '18 at 18:24











  • Is SSMS running on the same machine as your python code does?

    – Owen
    Nov 27 '18 at 9:19
















1















Is there a faster way to convert pyodbc.rows object to pandas Dataframe? It take about 30-40 minutes to convert a list of 10 million+ pyodbc.rows objects to pandas dataframe.



import pyodbc
import pandas

server = <server_ip>
database = <db_name>
username = <db_user>
password = <password>
port='1443'

conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)

#takes upto 12 minutes
rows = cursor.execute("select top 10000000 * from [LSLTGT].[MBR_DIM] ").fetchall()

#Read cursor data into Pandas dataframe.....Takes forever!
df = pandas.DataFrame([tuple(t) for t in rows])









share|improve this question























  • If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)

    – Owen
    Nov 26 '18 at 17:21











  • @Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.

    – Anjana Shivangi
    Nov 26 '18 at 17:27













  • How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.

    – Owen
    Nov 26 '18 at 17:40











  • @Owen - On SSMS it takes 8:25 minutes to read 10 million records.

    – Anjana Shivangi
    Nov 26 '18 at 18:24











  • Is SSMS running on the same machine as your python code does?

    – Owen
    Nov 27 '18 at 9:19














1












1








1








Is there a faster way to convert pyodbc.rows object to pandas Dataframe? It take about 30-40 minutes to convert a list of 10 million+ pyodbc.rows objects to pandas dataframe.



import pyodbc
import pandas

server = <server_ip>
database = <db_name>
username = <db_user>
password = <password>
port='1443'

conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)

#takes upto 12 minutes
rows = cursor.execute("select top 10000000 * from [LSLTGT].[MBR_DIM] ").fetchall()

#Read cursor data into Pandas dataframe.....Takes forever!
df = pandas.DataFrame([tuple(t) for t in rows])









share|improve this question














Is there a faster way to convert pyodbc.rows object to pandas Dataframe? It take about 30-40 minutes to convert a list of 10 million+ pyodbc.rows objects to pandas dataframe.



import pyodbc
import pandas

server = <server_ip>
database = <db_name>
username = <db_user>
password = <password>
port='1443'

conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)

#takes upto 12 minutes
rows = cursor.execute("select top 10000000 * from [LSLTGT].[MBR_DIM] ").fetchall()

#Read cursor data into Pandas dataframe.....Takes forever!
df = pandas.DataFrame([tuple(t) for t in rows])






python pandas pyodbc






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 26 '18 at 17:16









Anjana ShivangiAnjana Shivangi

6519




6519













  • If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)

    – Owen
    Nov 26 '18 at 17:21











  • @Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.

    – Anjana Shivangi
    Nov 26 '18 at 17:27













  • How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.

    – Owen
    Nov 26 '18 at 17:40











  • @Owen - On SSMS it takes 8:25 minutes to read 10 million records.

    – Anjana Shivangi
    Nov 26 '18 at 18:24











  • Is SSMS running on the same machine as your python code does?

    – Owen
    Nov 27 '18 at 9:19



















  • If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)

    – Owen
    Nov 26 '18 at 17:21











  • @Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.

    – Anjana Shivangi
    Nov 26 '18 at 17:27













  • How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.

    – Owen
    Nov 26 '18 at 17:40











  • @Owen - On SSMS it takes 8:25 minutes to read 10 million records.

    – Anjana Shivangi
    Nov 26 '18 at 18:24











  • Is SSMS running on the same machine as your python code does?

    – Owen
    Nov 27 '18 at 9:19

















If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)

– Owen
Nov 26 '18 at 17:21





If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)

– Owen
Nov 26 '18 at 17:21













@Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.

– Anjana Shivangi
Nov 26 '18 at 17:27







@Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.

– Anjana Shivangi
Nov 26 '18 at 17:27















How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.

– Owen
Nov 26 '18 at 17:40





How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.

– Owen
Nov 26 '18 at 17:40













@Owen - On SSMS it takes 8:25 minutes to read 10 million records.

– Anjana Shivangi
Nov 26 '18 at 18:24





@Owen - On SSMS it takes 8:25 minutes to read 10 million records.

– Anjana Shivangi
Nov 26 '18 at 18:24













Is SSMS running on the same machine as your python code does?

– Owen
Nov 27 '18 at 9:19





Is SSMS running on the same machine as your python code does?

– Owen
Nov 27 '18 at 9:19












1 Answer
1






active

oldest

votes


















0














You might get some improvement by using a generator expression rather than a list comprehension:



df = pandas.DataFrame((tuple(t) for t in rows)) 





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53486051%2fconvert-list-of-pyodbc-rows-to-pandas-dataframe-takes-very-long-time%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    You might get some improvement by using a generator expression rather than a list comprehension:



    df = pandas.DataFrame((tuple(t) for t in rows)) 





    share|improve this answer




























      0














      You might get some improvement by using a generator expression rather than a list comprehension:



      df = pandas.DataFrame((tuple(t) for t in rows)) 





      share|improve this answer


























        0












        0








        0







        You might get some improvement by using a generator expression rather than a list comprehension:



        df = pandas.DataFrame((tuple(t) for t in rows)) 





        share|improve this answer













        You might get some improvement by using a generator expression rather than a list comprehension:



        df = pandas.DataFrame((tuple(t) for t in rows)) 






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 28 '18 at 10:43









        OwenOwen

        3,2541915




        3,2541915
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53486051%2fconvert-list-of-pyodbc-rows-to-pandas-dataframe-takes-very-long-time%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

            Calculate evaluation metrics using cross_val_predict sklearn

            Insert data from modal to MySQL (multiple modal on website)