Convert list of pyodbc.rows to pandas Dataframe takes very long time

Is there a faster way to convert pyodbc.rows object to pandas Dataframe? It take about 30-40 minutes to convert a list of 10 million+ pyodbc.rows objects to pandas dataframe.

import pyodbc

import pandas



server = <server_ip> 

database = <db_name> 

username = <db_user> 

password = <password> 

port='1443'



conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)



#takes upto 12 minutes

rows = cursor.execute("select top 10000000 * from [LSLTGT].[MBR_DIM] ").fetchall() 



#Read cursor data into Pandas dataframe.....Takes forever!

df = pandas.DataFrame([tuple(t) for t in rows])

asked Nov 26 '18 at 17:16

Anjana Shivangi

6519

If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)

– Owen
Nov 26 '18 at 17:21

@Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.

– Anjana Shivangi
Nov 26 '18 at 17:27

How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.

– Owen
Nov 26 '18 at 17:40

@Owen - On SSMS it takes 8:25 minutes to read 10 million records.

– Anjana Shivangi
Nov 26 '18 at 18:24

Is SSMS running on the same machine as your python code does?

– Owen
Nov 27 '18 at 9:19

|
show 1 more comment

Is there a faster way to convert pyodbc.rows object to pandas Dataframe? It take about 30-40 minutes to convert a list of 10 million+ pyodbc.rows objects to pandas dataframe.

import pyodbc

import pandas



server = <server_ip> 

database = <db_name> 

username = <db_user> 

password = <password> 

port='1443'



conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)



#takes upto 12 minutes

rows = cursor.execute("select top 10000000 * from [LSLTGT].[MBR_DIM] ").fetchall() 



#Read cursor data into Pandas dataframe.....Takes forever!

df = pandas.DataFrame([tuple(t) for t in rows])

asked Nov 26 '18 at 17:16

Anjana Shivangi

6519

If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)

– Owen
Nov 26 '18 at 17:21

@Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.

– Anjana Shivangi
Nov 26 '18 at 17:27

How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.

– Owen
Nov 26 '18 at 17:40

@Owen - On SSMS it takes 8:25 minutes to read 10 million records.

– Anjana Shivangi
Nov 26 '18 at 18:24

Is SSMS running on the same machine as your python code does?

– Owen
Nov 27 '18 at 9:19

|
show 1 more comment

Is there a faster way to convert pyodbc.rows object to pandas Dataframe? It take about 30-40 minutes to convert a list of 10 million+ pyodbc.rows objects to pandas dataframe.

import pyodbc

import pandas



server = <server_ip> 

database = <db_name> 

username = <db_user> 

password = <password> 

port='1443'



conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)



#takes upto 12 minutes

rows = cursor.execute("select top 10000000 * from [LSLTGT].[MBR_DIM] ").fetchall() 



#Read cursor data into Pandas dataframe.....Takes forever!

df = pandas.DataFrame([tuple(t) for t in rows])

asked Nov 26 '18 at 17:16

Anjana Shivangi

6519

Is there a faster way to convert pyodbc.rows object to pandas Dataframe? It take about 30-40 minutes to convert a list of 10 million+ pyodbc.rows objects to pandas dataframe.

import pyodbc

import pandas



server = <server_ip> 

database = <db_name> 

username = <db_user> 

password = <password> 

port='1443'



conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)



#takes upto 12 minutes

rows = cursor.execute("select top 10000000 * from [LSLTGT].[MBR_DIM] ").fetchall() 



#Read cursor data into Pandas dataframe.....Takes forever!

df = pandas.DataFrame([tuple(t) for t in rows])

python pandas pyodbc

asked Nov 26 '18 at 17:16

Anjana Shivangi

6519

asked Nov 26 '18 at 17:16

Anjana Shivangi

6519

asked Nov 26 '18 at 17:16

Anjana Shivangi

6519

asked Nov 26 '18 at 17:16

Anjana Shivangi

6519

asked Nov 26 '18 at 17:16

Anjana Shivangi

6519

If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)

– Owen
Nov 26 '18 at 17:21

@Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.

– Anjana Shivangi
Nov 26 '18 at 17:27

How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.

– Owen
Nov 26 '18 at 17:40

@Owen - On SSMS it takes 8:25 minutes to read 10 million records.

– Anjana Shivangi
Nov 26 '18 at 18:24

Is SSMS running on the same machine as your python code does?

– Owen
Nov 27 '18 at 9:19

|
show 1 more comment

If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)

– Owen
Nov 26 '18 at 17:21

@Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.

– Anjana Shivangi
Nov 26 '18 at 17:27

How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.

– Owen
Nov 26 '18 at 17:40

@Owen - On SSMS it takes 8:25 minutes to read 10 million records.

– Anjana Shivangi
Nov 26 '18 at 18:24

Is SSMS running on the same machine as your python code does?

– Owen
Nov 27 '18 at 9:19

If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)

– Owen
Nov 26 '18 at 17:21

@Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.

– Anjana Shivangi
Nov 26 '18 at 17:27

How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.

– Owen
Nov 26 '18 at 17:40

@Owen - On SSMS it takes 8:25 minutes to read 10 million records.

– Anjana Shivangi
Nov 26 '18 at 18:24

Is SSMS running on the same machine as your python code does?

– Owen
Nov 27 '18 at 9:19

|
show 1 more comment

1 Answer
1

active

oldest

votes

You might get some improvement by using a generator expression rather than a list comprehension:

df = pandas.DataFrame((tuple(t) for t in rows))

answered Nov 28 '18 at 10:43

Owen

3,2541915

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53486051%2fconvert-list-of-pyodbc-rows-to-pandas-dataframe-takes-very-long-time%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You might get some improvement by using a generator expression rather than a list comprehension:

df = pandas.DataFrame((tuple(t) for t in rows))

answered Nov 28 '18 at 10:43

Owen

3,2541915

add a comment |

You might get some improvement by using a generator expression rather than a list comprehension:

df = pandas.DataFrame((tuple(t) for t in rows))

answered Nov 28 '18 at 10:43

Owen

3,2541915

add a comment |

You might get some improvement by using a generator expression rather than a list comprehension:

df = pandas.DataFrame((tuple(t) for t in rows))

answered Nov 28 '18 at 10:43

Owen

3,2541915

You might get some improvement by using a generator expression rather than a list comprehension:

df = pandas.DataFrame((tuple(t) for t in rows))

answered Nov 28 '18 at 10:43

Owen

3,2541915

answered Nov 28 '18 at 10:43

Owen

3,2541915

answered Nov 28 '18 at 10:43

Owen

3,2541915

answered Nov 28 '18 at 10:43

Owen

3,2541915

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

GLUu95KAa6fTo3wQjJ1 dOhnuygap83A3brrDPS

搜尋此網誌

Btukfyl