Convert list of pyodbc.rows to pandas Dataframe takes very long time
Is there a faster way to convert pyodbc.rows object to pandas Dataframe? It take about 30-40 minutes to convert a list of 10 million+ pyodbc.rows objects to pandas dataframe.
import pyodbc
import pandas
server = <server_ip>
database = <db_name>
username = <db_user>
password = <password>
port='1443'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)
#takes upto 12 minutes
rows = cursor.execute("select top 10000000 * from [LSLTGT].[MBR_DIM] ").fetchall()
#Read cursor data into Pandas dataframe.....Takes forever!
df = pandas.DataFrame([tuple(t) for t in rows])
python pandas pyodbc
|
show 1 more comment
Is there a faster way to convert pyodbc.rows object to pandas Dataframe? It take about 30-40 minutes to convert a list of 10 million+ pyodbc.rows objects to pandas dataframe.
import pyodbc
import pandas
server = <server_ip>
database = <db_name>
username = <db_user>
password = <password>
port='1443'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)
#takes upto 12 minutes
rows = cursor.execute("select top 10000000 * from [LSLTGT].[MBR_DIM] ").fetchall()
#Read cursor data into Pandas dataframe.....Takes forever!
df = pandas.DataFrame([tuple(t) for t in rows])
python pandas pyodbc
If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)
– Owen
Nov 26 '18 at 17:21
@Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.
– Anjana Shivangi
Nov 26 '18 at 17:27
How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.
– Owen
Nov 26 '18 at 17:40
@Owen - On SSMS it takes 8:25 minutes to read 10 million records.
– Anjana Shivangi
Nov 26 '18 at 18:24
Is SSMS running on the same machine as your python code does?
– Owen
Nov 27 '18 at 9:19
|
show 1 more comment
Is there a faster way to convert pyodbc.rows object to pandas Dataframe? It take about 30-40 minutes to convert a list of 10 million+ pyodbc.rows objects to pandas dataframe.
import pyodbc
import pandas
server = <server_ip>
database = <db_name>
username = <db_user>
password = <password>
port='1443'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)
#takes upto 12 minutes
rows = cursor.execute("select top 10000000 * from [LSLTGT].[MBR_DIM] ").fetchall()
#Read cursor data into Pandas dataframe.....Takes forever!
df = pandas.DataFrame([tuple(t) for t in rows])
python pandas pyodbc
Is there a faster way to convert pyodbc.rows object to pandas Dataframe? It take about 30-40 minutes to convert a list of 10 million+ pyodbc.rows objects to pandas dataframe.
import pyodbc
import pandas
server = <server_ip>
database = <db_name>
username = <db_user>
password = <password>
port='1443'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)
#takes upto 12 minutes
rows = cursor.execute("select top 10000000 * from [LSLTGT].[MBR_DIM] ").fetchall()
#Read cursor data into Pandas dataframe.....Takes forever!
df = pandas.DataFrame([tuple(t) for t in rows])
python pandas pyodbc
python pandas pyodbc
asked Nov 26 '18 at 17:16
Anjana ShivangiAnjana Shivangi
6519
6519
If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)
– Owen
Nov 26 '18 at 17:21
@Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.
– Anjana Shivangi
Nov 26 '18 at 17:27
How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.
– Owen
Nov 26 '18 at 17:40
@Owen - On SSMS it takes 8:25 minutes to read 10 million records.
– Anjana Shivangi
Nov 26 '18 at 18:24
Is SSMS running on the same machine as your python code does?
– Owen
Nov 27 '18 at 9:19
|
show 1 more comment
If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)
– Owen
Nov 26 '18 at 17:21
@Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.
– Anjana Shivangi
Nov 26 '18 at 17:27
How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.
– Owen
Nov 26 '18 at 17:40
@Owen - On SSMS it takes 8:25 minutes to read 10 million records.
– Anjana Shivangi
Nov 26 '18 at 18:24
Is SSMS running on the same machine as your python code does?
– Owen
Nov 27 '18 at 9:19
If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)
– Owen
Nov 26 '18 at 17:21
If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)
– Owen
Nov 26 '18 at 17:21
@Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.
– Anjana Shivangi
Nov 26 '18 at 17:27
@Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.
– Anjana Shivangi
Nov 26 '18 at 17:27
How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.
– Owen
Nov 26 '18 at 17:40
How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.
– Owen
Nov 26 '18 at 17:40
@Owen - On SSMS it takes 8:25 minutes to read 10 million records.
– Anjana Shivangi
Nov 26 '18 at 18:24
@Owen - On SSMS it takes 8:25 minutes to read 10 million records.
– Anjana Shivangi
Nov 26 '18 at 18:24
Is SSMS running on the same machine as your python code does?
– Owen
Nov 27 '18 at 9:19
Is SSMS running on the same machine as your python code does?
– Owen
Nov 27 '18 at 9:19
|
show 1 more comment
1 Answer
1
active
oldest
votes
You might get some improvement by using a generator expression rather than a list comprehension:
df = pandas.DataFrame((tuple(t) for t in rows))
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53486051%2fconvert-list-of-pyodbc-rows-to-pandas-dataframe-takes-very-long-time%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You might get some improvement by using a generator expression rather than a list comprehension:
df = pandas.DataFrame((tuple(t) for t in rows))
add a comment |
You might get some improvement by using a generator expression rather than a list comprehension:
df = pandas.DataFrame((tuple(t) for t in rows))
add a comment |
You might get some improvement by using a generator expression rather than a list comprehension:
df = pandas.DataFrame((tuple(t) for t in rows))
You might get some improvement by using a generator expression rather than a list comprehension:
df = pandas.DataFrame((tuple(t) for t in rows))
answered Nov 28 '18 at 10:43
OwenOwen
3,2541915
3,2541915
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53486051%2fconvert-list-of-pyodbc-rows-to-pandas-dataframe-takes-very-long-time%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If you are able to use sqlalchemy, you could look at pandas.read_sql (pandas.pydata.org/pandas-docs/stable/generated/…)
– Owen
Nov 26 '18 at 17:21
@Owen That was my previous issue. I tried using pandas.read_sql and it takes a very long time to read all the data. Please See link I am trying to find a faster way to load data from SQL server to pandas Dataframe Just Once and then I plan to store the df into feathers format for subsequent faster reads.
– Anjana Shivangi
Nov 26 '18 at 17:27
How long does that query take to execute in Management Studio? My guess is that pandas is not the problem here.
– Owen
Nov 26 '18 at 17:40
@Owen - On SSMS it takes 8:25 minutes to read 10 million records.
– Anjana Shivangi
Nov 26 '18 at 18:24
Is SSMS running on the same machine as your python code does?
– Owen
Nov 27 '18 at 9:19