How long should a query that returns 5 million records take?












12















I realise the answer should probably be 'as little time as possible' but I'm trying to learn how to optimise databases and I have no idea what an acceptable time is for my hardware.



For a start I'm using my local machine with a copy of sql server 2008 express. I have a dual-core processor, 2GB ram and a 64bit OS (if that makes a difference). I'm only using a simple table with about 6 varchar fields.



At first I queried the data without any indexing. This took a ridiculously long amount of time so I cancelled and added a clustered index (using the PK) to the table. This cut the time down to 1 minute 14 sec. I have no idea if this is the best I can get or whether I'm still able to cut this down even further?



Am I limited by my hardware or is there anything else I can do to my table/database/queries to get results faster?



FYI I'm only using a standard SELECT * FROM to retrieve my results.



Thanks!



EDIT: Just to clarify, I'm only doing this for testing purposes. I don't NEED to pull out all the data, I'm just using that as a consistent test to see if I can cut down the query times.



I suppose what I'm asking is: Is there anything I can do to speed up the performance of my queries other than a) upgrading hardware and b) adding indexes (assuming the schema is already good)?










share|improve this question




















  • 5





    FYI you should probably test by inserting into a #TEMP table instead of just selecting it. You will definitely be paying for display overhead for SSMS to show all 5m rows, and it will eat into your memory and IO

    – JNK
    Apr 3 '12 at 13:00






  • 7





    To speed it up, try to avoid using '*' and just ask for the data you want

    – Matt Gibson
    Apr 3 '12 at 13:06











  • +1 for Matt's suggestion of just ask for the data you want... for example, if all you want is the id and name of a row, you can do "select id, name from ..."

    – joelparkerhenderson
    Apr 3 '12 at 13:26
















12















I realise the answer should probably be 'as little time as possible' but I'm trying to learn how to optimise databases and I have no idea what an acceptable time is for my hardware.



For a start I'm using my local machine with a copy of sql server 2008 express. I have a dual-core processor, 2GB ram and a 64bit OS (if that makes a difference). I'm only using a simple table with about 6 varchar fields.



At first I queried the data without any indexing. This took a ridiculously long amount of time so I cancelled and added a clustered index (using the PK) to the table. This cut the time down to 1 minute 14 sec. I have no idea if this is the best I can get or whether I'm still able to cut this down even further?



Am I limited by my hardware or is there anything else I can do to my table/database/queries to get results faster?



FYI I'm only using a standard SELECT * FROM to retrieve my results.



Thanks!



EDIT: Just to clarify, I'm only doing this for testing purposes. I don't NEED to pull out all the data, I'm just using that as a consistent test to see if I can cut down the query times.



I suppose what I'm asking is: Is there anything I can do to speed up the performance of my queries other than a) upgrading hardware and b) adding indexes (assuming the schema is already good)?










share|improve this question




















  • 5





    FYI you should probably test by inserting into a #TEMP table instead of just selecting it. You will definitely be paying for display overhead for SSMS to show all 5m rows, and it will eat into your memory and IO

    – JNK
    Apr 3 '12 at 13:00






  • 7





    To speed it up, try to avoid using '*' and just ask for the data you want

    – Matt Gibson
    Apr 3 '12 at 13:06











  • +1 for Matt's suggestion of just ask for the data you want... for example, if all you want is the id and name of a row, you can do "select id, name from ..."

    – joelparkerhenderson
    Apr 3 '12 at 13:26














12












12








12


3






I realise the answer should probably be 'as little time as possible' but I'm trying to learn how to optimise databases and I have no idea what an acceptable time is for my hardware.



For a start I'm using my local machine with a copy of sql server 2008 express. I have a dual-core processor, 2GB ram and a 64bit OS (if that makes a difference). I'm only using a simple table with about 6 varchar fields.



At first I queried the data without any indexing. This took a ridiculously long amount of time so I cancelled and added a clustered index (using the PK) to the table. This cut the time down to 1 minute 14 sec. I have no idea if this is the best I can get or whether I'm still able to cut this down even further?



Am I limited by my hardware or is there anything else I can do to my table/database/queries to get results faster?



FYI I'm only using a standard SELECT * FROM to retrieve my results.



Thanks!



EDIT: Just to clarify, I'm only doing this for testing purposes. I don't NEED to pull out all the data, I'm just using that as a consistent test to see if I can cut down the query times.



I suppose what I'm asking is: Is there anything I can do to speed up the performance of my queries other than a) upgrading hardware and b) adding indexes (assuming the schema is already good)?










share|improve this question
















I realise the answer should probably be 'as little time as possible' but I'm trying to learn how to optimise databases and I have no idea what an acceptable time is for my hardware.



For a start I'm using my local machine with a copy of sql server 2008 express. I have a dual-core processor, 2GB ram and a 64bit OS (if that makes a difference). I'm only using a simple table with about 6 varchar fields.



At first I queried the data without any indexing. This took a ridiculously long amount of time so I cancelled and added a clustered index (using the PK) to the table. This cut the time down to 1 minute 14 sec. I have no idea if this is the best I can get or whether I'm still able to cut this down even further?



Am I limited by my hardware or is there anything else I can do to my table/database/queries to get results faster?



FYI I'm only using a standard SELECT * FROM to retrieve my results.



Thanks!



EDIT: Just to clarify, I'm only doing this for testing purposes. I don't NEED to pull out all the data, I'm just using that as a consistent test to see if I can cut down the query times.



I suppose what I'm asking is: Is there anything I can do to speed up the performance of my queries other than a) upgrading hardware and b) adding indexes (assuming the schema is already good)?







sql sql-server database sql-server-2008






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 3 '12 at 16:45









VMAtm

23.7k166178




23.7k166178










asked Apr 3 '12 at 12:58









alimac83alimac83

1,22952444




1,22952444








  • 5





    FYI you should probably test by inserting into a #TEMP table instead of just selecting it. You will definitely be paying for display overhead for SSMS to show all 5m rows, and it will eat into your memory and IO

    – JNK
    Apr 3 '12 at 13:00






  • 7





    To speed it up, try to avoid using '*' and just ask for the data you want

    – Matt Gibson
    Apr 3 '12 at 13:06











  • +1 for Matt's suggestion of just ask for the data you want... for example, if all you want is the id and name of a row, you can do "select id, name from ..."

    – joelparkerhenderson
    Apr 3 '12 at 13:26














  • 5





    FYI you should probably test by inserting into a #TEMP table instead of just selecting it. You will definitely be paying for display overhead for SSMS to show all 5m rows, and it will eat into your memory and IO

    – JNK
    Apr 3 '12 at 13:00






  • 7





    To speed it up, try to avoid using '*' and just ask for the data you want

    – Matt Gibson
    Apr 3 '12 at 13:06











  • +1 for Matt's suggestion of just ask for the data you want... for example, if all you want is the id and name of a row, you can do "select id, name from ..."

    – joelparkerhenderson
    Apr 3 '12 at 13:26








5




5





FYI you should probably test by inserting into a #TEMP table instead of just selecting it. You will definitely be paying for display overhead for SSMS to show all 5m rows, and it will eat into your memory and IO

– JNK
Apr 3 '12 at 13:00





FYI you should probably test by inserting into a #TEMP table instead of just selecting it. You will definitely be paying for display overhead for SSMS to show all 5m rows, and it will eat into your memory and IO

– JNK
Apr 3 '12 at 13:00




7




7





To speed it up, try to avoid using '*' and just ask for the data you want

– Matt Gibson
Apr 3 '12 at 13:06





To speed it up, try to avoid using '*' and just ask for the data you want

– Matt Gibson
Apr 3 '12 at 13:06













+1 for Matt's suggestion of just ask for the data you want... for example, if all you want is the id and name of a row, you can do "select id, name from ..."

– joelparkerhenderson
Apr 3 '12 at 13:26





+1 for Matt's suggestion of just ask for the data you want... for example, if all you want is the id and name of a row, you can do "select id, name from ..."

– joelparkerhenderson
Apr 3 '12 at 13:26












5 Answers
5






active

oldest

votes


















9














I think you are asking the wrong question.



First of all - why do you need so many articles at one time at the local machine? What do you want to do with them?



Why I'm asking? I think this amount of data will be transfered to somewhere. And only at this time you should measure the time of transfering the data.



And even in this situation I want to make advice to you:



Your applications should not select 5 million records at the time. Try to split your query, and get data partially.



UPDATE:



As you say are doing this for testing, I suggest you to:




  1. Remove * from your query - SQL server spends some time to resolve this.

  2. Try to put your data to some temp data storage. Try to use VIEW or temp table for this.

  3. Try to use some cache plan on your server


But I still don't understand - why do you need such tests if your application would not ever use such query? Testing only for testing is bad time spending.






share|improve this answer


























  • And I would add that optimizing for issues found in those tests, which I assume is the reason for running them, would be an even bigger waste of time.

    – Samuel Lindblom
    Sep 30 '13 at 9:03



















2














Look at the query execution plan. If your query is doing a table scan, it will obviously take a long time. The query execution plan can help you decide what kind of indexing you would need on the table. Also, creating table partitions can help sometimes in cases where the data is partitioned by a condition (usually date and time).






share|improve this answer































    1














    The best optimized way depends on the indexing strategy you choose. As many of the above answers, i too would say partitioning the table would help sometimes. And its not the best practise to query all the billion record in a single time frame. Will give you much bettered resuld if you could try to query partially with the iterations. you may check this link to clear the doubts on the minimum requirements for the Sql server 2008 Minimum H/W and S/W Requirements for Sql server 2008






    share|improve this answer































      0














      When fecthing 5 million rows you are almost 100% going spool to tempdb. you should try to optimize your temp Db by adding additional files. if you have multiple drives on seperate disks you should split the table data into different ndf files located on seperate disks. parititioning wont help when querying all the data on the disk
      U can also use a query hint to force parrallelism MAXDOP this will increase the CPU utilization. Ensure that the columns contain few nulls as possible and rebuild ur indexes and stats






      share|improve this answer































        0














        I did 5.5 million in 20 seconds. That's taking over 100k schedules with different frequencies and forecasting them for the next 25 years. Just max scenario testing, but proves the speed you can achieve in a scheduling system as an example.






        share|improve this answer























          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f9993761%2fhow-long-should-a-query-that-returns-5-million-records-take%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          5 Answers
          5






          active

          oldest

          votes








          5 Answers
          5






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          9














          I think you are asking the wrong question.



          First of all - why do you need so many articles at one time at the local machine? What do you want to do with them?



          Why I'm asking? I think this amount of data will be transfered to somewhere. And only at this time you should measure the time of transfering the data.



          And even in this situation I want to make advice to you:



          Your applications should not select 5 million records at the time. Try to split your query, and get data partially.



          UPDATE:



          As you say are doing this for testing, I suggest you to:




          1. Remove * from your query - SQL server spends some time to resolve this.

          2. Try to put your data to some temp data storage. Try to use VIEW or temp table for this.

          3. Try to use some cache plan on your server


          But I still don't understand - why do you need such tests if your application would not ever use such query? Testing only for testing is bad time spending.






          share|improve this answer


























          • And I would add that optimizing for issues found in those tests, which I assume is the reason for running them, would be an even bigger waste of time.

            – Samuel Lindblom
            Sep 30 '13 at 9:03
















          9














          I think you are asking the wrong question.



          First of all - why do you need so many articles at one time at the local machine? What do you want to do with them?



          Why I'm asking? I think this amount of data will be transfered to somewhere. And only at this time you should measure the time of transfering the data.



          And even in this situation I want to make advice to you:



          Your applications should not select 5 million records at the time. Try to split your query, and get data partially.



          UPDATE:



          As you say are doing this for testing, I suggest you to:




          1. Remove * from your query - SQL server spends some time to resolve this.

          2. Try to put your data to some temp data storage. Try to use VIEW or temp table for this.

          3. Try to use some cache plan on your server


          But I still don't understand - why do you need such tests if your application would not ever use such query? Testing only for testing is bad time spending.






          share|improve this answer


























          • And I would add that optimizing for issues found in those tests, which I assume is the reason for running them, would be an even bigger waste of time.

            – Samuel Lindblom
            Sep 30 '13 at 9:03














          9












          9








          9







          I think you are asking the wrong question.



          First of all - why do you need so many articles at one time at the local machine? What do you want to do with them?



          Why I'm asking? I think this amount of data will be transfered to somewhere. And only at this time you should measure the time of transfering the data.



          And even in this situation I want to make advice to you:



          Your applications should not select 5 million records at the time. Try to split your query, and get data partially.



          UPDATE:



          As you say are doing this for testing, I suggest you to:




          1. Remove * from your query - SQL server spends some time to resolve this.

          2. Try to put your data to some temp data storage. Try to use VIEW or temp table for this.

          3. Try to use some cache plan on your server


          But I still don't understand - why do you need such tests if your application would not ever use such query? Testing only for testing is bad time spending.






          share|improve this answer















          I think you are asking the wrong question.



          First of all - why do you need so many articles at one time at the local machine? What do you want to do with them?



          Why I'm asking? I think this amount of data will be transfered to somewhere. And only at this time you should measure the time of transfering the data.



          And even in this situation I want to make advice to you:



          Your applications should not select 5 million records at the time. Try to split your query, and get data partially.



          UPDATE:



          As you say are doing this for testing, I suggest you to:




          1. Remove * from your query - SQL server spends some time to resolve this.

          2. Try to put your data to some temp data storage. Try to use VIEW or temp table for this.

          3. Try to use some cache plan on your server


          But I still don't understand - why do you need such tests if your application would not ever use such query? Testing only for testing is bad time spending.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 13 '12 at 7:19

























          answered Apr 3 '12 at 13:02









          VMAtmVMAtm

          23.7k166178




          23.7k166178













          • And I would add that optimizing for issues found in those tests, which I assume is the reason for running them, would be an even bigger waste of time.

            – Samuel Lindblom
            Sep 30 '13 at 9:03



















          • And I would add that optimizing for issues found in those tests, which I assume is the reason for running them, would be an even bigger waste of time.

            – Samuel Lindblom
            Sep 30 '13 at 9:03

















          And I would add that optimizing for issues found in those tests, which I assume is the reason for running them, would be an even bigger waste of time.

          – Samuel Lindblom
          Sep 30 '13 at 9:03





          And I would add that optimizing for issues found in those tests, which I assume is the reason for running them, would be an even bigger waste of time.

          – Samuel Lindblom
          Sep 30 '13 at 9:03













          2














          Look at the query execution plan. If your query is doing a table scan, it will obviously take a long time. The query execution plan can help you decide what kind of indexing you would need on the table. Also, creating table partitions can help sometimes in cases where the data is partitioned by a condition (usually date and time).






          share|improve this answer




























            2














            Look at the query execution plan. If your query is doing a table scan, it will obviously take a long time. The query execution plan can help you decide what kind of indexing you would need on the table. Also, creating table partitions can help sometimes in cases where the data is partitioned by a condition (usually date and time).






            share|improve this answer


























              2












              2








              2







              Look at the query execution plan. If your query is doing a table scan, it will obviously take a long time. The query execution plan can help you decide what kind of indexing you would need on the table. Also, creating table partitions can help sometimes in cases where the data is partitioned by a condition (usually date and time).






              share|improve this answer













              Look at the query execution plan. If your query is doing a table scan, it will obviously take a long time. The query execution plan can help you decide what kind of indexing you would need on the table. Also, creating table partitions can help sometimes in cases where the data is partitioned by a condition (usually date and time).







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Apr 3 '12 at 13:39









              rvphxrvphx

              1,61862757




              1,61862757























                  1














                  The best optimized way depends on the indexing strategy you choose. As many of the above answers, i too would say partitioning the table would help sometimes. And its not the best practise to query all the billion record in a single time frame. Will give you much bettered resuld if you could try to query partially with the iterations. you may check this link to clear the doubts on the minimum requirements for the Sql server 2008 Minimum H/W and S/W Requirements for Sql server 2008






                  share|improve this answer




























                    1














                    The best optimized way depends on the indexing strategy you choose. As many of the above answers, i too would say partitioning the table would help sometimes. And its not the best practise to query all the billion record in a single time frame. Will give you much bettered resuld if you could try to query partially with the iterations. you may check this link to clear the doubts on the minimum requirements for the Sql server 2008 Minimum H/W and S/W Requirements for Sql server 2008






                    share|improve this answer


























                      1












                      1








                      1







                      The best optimized way depends on the indexing strategy you choose. As many of the above answers, i too would say partitioning the table would help sometimes. And its not the best practise to query all the billion record in a single time frame. Will give you much bettered resuld if you could try to query partially with the iterations. you may check this link to clear the doubts on the minimum requirements for the Sql server 2008 Minimum H/W and S/W Requirements for Sql server 2008






                      share|improve this answer













                      The best optimized way depends on the indexing strategy you choose. As many of the above answers, i too would say partitioning the table would help sometimes. And its not the best practise to query all the billion record in a single time frame. Will give you much bettered resuld if you could try to query partially with the iterations. you may check this link to clear the doubts on the minimum requirements for the Sql server 2008 Minimum H/W and S/W Requirements for Sql server 2008







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Apr 12 '12 at 7:19









                      user824910user824910

                      47251332




                      47251332























                          0














                          When fecthing 5 million rows you are almost 100% going spool to tempdb. you should try to optimize your temp Db by adding additional files. if you have multiple drives on seperate disks you should split the table data into different ndf files located on seperate disks. parititioning wont help when querying all the data on the disk
                          U can also use a query hint to force parrallelism MAXDOP this will increase the CPU utilization. Ensure that the columns contain few nulls as possible and rebuild ur indexes and stats






                          share|improve this answer




























                            0














                            When fecthing 5 million rows you are almost 100% going spool to tempdb. you should try to optimize your temp Db by adding additional files. if you have multiple drives on seperate disks you should split the table data into different ndf files located on seperate disks. parititioning wont help when querying all the data on the disk
                            U can also use a query hint to force parrallelism MAXDOP this will increase the CPU utilization. Ensure that the columns contain few nulls as possible and rebuild ur indexes and stats






                            share|improve this answer


























                              0












                              0








                              0







                              When fecthing 5 million rows you are almost 100% going spool to tempdb. you should try to optimize your temp Db by adding additional files. if you have multiple drives on seperate disks you should split the table data into different ndf files located on seperate disks. parititioning wont help when querying all the data on the disk
                              U can also use a query hint to force parrallelism MAXDOP this will increase the CPU utilization. Ensure that the columns contain few nulls as possible and rebuild ur indexes and stats






                              share|improve this answer













                              When fecthing 5 million rows you are almost 100% going spool to tempdb. you should try to optimize your temp Db by adding additional files. if you have multiple drives on seperate disks you should split the table data into different ndf files located on seperate disks. parititioning wont help when querying all the data on the disk
                              U can also use a query hint to force parrallelism MAXDOP this will increase the CPU utilization. Ensure that the columns contain few nulls as possible and rebuild ur indexes and stats







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Feb 7 '14 at 13:58









                              Jayanth KurupJayanth Kurup

                              1




                              1























                                  0














                                  I did 5.5 million in 20 seconds. That's taking over 100k schedules with different frequencies and forecasting them for the next 25 years. Just max scenario testing, but proves the speed you can achieve in a scheduling system as an example.






                                  share|improve this answer




























                                    0














                                    I did 5.5 million in 20 seconds. That's taking over 100k schedules with different frequencies and forecasting them for the next 25 years. Just max scenario testing, but proves the speed you can achieve in a scheduling system as an example.






                                    share|improve this answer


























                                      0












                                      0








                                      0







                                      I did 5.5 million in 20 seconds. That's taking over 100k schedules with different frequencies and forecasting them for the next 25 years. Just max scenario testing, but proves the speed you can achieve in a scheduling system as an example.






                                      share|improve this answer













                                      I did 5.5 million in 20 seconds. That's taking over 100k schedules with different frequencies and forecasting them for the next 25 years. Just max scenario testing, but proves the speed you can achieve in a scheduling system as an example.







                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Nov 25 '18 at 19:16









                                      AndyAndy

                                      1




                                      1






























                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f9993761%2fhow-long-should-a-query-that-returns-5-million-records-take%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          Lallio

                                          Futebolista

                                          Jornalista