Puppeteer get information about page loaded - list of files loaded and their sizes












1














I am wondering if it's possible to list of all the files loaded for a web page loaded through Google's Puppeteer e.g scripts, styles (not including inline), images, videos, audio. I need to get list of the files plus their respective sizes. Is that possible and if not is there some kind of software (e.g npm package) that might do that?



Searching Google and NPM results in nothing like what I need.










share|improve this question



























    1














    I am wondering if it's possible to list of all the files loaded for a web page loaded through Google's Puppeteer e.g scripts, styles (not including inline), images, videos, audio. I need to get list of the files plus their respective sizes. Is that possible and if not is there some kind of software (e.g npm package) that might do that?



    Searching Google and NPM results in nothing like what I need.










    share|improve this question

























      1












      1








      1


      1





      I am wondering if it's possible to list of all the files loaded for a web page loaded through Google's Puppeteer e.g scripts, styles (not including inline), images, videos, audio. I need to get list of the files plus their respective sizes. Is that possible and if not is there some kind of software (e.g npm package) that might do that?



      Searching Google and NPM results in nothing like what I need.










      share|improve this question













      I am wondering if it's possible to list of all the files loaded for a web page loaded through Google's Puppeteer e.g scripts, styles (not including inline), images, videos, audio. I need to get list of the files plus their respective sizes. Is that possible and if not is there some kind of software (e.g npm package) that might do that?



      Searching Google and NPM results in nothing like what I need.







      javascript node.js npm puppeteer






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Oct 24 '18 at 10:54









      Petar VasilevPetar Vasilev

      1,58432042




      1,58432042
























          2 Answers
          2






          active

          oldest

          votes


















          2














          Page assets are not stored on disk, they are held in browser memory and sometimes cached, so it's impossible to know their sizes.



          What you want to look at is web scraping, which can be done with modules like node-website-scraper or with puppeteer :



          page.on('response', async (response) => {
          const url = new URL(response.url());
          let filePath = path.resolve(`./output${url.pathname}`);
          if (path.extname(url.pathname).trim() === '') {
          filePath = `${filePath}/index.html`;
          }
          await fs_extra.outputFile(filePath, await response.buffer());
          });


          The piece of code above will intercept requests and save them in an output folder, where you can check their sizes. See the linked article for more details.






          share|improve this answer





























            1














            The code from @mihai works in most of the cases. But when there is a response with
            206 status (images, videos and audios usually in 206 response), an error will be thrown. See
            https://github.com/GoogleChrome/puppeteer/issues/1274






            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52967184%2fpuppeteer-get-information-about-page-loaded-list-of-files-loaded-and-their-siz%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              2














              Page assets are not stored on disk, they are held in browser memory and sometimes cached, so it's impossible to know their sizes.



              What you want to look at is web scraping, which can be done with modules like node-website-scraper or with puppeteer :



              page.on('response', async (response) => {
              const url = new URL(response.url());
              let filePath = path.resolve(`./output${url.pathname}`);
              if (path.extname(url.pathname).trim() === '') {
              filePath = `${filePath}/index.html`;
              }
              await fs_extra.outputFile(filePath, await response.buffer());
              });


              The piece of code above will intercept requests and save them in an output folder, where you can check their sizes. See the linked article for more details.






              share|improve this answer


























                2














                Page assets are not stored on disk, they are held in browser memory and sometimes cached, so it's impossible to know their sizes.



                What you want to look at is web scraping, which can be done with modules like node-website-scraper or with puppeteer :



                page.on('response', async (response) => {
                const url = new URL(response.url());
                let filePath = path.resolve(`./output${url.pathname}`);
                if (path.extname(url.pathname).trim() === '') {
                filePath = `${filePath}/index.html`;
                }
                await fs_extra.outputFile(filePath, await response.buffer());
                });


                The piece of code above will intercept requests and save them in an output folder, where you can check their sizes. See the linked article for more details.






                share|improve this answer
























                  2












                  2








                  2






                  Page assets are not stored on disk, they are held in browser memory and sometimes cached, so it's impossible to know their sizes.



                  What you want to look at is web scraping, which can be done with modules like node-website-scraper or with puppeteer :



                  page.on('response', async (response) => {
                  const url = new URL(response.url());
                  let filePath = path.resolve(`./output${url.pathname}`);
                  if (path.extname(url.pathname).trim() === '') {
                  filePath = `${filePath}/index.html`;
                  }
                  await fs_extra.outputFile(filePath, await response.buffer());
                  });


                  The piece of code above will intercept requests and save them in an output folder, where you can check their sizes. See the linked article for more details.






                  share|improve this answer












                  Page assets are not stored on disk, they are held in browser memory and sometimes cached, so it's impossible to know their sizes.



                  What you want to look at is web scraping, which can be done with modules like node-website-scraper or with puppeteer :



                  page.on('response', async (response) => {
                  const url = new URL(response.url());
                  let filePath = path.resolve(`./output${url.pathname}`);
                  if (path.extname(url.pathname).trim() === '') {
                  filePath = `${filePath}/index.html`;
                  }
                  await fs_extra.outputFile(filePath, await response.buffer());
                  });


                  The piece of code above will intercept requests and save them in an output folder, where you can check their sizes. See the linked article for more details.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Oct 24 '18 at 13:04









                  mihaimihai

                  23.5k73968




                  23.5k73968

























                      1














                      The code from @mihai works in most of the cases. But when there is a response with
                      206 status (images, videos and audios usually in 206 response), an error will be thrown. See
                      https://github.com/GoogleChrome/puppeteer/issues/1274






                      share|improve this answer




























                        1














                        The code from @mihai works in most of the cases. But when there is a response with
                        206 status (images, videos and audios usually in 206 response), an error will be thrown. See
                        https://github.com/GoogleChrome/puppeteer/issues/1274






                        share|improve this answer


























                          1












                          1








                          1






                          The code from @mihai works in most of the cases. But when there is a response with
                          206 status (images, videos and audios usually in 206 response), an error will be thrown. See
                          https://github.com/GoogleChrome/puppeteer/issues/1274






                          share|improve this answer














                          The code from @mihai works in most of the cases. But when there is a response with
                          206 status (images, videos and audios usually in 206 response), an error will be thrown. See
                          https://github.com/GoogleChrome/puppeteer/issues/1274







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Nov 23 '18 at 19:49









                          mihai

                          23.5k73968




                          23.5k73968










                          answered Oct 25 '18 at 7:54









                          lx1412lx1412

                          36018




                          36018






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52967184%2fpuppeteer-get-information-about-page-loaded-list-of-files-loaded-and-their-siz%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                              Calculate evaluation metrics using cross_val_predict sklearn

                              Insert data from modal to MySQL (multiple modal on website)