Web scraping a table and can't locate the table












0















I am scraping this website: https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc



And try to download all the zip files from the table. However, I can not locate the table from the 'soup'. It returns nothing.



req = Request(
'https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc',
headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(req).read()
soup = BeautifulSoup(page, "html.parser")
tables = soup.find('div', class_='table table-bordered docnav-metadata dataTable no-footer')









share|improve this question























  • The content gets loaded dynamically. Try using selenium or requests_html or something to fetch them.

    – SIM
    Nov 26 '18 at 19:03











  • Thank you so much. Can you be more explicit please?

    – Chen. B
    Nov 26 '18 at 19:16











  • If you disable javascript in your browser and reload the page, you wont see that tabular content. BeautifulSoup can't catch such content.

    – SIM
    Nov 26 '18 at 19:22











  • Thank you so much!

    – Chen. B
    Nov 26 '18 at 19:29











  • Why not just use firebug or chrome developer tools to check the ajax call and emulate it?

    – Carlos Alves Jorge
    Nov 26 '18 at 20:11
















0















I am scraping this website: https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc



And try to download all the zip files from the table. However, I can not locate the table from the 'soup'. It returns nothing.



req = Request(
'https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc',
headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(req).read()
soup = BeautifulSoup(page, "html.parser")
tables = soup.find('div', class_='table table-bordered docnav-metadata dataTable no-footer')









share|improve this question























  • The content gets loaded dynamically. Try using selenium or requests_html or something to fetch them.

    – SIM
    Nov 26 '18 at 19:03











  • Thank you so much. Can you be more explicit please?

    – Chen. B
    Nov 26 '18 at 19:16











  • If you disable javascript in your browser and reload the page, you wont see that tabular content. BeautifulSoup can't catch such content.

    – SIM
    Nov 26 '18 at 19:22











  • Thank you so much!

    – Chen. B
    Nov 26 '18 at 19:29











  • Why not just use firebug or chrome developer tools to check the ajax call and emulate it?

    – Carlos Alves Jorge
    Nov 26 '18 at 20:11














0












0








0








I am scraping this website: https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc



And try to download all the zip files from the table. However, I can not locate the table from the 'soup'. It returns nothing.



req = Request(
'https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc',
headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(req).read()
soup = BeautifulSoup(page, "html.parser")
tables = soup.find('div', class_='table table-bordered docnav-metadata dataTable no-footer')









share|improve this question














I am scraping this website: https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc



And try to download all the zip files from the table. However, I can not locate the table from the 'soup'. It returns nothing.



req = Request(
'https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc',
headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(req).read()
soup = BeautifulSoup(page, "html.parser")
tables = soup.find('div', class_='table table-bordered docnav-metadata dataTable no-footer')






web-scraping beautifulsoup






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 26 '18 at 18:46









Chen. BChen. B

85




85













  • The content gets loaded dynamically. Try using selenium or requests_html or something to fetch them.

    – SIM
    Nov 26 '18 at 19:03











  • Thank you so much. Can you be more explicit please?

    – Chen. B
    Nov 26 '18 at 19:16











  • If you disable javascript in your browser and reload the page, you wont see that tabular content. BeautifulSoup can't catch such content.

    – SIM
    Nov 26 '18 at 19:22











  • Thank you so much!

    – Chen. B
    Nov 26 '18 at 19:29











  • Why not just use firebug or chrome developer tools to check the ajax call and emulate it?

    – Carlos Alves Jorge
    Nov 26 '18 at 20:11



















  • The content gets loaded dynamically. Try using selenium or requests_html or something to fetch them.

    – SIM
    Nov 26 '18 at 19:03











  • Thank you so much. Can you be more explicit please?

    – Chen. B
    Nov 26 '18 at 19:16











  • If you disable javascript in your browser and reload the page, you wont see that tabular content. BeautifulSoup can't catch such content.

    – SIM
    Nov 26 '18 at 19:22











  • Thank you so much!

    – Chen. B
    Nov 26 '18 at 19:29











  • Why not just use firebug or chrome developer tools to check the ajax call and emulate it?

    – Carlos Alves Jorge
    Nov 26 '18 at 20:11

















The content gets loaded dynamically. Try using selenium or requests_html or something to fetch them.

– SIM
Nov 26 '18 at 19:03





The content gets loaded dynamically. Try using selenium or requests_html or something to fetch them.

– SIM
Nov 26 '18 at 19:03













Thank you so much. Can you be more explicit please?

– Chen. B
Nov 26 '18 at 19:16





Thank you so much. Can you be more explicit please?

– Chen. B
Nov 26 '18 at 19:16













If you disable javascript in your browser and reload the page, you wont see that tabular content. BeautifulSoup can't catch such content.

– SIM
Nov 26 '18 at 19:22





If you disable javascript in your browser and reload the page, you wont see that tabular content. BeautifulSoup can't catch such content.

– SIM
Nov 26 '18 at 19:22













Thank you so much!

– Chen. B
Nov 26 '18 at 19:29





Thank you so much!

– Chen. B
Nov 26 '18 at 19:29













Why not just use firebug or chrome developer tools to check the ajax call and emulate it?

– Carlos Alves Jorge
Nov 26 '18 at 20:11





Why not just use firebug or chrome developer tools to check the ajax call and emulate it?

– Carlos Alves Jorge
Nov 26 '18 at 20:11












2 Answers
2






active

oldest

votes


















0














as stated, you need something like selenium to load the page as it's dynamic. You'll also need to let it wait to load to get the table.



NOTE: I used time.sleep() for the wait, but I have read that is not the best solution. The suggestion is to use WebDriverWait but I'm still in the pr0cess of understanding how that would work, so will update this once I play around. In the meantime, this should get you started.



import bs4 
from selenium import webdriver
import time

driver = webdriver.Chrome()
driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')


time.sleep(5)

html = driver.page_source
soup = bs4.BeautifulSoup(html,'html.parser')

tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})


This worked for me with WebDriverWait:



import bs4 
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')

WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "table.table-bordered.docnav-metadata.dataTable.no-footer")))

html = driver.page_source
soup = bs4.BeautifulSoup(html,'html.parser')

tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})





share|improve this answer

































    1














    To fetch the tabular content from that webpage using Requests-HTML library, you can try the following script:



    import requests_html

    link = "https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc"

    with requests_html.HTMLSession() as session:
    r = session.get(link)
    r.html.render(sleep=5,timeout=8)
    for items in r.html.find("table.dataTable tr.desktop-row"):
    data = [item.text for item in items.find("td")]
    print(data)





    share|improve this answer



















    • 1





      Finally someone using requests_html 😁

      – Kamikaze_goldfish
      Nov 26 '18 at 20:09











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487244%2fweb-scraping-a-table-and-cant-locate-the-table%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    as stated, you need something like selenium to load the page as it's dynamic. You'll also need to let it wait to load to get the table.



    NOTE: I used time.sleep() for the wait, but I have read that is not the best solution. The suggestion is to use WebDriverWait but I'm still in the pr0cess of understanding how that would work, so will update this once I play around. In the meantime, this should get you started.



    import bs4 
    from selenium import webdriver
    import time

    driver = webdriver.Chrome()
    driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')


    time.sleep(5)

    html = driver.page_source
    soup = bs4.BeautifulSoup(html,'html.parser')

    tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})


    This worked for me with WebDriverWait:



    import bs4 
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.wait import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC

    driver = webdriver.Chrome()
    driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')

    WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, "table.table-bordered.docnav-metadata.dataTable.no-footer")))

    html = driver.page_source
    soup = bs4.BeautifulSoup(html,'html.parser')

    tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})





    share|improve this answer






























      0














      as stated, you need something like selenium to load the page as it's dynamic. You'll also need to let it wait to load to get the table.



      NOTE: I used time.sleep() for the wait, but I have read that is not the best solution. The suggestion is to use WebDriverWait but I'm still in the pr0cess of understanding how that would work, so will update this once I play around. In the meantime, this should get you started.



      import bs4 
      from selenium import webdriver
      import time

      driver = webdriver.Chrome()
      driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')


      time.sleep(5)

      html = driver.page_source
      soup = bs4.BeautifulSoup(html,'html.parser')

      tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})


      This worked for me with WebDriverWait:



      import bs4 
      from selenium import webdriver
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support.wait import WebDriverWait
      from selenium.webdriver.support import expected_conditions as EC

      driver = webdriver.Chrome()
      driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')

      WebDriverWait(driver, 10).until(
      EC.presence_of_element_located((By.CSS_SELECTOR, "table.table-bordered.docnav-metadata.dataTable.no-footer")))

      html = driver.page_source
      soup = bs4.BeautifulSoup(html,'html.parser')

      tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})





      share|improve this answer




























        0












        0








        0







        as stated, you need something like selenium to load the page as it's dynamic. You'll also need to let it wait to load to get the table.



        NOTE: I used time.sleep() for the wait, but I have read that is not the best solution. The suggestion is to use WebDriverWait but I'm still in the pr0cess of understanding how that would work, so will update this once I play around. In the meantime, this should get you started.



        import bs4 
        from selenium import webdriver
        import time

        driver = webdriver.Chrome()
        driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')


        time.sleep(5)

        html = driver.page_source
        soup = bs4.BeautifulSoup(html,'html.parser')

        tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})


        This worked for me with WebDriverWait:



        import bs4 
        from selenium import webdriver
        from selenium.webdriver.common.by import By
        from selenium.webdriver.support.wait import WebDriverWait
        from selenium.webdriver.support import expected_conditions as EC

        driver = webdriver.Chrome()
        driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')

        WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, "table.table-bordered.docnav-metadata.dataTable.no-footer")))

        html = driver.page_source
        soup = bs4.BeautifulSoup(html,'html.parser')

        tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})





        share|improve this answer















        as stated, you need something like selenium to load the page as it's dynamic. You'll also need to let it wait to load to get the table.



        NOTE: I used time.sleep() for the wait, but I have read that is not the best solution. The suggestion is to use WebDriverWait but I'm still in the pr0cess of understanding how that would work, so will update this once I play around. In the meantime, this should get you started.



        import bs4 
        from selenium import webdriver
        import time

        driver = webdriver.Chrome()
        driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')


        time.sleep(5)

        html = driver.page_source
        soup = bs4.BeautifulSoup(html,'html.parser')

        tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})


        This worked for me with WebDriverWait:



        import bs4 
        from selenium import webdriver
        from selenium.webdriver.common.by import By
        from selenium.webdriver.support.wait import WebDriverWait
        from selenium.webdriver.support import expected_conditions as EC

        driver = webdriver.Chrome()
        driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')

        WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, "table.table-bordered.docnav-metadata.dataTable.no-footer")))

        html = driver.page_source
        soup = bs4.BeautifulSoup(html,'html.parser')

        tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 26 '18 at 20:04

























        answered Nov 26 '18 at 19:43









        chitown88chitown88

        3,7511522




        3,7511522

























            1














            To fetch the tabular content from that webpage using Requests-HTML library, you can try the following script:



            import requests_html

            link = "https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc"

            with requests_html.HTMLSession() as session:
            r = session.get(link)
            r.html.render(sleep=5,timeout=8)
            for items in r.html.find("table.dataTable tr.desktop-row"):
            data = [item.text for item in items.find("td")]
            print(data)





            share|improve this answer



















            • 1





              Finally someone using requests_html 😁

              – Kamikaze_goldfish
              Nov 26 '18 at 20:09
















            1














            To fetch the tabular content from that webpage using Requests-HTML library, you can try the following script:



            import requests_html

            link = "https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc"

            with requests_html.HTMLSession() as session:
            r = session.get(link)
            r.html.render(sleep=5,timeout=8)
            for items in r.html.find("table.dataTable tr.desktop-row"):
            data = [item.text for item in items.find("td")]
            print(data)





            share|improve this answer



















            • 1





              Finally someone using requests_html 😁

              – Kamikaze_goldfish
              Nov 26 '18 at 20:09














            1












            1








            1







            To fetch the tabular content from that webpage using Requests-HTML library, you can try the following script:



            import requests_html

            link = "https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc"

            with requests_html.HTMLSession() as session:
            r = session.get(link)
            r.html.render(sleep=5,timeout=8)
            for items in r.html.find("table.dataTable tr.desktop-row"):
            data = [item.text for item in items.find("td")]
            print(data)





            share|improve this answer













            To fetch the tabular content from that webpage using Requests-HTML library, you can try the following script:



            import requests_html

            link = "https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc"

            with requests_html.HTMLSession() as session:
            r = session.get(link)
            r.html.render(sleep=5,timeout=8)
            for items in r.html.find("table.dataTable tr.desktop-row"):
            data = [item.text for item in items.find("td")]
            print(data)






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 26 '18 at 19:52









            SIMSIM

            10.5k3744




            10.5k3744








            • 1





              Finally someone using requests_html 😁

              – Kamikaze_goldfish
              Nov 26 '18 at 20:09














            • 1





              Finally someone using requests_html 😁

              – Kamikaze_goldfish
              Nov 26 '18 at 20:09








            1




            1





            Finally someone using requests_html 😁

            – Kamikaze_goldfish
            Nov 26 '18 at 20:09





            Finally someone using requests_html 😁

            – Kamikaze_goldfish
            Nov 26 '18 at 20:09


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487244%2fweb-scraping-a-table-and-cant-locate-the-table%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

            Calculate evaluation metrics using cross_val_predict sklearn

            Insert data from modal to MySQL (multiple modal on website)