Web scraping a table and can't locate the table

I am scraping this website: https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc

And try to download all the zip files from the table. However, I can not locate the table from the 'soup'. It returns nothing.

req = Request(

    'https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc',

     headers={'User-Agent': 'Mozilla/5.0'})

page = urlopen(req).read()

soup = BeautifulSoup(page, "html.parser")

tables = soup.find('div', class_='table table-bordered docnav-metadata dataTable no-footer')

asked Nov 26 '18 at 18:46

Chen. B

The content gets loaded dynamically. Try using selenium or requests_html or something to fetch them.

– SIM
Nov 26 '18 at 19:03

Thank you so much. Can you be more explicit please?

– Chen. B
Nov 26 '18 at 19:16

If you disable javascript in your browser and reload the page, you wont see that tabular content. BeautifulSoup can't catch such content.

– SIM
Nov 26 '18 at 19:22

Thank you so much!

– Chen. B
Nov 26 '18 at 19:29

Why not just use firebug or chrome developer tools to check the ajax call and emulate it?

– Carlos Alves Jorge
Nov 26 '18 at 20:11

add a comment |

And try to download all the zip files from the table. However, I can not locate the table from the 'soup'. It returns nothing.

req = Request(

    'https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc',

     headers={'User-Agent': 'Mozilla/5.0'})

page = urlopen(req).read()

soup = BeautifulSoup(page, "html.parser")

tables = soup.find('div', class_='table table-bordered docnav-metadata dataTable no-footer')

asked Nov 26 '18 at 18:46

Chen. B

The content gets loaded dynamically. Try using selenium or requests_html or something to fetch them.

– SIM
Nov 26 '18 at 19:03

Thank you so much. Can you be more explicit please?

– Chen. B
Nov 26 '18 at 19:16

If you disable javascript in your browser and reload the page, you wont see that tabular content. BeautifulSoup can't catch such content.

– SIM
Nov 26 '18 at 19:22

Thank you so much!

– Chen. B
Nov 26 '18 at 19:29

Why not just use firebug or chrome developer tools to check the ajax call and emulate it?

– Carlos Alves Jorge
Nov 26 '18 at 20:11

add a comment |

And try to download all the zip files from the table. However, I can not locate the table from the 'soup'. It returns nothing.

req = Request(

    'https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc',

     headers={'User-Agent': 'Mozilla/5.0'})

page = urlopen(req).read()

soup = BeautifulSoup(page, "html.parser")

tables = soup.find('div', class_='table table-bordered docnav-metadata dataTable no-footer')

asked Nov 26 '18 at 18:46

Chen. B

And try to download all the zip files from the table. However, I can not locate the table from the 'soup'. It returns nothing.

req = Request(

    'https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc',

     headers={'User-Agent': 'Mozilla/5.0'})

page = urlopen(req).read()

soup = BeautifulSoup(page, "html.parser")

tables = soup.find('div', class_='table table-bordered docnav-metadata dataTable no-footer')

web-scraping beautifulsoup

asked Nov 26 '18 at 18:46

Chen. B

asked Nov 26 '18 at 18:46

Chen. B

asked Nov 26 '18 at 18:46

Chen. B

asked Nov 26 '18 at 18:46

Chen. B

asked Nov 26 '18 at 18:46

Chen. B

The content gets loaded dynamically. Try using selenium or requests_html or something to fetch them.

– SIM
Nov 26 '18 at 19:03

Thank you so much. Can you be more explicit please?

– Chen. B
Nov 26 '18 at 19:16

If you disable javascript in your browser and reload the page, you wont see that tabular content. BeautifulSoup can't catch such content.

– SIM
Nov 26 '18 at 19:22

Thank you so much!

– Chen. B
Nov 26 '18 at 19:29

Why not just use firebug or chrome developer tools to check the ajax call and emulate it?

– Carlos Alves Jorge
Nov 26 '18 at 20:11

add a comment |

The content gets loaded dynamically. Try using selenium or requests_html or something to fetch them.

– SIM
Nov 26 '18 at 19:03

Thank you so much. Can you be more explicit please?

– Chen. B
Nov 26 '18 at 19:16

If you disable javascript in your browser and reload the page, you wont see that tabular content. BeautifulSoup can't catch such content.

– SIM
Nov 26 '18 at 19:22

Thank you so much!

– Chen. B
Nov 26 '18 at 19:29

Why not just use firebug or chrome developer tools to check the ajax call and emulate it?

– Carlos Alves Jorge
Nov 26 '18 at 20:11

The content gets loaded dynamically. Try using selenium or requests_html or something to fetch them.

– SIM
Nov 26 '18 at 19:03

Thank you so much. Can you be more explicit please?

– Chen. B
Nov 26 '18 at 19:16

If you disable javascript in your browser and reload the page, you wont see that tabular content. BeautifulSoup can't catch such content.

– SIM
Nov 26 '18 at 19:22

Thank you so much!

– Chen. B
Nov 26 '18 at 19:29

Why not just use firebug or chrome developer tools to check the ajax call and emulate it?

– Carlos Alves Jorge
Nov 26 '18 at 20:11

add a comment |

2 Answers
2

active

oldest

votes

as stated, you need something like selenium to load the page as it's dynamic. You'll also need to let it wait to load to get the table.

NOTE: I used time.sleep() for the wait, but I have read that is not the best solution. The suggestion is to use WebDriverWait but I'm still in the pr0cess of understanding how that would work, so will update this once I play around. In the meantime, this should get you started.

import bs4 

from selenium import webdriver

import time



driver = webdriver.Chrome()

driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')





time.sleep(5)    



html = driver.page_source

soup = bs4.BeautifulSoup(html,'html.parser')



tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})

This worked for me with WebDriverWait:

import bs4 

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.support.wait import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC



driver = webdriver.Chrome()

driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')



WebDriverWait(driver, 10).until(

        EC.presence_of_element_located((By.CSS_SELECTOR, "table.table-bordered.docnav-metadata.dataTable.no-footer")))



html = driver.page_source

soup = bs4.BeautifulSoup(html,'html.parser')



tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})

edited Nov 26 '18 at 20:04

answered Nov 26 '18 at 19:43

chitown88

3,7511522

add a comment |

To fetch the tabular content from that webpage using Requests-HTML library, you can try the following script:

import requests_html



link = "https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc"



with requests_html.HTMLSession() as session:

    r = session.get(link)

    r.html.render(sleep=5,timeout=8)

    for items in r.html.find("table.dataTable tr.desktop-row"):

        data = [item.text for item in items.find("td")]

        print(data)

answered Nov 26 '18 at 19:52

SIM

10.5k3744

1

Finally someone using requests_html 😁

– Kamikaze_goldfish
Nov 26 '18 at 20:09

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487244%2fweb-scraping-a-table-and-cant-locate-the-table%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

as stated, you need something like selenium to load the page as it's dynamic. You'll also need to let it wait to load to get the table.

import bs4 

from selenium import webdriver

import time



driver = webdriver.Chrome()

driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')





time.sleep(5)    



html = driver.page_source

soup = bs4.BeautifulSoup(html,'html.parser')



tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})

This worked for me with WebDriverWait:

import bs4 

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.support.wait import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC



driver = webdriver.Chrome()

driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')



WebDriverWait(driver, 10).until(

        EC.presence_of_element_located((By.CSS_SELECTOR, "table.table-bordered.docnav-metadata.dataTable.no-footer")))



html = driver.page_source

soup = bs4.BeautifulSoup(html,'html.parser')



tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})

edited Nov 26 '18 at 20:04

answered Nov 26 '18 at 19:43

chitown88

3,7511522

add a comment |

as stated, you need something like selenium to load the page as it's dynamic. You'll also need to let it wait to load to get the table.

import bs4 

from selenium import webdriver

import time



driver = webdriver.Chrome()

driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')





time.sleep(5)    



html = driver.page_source

soup = bs4.BeautifulSoup(html,'html.parser')



tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})

This worked for me with WebDriverWait:

import bs4 

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.support.wait import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC



driver = webdriver.Chrome()

driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')



WebDriverWait(driver, 10).until(

        EC.presence_of_element_located((By.CSS_SELECTOR, "table.table-bordered.docnav-metadata.dataTable.no-footer")))



html = driver.page_source

soup = bs4.BeautifulSoup(html,'html.parser')



tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})

edited Nov 26 '18 at 20:04

answered Nov 26 '18 at 19:43

chitown88

3,7511522

add a comment |

as stated, you need something like selenium to load the page as it's dynamic. You'll also need to let it wait to load to get the table.

import bs4 

from selenium import webdriver

import time



driver = webdriver.Chrome()

driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')





time.sleep(5)    



html = driver.page_source

soup = bs4.BeautifulSoup(html,'html.parser')



tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})

This worked for me with WebDriverWait:

import bs4 

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.support.wait import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC



driver = webdriver.Chrome()

driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')



WebDriverWait(driver, 10).until(

        EC.presence_of_element_located((By.CSS_SELECTOR, "table.table-bordered.docnav-metadata.dataTable.no-footer")))



html = driver.page_source

soup = bs4.BeautifulSoup(html,'html.parser')



tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})

edited Nov 26 '18 at 20:04

answered Nov 26 '18 at 19:43

chitown88

3,7511522

as stated, you need something like selenium to load the page as it's dynamic. You'll also need to let it wait to load to get the table.

import bs4 

from selenium import webdriver

import time



driver = webdriver.Chrome()

driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')





time.sleep(5)    



html = driver.page_source

soup = bs4.BeautifulSoup(html,'html.parser')



tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})

This worked for me with WebDriverWait:

import bs4 

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.support.wait import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC



driver = webdriver.Chrome()

driver.get('https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc')



WebDriverWait(driver, 10).until(

        EC.presence_of_element_located((By.CSS_SELECTOR, "table.table-bordered.docnav-metadata.dataTable.no-footer")))



html = driver.page_source

soup = bs4.BeautifulSoup(html,'html.parser')



tables = soup.findAll('table', {'class':'table table-bordered docnav-metadata dataTable no-footer'})

edited Nov 26 '18 at 20:04

answered Nov 26 '18 at 19:43

chitown88

3,7511522

edited Nov 26 '18 at 20:04

answered Nov 26 '18 at 19:43

chitown88

3,7511522

answered Nov 26 '18 at 19:43

chitown88

3,7511522

answered Nov 26 '18 at 19:43

chitown88

3,7511522

add a comment |

To fetch the tabular content from that webpage using Requests-HTML library, you can try the following script:

import requests_html



link = "https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc"



with requests_html.HTMLSession() as session:

    r = session.get(link)

    r.html.render(sleep=5,timeout=8)

    for items in r.html.find("table.dataTable tr.desktop-row"):

        data = [item.text for item in items.find("td")]

        print(data)

answered Nov 26 '18 at 19:52

SIM

10.5k3744

1

Finally someone using requests_html 😁

– Kamikaze_goldfish
Nov 26 '18 at 20:09

add a comment |

To fetch the tabular content from that webpage using Requests-HTML library, you can try the following script:

import requests_html



link = "https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc"



with requests_html.HTMLSession() as session:

    r = session.get(link)

    r.html.render(sleep=5,timeout=8)

    for items in r.html.find("table.dataTable tr.desktop-row"):

        data = [item.text for item in items.find("td")]

        print(data)

answered Nov 26 '18 at 19:52

SIM

10.5k3744

1

Finally someone using requests_html 😁

– Kamikaze_goldfish
Nov 26 '18 at 20:09

add a comment |

To fetch the tabular content from that webpage using Requests-HTML library, you can try the following script:

import requests_html



link = "https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc"



with requests_html.HTMLSession() as session:

    r = session.get(link)

    r.html.render(sleep=5,timeout=8)

    for items in r.html.find("table.dataTable tr.desktop-row"):

        data = [item.text for item in items.find("td")]

        print(data)

answered Nov 26 '18 at 19:52

SIM

10.5k3744

To fetch the tabular content from that webpage using Requests-HTML library, you can try the following script:

import requests_html



link = "https://www.misoenergy.org/markets-and-operations/market-reports/market-report-archives/#nt=%2FMarketReportType%3ABids%2FMarketReportName%3AArchived%20Cleared%20Bids%20%20(zip)&t=10&p=0&s=FileName&sd=desc"



with requests_html.HTMLSession() as session:

    r = session.get(link)

    r.html.render(sleep=5,timeout=8)

    for items in r.html.find("table.dataTable tr.desktop-row"):

        data = [item.text for item in items.find("td")]

        print(data)

answered Nov 26 '18 at 19:52

SIM

10.5k3744

answered Nov 26 '18 at 19:52

SIM

10.5k3744

answered Nov 26 '18 at 19:52

SIM

10.5k3744

answered Nov 26 '18 at 19:52

SIM

10.5k3744

1

Finally someone using requests_html 😁

– Kamikaze_goldfish
Nov 26 '18 at 20:09

add a comment |

1

Finally someone using requests_html 😁

– Kamikaze_goldfish
Nov 26 '18 at 20:09

Finally someone using requests_html 😁

– Kamikaze_goldfish
Nov 26 '18 at 20:09

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl