Scraping an 'onclick' table with Selenium in Python

I am attempting to scrape the following webpage, using Selenium in Python (with Chrome Web Driver).

https://www.betexplorer.com/soccer/argentina/superliga/argentinos-jrs-talleres-cordoba/ptSIK7kB/#ah1

I only wish to collect the rows of data in which the bookmaker is Bet365.

I have been able to obtain all the rows where this is the case. However, I am struggling to scrape the information within the 'onclick' table that appears when the values are clicked:

enter image description here

The image above shows the table ARCHIVE ODDS, which appears when the 5.90 is clicked.

The aim is to collect the information from each table in all the rows where Bet365 is the bookmaker.

My attempt so far has been to locate all the 'onclick' links using a CSS-selector:

table_links = browser.find_elements_by_css_selector("span[onclick*='16);']")

And then to loop through each of the table_links, click each one, and scrape the data which appears using the xpath:

bet365table = 

for i in table_links:

    i.click()

    xx = browser.find_element_by_xpath("//TBODY[@id='aodds-tbody']")

    bet365table.append(xx)

However, this fails each time with the error stating the element is not clickable.

edited Nov 23 '18 at 11:40

asked Nov 23 '18 at 11:24

bobman

347

Update the question with the relevant HTML and your code trials
– DebanjanB
Nov 23 '18 at 11:26

@DebanjanB i've updated the post with my attempts so far - from my end the link in the post works, but please let me know if you're having difficulties reaching it!
– bobman
Nov 23 '18 at 11:41

add a comment |

I am attempting to scrape the following webpage, using Selenium in Python (with Chrome Web Driver).

https://www.betexplorer.com/soccer/argentina/superliga/argentinos-jrs-talleres-cordoba/ptSIK7kB/#ah1

I only wish to collect the rows of data in which the bookmaker is Bet365.

I have been able to obtain all the rows where this is the case. However, I am struggling to scrape the information within the 'onclick' table that appears when the values are clicked:

enter image description here

The image above shows the table ARCHIVE ODDS, which appears when the 5.90 is clicked.

The aim is to collect the information from each table in all the rows where Bet365 is the bookmaker.

My attempt so far has been to locate all the 'onclick' links using a CSS-selector:

table_links = browser.find_elements_by_css_selector("span[onclick*='16);']")

And then to loop through each of the table_links, click each one, and scrape the data which appears using the xpath:

bet365table = 

for i in table_links:

    i.click()

    xx = browser.find_element_by_xpath("//TBODY[@id='aodds-tbody']")

    bet365table.append(xx)

However, this fails each time with the error stating the element is not clickable.

edited Nov 23 '18 at 11:40

asked Nov 23 '18 at 11:24

bobman

347

Update the question with the relevant HTML and your code trials
– DebanjanB
Nov 23 '18 at 11:26

@DebanjanB i've updated the post with my attempts so far - from my end the link in the post works, but please let me know if you're having difficulties reaching it!
– bobman
Nov 23 '18 at 11:41

add a comment |

I am attempting to scrape the following webpage, using Selenium in Python (with Chrome Web Driver).

https://www.betexplorer.com/soccer/argentina/superliga/argentinos-jrs-talleres-cordoba/ptSIK7kB/#ah1

I only wish to collect the rows of data in which the bookmaker is Bet365.

I have been able to obtain all the rows where this is the case. However, I am struggling to scrape the information within the 'onclick' table that appears when the values are clicked:

enter image description here

The image above shows the table ARCHIVE ODDS, which appears when the 5.90 is clicked.

The aim is to collect the information from each table in all the rows where Bet365 is the bookmaker.

My attempt so far has been to locate all the 'onclick' links using a CSS-selector:

table_links = browser.find_elements_by_css_selector("span[onclick*='16);']")

And then to loop through each of the table_links, click each one, and scrape the data which appears using the xpath:

bet365table = 

for i in table_links:

    i.click()

    xx = browser.find_element_by_xpath("//TBODY[@id='aodds-tbody']")

    bet365table.append(xx)

However, this fails each time with the error stating the element is not clickable.

edited Nov 23 '18 at 11:40

asked Nov 23 '18 at 11:24

bobman

347

I am attempting to scrape the following webpage, using Selenium in Python (with Chrome Web Driver).

https://www.betexplorer.com/soccer/argentina/superliga/argentinos-jrs-talleres-cordoba/ptSIK7kB/#ah1

I only wish to collect the rows of data in which the bookmaker is Bet365.

I have been able to obtain all the rows where this is the case. However, I am struggling to scrape the information within the 'onclick' table that appears when the values are clicked:

enter image description here

The image above shows the table ARCHIVE ODDS, which appears when the 5.90 is clicked.

The aim is to collect the information from each table in all the rows where Bet365 is the bookmaker.

My attempt so far has been to locate all the 'onclick' links using a CSS-selector:

table_links = browser.find_elements_by_css_selector("span[onclick*='16);']")

And then to loop through each of the table_links, click each one, and scrape the data which appears using the xpath:

bet365table = 

for i in table_links:

    i.click()

    xx = browser.find_element_by_xpath("//TBODY[@id='aodds-tbody']")

    bet365table.append(xx)

However, this fails each time with the error stating the element is not clickable.

python selenium web-scraping

edited Nov 23 '18 at 11:40

asked Nov 23 '18 at 11:24

bobman

347

edited Nov 23 '18 at 11:40

asked Nov 23 '18 at 11:24

bobman

347

edited Nov 23 '18 at 11:40

asked Nov 23 '18 at 11:24

bobman

347

asked Nov 23 '18 at 11:24

bobman

347

asked Nov 23 '18 at 11:24

bobman

347

Update the question with the relevant HTML and your code trials
– DebanjanB
Nov 23 '18 at 11:26

@DebanjanB i've updated the post with my attempts so far - from my end the link in the post works, but please let me know if you're having difficulties reaching it!
– bobman
Nov 23 '18 at 11:41

add a comment |

Update the question with the relevant HTML and your code trials
– DebanjanB
Nov 23 '18 at 11:26

@DebanjanB i've updated the post with my attempts so far - from my end the link in the post works, but please let me know if you're having difficulties reaching it!
– bobman
Nov 23 '18 at 11:41

Update the question with the relevant HTML and your code trials
– DebanjanB
Nov 23 '18 at 11:26

@DebanjanB i've updated the post with my attempts so far - from my end the link in the post works, but please let me know if you're having difficulties reaching it!
– bobman
Nov 23 '18 at 11:41

add a comment |

1 Answer
1

active

oldest

votes

You could also mimic the XHR requests and get the JSON responses. Bet365 has id of 16. You can test for qualifying rows with CSS selector

import requests

import pandas as pd

import json

from pandas.io.json import json_normalize

from selenium import webdriver

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.common.by import By



d = webdriver.Chrome()

d.get("https://www.betexplorer.com/soccer/argentina/superliga/argentinos-jrs-talleres-cordoba/ptSIK7kB/#ah")

WebDriverWait(d,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".in-bookmaker-logo-link.in-bookmaker-logo-link--primary.l16")))



base = 'https://www.betexplorer.com/archive-odds/'

links = d.find_elements_by_css_selector("[onclick$=', 16);']")

extracted_links = [link.get_attribute("onclick").strip("load_odds_archive(this, '").strip("', 16);") for link in links]

json_links = [base + link + '/16/?_=1' for link in extracted_links]



for link in json_links:

    res = requests.get(link)

    data= json.loads(res.content)

    data = json_normalize(data)

    print(data)



d.quit()

answered Nov 23 '18 at 12:21

QHarr

30.1k81941

Did not know this was a possible method! Can I please ask, how did you find out this is a possibility for this website?
– bobman
Nov 23 '18 at 12:53

I used dev tools (F12) to monitor the traffic when pressing to display the tables for odds. I noticed that 16 was associated with Bet365 and that a JSON response was returned. I also saw that the start of the request URL was constant, the middle part came from the existing 365 href and that the end was a constant plus an incrementing number which bore no real significance AFAICS, so I just plugged the number 1 onto the end of the request URL. I suspect they may even offer an API so worth looking into that.
– QHarr
Nov 23 '18 at 12:56

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53445815%2fscraping-an-onclick-table-with-selenium-in-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You could also mimic the XHR requests and get the JSON responses. Bet365 has id of 16. You can test for qualifying rows with CSS selector

import requests

import pandas as pd

import json

from pandas.io.json import json_normalize

from selenium import webdriver

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.common.by import By



d = webdriver.Chrome()

d.get("https://www.betexplorer.com/soccer/argentina/superliga/argentinos-jrs-talleres-cordoba/ptSIK7kB/#ah")

WebDriverWait(d,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".in-bookmaker-logo-link.in-bookmaker-logo-link--primary.l16")))



base = 'https://www.betexplorer.com/archive-odds/'

links = d.find_elements_by_css_selector("[onclick$=', 16);']")

extracted_links = [link.get_attribute("onclick").strip("load_odds_archive(this, '").strip("', 16);") for link in links]

json_links = [base + link + '/16/?_=1' for link in extracted_links]



for link in json_links:

    res = requests.get(link)

    data= json.loads(res.content)

    data = json_normalize(data)

    print(data)



d.quit()

answered Nov 23 '18 at 12:21

QHarr

30.1k81941

Did not know this was a possible method! Can I please ask, how did you find out this is a possibility for this website?
– bobman
Nov 23 '18 at 12:53

I used dev tools (F12) to monitor the traffic when pressing to display the tables for odds. I noticed that 16 was associated with Bet365 and that a JSON response was returned. I also saw that the start of the request URL was constant, the middle part came from the existing 365 href and that the end was a constant plus an incrementing number which bore no real significance AFAICS, so I just plugged the number 1 onto the end of the request URL. I suspect they may even offer an API so worth looking into that.
– QHarr
Nov 23 '18 at 12:56

add a comment |

You could also mimic the XHR requests and get the JSON responses. Bet365 has id of 16. You can test for qualifying rows with CSS selector

import requests

import pandas as pd

import json

from pandas.io.json import json_normalize

from selenium import webdriver

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.common.by import By



d = webdriver.Chrome()

d.get("https://www.betexplorer.com/soccer/argentina/superliga/argentinos-jrs-talleres-cordoba/ptSIK7kB/#ah")

WebDriverWait(d,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".in-bookmaker-logo-link.in-bookmaker-logo-link--primary.l16")))



base = 'https://www.betexplorer.com/archive-odds/'

links = d.find_elements_by_css_selector("[onclick$=', 16);']")

extracted_links = [link.get_attribute("onclick").strip("load_odds_archive(this, '").strip("', 16);") for link in links]

json_links = [base + link + '/16/?_=1' for link in extracted_links]



for link in json_links:

    res = requests.get(link)

    data= json.loads(res.content)

    data = json_normalize(data)

    print(data)



d.quit()

answered Nov 23 '18 at 12:21

QHarr

30.1k81941

Did not know this was a possible method! Can I please ask, how did you find out this is a possibility for this website?
– bobman
Nov 23 '18 at 12:53

I used dev tools (F12) to monitor the traffic when pressing to display the tables for odds. I noticed that 16 was associated with Bet365 and that a JSON response was returned. I also saw that the start of the request URL was constant, the middle part came from the existing 365 href and that the end was a constant plus an incrementing number which bore no real significance AFAICS, so I just plugged the number 1 onto the end of the request URL. I suspect they may even offer an API so worth looking into that.
– QHarr
Nov 23 '18 at 12:56

add a comment |

You could also mimic the XHR requests and get the JSON responses. Bet365 has id of 16. You can test for qualifying rows with CSS selector

import requests

import pandas as pd

import json

from pandas.io.json import json_normalize

from selenium import webdriver

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.common.by import By



d = webdriver.Chrome()

d.get("https://www.betexplorer.com/soccer/argentina/superliga/argentinos-jrs-talleres-cordoba/ptSIK7kB/#ah")

WebDriverWait(d,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".in-bookmaker-logo-link.in-bookmaker-logo-link--primary.l16")))



base = 'https://www.betexplorer.com/archive-odds/'

links = d.find_elements_by_css_selector("[onclick$=', 16);']")

extracted_links = [link.get_attribute("onclick").strip("load_odds_archive(this, '").strip("', 16);") for link in links]

json_links = [base + link + '/16/?_=1' for link in extracted_links]



for link in json_links:

    res = requests.get(link)

    data= json.loads(res.content)

    data = json_normalize(data)

    print(data)



d.quit()

answered Nov 23 '18 at 12:21

QHarr

30.1k81941

You could also mimic the XHR requests and get the JSON responses. Bet365 has id of 16. You can test for qualifying rows with CSS selector

import requests

import pandas as pd

import json

from pandas.io.json import json_normalize

from selenium import webdriver

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.common.by import By



d = webdriver.Chrome()

d.get("https://www.betexplorer.com/soccer/argentina/superliga/argentinos-jrs-talleres-cordoba/ptSIK7kB/#ah")

WebDriverWait(d,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".in-bookmaker-logo-link.in-bookmaker-logo-link--primary.l16")))



base = 'https://www.betexplorer.com/archive-odds/'

links = d.find_elements_by_css_selector("[onclick$=', 16);']")

extracted_links = [link.get_attribute("onclick").strip("load_odds_archive(this, '").strip("', 16);") for link in links]

json_links = [base + link + '/16/?_=1' for link in extracted_links]



for link in json_links:

    res = requests.get(link)

    data= json.loads(res.content)

    data = json_normalize(data)

    print(data)



d.quit()

answered Nov 23 '18 at 12:21

QHarr

30.1k81941

answered Nov 23 '18 at 12:21

QHarr

30.1k81941

answered Nov 23 '18 at 12:21

QHarr

30.1k81941

answered Nov 23 '18 at 12:21

QHarr

30.1k81941

Did not know this was a possible method! Can I please ask, how did you find out this is a possibility for this website?
– bobman
Nov 23 '18 at 12:53

I used dev tools (F12) to monitor the traffic when pressing to display the tables for odds. I noticed that 16 was associated with Bet365 and that a JSON response was returned. I also saw that the start of the request URL was constant, the middle part came from the existing 365 href and that the end was a constant plus an incrementing number which bore no real significance AFAICS, so I just plugged the number 1 onto the end of the request URL. I suspect they may even offer an API so worth looking into that.
– QHarr
Nov 23 '18 at 12:56

add a comment |

Did not know this was a possible method! Can I please ask, how did you find out this is a possibility for this website?
– bobman
Nov 23 '18 at 12:53

I used dev tools (F12) to monitor the traffic when pressing to display the tables for odds. I noticed that 16 was associated with Bet365 and that a JSON response was returned. I also saw that the start of the request URL was constant, the middle part came from the existing 365 href and that the end was a constant plus an incrementing number which bore no real significance AFAICS, so I just plugged the number 1 onto the end of the request URL. I suspect they may even offer an API so worth looking into that.
– QHarr
Nov 23 '18 at 12:56

Did not know this was a possible method! Can I please ask, how did you find out this is a possibility for this website?
– bobman
Nov 23 '18 at 12:53

I used dev tools (F12) to monitor the traffic when pressing to display the tables for odds. I noticed that 16 was associated with Bet365 and that a JSON response was returned. I also saw that the start of the request URL was constant, the middle part came from the existing 365 href and that the end was a constant plus an incrementing number which bore no real significance AFAICS, so I just plugged the number 1 onto the end of the request URL. I suspect they may even offer an API so worth looking into that.
– QHarr
Nov 23 '18 at 12:56

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl