How to extract text from divs in Selenium using Python when new divs are added every approx 1 second?
I am trying to extract the content from divs
on a web page using Selenium.
The web page is dynamically generated and every second or so there is a new div inserted into the HTML on the web page.
So far I have the following code:
from selenium import webdriver
chrome_path = r"C:scrapechromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://website.com/")
messages =
for message in driver.find_elements_by_class_name('div_i_am_targeting'):
messages.append(message.text)
for x in messages:
print(x)
Which works fine, the problem is it only prints the values of the divs
on the page at the time it is run, I want to continuously extract the text from the_div_i_am_targeting
and there are new divs
appearing on the page every second or so.
I found this:
Handling dynamic div's in selenium
Which was the closest related question I could find, but it isn't a match for my question and there are no answers.
How can I update the above code so that it continuously prints the contents of the divs on the page for my chosen div (in this example div_i_am_targeting
) including new divs that are added to the page after the program runtime?
python selenium chrome-web-driver
|
show 1 more comment
I am trying to extract the content from divs
on a web page using Selenium.
The web page is dynamically generated and every second or so there is a new div inserted into the HTML on the web page.
So far I have the following code:
from selenium import webdriver
chrome_path = r"C:scrapechromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://website.com/")
messages =
for message in driver.find_elements_by_class_name('div_i_am_targeting'):
messages.append(message.text)
for x in messages:
print(x)
Which works fine, the problem is it only prints the values of the divs
on the page at the time it is run, I want to continuously extract the text from the_div_i_am_targeting
and there are new divs
appearing on the page every second or so.
I found this:
Handling dynamic div's in selenium
Which was the closest related question I could find, but it isn't a match for my question and there are no answers.
How can I update the above code so that it continuously prints the contents of the divs on the page for my chosen div (in this example div_i_am_targeting
) including new divs that are added to the page after the program runtime?
python selenium chrome-web-driver
I guess you need to put this in an infinite loop but does each div has any unique identification since we need to exclude the divs that have already been processed?
– Samarth
Nov 24 '18 at 12:54
@Gary, can you share the webpage you're trying to scrape? I cannot test here without a specific link in order to ensure my solution works.
– Luan Naufal
Nov 24 '18 at 12:54
One solution would be to add a loop with a sleep in the end, so you could ensure you're taking all generated divs:if message.text not in messages:
messages.append(message.text)
sleep(1)
– Luan Naufal
Nov 24 '18 at 12:57
Thanks both. I cannot share the webpage but the the content I want to extract is within the_div_i_am_targeting, there is no unique identifier on these divs, the structure of the content is: div class="the_div_i_am_targeting"> <p> some text </p></div> this pattern is repeated indefinitely on the page, so there are many of the same divs generated. The code above works fine, but I need to find a way to get the program to continue to run and continuously capture the new divs as they are created.Thanks for the suggestion about iterating over the loop with messate.text not in messages.
– Gary
Nov 24 '18 at 13:17
@Gary I understand your usecase is to extract text from the newly added<div>
s but what is the exit criteria for your Test?
– DebanjanB
Nov 24 '18 at 13:42
|
show 1 more comment
I am trying to extract the content from divs
on a web page using Selenium.
The web page is dynamically generated and every second or so there is a new div inserted into the HTML on the web page.
So far I have the following code:
from selenium import webdriver
chrome_path = r"C:scrapechromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://website.com/")
messages =
for message in driver.find_elements_by_class_name('div_i_am_targeting'):
messages.append(message.text)
for x in messages:
print(x)
Which works fine, the problem is it only prints the values of the divs
on the page at the time it is run, I want to continuously extract the text from the_div_i_am_targeting
and there are new divs
appearing on the page every second or so.
I found this:
Handling dynamic div's in selenium
Which was the closest related question I could find, but it isn't a match for my question and there are no answers.
How can I update the above code so that it continuously prints the contents of the divs on the page for my chosen div (in this example div_i_am_targeting
) including new divs that are added to the page after the program runtime?
python selenium chrome-web-driver
I am trying to extract the content from divs
on a web page using Selenium.
The web page is dynamically generated and every second or so there is a new div inserted into the HTML on the web page.
So far I have the following code:
from selenium import webdriver
chrome_path = r"C:scrapechromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://website.com/")
messages =
for message in driver.find_elements_by_class_name('div_i_am_targeting'):
messages.append(message.text)
for x in messages:
print(x)
Which works fine, the problem is it only prints the values of the divs
on the page at the time it is run, I want to continuously extract the text from the_div_i_am_targeting
and there are new divs
appearing on the page every second or so.
I found this:
Handling dynamic div's in selenium
Which was the closest related question I could find, but it isn't a match for my question and there are no answers.
How can I update the above code so that it continuously prints the contents of the divs on the page for my chosen div (in this example div_i_am_targeting
) including new divs that are added to the page after the program runtime?
python selenium chrome-web-driver
python selenium chrome-web-driver
asked Nov 24 '18 at 12:48
GaryGary
468416
468416
I guess you need to put this in an infinite loop but does each div has any unique identification since we need to exclude the divs that have already been processed?
– Samarth
Nov 24 '18 at 12:54
@Gary, can you share the webpage you're trying to scrape? I cannot test here without a specific link in order to ensure my solution works.
– Luan Naufal
Nov 24 '18 at 12:54
One solution would be to add a loop with a sleep in the end, so you could ensure you're taking all generated divs:if message.text not in messages:
messages.append(message.text)
sleep(1)
– Luan Naufal
Nov 24 '18 at 12:57
Thanks both. I cannot share the webpage but the the content I want to extract is within the_div_i_am_targeting, there is no unique identifier on these divs, the structure of the content is: div class="the_div_i_am_targeting"> <p> some text </p></div> this pattern is repeated indefinitely on the page, so there are many of the same divs generated. The code above works fine, but I need to find a way to get the program to continue to run and continuously capture the new divs as they are created.Thanks for the suggestion about iterating over the loop with messate.text not in messages.
– Gary
Nov 24 '18 at 13:17
@Gary I understand your usecase is to extract text from the newly added<div>
s but what is the exit criteria for your Test?
– DebanjanB
Nov 24 '18 at 13:42
|
show 1 more comment
I guess you need to put this in an infinite loop but does each div has any unique identification since we need to exclude the divs that have already been processed?
– Samarth
Nov 24 '18 at 12:54
@Gary, can you share the webpage you're trying to scrape? I cannot test here without a specific link in order to ensure my solution works.
– Luan Naufal
Nov 24 '18 at 12:54
One solution would be to add a loop with a sleep in the end, so you could ensure you're taking all generated divs:if message.text not in messages:
messages.append(message.text)
sleep(1)
– Luan Naufal
Nov 24 '18 at 12:57
Thanks both. I cannot share the webpage but the the content I want to extract is within the_div_i_am_targeting, there is no unique identifier on these divs, the structure of the content is: div class="the_div_i_am_targeting"> <p> some text </p></div> this pattern is repeated indefinitely on the page, so there are many of the same divs generated. The code above works fine, but I need to find a way to get the program to continue to run and continuously capture the new divs as they are created.Thanks for the suggestion about iterating over the loop with messate.text not in messages.
– Gary
Nov 24 '18 at 13:17
@Gary I understand your usecase is to extract text from the newly added<div>
s but what is the exit criteria for your Test?
– DebanjanB
Nov 24 '18 at 13:42
I guess you need to put this in an infinite loop but does each div has any unique identification since we need to exclude the divs that have already been processed?
– Samarth
Nov 24 '18 at 12:54
I guess you need to put this in an infinite loop but does each div has any unique identification since we need to exclude the divs that have already been processed?
– Samarth
Nov 24 '18 at 12:54
@Gary, can you share the webpage you're trying to scrape? I cannot test here without a specific link in order to ensure my solution works.
– Luan Naufal
Nov 24 '18 at 12:54
@Gary, can you share the webpage you're trying to scrape? I cannot test here without a specific link in order to ensure my solution works.
– Luan Naufal
Nov 24 '18 at 12:54
One solution would be to add a loop with a sleep in the end, so you could ensure you're taking all generated divs:
if message.text not in messages:
messages.append(message.text)
sleep(1)
– Luan Naufal
Nov 24 '18 at 12:57
One solution would be to add a loop with a sleep in the end, so you could ensure you're taking all generated divs:
if message.text not in messages:
messages.append(message.text)
sleep(1)
– Luan Naufal
Nov 24 '18 at 12:57
Thanks both. I cannot share the webpage but the the content I want to extract is within the_div_i_am_targeting, there is no unique identifier on these divs, the structure of the content is: div class="the_div_i_am_targeting"> <p> some text </p></div> this pattern is repeated indefinitely on the page, so there are many of the same divs generated. The code above works fine, but I need to find a way to get the program to continue to run and continuously capture the new divs as they are created.Thanks for the suggestion about iterating over the loop with messate.text not in messages.
– Gary
Nov 24 '18 at 13:17
Thanks both. I cannot share the webpage but the the content I want to extract is within the_div_i_am_targeting, there is no unique identifier on these divs, the structure of the content is: div class="the_div_i_am_targeting"> <p> some text </p></div> this pattern is repeated indefinitely on the page, so there are many of the same divs generated. The code above works fine, but I need to find a way to get the program to continue to run and continuously capture the new divs as they are created.Thanks for the suggestion about iterating over the loop with messate.text not in messages.
– Gary
Nov 24 '18 at 13:17
@Gary I understand your usecase is to extract text from the newly added
<div>
s but what is the exit criteria for your Test?– DebanjanB
Nov 24 '18 at 13:42
@Gary I understand your usecase is to extract text from the newly added
<div>
s but what is the exit criteria for your Test?– DebanjanB
Nov 24 '18 at 13:42
|
show 1 more comment
1 Answer
1
active
oldest
votes
You can apply below code to continuously print content of required divs:
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium import webdriver
chrome_path = r"C:scrapechromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://website.com/")
# Get current divs
messages = driver.find_elements_by_class_name('div_i_am_targeting')
# Print all messages
for message in messages:
print(message.text)
while True:
try:
# Wait up to minute for new message to appear
wait(driver, 60).until(lambda driver: driver.find_elements_by_class_name('div_i_am_targeting') != messages)
# Print new message
for message in [m.text for m in driver.find_elements_by_class_name('div_i_am_targeting') if m not in messages]:
print(message)
# Update list of messages
messages = driver.find_elements_by_class_name('div_i_am_targeting')
except:
# Break the loop in case no new messages after minute passed
print('No new messages')
break
Andersson, thanks for this great solution. It seems to be semi working for me. But I have noticed after a variable number of additional div elements being added (about 10, but not always 10) , it will sometimes skip a new div, then continue, and it will always fail about after 20 new divs being added. I've checked the html, and can't see anything different about the div structure for the divs it's breaking at. Can you think of any reason why this might be? Thanks
– Gary
Nov 24 '18 at 17:24
To help debug I added print(count) after the #print new message , comment. I noticed it's continuously stopping at 48 to 49 total number of divs (even though new divs are being added within a few seconds of the last div that prints), and although it's skipping some of the divs, in the printed output it can still see they are there, because the count above the print new message statement jumps for example it will run: 35... printed output, 36.... printed output .... 39.... printed output
– Gary
Nov 24 '18 at 17:49
1
@Gary , Are old divs still on page? Or they removed after some number of new divs added? Also is it possible that several new messages comes at the same time or the time between messages is almost constant?
– Andersson
Nov 24 '18 at 17:55
Great point, yes I checked and after a certain number of divs the older divs are replaced such that the first div is removed everytime a new div is added (the visual display is a list style box, that only shows the last x messages, and each div contains a single message) ; yes messages happen constantly some can come every second, some might be very close to the same time and closer in time than 1 second duration
– Gary
Nov 24 '18 at 18:13
@Gary , try updated answer and let me know in case of any new issues
– Andersson
Nov 24 '18 at 18:23
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458309%2fhow-to-extract-text-from-divs-in-selenium-using-python-when-new-divs-are-added-e%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can apply below code to continuously print content of required divs:
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium import webdriver
chrome_path = r"C:scrapechromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://website.com/")
# Get current divs
messages = driver.find_elements_by_class_name('div_i_am_targeting')
# Print all messages
for message in messages:
print(message.text)
while True:
try:
# Wait up to minute for new message to appear
wait(driver, 60).until(lambda driver: driver.find_elements_by_class_name('div_i_am_targeting') != messages)
# Print new message
for message in [m.text for m in driver.find_elements_by_class_name('div_i_am_targeting') if m not in messages]:
print(message)
# Update list of messages
messages = driver.find_elements_by_class_name('div_i_am_targeting')
except:
# Break the loop in case no new messages after minute passed
print('No new messages')
break
Andersson, thanks for this great solution. It seems to be semi working for me. But I have noticed after a variable number of additional div elements being added (about 10, but not always 10) , it will sometimes skip a new div, then continue, and it will always fail about after 20 new divs being added. I've checked the html, and can't see anything different about the div structure for the divs it's breaking at. Can you think of any reason why this might be? Thanks
– Gary
Nov 24 '18 at 17:24
To help debug I added print(count) after the #print new message , comment. I noticed it's continuously stopping at 48 to 49 total number of divs (even though new divs are being added within a few seconds of the last div that prints), and although it's skipping some of the divs, in the printed output it can still see they are there, because the count above the print new message statement jumps for example it will run: 35... printed output, 36.... printed output .... 39.... printed output
– Gary
Nov 24 '18 at 17:49
1
@Gary , Are old divs still on page? Or they removed after some number of new divs added? Also is it possible that several new messages comes at the same time or the time between messages is almost constant?
– Andersson
Nov 24 '18 at 17:55
Great point, yes I checked and after a certain number of divs the older divs are replaced such that the first div is removed everytime a new div is added (the visual display is a list style box, that only shows the last x messages, and each div contains a single message) ; yes messages happen constantly some can come every second, some might be very close to the same time and closer in time than 1 second duration
– Gary
Nov 24 '18 at 18:13
@Gary , try updated answer and let me know in case of any new issues
– Andersson
Nov 24 '18 at 18:23
|
show 1 more comment
You can apply below code to continuously print content of required divs:
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium import webdriver
chrome_path = r"C:scrapechromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://website.com/")
# Get current divs
messages = driver.find_elements_by_class_name('div_i_am_targeting')
# Print all messages
for message in messages:
print(message.text)
while True:
try:
# Wait up to minute for new message to appear
wait(driver, 60).until(lambda driver: driver.find_elements_by_class_name('div_i_am_targeting') != messages)
# Print new message
for message in [m.text for m in driver.find_elements_by_class_name('div_i_am_targeting') if m not in messages]:
print(message)
# Update list of messages
messages = driver.find_elements_by_class_name('div_i_am_targeting')
except:
# Break the loop in case no new messages after minute passed
print('No new messages')
break
Andersson, thanks for this great solution. It seems to be semi working for me. But I have noticed after a variable number of additional div elements being added (about 10, but not always 10) , it will sometimes skip a new div, then continue, and it will always fail about after 20 new divs being added. I've checked the html, and can't see anything different about the div structure for the divs it's breaking at. Can you think of any reason why this might be? Thanks
– Gary
Nov 24 '18 at 17:24
To help debug I added print(count) after the #print new message , comment. I noticed it's continuously stopping at 48 to 49 total number of divs (even though new divs are being added within a few seconds of the last div that prints), and although it's skipping some of the divs, in the printed output it can still see they are there, because the count above the print new message statement jumps for example it will run: 35... printed output, 36.... printed output .... 39.... printed output
– Gary
Nov 24 '18 at 17:49
1
@Gary , Are old divs still on page? Or they removed after some number of new divs added? Also is it possible that several new messages comes at the same time or the time between messages is almost constant?
– Andersson
Nov 24 '18 at 17:55
Great point, yes I checked and after a certain number of divs the older divs are replaced such that the first div is removed everytime a new div is added (the visual display is a list style box, that only shows the last x messages, and each div contains a single message) ; yes messages happen constantly some can come every second, some might be very close to the same time and closer in time than 1 second duration
– Gary
Nov 24 '18 at 18:13
@Gary , try updated answer and let me know in case of any new issues
– Andersson
Nov 24 '18 at 18:23
|
show 1 more comment
You can apply below code to continuously print content of required divs:
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium import webdriver
chrome_path = r"C:scrapechromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://website.com/")
# Get current divs
messages = driver.find_elements_by_class_name('div_i_am_targeting')
# Print all messages
for message in messages:
print(message.text)
while True:
try:
# Wait up to minute for new message to appear
wait(driver, 60).until(lambda driver: driver.find_elements_by_class_name('div_i_am_targeting') != messages)
# Print new message
for message in [m.text for m in driver.find_elements_by_class_name('div_i_am_targeting') if m not in messages]:
print(message)
# Update list of messages
messages = driver.find_elements_by_class_name('div_i_am_targeting')
except:
# Break the loop in case no new messages after minute passed
print('No new messages')
break
You can apply below code to continuously print content of required divs:
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium import webdriver
chrome_path = r"C:scrapechromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://website.com/")
# Get current divs
messages = driver.find_elements_by_class_name('div_i_am_targeting')
# Print all messages
for message in messages:
print(message.text)
while True:
try:
# Wait up to minute for new message to appear
wait(driver, 60).until(lambda driver: driver.find_elements_by_class_name('div_i_am_targeting') != messages)
# Print new message
for message in [m.text for m in driver.find_elements_by_class_name('div_i_am_targeting') if m not in messages]:
print(message)
# Update list of messages
messages = driver.find_elements_by_class_name('div_i_am_targeting')
except:
# Break the loop in case no new messages after minute passed
print('No new messages')
break
edited Nov 24 '18 at 18:22
answered Nov 24 '18 at 13:58
AnderssonAndersson
38k103266
38k103266
Andersson, thanks for this great solution. It seems to be semi working for me. But I have noticed after a variable number of additional div elements being added (about 10, but not always 10) , it will sometimes skip a new div, then continue, and it will always fail about after 20 new divs being added. I've checked the html, and can't see anything different about the div structure for the divs it's breaking at. Can you think of any reason why this might be? Thanks
– Gary
Nov 24 '18 at 17:24
To help debug I added print(count) after the #print new message , comment. I noticed it's continuously stopping at 48 to 49 total number of divs (even though new divs are being added within a few seconds of the last div that prints), and although it's skipping some of the divs, in the printed output it can still see they are there, because the count above the print new message statement jumps for example it will run: 35... printed output, 36.... printed output .... 39.... printed output
– Gary
Nov 24 '18 at 17:49
1
@Gary , Are old divs still on page? Or they removed after some number of new divs added? Also is it possible that several new messages comes at the same time or the time between messages is almost constant?
– Andersson
Nov 24 '18 at 17:55
Great point, yes I checked and after a certain number of divs the older divs are replaced such that the first div is removed everytime a new div is added (the visual display is a list style box, that only shows the last x messages, and each div contains a single message) ; yes messages happen constantly some can come every second, some might be very close to the same time and closer in time than 1 second duration
– Gary
Nov 24 '18 at 18:13
@Gary , try updated answer and let me know in case of any new issues
– Andersson
Nov 24 '18 at 18:23
|
show 1 more comment
Andersson, thanks for this great solution. It seems to be semi working for me. But I have noticed after a variable number of additional div elements being added (about 10, but not always 10) , it will sometimes skip a new div, then continue, and it will always fail about after 20 new divs being added. I've checked the html, and can't see anything different about the div structure for the divs it's breaking at. Can you think of any reason why this might be? Thanks
– Gary
Nov 24 '18 at 17:24
To help debug I added print(count) after the #print new message , comment. I noticed it's continuously stopping at 48 to 49 total number of divs (even though new divs are being added within a few seconds of the last div that prints), and although it's skipping some of the divs, in the printed output it can still see they are there, because the count above the print new message statement jumps for example it will run: 35... printed output, 36.... printed output .... 39.... printed output
– Gary
Nov 24 '18 at 17:49
1
@Gary , Are old divs still on page? Or they removed after some number of new divs added? Also is it possible that several new messages comes at the same time or the time between messages is almost constant?
– Andersson
Nov 24 '18 at 17:55
Great point, yes I checked and after a certain number of divs the older divs are replaced such that the first div is removed everytime a new div is added (the visual display is a list style box, that only shows the last x messages, and each div contains a single message) ; yes messages happen constantly some can come every second, some might be very close to the same time and closer in time than 1 second duration
– Gary
Nov 24 '18 at 18:13
@Gary , try updated answer and let me know in case of any new issues
– Andersson
Nov 24 '18 at 18:23
Andersson, thanks for this great solution. It seems to be semi working for me. But I have noticed after a variable number of additional div elements being added (about 10, but not always 10) , it will sometimes skip a new div, then continue, and it will always fail about after 20 new divs being added. I've checked the html, and can't see anything different about the div structure for the divs it's breaking at. Can you think of any reason why this might be? Thanks
– Gary
Nov 24 '18 at 17:24
Andersson, thanks for this great solution. It seems to be semi working for me. But I have noticed after a variable number of additional div elements being added (about 10, but not always 10) , it will sometimes skip a new div, then continue, and it will always fail about after 20 new divs being added. I've checked the html, and can't see anything different about the div structure for the divs it's breaking at. Can you think of any reason why this might be? Thanks
– Gary
Nov 24 '18 at 17:24
To help debug I added print(count) after the #print new message , comment. I noticed it's continuously stopping at 48 to 49 total number of divs (even though new divs are being added within a few seconds of the last div that prints), and although it's skipping some of the divs, in the printed output it can still see they are there, because the count above the print new message statement jumps for example it will run: 35... printed output, 36.... printed output .... 39.... printed output
– Gary
Nov 24 '18 at 17:49
To help debug I added print(count) after the #print new message , comment. I noticed it's continuously stopping at 48 to 49 total number of divs (even though new divs are being added within a few seconds of the last div that prints), and although it's skipping some of the divs, in the printed output it can still see they are there, because the count above the print new message statement jumps for example it will run: 35... printed output, 36.... printed output .... 39.... printed output
– Gary
Nov 24 '18 at 17:49
1
1
@Gary , Are old divs still on page? Or they removed after some number of new divs added? Also is it possible that several new messages comes at the same time or the time between messages is almost constant?
– Andersson
Nov 24 '18 at 17:55
@Gary , Are old divs still on page? Or they removed after some number of new divs added? Also is it possible that several new messages comes at the same time or the time between messages is almost constant?
– Andersson
Nov 24 '18 at 17:55
Great point, yes I checked and after a certain number of divs the older divs are replaced such that the first div is removed everytime a new div is added (the visual display is a list style box, that only shows the last x messages, and each div contains a single message) ; yes messages happen constantly some can come every second, some might be very close to the same time and closer in time than 1 second duration
– Gary
Nov 24 '18 at 18:13
Great point, yes I checked and after a certain number of divs the older divs are replaced such that the first div is removed everytime a new div is added (the visual display is a list style box, that only shows the last x messages, and each div contains a single message) ; yes messages happen constantly some can come every second, some might be very close to the same time and closer in time than 1 second duration
– Gary
Nov 24 '18 at 18:13
@Gary , try updated answer and let me know in case of any new issues
– Andersson
Nov 24 '18 at 18:23
@Gary , try updated answer and let me know in case of any new issues
– Andersson
Nov 24 '18 at 18:23
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458309%2fhow-to-extract-text-from-divs-in-selenium-using-python-when-new-divs-are-added-e%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I guess you need to put this in an infinite loop but does each div has any unique identification since we need to exclude the divs that have already been processed?
– Samarth
Nov 24 '18 at 12:54
@Gary, can you share the webpage you're trying to scrape? I cannot test here without a specific link in order to ensure my solution works.
– Luan Naufal
Nov 24 '18 at 12:54
One solution would be to add a loop with a sleep in the end, so you could ensure you're taking all generated divs:
if message.text not in messages:
messages.append(message.text)
sleep(1)
– Luan Naufal
Nov 24 '18 at 12:57
Thanks both. I cannot share the webpage but the the content I want to extract is within the_div_i_am_targeting, there is no unique identifier on these divs, the structure of the content is: div class="the_div_i_am_targeting"> <p> some text </p></div> this pattern is repeated indefinitely on the page, so there are many of the same divs generated. The code above works fine, but I need to find a way to get the program to continue to run and continuously capture the new divs as they are created.Thanks for the suggestion about iterating over the loop with messate.text not in messages.
– Gary
Nov 24 '18 at 13:17
@Gary I understand your usecase is to extract text from the newly added
<div>
s but what is the exit criteria for your Test?– DebanjanB
Nov 24 '18 at 13:42