Programatically identified cookie is not getting accepted
up vote
0
down vote
favorite
I am working on a web scraper on Python 2 that reads some contents of a website. To access the contents, I need to pass a cookie. Right now, I am finding the cookie by opening the website in Chrome, and finding the cookie from site information. I am hardcoding this cookie into my scraper and getting contents from website. However, the cookies gets invalidated in some hours and then no information can be extracted from the website. To address this, I am trying to refresh the cookie in my scraper itself when a new cookie is needed.
I have tried the following two codes
First approach
import requests
import browsercookie
try:
cj = browsercookie.chrome()
session = requests.Session()
r = session.get(base_url, cookies=cj)
new_cookie = str(session.cookies.get_dict()['JSESSIONID'])
except Exception as e:
pass
Second approach
with requests.Session() as s:
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
headers['Connection'] = 'keep-alive'
r = s.get(baseurl, headers=headers)
new_cookie = s.cookies.get_dict()['JSESSIONID']
All of these codes return cookies that looks perfectly fine. The problem I am facing is that these programatically identified cookies make the scraper not extract any result. When I send the cookie found in browser as hardcoded while making a request to website from scraper, the scraper gets the DOM of the website. But When I send the cookie found programatically while making a request to the website from scraper, the scraper cant access the DOM of the webiste.
The cookie information on the browser says that the cookie gets invalidated "When the browsing session ends".
This is very puzzling. What is that I am missing in this whole process?
python cookies web-scraping request
add a comment |
up vote
0
down vote
favorite
I am working on a web scraper on Python 2 that reads some contents of a website. To access the contents, I need to pass a cookie. Right now, I am finding the cookie by opening the website in Chrome, and finding the cookie from site information. I am hardcoding this cookie into my scraper and getting contents from website. However, the cookies gets invalidated in some hours and then no information can be extracted from the website. To address this, I am trying to refresh the cookie in my scraper itself when a new cookie is needed.
I have tried the following two codes
First approach
import requests
import browsercookie
try:
cj = browsercookie.chrome()
session = requests.Session()
r = session.get(base_url, cookies=cj)
new_cookie = str(session.cookies.get_dict()['JSESSIONID'])
except Exception as e:
pass
Second approach
with requests.Session() as s:
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
headers['Connection'] = 'keep-alive'
r = s.get(baseurl, headers=headers)
new_cookie = s.cookies.get_dict()['JSESSIONID']
All of these codes return cookies that looks perfectly fine. The problem I am facing is that these programatically identified cookies make the scraper not extract any result. When I send the cookie found in browser as hardcoded while making a request to website from scraper, the scraper gets the DOM of the website. But When I send the cookie found programatically while making a request to the website from scraper, the scraper cant access the DOM of the webiste.
The cookie information on the browser says that the cookie gets invalidated "When the browsing session ends".
This is very puzzling. What is that I am missing in this whole process?
python cookies web-scraping request
If you need chrome to get a good session cookie then you should use selenium or headless chrome.
– pguardiario
Nov 22 at 9:33
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am working on a web scraper on Python 2 that reads some contents of a website. To access the contents, I need to pass a cookie. Right now, I am finding the cookie by opening the website in Chrome, and finding the cookie from site information. I am hardcoding this cookie into my scraper and getting contents from website. However, the cookies gets invalidated in some hours and then no information can be extracted from the website. To address this, I am trying to refresh the cookie in my scraper itself when a new cookie is needed.
I have tried the following two codes
First approach
import requests
import browsercookie
try:
cj = browsercookie.chrome()
session = requests.Session()
r = session.get(base_url, cookies=cj)
new_cookie = str(session.cookies.get_dict()['JSESSIONID'])
except Exception as e:
pass
Second approach
with requests.Session() as s:
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
headers['Connection'] = 'keep-alive'
r = s.get(baseurl, headers=headers)
new_cookie = s.cookies.get_dict()['JSESSIONID']
All of these codes return cookies that looks perfectly fine. The problem I am facing is that these programatically identified cookies make the scraper not extract any result. When I send the cookie found in browser as hardcoded while making a request to website from scraper, the scraper gets the DOM of the website. But When I send the cookie found programatically while making a request to the website from scraper, the scraper cant access the DOM of the webiste.
The cookie information on the browser says that the cookie gets invalidated "When the browsing session ends".
This is very puzzling. What is that I am missing in this whole process?
python cookies web-scraping request
I am working on a web scraper on Python 2 that reads some contents of a website. To access the contents, I need to pass a cookie. Right now, I am finding the cookie by opening the website in Chrome, and finding the cookie from site information. I am hardcoding this cookie into my scraper and getting contents from website. However, the cookies gets invalidated in some hours and then no information can be extracted from the website. To address this, I am trying to refresh the cookie in my scraper itself when a new cookie is needed.
I have tried the following two codes
First approach
import requests
import browsercookie
try:
cj = browsercookie.chrome()
session = requests.Session()
r = session.get(base_url, cookies=cj)
new_cookie = str(session.cookies.get_dict()['JSESSIONID'])
except Exception as e:
pass
Second approach
with requests.Session() as s:
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
headers['Connection'] = 'keep-alive'
r = s.get(baseurl, headers=headers)
new_cookie = s.cookies.get_dict()['JSESSIONID']
All of these codes return cookies that looks perfectly fine. The problem I am facing is that these programatically identified cookies make the scraper not extract any result. When I send the cookie found in browser as hardcoded while making a request to website from scraper, the scraper gets the DOM of the website. But When I send the cookie found programatically while making a request to the website from scraper, the scraper cant access the DOM of the webiste.
The cookie information on the browser says that the cookie gets invalidated "When the browsing session ends".
This is very puzzling. What is that I am missing in this whole process?
python cookies web-scraping request
python cookies web-scraping request
edited Nov 22 at 7:52
asked Nov 22 at 7:42
harshvardhan
18213
18213
If you need chrome to get a good session cookie then you should use selenium or headless chrome.
– pguardiario
Nov 22 at 9:33
add a comment |
If you need chrome to get a good session cookie then you should use selenium or headless chrome.
– pguardiario
Nov 22 at 9:33
If you need chrome to get a good session cookie then you should use selenium or headless chrome.
– pguardiario
Nov 22 at 9:33
If you need chrome to get a good session cookie then you should use selenium or headless chrome.
– pguardiario
Nov 22 at 9:33
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426041%2fprogramatically-identified-cookie-is-not-getting-accepted%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If you need chrome to get a good session cookie then you should use selenium or headless chrome.
– pguardiario
Nov 22 at 9:33