How can I pull webpage data into my DataFrame by referencing a specific HTML class or id using pandas...
I'm trying to pull the data from the table at this site and save it in a CSV with the column 'ticker' included. Right now my code is this:
import requests
import pandas as pd
url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[0]
print (df)
df.to_csv('my data.csv')
and it results in a file that looks like this.
I want to have the 'ticker' column in my CSV file with the corresponding ticker listed for each company. The ticker is in the HTML here (class="ticker--small"). The output should look like this.
I'm totally stuck on this. I've tried doing it in BeautifulSoup too but I can't get it working. Any help would be greatly appreciated.
python html pandas beautifulsoup datareader
add a comment |
I'm trying to pull the data from the table at this site and save it in a CSV with the column 'ticker' included. Right now my code is this:
import requests
import pandas as pd
url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[0]
print (df)
df.to_csv('my data.csv')
and it results in a file that looks like this.
I want to have the 'ticker' column in my CSV file with the corresponding ticker listed for each company. The ticker is in the HTML here (class="ticker--small"). The output should look like this.
I'm totally stuck on this. I've tried doing it in BeautifulSoup too but I can't get it working. Any help would be greatly appreciated.
python html pandas beautifulsoup datareader
You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it
– Vishnudev
Nov 29 '18 at 2:23
How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.
– j_rothmans
Nov 29 '18 at 5:18
According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with<div class="filter-table__row js-tr">one at a time
– G. Anderson
Nov 29 '18 at 17:01
I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?
– j_rothmans
Nov 29 '18 at 19:06
add a comment |
I'm trying to pull the data from the table at this site and save it in a CSV with the column 'ticker' included. Right now my code is this:
import requests
import pandas as pd
url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[0]
print (df)
df.to_csv('my data.csv')
and it results in a file that looks like this.
I want to have the 'ticker' column in my CSV file with the corresponding ticker listed for each company. The ticker is in the HTML here (class="ticker--small"). The output should look like this.
I'm totally stuck on this. I've tried doing it in BeautifulSoup too but I can't get it working. Any help would be greatly appreciated.
python html pandas beautifulsoup datareader
I'm trying to pull the data from the table at this site and save it in a CSV with the column 'ticker' included. Right now my code is this:
import requests
import pandas as pd
url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[0]
print (df)
df.to_csv('my data.csv')
and it results in a file that looks like this.
I want to have the 'ticker' column in my CSV file with the corresponding ticker listed for each company. The ticker is in the HTML here (class="ticker--small"). The output should look like this.
I'm totally stuck on this. I've tried doing it in BeautifulSoup too but I can't get it working. Any help would be greatly appreciated.
python html pandas beautifulsoup datareader
python html pandas beautifulsoup datareader
edited Nov 29 '18 at 0:02
j_rothmans
asked Nov 28 '18 at 23:43
j_rothmansj_rothmans
12
12
You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it
– Vishnudev
Nov 29 '18 at 2:23
How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.
– j_rothmans
Nov 29 '18 at 5:18
According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with<div class="filter-table__row js-tr">one at a time
– G. Anderson
Nov 29 '18 at 17:01
I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?
– j_rothmans
Nov 29 '18 at 19:06
add a comment |
You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it
– Vishnudev
Nov 29 '18 at 2:23
How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.
– j_rothmans
Nov 29 '18 at 5:18
According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with<div class="filter-table__row js-tr">one at a time
– G. Anderson
Nov 29 '18 at 17:01
I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?
– j_rothmans
Nov 29 '18 at 19:06
You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it
– Vishnudev
Nov 29 '18 at 2:23
You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it
– Vishnudev
Nov 29 '18 at 2:23
How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.
– j_rothmans
Nov 29 '18 at 5:18
How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.
– j_rothmans
Nov 29 '18 at 5:18
According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with
<div class="filter-table__row js-tr"> one at a time– G. Anderson
Nov 29 '18 at 17:01
According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with
<div class="filter-table__row js-tr"> one at a time– G. Anderson
Nov 29 '18 at 17:01
I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?
– j_rothmans
Nov 29 '18 at 19:06
I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?
– j_rothmans
Nov 29 '18 at 19:06
add a comment |
1 Answer
1
active
oldest
votes
it has multiple table, use BeautifulSoup to extract and do loop to write the csv.
from bs4 import BeautifulSoup
import requests, lxml
import pandas as pd
url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
tables = soup.findAll('table')
for table in tables:
df = pd.read_html(str(table))[0]
with open('my_data.csv', 'a+') as f:
df.to_csv(f)
Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.
– j_rothmans
Nov 29 '18 at 5:17
then you have to code beautifulsoup
– ewwink
Nov 29 '18 at 5:51
How would I do that?
– j_rothmans
Nov 29 '18 at 6:14
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53529785%2fhow-can-i-pull-webpage-data-into-my-dataframe-by-referencing-a-specific-html-cla%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
it has multiple table, use BeautifulSoup to extract and do loop to write the csv.
from bs4 import BeautifulSoup
import requests, lxml
import pandas as pd
url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
tables = soup.findAll('table')
for table in tables:
df = pd.read_html(str(table))[0]
with open('my_data.csv', 'a+') as f:
df.to_csv(f)
Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.
– j_rothmans
Nov 29 '18 at 5:17
then you have to code beautifulsoup
– ewwink
Nov 29 '18 at 5:51
How would I do that?
– j_rothmans
Nov 29 '18 at 6:14
add a comment |
it has multiple table, use BeautifulSoup to extract and do loop to write the csv.
from bs4 import BeautifulSoup
import requests, lxml
import pandas as pd
url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
tables = soup.findAll('table')
for table in tables:
df = pd.read_html(str(table))[0]
with open('my_data.csv', 'a+') as f:
df.to_csv(f)
Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.
– j_rothmans
Nov 29 '18 at 5:17
then you have to code beautifulsoup
– ewwink
Nov 29 '18 at 5:51
How would I do that?
– j_rothmans
Nov 29 '18 at 6:14
add a comment |
it has multiple table, use BeautifulSoup to extract and do loop to write the csv.
from bs4 import BeautifulSoup
import requests, lxml
import pandas as pd
url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
tables = soup.findAll('table')
for table in tables:
df = pd.read_html(str(table))[0]
with open('my_data.csv', 'a+') as f:
df.to_csv(f)
it has multiple table, use BeautifulSoup to extract and do loop to write the csv.
from bs4 import BeautifulSoup
import requests, lxml
import pandas as pd
url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
tables = soup.findAll('table')
for table in tables:
df = pd.read_html(str(table))[0]
with open('my_data.csv', 'a+') as f:
df.to_csv(f)
answered Nov 29 '18 at 2:33
ewwinkewwink
12.2k22440
12.2k22440
Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.
– j_rothmans
Nov 29 '18 at 5:17
then you have to code beautifulsoup
– ewwink
Nov 29 '18 at 5:51
How would I do that?
– j_rothmans
Nov 29 '18 at 6:14
add a comment |
Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.
– j_rothmans
Nov 29 '18 at 5:17
then you have to code beautifulsoup
– ewwink
Nov 29 '18 at 5:51
How would I do that?
– j_rothmans
Nov 29 '18 at 6:14
Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.
– j_rothmans
Nov 29 '18 at 5:17
Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.
– j_rothmans
Nov 29 '18 at 5:17
then you have to code beautifulsoup
– ewwink
Nov 29 '18 at 5:51
then you have to code beautifulsoup
– ewwink
Nov 29 '18 at 5:51
How would I do that?
– j_rothmans
Nov 29 '18 at 6:14
How would I do that?
– j_rothmans
Nov 29 '18 at 6:14
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53529785%2fhow-can-i-pull-webpage-data-into-my-dataframe-by-referencing-a-specific-html-cla%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it
– Vishnudev
Nov 29 '18 at 2:23
How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.
– j_rothmans
Nov 29 '18 at 5:18
According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with
<div class="filter-table__row js-tr">one at a time– G. Anderson
Nov 29 '18 at 17:01
I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?
– j_rothmans
Nov 29 '18 at 19:06