How can I pull webpage data into my DataFrame by referencing a specific HTML class or id using pandas...












0















I'm trying to pull the data from the table at this site and save it in a CSV with the column 'ticker' included. Right now my code is this:



import requests
import pandas as pd

url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[0]
print (df)
df.to_csv('my data.csv')


and it results in a file that looks like this.



I want to have the 'ticker' column in my CSV file with the corresponding ticker listed for each company. The ticker is in the HTML here (class="ticker--small"). The output should look like this.



I'm totally stuck on this. I've tried doing it in BeautifulSoup too but I can't get it working. Any help would be greatly appreciated.










share|improve this question

























  • You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it

    – Vishnudev
    Nov 29 '18 at 2:23











  • How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.

    – j_rothmans
    Nov 29 '18 at 5:18











  • According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with <div class="filter-table__row js-tr"> one at a time

    – G. Anderson
    Nov 29 '18 at 17:01











  • I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?

    – j_rothmans
    Nov 29 '18 at 19:06
















0















I'm trying to pull the data from the table at this site and save it in a CSV with the column 'ticker' included. Right now my code is this:



import requests
import pandas as pd

url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[0]
print (df)
df.to_csv('my data.csv')


and it results in a file that looks like this.



I want to have the 'ticker' column in my CSV file with the corresponding ticker listed for each company. The ticker is in the HTML here (class="ticker--small"). The output should look like this.



I'm totally stuck on this. I've tried doing it in BeautifulSoup too but I can't get it working. Any help would be greatly appreciated.










share|improve this question

























  • You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it

    – Vishnudev
    Nov 29 '18 at 2:23











  • How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.

    – j_rothmans
    Nov 29 '18 at 5:18











  • According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with <div class="filter-table__row js-tr"> one at a time

    – G. Anderson
    Nov 29 '18 at 17:01











  • I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?

    – j_rothmans
    Nov 29 '18 at 19:06














0












0








0








I'm trying to pull the data from the table at this site and save it in a CSV with the column 'ticker' included. Right now my code is this:



import requests
import pandas as pd

url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[0]
print (df)
df.to_csv('my data.csv')


and it results in a file that looks like this.



I want to have the 'ticker' column in my CSV file with the corresponding ticker listed for each company. The ticker is in the HTML here (class="ticker--small"). The output should look like this.



I'm totally stuck on this. I've tried doing it in BeautifulSoup too but I can't get it working. Any help would be greatly appreciated.










share|improve this question
















I'm trying to pull the data from the table at this site and save it in a CSV with the column 'ticker' included. Right now my code is this:



import requests
import pandas as pd

url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[0]
print (df)
df.to_csv('my data.csv')


and it results in a file that looks like this.



I want to have the 'ticker' column in my CSV file with the corresponding ticker listed for each company. The ticker is in the HTML here (class="ticker--small"). The output should look like this.



I'm totally stuck on this. I've tried doing it in BeautifulSoup too but I can't get it working. Any help would be greatly appreciated.







python html pandas beautifulsoup datareader






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 29 '18 at 0:02







j_rothmans

















asked Nov 28 '18 at 23:43









j_rothmansj_rothmans

12




12













  • You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it

    – Vishnudev
    Nov 29 '18 at 2:23











  • How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.

    – j_rothmans
    Nov 29 '18 at 5:18











  • According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with <div class="filter-table__row js-tr"> one at a time

    – G. Anderson
    Nov 29 '18 at 17:01











  • I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?

    – j_rothmans
    Nov 29 '18 at 19:06



















  • You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it

    – Vishnudev
    Nov 29 '18 at 2:23











  • How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.

    – j_rothmans
    Nov 29 '18 at 5:18











  • According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with <div class="filter-table__row js-tr"> one at a time

    – G. Anderson
    Nov 29 '18 at 17:01











  • I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?

    – j_rothmans
    Nov 29 '18 at 19:06

















You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it

– Vishnudev
Nov 29 '18 at 2:23





You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it

– Vishnudev
Nov 29 '18 at 2:23













How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.

– j_rothmans
Nov 29 '18 at 5:18





How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.

– j_rothmans
Nov 29 '18 at 5:18













According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with <div class="filter-table__row js-tr"> one at a time

– G. Anderson
Nov 29 '18 at 17:01





According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with <div class="filter-table__row js-tr"> one at a time

– G. Anderson
Nov 29 '18 at 17:01













I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?

– j_rothmans
Nov 29 '18 at 19:06





I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?

– j_rothmans
Nov 29 '18 at 19:06












1 Answer
1






active

oldest

votes


















0














it has multiple table, use BeautifulSoup to extract and do loop to write the csv.



from bs4 import BeautifulSoup

import requests, lxml
import pandas as pd

url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
tables = soup.findAll('table')

for table in tables:
df = pd.read_html(str(table))[0]
with open('my_data.csv', 'a+') as f:
df.to_csv(f)





share|improve this answer
























  • Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.

    – j_rothmans
    Nov 29 '18 at 5:17











  • then you have to code beautifulsoup

    – ewwink
    Nov 29 '18 at 5:51











  • How would I do that?

    – j_rothmans
    Nov 29 '18 at 6:14












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53529785%2fhow-can-i-pull-webpage-data-into-my-dataframe-by-referencing-a-specific-html-cla%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














it has multiple table, use BeautifulSoup to extract and do loop to write the csv.



from bs4 import BeautifulSoup

import requests, lxml
import pandas as pd

url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
tables = soup.findAll('table')

for table in tables:
df = pd.read_html(str(table))[0]
with open('my_data.csv', 'a+') as f:
df.to_csv(f)





share|improve this answer
























  • Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.

    – j_rothmans
    Nov 29 '18 at 5:17











  • then you have to code beautifulsoup

    – ewwink
    Nov 29 '18 at 5:51











  • How would I do that?

    – j_rothmans
    Nov 29 '18 at 6:14
















0














it has multiple table, use BeautifulSoup to extract and do loop to write the csv.



from bs4 import BeautifulSoup

import requests, lxml
import pandas as pd

url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
tables = soup.findAll('table')

for table in tables:
df = pd.read_html(str(table))[0]
with open('my_data.csv', 'a+') as f:
df.to_csv(f)





share|improve this answer
























  • Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.

    – j_rothmans
    Nov 29 '18 at 5:17











  • then you have to code beautifulsoup

    – ewwink
    Nov 29 '18 at 5:51











  • How would I do that?

    – j_rothmans
    Nov 29 '18 at 6:14














0












0








0







it has multiple table, use BeautifulSoup to extract and do loop to write the csv.



from bs4 import BeautifulSoup

import requests, lxml
import pandas as pd

url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
tables = soup.findAll('table')

for table in tables:
df = pd.read_html(str(table))[0]
with open('my_data.csv', 'a+') as f:
df.to_csv(f)





share|improve this answer













it has multiple table, use BeautifulSoup to extract and do loop to write the csv.



from bs4 import BeautifulSoup

import requests, lxml
import pandas as pd

url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
tables = soup.findAll('table')

for table in tables:
df = pd.read_html(str(table))[0]
with open('my_data.csv', 'a+') as f:
df.to_csv(f)






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 29 '18 at 2:33









ewwinkewwink

12.2k22440




12.2k22440













  • Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.

    – j_rothmans
    Nov 29 '18 at 5:17











  • then you have to code beautifulsoup

    – ewwink
    Nov 29 '18 at 5:51











  • How would I do that?

    – j_rothmans
    Nov 29 '18 at 6:14



















  • Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.

    – j_rothmans
    Nov 29 '18 at 5:17











  • then you have to code beautifulsoup

    – ewwink
    Nov 29 '18 at 5:51











  • How would I do that?

    – j_rothmans
    Nov 29 '18 at 6:14

















Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.

– j_rothmans
Nov 29 '18 at 5:17





Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.

– j_rothmans
Nov 29 '18 at 5:17













then you have to code beautifulsoup

– ewwink
Nov 29 '18 at 5:51





then you have to code beautifulsoup

– ewwink
Nov 29 '18 at 5:51













How would I do that?

– j_rothmans
Nov 29 '18 at 6:14





How would I do that?

– j_rothmans
Nov 29 '18 at 6:14




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53529785%2fhow-can-i-pull-webpage-data-into-my-dataframe-by-referencing-a-specific-html-cla%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Lallio

Futebolista

Jornalista