How can I pull webpage data into my DataFrame by referencing a specific HTML class or id using pandas...

I'm trying to pull the data from the table at this site and save it in a CSV with the column 'ticker' included. Right now my code is this:

import requests

import pandas as pd



url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'

html = requests.get(url).content

df_list = pd.read_html(html)

df = df_list[0]

print (df)

df.to_csv('my data.csv')

and it results in a file that looks like this.

I want to have the 'ticker' column in my CSV file with the corresponding ticker listed for each company. The ticker is in the HTML here (class="ticker--small"). The output should look like this.

I'm totally stuck on this. I've tried doing it in BeautifulSoup too but I can't get it working. Any help would be greatly appreciated.

edited Nov 29 '18 at 0:02

asked Nov 28 '18 at 23:43

j_rothmans

You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it

– Vishnudev
Nov 29 '18 at 2:23

How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.

– j_rothmans
Nov 29 '18 at 5:18

According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with <div class="filter-table__row js-tr"> one at a time

– G. Anderson
Nov 29 '18 at 17:01

I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?

– j_rothmans
Nov 29 '18 at 19:06

add a comment |

I'm trying to pull the data from the table at this site and save it in a CSV with the column 'ticker' included. Right now my code is this:

import requests

import pandas as pd



url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'

html = requests.get(url).content

df_list = pd.read_html(html)

df = df_list[0]

print (df)

df.to_csv('my data.csv')

and it results in a file that looks like this.

I want to have the 'ticker' column in my CSV file with the corresponding ticker listed for each company. The ticker is in the HTML here (class="ticker--small"). The output should look like this.

I'm totally stuck on this. I've tried doing it in BeautifulSoup too but I can't get it working. Any help would be greatly appreciated.

edited Nov 29 '18 at 0:02

asked Nov 28 '18 at 23:43

j_rothmans

You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it

– Vishnudev
Nov 29 '18 at 2:23

How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.

– j_rothmans
Nov 29 '18 at 5:18

According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with <div class="filter-table__row js-tr"> one at a time

– G. Anderson
Nov 29 '18 at 17:01

I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?

– j_rothmans
Nov 29 '18 at 19:06

add a comment |

I'm trying to pull the data from the table at this site and save it in a CSV with the column 'ticker' included. Right now my code is this:

import requests

import pandas as pd



url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'

html = requests.get(url).content

df_list = pd.read_html(html)

df = df_list[0]

print (df)

df.to_csv('my data.csv')

and it results in a file that looks like this.

I want to have the 'ticker' column in my CSV file with the corresponding ticker listed for each company. The ticker is in the HTML here (class="ticker--small"). The output should look like this.

I'm totally stuck on this. I've tried doing it in BeautifulSoup too but I can't get it working. Any help would be greatly appreciated.

edited Nov 29 '18 at 0:02

asked Nov 28 '18 at 23:43

j_rothmans

I'm trying to pull the data from the table at this site and save it in a CSV with the column 'ticker' included. Right now my code is this:

import requests

import pandas as pd



url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'

html = requests.get(url).content

df_list = pd.read_html(html)

df = df_list[0]

print (df)

df.to_csv('my data.csv')

and it results in a file that looks like this.

I want to have the 'ticker' column in my CSV file with the corresponding ticker listed for each company. The ticker is in the HTML here (class="ticker--small"). The output should look like this.

I'm totally stuck on this. I've tried doing it in BeautifulSoup too but I can't get it working. Any help would be greatly appreciated.

python html pandas beautifulsoup datareader

edited Nov 29 '18 at 0:02

asked Nov 28 '18 at 23:43

j_rothmans

edited Nov 29 '18 at 0:02

asked Nov 28 '18 at 23:43

j_rothmans

edited Nov 29 '18 at 0:02

asked Nov 28 '18 at 23:43

j_rothmans

asked Nov 28 '18 at 23:43

j_rothmans

asked Nov 28 '18 at 23:43

j_rothmans

You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it

– Vishnudev
Nov 29 '18 at 2:23

How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.

– j_rothmans
Nov 29 '18 at 5:18

According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with <div class="filter-table__row js-tr"> one at a time

– G. Anderson
Nov 29 '18 at 17:01

I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?

– j_rothmans
Nov 29 '18 at 19:06

add a comment |

You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it

– Vishnudev
Nov 29 '18 at 2:23

How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.

– j_rothmans
Nov 29 '18 at 5:18

According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with <div class="filter-table__row js-tr"> one at a time

– G. Anderson
Nov 29 '18 at 17:01

I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?

– j_rothmans
Nov 29 '18 at 19:06

You need to associate each ticker with the table you have got. Pandas read html can't be used for all cases. As you said you need Beautiful Soup kind of library to parse the html and then build an table using it

– Vishnudev
Nov 29 '18 at 2:23

How would you suggest I do that? The post below doesn't solve it in BeautifulSoup.

– j_rothmans
Nov 29 '18 at 5:18

According to the terms of use of the site: "scraping on data is not permitted as per the Terms of Use." And, unfortunately, it seems that the site intentionally makes it difficult to do the simple web scraping. You may be able to use beautifulsoup to pull all elements with <div class="filter-table__row js-tr"> one at a time

– G. Anderson
Nov 29 '18 at 17:01

I saw that class but wasn't sure how to use it, how would you incorporate it into ewwink's code in the comment below?

– j_rothmans
Nov 29 '18 at 19:06

add a comment |

1 Answer
1

active

oldest

votes

it has multiple table, use BeautifulSoup to extract and do loop to write the csv.

from bs4 import BeautifulSoup



import requests, lxml

import pandas as pd



url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'

html = requests.get(url).text

soup = BeautifulSoup(html, 'lxml')

tables = soup.findAll('table')



for table in tables:

    df = pd.read_html(str(table))[0]

    with open('my_data.csv', 'a+') as f:

      df.to_csv(f)

answered Nov 29 '18 at 2:33

ewwink

12.2k22440

Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.

– j_rothmans
Nov 29 '18 at 5:17

then you have to code beautifulsoup

– ewwink
Nov 29 '18 at 5:51

How would I do that?

– j_rothmans
Nov 29 '18 at 6:14

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53529785%2fhow-can-i-pull-webpage-data-into-my-dataframe-by-referencing-a-specific-html-cla%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

it has multiple table, use BeautifulSoup to extract and do loop to write the csv.

from bs4 import BeautifulSoup



import requests, lxml

import pandas as pd



url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'

html = requests.get(url).text

soup = BeautifulSoup(html, 'lxml')

tables = soup.findAll('table')



for table in tables:

    df = pd.read_html(str(table))[0]

    with open('my_data.csv', 'a+') as f:

      df.to_csv(f)

answered Nov 29 '18 at 2:33

ewwink

12.2k22440

Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.

– j_rothmans
Nov 29 '18 at 5:17

then you have to code beautifulsoup

– ewwink
Nov 29 '18 at 5:51

How would I do that?

– j_rothmans
Nov 29 '18 at 6:14

add a comment |

it has multiple table, use BeautifulSoup to extract and do loop to write the csv.

from bs4 import BeautifulSoup



import requests, lxml

import pandas as pd



url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'

html = requests.get(url).text

soup = BeautifulSoup(html, 'lxml')

tables = soup.findAll('table')



for table in tables:

    df = pd.read_html(str(table))[0]

    with open('my_data.csv', 'a+') as f:

      df.to_csv(f)

answered Nov 29 '18 at 2:33

ewwink

12.2k22440

Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.

– j_rothmans
Nov 29 '18 at 5:17

then you have to code beautifulsoup

– ewwink
Nov 29 '18 at 5:51

How would I do that?

– j_rothmans
Nov 29 '18 at 6:14

add a comment |

it has multiple table, use BeautifulSoup to extract and do loop to write the csv.

from bs4 import BeautifulSoup



import requests, lxml

import pandas as pd



url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'

html = requests.get(url).text

soup = BeautifulSoup(html, 'lxml')

tables = soup.findAll('table')



for table in tables:

    df = pd.read_html(str(table))[0]

    with open('my_data.csv', 'a+') as f:

      df.to_csv(f)

answered Nov 29 '18 at 2:33

ewwink

12.2k22440

it has multiple table, use BeautifulSoup to extract and do loop to write the csv.

from bs4 import BeautifulSoup



import requests, lxml

import pandas as pd



url = 'https://www.biopharmcatalyst.com/biotech-stocks/company-pipeline-database#marketCap=mid|stages=approved,crl'

html = requests.get(url).text

soup = BeautifulSoup(html, 'lxml')

tables = soup.findAll('table')



for table in tables:

    df = pd.read_html(str(table))[0]

    with open('my_data.csv', 'a+') as f:

      df.to_csv(f)

answered Nov 29 '18 at 2:33

ewwink

12.2k22440

answered Nov 29 '18 at 2:33

ewwink

12.2k22440

answered Nov 29 '18 at 2:33

ewwink

12.2k22440

answered Nov 29 '18 at 2:33

ewwink

12.2k22440

Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.

– j_rothmans
Nov 29 '18 at 5:17

then you have to code beautifulsoup

– ewwink
Nov 29 '18 at 5:51

How would I do that?

– j_rothmans
Nov 29 '18 at 6:14

add a comment |

Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.

– j_rothmans
Nov 29 '18 at 5:17

then you have to code beautifulsoup

– ewwink
Nov 29 '18 at 5:51

How would I do that?

– j_rothmans
Nov 29 '18 at 6:14

Thanks for your suggestion. Unfortunately, this yields the same result as I already have, there is no column in the CSV file for ticker or company name.

– j_rothmans
Nov 29 '18 at 5:17

then you have to code beautifulsoup

– ewwink
Nov 29 '18 at 5:51

How would I do that?

– j_rothmans
Nov 29 '18 at 6:14

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl