urllib in Python 3 gives me no umlaut












0














I try to fetch some google results with BeautifulSoup and urllib:



from urllib.request import Request, urlopen
from urllib.parse import quote
from bs4 import BeautifulSoup

url = "http://www.google.de/search?q=" + quote("ätzend")

req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(urlopen(req),"html.parser")

for item in soup.select(".r a"):
print(item.text)


This is the result:



�tzende Stoffe � Wikipedia
�tzende Stoffe � Wikipedia
�tzend � Wikipedia


I tried using decode('utf-8') it doesn't help. What can I do?



Edit:
Also tried:



soup = BeautifulSoup(urlopen(req).read().decode('utf-8'),"html.parser")


Same problem. using utf-16 doesn't help either. The Unicode for the letter `Ä' is 196 => C4.



Edit2:
Windows Power Shell shows correct results.










share|improve this question
























  • Python can tell you what Unicode code that question mark is. What is it?
    – usr2564301
    Nov 23 '18 at 19:13






  • 1




    docs.python.org/3/library/functions.html#ord
    – usr2564301
    Nov 23 '18 at 19:17






  • 1




    That is Latin1 encoding.
    – usr2564301
    Nov 23 '18 at 19:30






  • 1




    Your code is fine. The problem is the encoding of your terminal. Try using cp1252 (typical windows encoding). Also: are you 100% sure that the terminal uses a font that can render those characters? Because python can produce whatever result you want but the terminal then has to display it and some fonts simply do not have certain glyphs, and they will show a box or some other weird symbol instead (although the question mark is usually used for invalid characters not "missing glyph" ones).
    – Bakuriu
    Nov 23 '18 at 19:42








  • 1




    @DoubleVoid: Did you make any code change? Please share what you did to see the right output.
    – shahkalpesh
    Nov 23 '18 at 19:57
















0














I try to fetch some google results with BeautifulSoup and urllib:



from urllib.request import Request, urlopen
from urllib.parse import quote
from bs4 import BeautifulSoup

url = "http://www.google.de/search?q=" + quote("ätzend")

req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(urlopen(req),"html.parser")

for item in soup.select(".r a"):
print(item.text)


This is the result:



�tzende Stoffe � Wikipedia
�tzende Stoffe � Wikipedia
�tzend � Wikipedia


I tried using decode('utf-8') it doesn't help. What can I do?



Edit:
Also tried:



soup = BeautifulSoup(urlopen(req).read().decode('utf-8'),"html.parser")


Same problem. using utf-16 doesn't help either. The Unicode for the letter `Ä' is 196 => C4.



Edit2:
Windows Power Shell shows correct results.










share|improve this question
























  • Python can tell you what Unicode code that question mark is. What is it?
    – usr2564301
    Nov 23 '18 at 19:13






  • 1




    docs.python.org/3/library/functions.html#ord
    – usr2564301
    Nov 23 '18 at 19:17






  • 1




    That is Latin1 encoding.
    – usr2564301
    Nov 23 '18 at 19:30






  • 1




    Your code is fine. The problem is the encoding of your terminal. Try using cp1252 (typical windows encoding). Also: are you 100% sure that the terminal uses a font that can render those characters? Because python can produce whatever result you want but the terminal then has to display it and some fonts simply do not have certain glyphs, and they will show a box or some other weird symbol instead (although the question mark is usually used for invalid characters not "missing glyph" ones).
    – Bakuriu
    Nov 23 '18 at 19:42








  • 1




    @DoubleVoid: Did you make any code change? Please share what you did to see the right output.
    – shahkalpesh
    Nov 23 '18 at 19:57














0












0








0







I try to fetch some google results with BeautifulSoup and urllib:



from urllib.request import Request, urlopen
from urllib.parse import quote
from bs4 import BeautifulSoup

url = "http://www.google.de/search?q=" + quote("ätzend")

req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(urlopen(req),"html.parser")

for item in soup.select(".r a"):
print(item.text)


This is the result:



�tzende Stoffe � Wikipedia
�tzende Stoffe � Wikipedia
�tzend � Wikipedia


I tried using decode('utf-8') it doesn't help. What can I do?



Edit:
Also tried:



soup = BeautifulSoup(urlopen(req).read().decode('utf-8'),"html.parser")


Same problem. using utf-16 doesn't help either. The Unicode for the letter `Ä' is 196 => C4.



Edit2:
Windows Power Shell shows correct results.










share|improve this question















I try to fetch some google results with BeautifulSoup and urllib:



from urllib.request import Request, urlopen
from urllib.parse import quote
from bs4 import BeautifulSoup

url = "http://www.google.de/search?q=" + quote("ätzend")

req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(urlopen(req),"html.parser")

for item in soup.select(".r a"):
print(item.text)


This is the result:



�tzende Stoffe � Wikipedia
�tzende Stoffe � Wikipedia
�tzend � Wikipedia


I tried using decode('utf-8') it doesn't help. What can I do?



Edit:
Also tried:



soup = BeautifulSoup(urlopen(req).read().decode('utf-8'),"html.parser")


Same problem. using utf-16 doesn't help either. The Unicode for the letter `Ä' is 196 => C4.



Edit2:
Windows Power Shell shows correct results.







python python-3.x beautifulsoup urllib






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '18 at 19:37







DoubleVoid

















asked Nov 23 '18 at 19:10









DoubleVoidDoubleVoid

368628




368628












  • Python can tell you what Unicode code that question mark is. What is it?
    – usr2564301
    Nov 23 '18 at 19:13






  • 1




    docs.python.org/3/library/functions.html#ord
    – usr2564301
    Nov 23 '18 at 19:17






  • 1




    That is Latin1 encoding.
    – usr2564301
    Nov 23 '18 at 19:30






  • 1




    Your code is fine. The problem is the encoding of your terminal. Try using cp1252 (typical windows encoding). Also: are you 100% sure that the terminal uses a font that can render those characters? Because python can produce whatever result you want but the terminal then has to display it and some fonts simply do not have certain glyphs, and they will show a box or some other weird symbol instead (although the question mark is usually used for invalid characters not "missing glyph" ones).
    – Bakuriu
    Nov 23 '18 at 19:42








  • 1




    @DoubleVoid: Did you make any code change? Please share what you did to see the right output.
    – shahkalpesh
    Nov 23 '18 at 19:57


















  • Python can tell you what Unicode code that question mark is. What is it?
    – usr2564301
    Nov 23 '18 at 19:13






  • 1




    docs.python.org/3/library/functions.html#ord
    – usr2564301
    Nov 23 '18 at 19:17






  • 1




    That is Latin1 encoding.
    – usr2564301
    Nov 23 '18 at 19:30






  • 1




    Your code is fine. The problem is the encoding of your terminal. Try using cp1252 (typical windows encoding). Also: are you 100% sure that the terminal uses a font that can render those characters? Because python can produce whatever result you want but the terminal then has to display it and some fonts simply do not have certain glyphs, and they will show a box or some other weird symbol instead (although the question mark is usually used for invalid characters not "missing glyph" ones).
    – Bakuriu
    Nov 23 '18 at 19:42








  • 1




    @DoubleVoid: Did you make any code change? Please share what you did to see the right output.
    – shahkalpesh
    Nov 23 '18 at 19:57
















Python can tell you what Unicode code that question mark is. What is it?
– usr2564301
Nov 23 '18 at 19:13




Python can tell you what Unicode code that question mark is. What is it?
– usr2564301
Nov 23 '18 at 19:13




1




1




docs.python.org/3/library/functions.html#ord
– usr2564301
Nov 23 '18 at 19:17




docs.python.org/3/library/functions.html#ord
– usr2564301
Nov 23 '18 at 19:17




1




1




That is Latin1 encoding.
– usr2564301
Nov 23 '18 at 19:30




That is Latin1 encoding.
– usr2564301
Nov 23 '18 at 19:30




1




1




Your code is fine. The problem is the encoding of your terminal. Try using cp1252 (typical windows encoding). Also: are you 100% sure that the terminal uses a font that can render those characters? Because python can produce whatever result you want but the terminal then has to display it and some fonts simply do not have certain glyphs, and they will show a box or some other weird symbol instead (although the question mark is usually used for invalid characters not "missing glyph" ones).
– Bakuriu
Nov 23 '18 at 19:42






Your code is fine. The problem is the encoding of your terminal. Try using cp1252 (typical windows encoding). Also: are you 100% sure that the terminal uses a font that can render those characters? Because python can produce whatever result you want but the terminal then has to display it and some fonts simply do not have certain glyphs, and they will show a box or some other weird symbol instead (although the question mark is usually used for invalid characters not "missing glyph" ones).
– Bakuriu
Nov 23 '18 at 19:42






1




1




@DoubleVoid: Did you make any code change? Please share what you did to see the right output.
– shahkalpesh
Nov 23 '18 at 19:57




@DoubleVoid: Did you make any code change? Please share what you did to see the right output.
– shahkalpesh
Nov 23 '18 at 19:57












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451881%2furllib-in-python-3-gives-me-no-umlaut%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451881%2furllib-in-python-3-gives-me-no-umlaut%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

Calculate evaluation metrics using cross_val_predict sklearn

Insert data from modal to MySQL (multiple modal on website)