urllib in Python 3 gives me no umlaut

I try to fetch some google results with BeautifulSoup and urllib:

from urllib.request import Request, urlopen

from urllib.parse import quote

from bs4 import BeautifulSoup



url = "http://www.google.de/search?q=" + quote("ätzend")



req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

soup = BeautifulSoup(urlopen(req),"html.parser")



for item in soup.select(".r a"):

        print(item.text)

This is the result:

�tzende Stoffe � Wikipedia

�tzende Stoffe � Wikipedia

�tzend � Wikipedia

I tried using decode('utf-8') it doesn't help. What can I do?

Edit:
Also tried:

soup = BeautifulSoup(urlopen(req).read().decode('utf-8'),"html.parser")

Same problem. using utf-16 doesn't help either. The Unicode for the letter `Ä' is 196 => C4.

Edit2:
Windows Power Shell shows correct results.

edited Nov 23 '18 at 19:37

asked Nov 23 '18 at 19:10

DoubleVoid

368628

Python can tell you what Unicode code that question mark is. What is it?
– usr2564301
Nov 23 '18 at 19:13

1

docs.python.org/3/library/functions.html#ord
– usr2564301
Nov 23 '18 at 19:17

1

That is Latin1 encoding.
– usr2564301
Nov 23 '18 at 19:30

1

Your code is fine. The problem is the encoding of your terminal. Try using cp1252 (typical windows encoding). Also: are you 100% sure that the terminal uses a font that can render those characters? Because python can produce whatever result you want but the terminal then has to display it and some fonts simply do not have certain glyphs, and they will show a box or some other weird symbol instead (although the question mark is usually used for invalid characters not "missing glyph" ones).
– Bakuriu
Nov 23 '18 at 19:42

1

@DoubleVoid: Did you make any code change? Please share what you did to see the right output.
– shahkalpesh
Nov 23 '18 at 19:57

|
show 11 more comments

I try to fetch some google results with BeautifulSoup and urllib:

from urllib.request import Request, urlopen

from urllib.parse import quote

from bs4 import BeautifulSoup



url = "http://www.google.de/search?q=" + quote("ätzend")



req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

soup = BeautifulSoup(urlopen(req),"html.parser")



for item in soup.select(".r a"):

        print(item.text)

This is the result:

�tzende Stoffe � Wikipedia

�tzende Stoffe � Wikipedia

�tzend � Wikipedia

I tried using decode('utf-8') it doesn't help. What can I do?

Edit:
Also tried:

soup = BeautifulSoup(urlopen(req).read().decode('utf-8'),"html.parser")

Same problem. using utf-16 doesn't help either. The Unicode for the letter `Ä' is 196 => C4.

Edit2:
Windows Power Shell shows correct results.

edited Nov 23 '18 at 19:37

asked Nov 23 '18 at 19:10

DoubleVoid

368628

Python can tell you what Unicode code that question mark is. What is it?
– usr2564301
Nov 23 '18 at 19:13

1

docs.python.org/3/library/functions.html#ord
– usr2564301
Nov 23 '18 at 19:17

1

That is Latin1 encoding.
– usr2564301
Nov 23 '18 at 19:30

1

Your code is fine. The problem is the encoding of your terminal. Try using cp1252 (typical windows encoding). Also: are you 100% sure that the terminal uses a font that can render those characters? Because python can produce whatever result you want but the terminal then has to display it and some fonts simply do not have certain glyphs, and they will show a box or some other weird symbol instead (although the question mark is usually used for invalid characters not "missing glyph" ones).
– Bakuriu
Nov 23 '18 at 19:42

1

@DoubleVoid: Did you make any code change? Please share what you did to see the right output.
– shahkalpesh
Nov 23 '18 at 19:57

|
show 11 more comments

I try to fetch some google results with BeautifulSoup and urllib:

from urllib.request import Request, urlopen

from urllib.parse import quote

from bs4 import BeautifulSoup



url = "http://www.google.de/search?q=" + quote("ätzend")



req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

soup = BeautifulSoup(urlopen(req),"html.parser")



for item in soup.select(".r a"):

        print(item.text)

This is the result:

�tzende Stoffe � Wikipedia

�tzende Stoffe � Wikipedia

�tzend � Wikipedia

I tried using decode('utf-8') it doesn't help. What can I do?

Edit:
Also tried:

soup = BeautifulSoup(urlopen(req).read().decode('utf-8'),"html.parser")

Same problem. using utf-16 doesn't help either. The Unicode for the letter `Ä' is 196 => C4.

Edit2:
Windows Power Shell shows correct results.

edited Nov 23 '18 at 19:37

asked Nov 23 '18 at 19:10

DoubleVoid

368628

I try to fetch some google results with BeautifulSoup and urllib:

from urllib.request import Request, urlopen

from urllib.parse import quote

from bs4 import BeautifulSoup



url = "http://www.google.de/search?q=" + quote("ätzend")



req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

soup = BeautifulSoup(urlopen(req),"html.parser")



for item in soup.select(".r a"):

        print(item.text)

This is the result:

�tzende Stoffe � Wikipedia

�tzende Stoffe � Wikipedia

�tzend � Wikipedia

I tried using decode('utf-8') it doesn't help. What can I do?

Edit:
Also tried:

soup = BeautifulSoup(urlopen(req).read().decode('utf-8'),"html.parser")

Same problem. using utf-16 doesn't help either. The Unicode for the letter `Ä' is 196 => C4.

Edit2:
Windows Power Shell shows correct results.

python python-3.x beautifulsoup urllib

edited Nov 23 '18 at 19:37

asked Nov 23 '18 at 19:10

DoubleVoid

368628

edited Nov 23 '18 at 19:37

asked Nov 23 '18 at 19:10

DoubleVoid

368628

edited Nov 23 '18 at 19:37

asked Nov 23 '18 at 19:10

DoubleVoid

368628

asked Nov 23 '18 at 19:10

DoubleVoid

368628

asked Nov 23 '18 at 19:10

DoubleVoid

368628

Python can tell you what Unicode code that question mark is. What is it?
– usr2564301
Nov 23 '18 at 19:13

1

docs.python.org/3/library/functions.html#ord
– usr2564301
Nov 23 '18 at 19:17

1

That is Latin1 encoding.
– usr2564301
Nov 23 '18 at 19:30

1

Your code is fine. The problem is the encoding of your terminal. Try using cp1252 (typical windows encoding). Also: are you 100% sure that the terminal uses a font that can render those characters? Because python can produce whatever result you want but the terminal then has to display it and some fonts simply do not have certain glyphs, and they will show a box or some other weird symbol instead (although the question mark is usually used for invalid characters not "missing glyph" ones).
– Bakuriu
Nov 23 '18 at 19:42

1

@DoubleVoid: Did you make any code change? Please share what you did to see the right output.
– shahkalpesh
Nov 23 '18 at 19:57

|
show 11 more comments

Python can tell you what Unicode code that question mark is. What is it?
– usr2564301
Nov 23 '18 at 19:13

1

docs.python.org/3/library/functions.html#ord
– usr2564301
Nov 23 '18 at 19:17

1

That is Latin1 encoding.
– usr2564301
Nov 23 '18 at 19:30

1

Your code is fine. The problem is the encoding of your terminal. Try using cp1252 (typical windows encoding). Also: are you 100% sure that the terminal uses a font that can render those characters? Because python can produce whatever result you want but the terminal then has to display it and some fonts simply do not have certain glyphs, and they will show a box or some other weird symbol instead (although the question mark is usually used for invalid characters not "missing glyph" ones).
– Bakuriu
Nov 23 '18 at 19:42

1

@DoubleVoid: Did you make any code change? Please share what you did to see the right output.
– shahkalpesh
Nov 23 '18 at 19:57

Python can tell you what Unicode code that question mark is. What is it?
– usr2564301
Nov 23 '18 at 19:13

docs.python.org/3/library/functions.html#ord
– usr2564301
Nov 23 '18 at 19:17

That is Latin1 encoding.
– usr2564301
Nov 23 '18 at 19:30

Your code is fine. The problem is the encoding of your terminal. Try using cp1252 (typical windows encoding). Also: are you 100% sure that the terminal uses a font that can render those characters? Because python can produce whatever result you want but the terminal then has to display it and some fonts simply do not have certain glyphs, and they will show a box or some other weird symbol instead (although the question mark is usually used for invalid characters not "missing glyph" ones).
– Bakuriu
Nov 23 '18 at 19:42

@DoubleVoid: Did you make any code change? Please share what you did to see the right output.
– shahkalpesh
Nov 23 '18 at 19:57

|
show 11 more comments

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451881%2furllib-in-python-3-gives-me-no-umlaut%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl