How to turn a binary string into a byte?











up vote
1
down vote

favorite












If I take the letter 'à' and encode it in UTF-8 I obtain the following result:



'à'.encode('utf-8')
>> b'xc3xa0'


Now from a bytearray I would like to convert 'à' into a binary string and turn it back into 'à'. To do so I execute the following code:



byte = bytearray('à','utf-8')
for x in byte:
print(bin(x))


I get 0b11000011and0b10100000, which is 195 and 160. Then, I fuse them together and take the 0b part out. Now I execute this code:



s = '1100001110100000'
value1 = s[0:8].encode('utf-8')
value2 = s[9:16].encode('utf-8')
value = value1 + value2
print(chr(int(value, 2)))
>> 憠


No matter how I develop the later part I get symbols and never seem to be able to get back my 'à'. I would like to know why is that? And how can I get an 'à'.










share|improve this question


























    up vote
    1
    down vote

    favorite












    If I take the letter 'à' and encode it in UTF-8 I obtain the following result:



    'à'.encode('utf-8')
    >> b'xc3xa0'


    Now from a bytearray I would like to convert 'à' into a binary string and turn it back into 'à'. To do so I execute the following code:



    byte = bytearray('à','utf-8')
    for x in byte:
    print(bin(x))


    I get 0b11000011and0b10100000, which is 195 and 160. Then, I fuse them together and take the 0b part out. Now I execute this code:



    s = '1100001110100000'
    value1 = s[0:8].encode('utf-8')
    value2 = s[9:16].encode('utf-8')
    value = value1 + value2
    print(chr(int(value, 2)))
    >> 憠


    No matter how I develop the later part I get symbols and never seem to be able to get back my 'à'. I would like to know why is that? And how can I get an 'à'.










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      If I take the letter 'à' and encode it in UTF-8 I obtain the following result:



      'à'.encode('utf-8')
      >> b'xc3xa0'


      Now from a bytearray I would like to convert 'à' into a binary string and turn it back into 'à'. To do so I execute the following code:



      byte = bytearray('à','utf-8')
      for x in byte:
      print(bin(x))


      I get 0b11000011and0b10100000, which is 195 and 160. Then, I fuse them together and take the 0b part out. Now I execute this code:



      s = '1100001110100000'
      value1 = s[0:8].encode('utf-8')
      value2 = s[9:16].encode('utf-8')
      value = value1 + value2
      print(chr(int(value, 2)))
      >> 憠


      No matter how I develop the later part I get symbols and never seem to be able to get back my 'à'. I would like to know why is that? And how can I get an 'à'.










      share|improve this question













      If I take the letter 'à' and encode it in UTF-8 I obtain the following result:



      'à'.encode('utf-8')
      >> b'xc3xa0'


      Now from a bytearray I would like to convert 'à' into a binary string and turn it back into 'à'. To do so I execute the following code:



      byte = bytearray('à','utf-8')
      for x in byte:
      print(bin(x))


      I get 0b11000011and0b10100000, which is 195 and 160. Then, I fuse them together and take the 0b part out. Now I execute this code:



      s = '1100001110100000'
      value1 = s[0:8].encode('utf-8')
      value2 = s[9:16].encode('utf-8')
      value = value1 + value2
      print(chr(int(value, 2)))
      >> 憠


      No matter how I develop the later part I get symbols and never seem to be able to get back my 'à'. I would like to know why is that? And how can I get an 'à'.







      python unicode utf-8 utf






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 21 at 23:44









      jatrp5

      142




      142
























          3 Answers
          3






          active

          oldest

          votes

















          up vote
          3
          down vote













          >>> bytes(int(s[i:i+8], 2) for i in range(0, len(s), 8)).decode('utf-8')
          'à'


          There are multiple parts to this. The bytes constructor creates a byte string from a sequence of integers. The integers are formed from strings using int with a base of 2. The range combined with the slicing peels off 8 characters at a time. Finally decode converts those bytes back into Unicode characters.






          share|improve this answer

















          • 1




            Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.
            – torek
            Nov 22 at 0:03


















          up vote
          0
          down vote













          you need your second bits to be s[8:16] (or just s[8:]) otherwise you get 0100000



          you also need to convert you "bit string" back to an integer before thinking of it as a byte with int("0010101",2)



          s = '1100001110100000'
          value1 = bytearray([int(s[:8],2), # bits 0..7 (8 total)
          int(s[8:],2)] # bits 8..15 (8 total)
          )
          print(value1.decode("utf8"))





          share|improve this answer




























            up vote
            0
            down vote













            Convert the base-2 value back to an integer with int(s,2), convert that integer to a number of bytes (int.to_bytes) based on the original length divided by 8 and big-endian conversion to keep the bytes in the right order, then .decode() it (default in Python 3 is utf8):



            >>> s = '1100001110100000'
            >>> int(s,2)
            50080
            >>> int(s,2).to_bytes(len(s)//8,'big')
            b'xc3xa0'
            >>> int(s,2).to_bytes(len(s)//8,'big').decode()
            'à'





            share|improve this answer





















              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53422008%2fhow-to-turn-a-binary-string-into-a-byte%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              3
              down vote













              >>> bytes(int(s[i:i+8], 2) for i in range(0, len(s), 8)).decode('utf-8')
              'à'


              There are multiple parts to this. The bytes constructor creates a byte string from a sequence of integers. The integers are formed from strings using int with a base of 2. The range combined with the slicing peels off 8 characters at a time. Finally decode converts those bytes back into Unicode characters.






              share|improve this answer

















              • 1




                Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.
                – torek
                Nov 22 at 0:03















              up vote
              3
              down vote













              >>> bytes(int(s[i:i+8], 2) for i in range(0, len(s), 8)).decode('utf-8')
              'à'


              There are multiple parts to this. The bytes constructor creates a byte string from a sequence of integers. The integers are formed from strings using int with a base of 2. The range combined with the slicing peels off 8 characters at a time. Finally decode converts those bytes back into Unicode characters.






              share|improve this answer

















              • 1




                Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.
                – torek
                Nov 22 at 0:03













              up vote
              3
              down vote










              up vote
              3
              down vote









              >>> bytes(int(s[i:i+8], 2) for i in range(0, len(s), 8)).decode('utf-8')
              'à'


              There are multiple parts to this. The bytes constructor creates a byte string from a sequence of integers. The integers are formed from strings using int with a base of 2. The range combined with the slicing peels off 8 characters at a time. Finally decode converts those bytes back into Unicode characters.






              share|improve this answer












              >>> bytes(int(s[i:i+8], 2) for i in range(0, len(s), 8)).decode('utf-8')
              'à'


              There are multiple parts to this. The bytes constructor creates a byte string from a sequence of integers. The integers are formed from strings using int with a base of 2. The range combined with the slicing peels off 8 characters at a time. Finally decode converts those bytes back into Unicode characters.







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Nov 21 at 23:50









              Mark Ransom

              221k29275503




              221k29275503








              • 1




                Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.
                – torek
                Nov 22 at 0:03














              • 1




                Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.
                – torek
                Nov 22 at 0:03








              1




              1




              Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.
              – torek
              Nov 22 at 0:03




              Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.
              – torek
              Nov 22 at 0:03












              up vote
              0
              down vote













              you need your second bits to be s[8:16] (or just s[8:]) otherwise you get 0100000



              you also need to convert you "bit string" back to an integer before thinking of it as a byte with int("0010101",2)



              s = '1100001110100000'
              value1 = bytearray([int(s[:8],2), # bits 0..7 (8 total)
              int(s[8:],2)] # bits 8..15 (8 total)
              )
              print(value1.decode("utf8"))





              share|improve this answer

























                up vote
                0
                down vote













                you need your second bits to be s[8:16] (or just s[8:]) otherwise you get 0100000



                you also need to convert you "bit string" back to an integer before thinking of it as a byte with int("0010101",2)



                s = '1100001110100000'
                value1 = bytearray([int(s[:8],2), # bits 0..7 (8 total)
                int(s[8:],2)] # bits 8..15 (8 total)
                )
                print(value1.decode("utf8"))





                share|improve this answer























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  you need your second bits to be s[8:16] (or just s[8:]) otherwise you get 0100000



                  you also need to convert you "bit string" back to an integer before thinking of it as a byte with int("0010101",2)



                  s = '1100001110100000'
                  value1 = bytearray([int(s[:8],2), # bits 0..7 (8 total)
                  int(s[8:],2)] # bits 8..15 (8 total)
                  )
                  print(value1.decode("utf8"))





                  share|improve this answer












                  you need your second bits to be s[8:16] (or just s[8:]) otherwise you get 0100000



                  you also need to convert you "bit string" back to an integer before thinking of it as a byte with int("0010101",2)



                  s = '1100001110100000'
                  value1 = bytearray([int(s[:8],2), # bits 0..7 (8 total)
                  int(s[8:],2)] # bits 8..15 (8 total)
                  )
                  print(value1.decode("utf8"))






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 21 at 23:51









                  Joran Beasley

                  71.8k676116




                  71.8k676116






















                      up vote
                      0
                      down vote













                      Convert the base-2 value back to an integer with int(s,2), convert that integer to a number of bytes (int.to_bytes) based on the original length divided by 8 and big-endian conversion to keep the bytes in the right order, then .decode() it (default in Python 3 is utf8):



                      >>> s = '1100001110100000'
                      >>> int(s,2)
                      50080
                      >>> int(s,2).to_bytes(len(s)//8,'big')
                      b'xc3xa0'
                      >>> int(s,2).to_bytes(len(s)//8,'big').decode()
                      'à'





                      share|improve this answer

























                        up vote
                        0
                        down vote













                        Convert the base-2 value back to an integer with int(s,2), convert that integer to a number of bytes (int.to_bytes) based on the original length divided by 8 and big-endian conversion to keep the bytes in the right order, then .decode() it (default in Python 3 is utf8):



                        >>> s = '1100001110100000'
                        >>> int(s,2)
                        50080
                        >>> int(s,2).to_bytes(len(s)//8,'big')
                        b'xc3xa0'
                        >>> int(s,2).to_bytes(len(s)//8,'big').decode()
                        'à'





                        share|improve this answer























                          up vote
                          0
                          down vote










                          up vote
                          0
                          down vote









                          Convert the base-2 value back to an integer with int(s,2), convert that integer to a number of bytes (int.to_bytes) based on the original length divided by 8 and big-endian conversion to keep the bytes in the right order, then .decode() it (default in Python 3 is utf8):



                          >>> s = '1100001110100000'
                          >>> int(s,2)
                          50080
                          >>> int(s,2).to_bytes(len(s)//8,'big')
                          b'xc3xa0'
                          >>> int(s,2).to_bytes(len(s)//8,'big').decode()
                          'à'





                          share|improve this answer












                          Convert the base-2 value back to an integer with int(s,2), convert that integer to a number of bytes (int.to_bytes) based on the original length divided by 8 and big-endian conversion to keep the bytes in the right order, then .decode() it (default in Python 3 is utf8):



                          >>> s = '1100001110100000'
                          >>> int(s,2)
                          50080
                          >>> int(s,2).to_bytes(len(s)//8,'big')
                          b'xc3xa0'
                          >>> int(s,2).to_bytes(len(s)//8,'big').decode()
                          'à'






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 22 at 7:29









                          Mark Tolonen

                          89.7k12107175




                          89.7k12107175






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53422008%2fhow-to-turn-a-binary-string-into-a-byte%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Contact image not getting when fetch all contact list from iPhone by CNContact

                              count number of partitions of a set with n elements into k subsets

                              A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks