Perl6 IO::Socket::Async truncates data
I'm rewriting my P5 socket server in P6 using IO::Socket::Async, but the data received got truncated 1 character at the end and that 1 character is received on the next connection. Someone from Perl6 Facebook group (Jonathan Worthington) pointed that this might be due to the nature of strings and bytes are handled very differently in P6. Quoted:
In Perl 6, strings and bytes are handled very differently. Of note, strings work at grapheme level. When receiving Unicode data, it's not only possible that a multi-byte sequence will be split over packets, but also a multi-codepoint sequence. For example, one packet might have the letter "a" at the end, and the next one would be a combining acute accent. Therefore, it can't safely pass on the "a" until it's seen how the next packet starts.
My P6 is running on MoarVM
https://pastebin.com/Vr8wqyVu
use Data::Dump;
use experimental :pack;
my $socket = IO::Socket::Async.listen('0.0.0.0', 7000);
react {
whenever $socket -> $conn {
my $line = '';
whenever $conn {
say "Received --> "~$_;
$conn.print: &translate($_) if $_.chars ge 100;
$conn.close;
}
}
CATCH {
default {
say .^name, ': ', .Str;
say "handled in $?LINE";
}
}
}
sub translate($raw) {
my $rawdata = $raw;
$raw ~~ s/^s+|s+$//; # remove heading/trailing whitespace
my $minus_checksum = substr($raw, 0, *-2);
my $our_checksum = generateChecksum($minus_checksum);
my $data_checksum = ($raw, *-2);
# say $our_checksum;
return $our_checksum;
}
sub generateChecksum($minus_checksum) {
# turn string into Blob
my Blob $blob = $minus_checksum.encode('utf-8');
# unpack Blob into ascii list
my @array = $blob.unpack("C*");
# perform bitwise operation for each ascii in the list
my $dec +^= $_ for $blob.unpack("C*");
# only take 2 digits
$dec = sprintf("%02d", $dec) if $dec ~~ /^d$/;
$dec = '0'.$dec if $dec ~~ /^[a..fA..F]$/;
$dec = uc $dec;
# convert it to hex
my $hex = sprintf '%02x', $dec;
return uc $hex;
}
Result
Received --> $$0116AA861013034151986|10001000181123062657411200000000000010235444112500000000.600000000345.4335N10058.8249E00015
Received --> 0
Received --> $$0116AA861013037849727|1080100018112114435541120000000000000FBA00D5122500000000.600000000623.9080N10007.8627E00075
Received --> D
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
perl6 io-socket moarvm
add a comment |
I'm rewriting my P5 socket server in P6 using IO::Socket::Async, but the data received got truncated 1 character at the end and that 1 character is received on the next connection. Someone from Perl6 Facebook group (Jonathan Worthington) pointed that this might be due to the nature of strings and bytes are handled very differently in P6. Quoted:
In Perl 6, strings and bytes are handled very differently. Of note, strings work at grapheme level. When receiving Unicode data, it's not only possible that a multi-byte sequence will be split over packets, but also a multi-codepoint sequence. For example, one packet might have the letter "a" at the end, and the next one would be a combining acute accent. Therefore, it can't safely pass on the "a" until it's seen how the next packet starts.
My P6 is running on MoarVM
https://pastebin.com/Vr8wqyVu
use Data::Dump;
use experimental :pack;
my $socket = IO::Socket::Async.listen('0.0.0.0', 7000);
react {
whenever $socket -> $conn {
my $line = '';
whenever $conn {
say "Received --> "~$_;
$conn.print: &translate($_) if $_.chars ge 100;
$conn.close;
}
}
CATCH {
default {
say .^name, ': ', .Str;
say "handled in $?LINE";
}
}
}
sub translate($raw) {
my $rawdata = $raw;
$raw ~~ s/^s+|s+$//; # remove heading/trailing whitespace
my $minus_checksum = substr($raw, 0, *-2);
my $our_checksum = generateChecksum($minus_checksum);
my $data_checksum = ($raw, *-2);
# say $our_checksum;
return $our_checksum;
}
sub generateChecksum($minus_checksum) {
# turn string into Blob
my Blob $blob = $minus_checksum.encode('utf-8');
# unpack Blob into ascii list
my @array = $blob.unpack("C*");
# perform bitwise operation for each ascii in the list
my $dec +^= $_ for $blob.unpack("C*");
# only take 2 digits
$dec = sprintf("%02d", $dec) if $dec ~~ /^d$/;
$dec = '0'.$dec if $dec ~~ /^[a..fA..F]$/;
$dec = uc $dec;
# convert it to hex
my $hex = sprintf '%02x', $dec;
return uc $hex;
}
Result
Received --> $$0116AA861013034151986|10001000181123062657411200000000000010235444112500000000.600000000345.4335N10058.8249E00015
Received --> 0
Received --> $$0116AA861013037849727|1080100018112114435541120000000000000FBA00D5122500000000.600000000623.9080N10007.8627E00075
Received --> D
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
perl6 io-socket moarvm
3
"IO::Socket::Async
truncates data". Actually P6 is helping you to avoid corrupting data. P6 keeps devs' choice clear: bytes or characters. EITHER You use:bin
so data is a sequence of bytes. So the unit of transfer is a byte. OR Data is text, a sequence of "what a user thinks of as a character". So the logical unit of transfer is one character at a time. Thus P6 buffers bytes to ensure it only delivers a whole character when it's known to be complete. This buffering is a consequence of Unicode's design.
– raiph
Nov 24 '18 at 9:57
add a comment |
I'm rewriting my P5 socket server in P6 using IO::Socket::Async, but the data received got truncated 1 character at the end and that 1 character is received on the next connection. Someone from Perl6 Facebook group (Jonathan Worthington) pointed that this might be due to the nature of strings and bytes are handled very differently in P6. Quoted:
In Perl 6, strings and bytes are handled very differently. Of note, strings work at grapheme level. When receiving Unicode data, it's not only possible that a multi-byte sequence will be split over packets, but also a multi-codepoint sequence. For example, one packet might have the letter "a" at the end, and the next one would be a combining acute accent. Therefore, it can't safely pass on the "a" until it's seen how the next packet starts.
My P6 is running on MoarVM
https://pastebin.com/Vr8wqyVu
use Data::Dump;
use experimental :pack;
my $socket = IO::Socket::Async.listen('0.0.0.0', 7000);
react {
whenever $socket -> $conn {
my $line = '';
whenever $conn {
say "Received --> "~$_;
$conn.print: &translate($_) if $_.chars ge 100;
$conn.close;
}
}
CATCH {
default {
say .^name, ': ', .Str;
say "handled in $?LINE";
}
}
}
sub translate($raw) {
my $rawdata = $raw;
$raw ~~ s/^s+|s+$//; # remove heading/trailing whitespace
my $minus_checksum = substr($raw, 0, *-2);
my $our_checksum = generateChecksum($minus_checksum);
my $data_checksum = ($raw, *-2);
# say $our_checksum;
return $our_checksum;
}
sub generateChecksum($minus_checksum) {
# turn string into Blob
my Blob $blob = $minus_checksum.encode('utf-8');
# unpack Blob into ascii list
my @array = $blob.unpack("C*");
# perform bitwise operation for each ascii in the list
my $dec +^= $_ for $blob.unpack("C*");
# only take 2 digits
$dec = sprintf("%02d", $dec) if $dec ~~ /^d$/;
$dec = '0'.$dec if $dec ~~ /^[a..fA..F]$/;
$dec = uc $dec;
# convert it to hex
my $hex = sprintf '%02x', $dec;
return uc $hex;
}
Result
Received --> $$0116AA861013034151986|10001000181123062657411200000000000010235444112500000000.600000000345.4335N10058.8249E00015
Received --> 0
Received --> $$0116AA861013037849727|1080100018112114435541120000000000000FBA00D5122500000000.600000000623.9080N10007.8627E00075
Received --> D
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
perl6 io-socket moarvm
I'm rewriting my P5 socket server in P6 using IO::Socket::Async, but the data received got truncated 1 character at the end and that 1 character is received on the next connection. Someone from Perl6 Facebook group (Jonathan Worthington) pointed that this might be due to the nature of strings and bytes are handled very differently in P6. Quoted:
In Perl 6, strings and bytes are handled very differently. Of note, strings work at grapheme level. When receiving Unicode data, it's not only possible that a multi-byte sequence will be split over packets, but also a multi-codepoint sequence. For example, one packet might have the letter "a" at the end, and the next one would be a combining acute accent. Therefore, it can't safely pass on the "a" until it's seen how the next packet starts.
My P6 is running on MoarVM
https://pastebin.com/Vr8wqyVu
use Data::Dump;
use experimental :pack;
my $socket = IO::Socket::Async.listen('0.0.0.0', 7000);
react {
whenever $socket -> $conn {
my $line = '';
whenever $conn {
say "Received --> "~$_;
$conn.print: &translate($_) if $_.chars ge 100;
$conn.close;
}
}
CATCH {
default {
say .^name, ': ', .Str;
say "handled in $?LINE";
}
}
}
sub translate($raw) {
my $rawdata = $raw;
$raw ~~ s/^s+|s+$//; # remove heading/trailing whitespace
my $minus_checksum = substr($raw, 0, *-2);
my $our_checksum = generateChecksum($minus_checksum);
my $data_checksum = ($raw, *-2);
# say $our_checksum;
return $our_checksum;
}
sub generateChecksum($minus_checksum) {
# turn string into Blob
my Blob $blob = $minus_checksum.encode('utf-8');
# unpack Blob into ascii list
my @array = $blob.unpack("C*");
# perform bitwise operation for each ascii in the list
my $dec +^= $_ for $blob.unpack("C*");
# only take 2 digits
$dec = sprintf("%02d", $dec) if $dec ~~ /^d$/;
$dec = '0'.$dec if $dec ~~ /^[a..fA..F]$/;
$dec = uc $dec;
# convert it to hex
my $hex = sprintf '%02x', $dec;
return uc $hex;
}
Result
Received --> $$0116AA861013034151986|10001000181123062657411200000000000010235444112500000000.600000000345.4335N10058.8249E00015
Received --> 0
Received --> $$0116AA861013037849727|1080100018112114435541120000000000000FBA00D5122500000000.600000000623.9080N10007.8627E00075
Received --> D
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
perl6 io-socket moarvm
perl6 io-socket moarvm
asked Nov 23 '18 at 21:36
Zarul ZakuanZarul Zakuan
1649
1649
3
"IO::Socket::Async
truncates data". Actually P6 is helping you to avoid corrupting data. P6 keeps devs' choice clear: bytes or characters. EITHER You use:bin
so data is a sequence of bytes. So the unit of transfer is a byte. OR Data is text, a sequence of "what a user thinks of as a character". So the logical unit of transfer is one character at a time. Thus P6 buffers bytes to ensure it only delivers a whole character when it's known to be complete. This buffering is a consequence of Unicode's design.
– raiph
Nov 24 '18 at 9:57
add a comment |
3
"IO::Socket::Async
truncates data". Actually P6 is helping you to avoid corrupting data. P6 keeps devs' choice clear: bytes or characters. EITHER You use:bin
so data is a sequence of bytes. So the unit of transfer is a byte. OR Data is text, a sequence of "what a user thinks of as a character". So the logical unit of transfer is one character at a time. Thus P6 buffers bytes to ensure it only delivers a whole character when it's known to be complete. This buffering is a consequence of Unicode's design.
– raiph
Nov 24 '18 at 9:57
3
3
"
IO::Socket::Async
truncates data". Actually P6 is helping you to avoid corrupting data. P6 keeps devs' choice clear: bytes or characters. EITHER You use :bin
so data is a sequence of bytes. So the unit of transfer is a byte. OR Data is text, a sequence of "what a user thinks of as a character". So the logical unit of transfer is one character at a time. Thus P6 buffers bytes to ensure it only delivers a whole character when it's known to be complete. This buffering is a consequence of Unicode's design.– raiph
Nov 24 '18 at 9:57
"
IO::Socket::Async
truncates data". Actually P6 is helping you to avoid corrupting data. P6 keeps devs' choice clear: bytes or characters. EITHER You use :bin
so data is a sequence of bytes. So the unit of transfer is a byte. OR Data is text, a sequence of "what a user thinks of as a character". So the logical unit of transfer is one character at a time. Thus P6 buffers bytes to ensure it only delivers a whole character when it's known to be complete. This buffering is a consequence of Unicode's design.– raiph
Nov 24 '18 at 9:57
add a comment |
1 Answer
1
active
oldest
votes
First of all, TCP connections are streams, so there's no promises that the "messages" that are sent will be received as equivalent "messages" on the receiving end. Things that are sent can be split up or merged as part of normal TCP behavior, even before Perl 6 behavior is considered. Anything that wants a "messages" abstraction needs to build it on top of the TCP stream (for example, by sending data as lines, or by sending a size in bytes, followed by the data).
In Perl 6, the data arriving over the socket is exposed as a Supply
. A whenever $conn { }
is short for whenever $conn.Supply { }
(the whenever
will coerce whatever it is given into a Supply
). The default Supply
is a character one, decoded as UTF-8 into a stream of Perl 6 Str
. As noted in the answer you already received, strings in Perl 6 work at grapheme level, so it will keep back a character in case the next thing that arrives over the network is a combining character. This is the "truncation" that you are experiencing. (There are some things which can never be combined. For example, n
can never have a combining character placed on it. This means that line-oriented protocols won't encounter this kind of behavior, and can be implemented as simply whenever $conn.Supply.lines { }
.)
There are a couple of options available:
- Do
whenever $conn.Supply(:bin) { }
, which will deliver binaryBlob
objects, which will correspond to what the OS passed to the VM. That can then be.decode
'd as wanted. This is probably your best bet. - Specify an encoding that does not support combining characters, for example
whenever $conn.Supply(:enc('latin-1')) { }
. (However, note that sincern
is 1 grapheme, then if the message were to end inr
then that would be held back in case the next packet came along with an
).
In both cases, it's still possible for messages to be split up during transmission, but these will (entirely and mostly, respectively) avoid the keep-one-back requirement that grapheme normalization entails.
Thank you Jonathan! I was so close. Instead I was using :$bin, as per documentation.
– Zarul Zakuan
Nov 25 '18 at 2:38
1
@ZarulZakuan The doc shows a routine's parameters. When calling a routine you pass arguments. Arguments are "bound" to parameters as part of the calling process but they are not the same before that.:$bin
as a parameter means the caller must pass a pair namedbin
as an argument. You can pass:$bin
to meanbin => $bin
but you must initialize$bin
toTrue
if you want:$bin
as an argument to becomebin => True
. Alternatively you can writebin => True
or just:bin
.
– raiph
Nov 25 '18 at 12:42
@raiph I see. Coming from Perl 5, there's a lot of new terms and concepts here. I can feel the same excitement when I was learning P5! Thanks!
– Zarul Zakuan
Nov 25 '18 at 13:35
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53453184%2fperl6-iosocketasync-truncates-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
First of all, TCP connections are streams, so there's no promises that the "messages" that are sent will be received as equivalent "messages" on the receiving end. Things that are sent can be split up or merged as part of normal TCP behavior, even before Perl 6 behavior is considered. Anything that wants a "messages" abstraction needs to build it on top of the TCP stream (for example, by sending data as lines, or by sending a size in bytes, followed by the data).
In Perl 6, the data arriving over the socket is exposed as a Supply
. A whenever $conn { }
is short for whenever $conn.Supply { }
(the whenever
will coerce whatever it is given into a Supply
). The default Supply
is a character one, decoded as UTF-8 into a stream of Perl 6 Str
. As noted in the answer you already received, strings in Perl 6 work at grapheme level, so it will keep back a character in case the next thing that arrives over the network is a combining character. This is the "truncation" that you are experiencing. (There are some things which can never be combined. For example, n
can never have a combining character placed on it. This means that line-oriented protocols won't encounter this kind of behavior, and can be implemented as simply whenever $conn.Supply.lines { }
.)
There are a couple of options available:
- Do
whenever $conn.Supply(:bin) { }
, which will deliver binaryBlob
objects, which will correspond to what the OS passed to the VM. That can then be.decode
'd as wanted. This is probably your best bet. - Specify an encoding that does not support combining characters, for example
whenever $conn.Supply(:enc('latin-1')) { }
. (However, note that sincern
is 1 grapheme, then if the message were to end inr
then that would be held back in case the next packet came along with an
).
In both cases, it's still possible for messages to be split up during transmission, but these will (entirely and mostly, respectively) avoid the keep-one-back requirement that grapheme normalization entails.
Thank you Jonathan! I was so close. Instead I was using :$bin, as per documentation.
– Zarul Zakuan
Nov 25 '18 at 2:38
1
@ZarulZakuan The doc shows a routine's parameters. When calling a routine you pass arguments. Arguments are "bound" to parameters as part of the calling process but they are not the same before that.:$bin
as a parameter means the caller must pass a pair namedbin
as an argument. You can pass:$bin
to meanbin => $bin
but you must initialize$bin
toTrue
if you want:$bin
as an argument to becomebin => True
. Alternatively you can writebin => True
or just:bin
.
– raiph
Nov 25 '18 at 12:42
@raiph I see. Coming from Perl 5, there's a lot of new terms and concepts here. I can feel the same excitement when I was learning P5! Thanks!
– Zarul Zakuan
Nov 25 '18 at 13:35
add a comment |
First of all, TCP connections are streams, so there's no promises that the "messages" that are sent will be received as equivalent "messages" on the receiving end. Things that are sent can be split up or merged as part of normal TCP behavior, even before Perl 6 behavior is considered. Anything that wants a "messages" abstraction needs to build it on top of the TCP stream (for example, by sending data as lines, or by sending a size in bytes, followed by the data).
In Perl 6, the data arriving over the socket is exposed as a Supply
. A whenever $conn { }
is short for whenever $conn.Supply { }
(the whenever
will coerce whatever it is given into a Supply
). The default Supply
is a character one, decoded as UTF-8 into a stream of Perl 6 Str
. As noted in the answer you already received, strings in Perl 6 work at grapheme level, so it will keep back a character in case the next thing that arrives over the network is a combining character. This is the "truncation" that you are experiencing. (There are some things which can never be combined. For example, n
can never have a combining character placed on it. This means that line-oriented protocols won't encounter this kind of behavior, and can be implemented as simply whenever $conn.Supply.lines { }
.)
There are a couple of options available:
- Do
whenever $conn.Supply(:bin) { }
, which will deliver binaryBlob
objects, which will correspond to what the OS passed to the VM. That can then be.decode
'd as wanted. This is probably your best bet. - Specify an encoding that does not support combining characters, for example
whenever $conn.Supply(:enc('latin-1')) { }
. (However, note that sincern
is 1 grapheme, then if the message were to end inr
then that would be held back in case the next packet came along with an
).
In both cases, it's still possible for messages to be split up during transmission, but these will (entirely and mostly, respectively) avoid the keep-one-back requirement that grapheme normalization entails.
Thank you Jonathan! I was so close. Instead I was using :$bin, as per documentation.
– Zarul Zakuan
Nov 25 '18 at 2:38
1
@ZarulZakuan The doc shows a routine's parameters. When calling a routine you pass arguments. Arguments are "bound" to parameters as part of the calling process but they are not the same before that.:$bin
as a parameter means the caller must pass a pair namedbin
as an argument. You can pass:$bin
to meanbin => $bin
but you must initialize$bin
toTrue
if you want:$bin
as an argument to becomebin => True
. Alternatively you can writebin => True
or just:bin
.
– raiph
Nov 25 '18 at 12:42
@raiph I see. Coming from Perl 5, there's a lot of new terms and concepts here. I can feel the same excitement when I was learning P5! Thanks!
– Zarul Zakuan
Nov 25 '18 at 13:35
add a comment |
First of all, TCP connections are streams, so there's no promises that the "messages" that are sent will be received as equivalent "messages" on the receiving end. Things that are sent can be split up or merged as part of normal TCP behavior, even before Perl 6 behavior is considered. Anything that wants a "messages" abstraction needs to build it on top of the TCP stream (for example, by sending data as lines, or by sending a size in bytes, followed by the data).
In Perl 6, the data arriving over the socket is exposed as a Supply
. A whenever $conn { }
is short for whenever $conn.Supply { }
(the whenever
will coerce whatever it is given into a Supply
). The default Supply
is a character one, decoded as UTF-8 into a stream of Perl 6 Str
. As noted in the answer you already received, strings in Perl 6 work at grapheme level, so it will keep back a character in case the next thing that arrives over the network is a combining character. This is the "truncation" that you are experiencing. (There are some things which can never be combined. For example, n
can never have a combining character placed on it. This means that line-oriented protocols won't encounter this kind of behavior, and can be implemented as simply whenever $conn.Supply.lines { }
.)
There are a couple of options available:
- Do
whenever $conn.Supply(:bin) { }
, which will deliver binaryBlob
objects, which will correspond to what the OS passed to the VM. That can then be.decode
'd as wanted. This is probably your best bet. - Specify an encoding that does not support combining characters, for example
whenever $conn.Supply(:enc('latin-1')) { }
. (However, note that sincern
is 1 grapheme, then if the message were to end inr
then that would be held back in case the next packet came along with an
).
In both cases, it's still possible for messages to be split up during transmission, but these will (entirely and mostly, respectively) avoid the keep-one-back requirement that grapheme normalization entails.
First of all, TCP connections are streams, so there's no promises that the "messages" that are sent will be received as equivalent "messages" on the receiving end. Things that are sent can be split up or merged as part of normal TCP behavior, even before Perl 6 behavior is considered. Anything that wants a "messages" abstraction needs to build it on top of the TCP stream (for example, by sending data as lines, or by sending a size in bytes, followed by the data).
In Perl 6, the data arriving over the socket is exposed as a Supply
. A whenever $conn { }
is short for whenever $conn.Supply { }
(the whenever
will coerce whatever it is given into a Supply
). The default Supply
is a character one, decoded as UTF-8 into a stream of Perl 6 Str
. As noted in the answer you already received, strings in Perl 6 work at grapheme level, so it will keep back a character in case the next thing that arrives over the network is a combining character. This is the "truncation" that you are experiencing. (There are some things which can never be combined. For example, n
can never have a combining character placed on it. This means that line-oriented protocols won't encounter this kind of behavior, and can be implemented as simply whenever $conn.Supply.lines { }
.)
There are a couple of options available:
- Do
whenever $conn.Supply(:bin) { }
, which will deliver binaryBlob
objects, which will correspond to what the OS passed to the VM. That can then be.decode
'd as wanted. This is probably your best bet. - Specify an encoding that does not support combining characters, for example
whenever $conn.Supply(:enc('latin-1')) { }
. (However, note that sincern
is 1 grapheme, then if the message were to end inr
then that would be held back in case the next packet came along with an
).
In both cases, it's still possible for messages to be split up during transmission, but these will (entirely and mostly, respectively) avoid the keep-one-back requirement that grapheme normalization entails.
answered Nov 23 '18 at 23:14
Jonathan WorthingtonJonathan Worthington
7,80912146
7,80912146
Thank you Jonathan! I was so close. Instead I was using :$bin, as per documentation.
– Zarul Zakuan
Nov 25 '18 at 2:38
1
@ZarulZakuan The doc shows a routine's parameters. When calling a routine you pass arguments. Arguments are "bound" to parameters as part of the calling process but they are not the same before that.:$bin
as a parameter means the caller must pass a pair namedbin
as an argument. You can pass:$bin
to meanbin => $bin
but you must initialize$bin
toTrue
if you want:$bin
as an argument to becomebin => True
. Alternatively you can writebin => True
or just:bin
.
– raiph
Nov 25 '18 at 12:42
@raiph I see. Coming from Perl 5, there's a lot of new terms and concepts here. I can feel the same excitement when I was learning P5! Thanks!
– Zarul Zakuan
Nov 25 '18 at 13:35
add a comment |
Thank you Jonathan! I was so close. Instead I was using :$bin, as per documentation.
– Zarul Zakuan
Nov 25 '18 at 2:38
1
@ZarulZakuan The doc shows a routine's parameters. When calling a routine you pass arguments. Arguments are "bound" to parameters as part of the calling process but they are not the same before that.:$bin
as a parameter means the caller must pass a pair namedbin
as an argument. You can pass:$bin
to meanbin => $bin
but you must initialize$bin
toTrue
if you want:$bin
as an argument to becomebin => True
. Alternatively you can writebin => True
or just:bin
.
– raiph
Nov 25 '18 at 12:42
@raiph I see. Coming from Perl 5, there's a lot of new terms and concepts here. I can feel the same excitement when I was learning P5! Thanks!
– Zarul Zakuan
Nov 25 '18 at 13:35
Thank you Jonathan! I was so close. Instead I was using :$bin, as per documentation.
– Zarul Zakuan
Nov 25 '18 at 2:38
Thank you Jonathan! I was so close. Instead I was using :$bin, as per documentation.
– Zarul Zakuan
Nov 25 '18 at 2:38
1
1
@ZarulZakuan The doc shows a routine's parameters. When calling a routine you pass arguments. Arguments are "bound" to parameters as part of the calling process but they are not the same before that.
:$bin
as a parameter means the caller must pass a pair named bin
as an argument. You can pass :$bin
to mean bin => $bin
but you must initialize $bin
to True
if you want :$bin
as an argument to become bin => True
. Alternatively you can write bin => True
or just:bin
.– raiph
Nov 25 '18 at 12:42
@ZarulZakuan The doc shows a routine's parameters. When calling a routine you pass arguments. Arguments are "bound" to parameters as part of the calling process but they are not the same before that.
:$bin
as a parameter means the caller must pass a pair named bin
as an argument. You can pass :$bin
to mean bin => $bin
but you must initialize $bin
to True
if you want :$bin
as an argument to become bin => True
. Alternatively you can write bin => True
or just:bin
.– raiph
Nov 25 '18 at 12:42
@raiph I see. Coming from Perl 5, there's a lot of new terms and concepts here. I can feel the same excitement when I was learning P5! Thanks!
– Zarul Zakuan
Nov 25 '18 at 13:35
@raiph I see. Coming from Perl 5, there's a lot of new terms and concepts here. I can feel the same excitement when I was learning P5! Thanks!
– Zarul Zakuan
Nov 25 '18 at 13:35
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53453184%2fperl6-iosocketasync-truncates-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
"
IO::Socket::Async
truncates data". Actually P6 is helping you to avoid corrupting data. P6 keeps devs' choice clear: bytes or characters. EITHER You use:bin
so data is a sequence of bytes. So the unit of transfer is a byte. OR Data is text, a sequence of "what a user thinks of as a character". So the logical unit of transfer is one character at a time. Thus P6 buffers bytes to ensure it only delivers a whole character when it's known to be complete. This buffering is a consequence of Unicode's design.– raiph
Nov 24 '18 at 9:57