Writing “latin1” encoded byte array with escape characters












0















I'm working on a database import/export process in VB.NET which writes data from a MySQL (5.5) database to a plain text file. The application reads the data to a DataTable, then goes through the rows/columns to actually write the data to the OutputFile (System.IO.StreamWriter object). The encoding on the tables in this database is Latin1. There is a MediumBlob field in one of the tables I've been using for testing which contains image files stored as a byte array.



In my attempts to validate the output from my application, I've exported the data directly from the database using the MySQL Workbench, then compared that with the results I get when I write the same data from my application. In the direct export from MySQL Workbench, I see some of these bytes are exported with the backslash. When I read the data through my application, however, this escape character does not appear. Viewed through Notepad++, it clearly shows some distinct differences between the two output results (see screenshot).



Notepad++ comparison of output results



Obviously, while apparently very similar, the two are not completely identical. My application is not including the backslashes for escaped characters, and some characters such as NULL are coming out differently altogether. My code for writing this field to the file is:



OutputFile.Write("'" & System.Text.Encoding.GetEncoding(28591).GetString(CType(COPYRow(ColumnIndex), Byte())) & "'")


There doesn't appear to be an overload for the GetString method that allows me to specify an escape character, so I'm wondering if there's another way that, using this method, I can ensure the characters are correctly encoded, including escape characters.



I'm "assuming" that this method should also work in general when I start working with my PostgreSQL database, but with possibly a different encoding. I'm trying to build things as "generic" as possible, but I'll have to worry about specifying encodings at run-time instead of hard-coding them later.





EDIT



I just ran across another SO question, which might point me in the right direction: Convert a Unicode string to an escaped ASCII string. Obviously, it might take a bit more work to get it right, but this looks like the closest thing to what I'm trying to accomplish.










share|improve this question

























  • It's not clear, to me, what you're trying to do. Are you trying to conver an Image bytes to a US-ASCII representation? Or iso-8859-1 encoded strings to US-ASCII, preserving the non-convertible CodePoints to the escaped Unicode representation as ASCII text? What's the use of these?

    – Jimi
    Nov 27 '18 at 17:21













  • MySQL Workbench is showing you a "debugging" output (probably of its own invention). Why do you want your text file to be anything but a straightforward text file with a specific character encoding? (Or, why not a clean JSON file [which would use UTF-8]?).

    – Tom Blodget
    Nov 27 '18 at 17:22











  • For binary data, ideally, it wouldn't come out of your data access layer as a .NET String. A String is a text data type. If it must be a String can you convert the data on the server side to a Base64 string?

    – Tom Blodget
    Nov 27 '18 at 17:26











  • Perhaps I'm over-thinking things (or thinking about it incorrectly), but the intent is to be able to export the data in a format that is usable for importing back into the database at potentially some point in the future. To that end, I am comparing the output produced by my application to the output produced by MySQL Workbench and trying to reproduce the latter as closely as possible. Additionally, I may need to be able to transfer this data from one RDBMS to another (MySQL -> PostgreSQL or vice-versa), so I want to ensure the data is as "clean" as possible.

    – G_Hosa_Phat
    Nov 27 '18 at 17:31








  • 1





    You are over-thinking it. For Text fields, read the Collation value and reproduce the Encoding when saving the Text to a file. Or, convert everything to UTF-8, if possible. Byte Arrays representing Bitmaps must remain as they are. You might, if really required, convert the array to a Base64 representation.

    – Jimi
    Nov 27 '18 at 18:14


















0















I'm working on a database import/export process in VB.NET which writes data from a MySQL (5.5) database to a plain text file. The application reads the data to a DataTable, then goes through the rows/columns to actually write the data to the OutputFile (System.IO.StreamWriter object). The encoding on the tables in this database is Latin1. There is a MediumBlob field in one of the tables I've been using for testing which contains image files stored as a byte array.



In my attempts to validate the output from my application, I've exported the data directly from the database using the MySQL Workbench, then compared that with the results I get when I write the same data from my application. In the direct export from MySQL Workbench, I see some of these bytes are exported with the backslash. When I read the data through my application, however, this escape character does not appear. Viewed through Notepad++, it clearly shows some distinct differences between the two output results (see screenshot).



Notepad++ comparison of output results



Obviously, while apparently very similar, the two are not completely identical. My application is not including the backslashes for escaped characters, and some characters such as NULL are coming out differently altogether. My code for writing this field to the file is:



OutputFile.Write("'" & System.Text.Encoding.GetEncoding(28591).GetString(CType(COPYRow(ColumnIndex), Byte())) & "'")


There doesn't appear to be an overload for the GetString method that allows me to specify an escape character, so I'm wondering if there's another way that, using this method, I can ensure the characters are correctly encoded, including escape characters.



I'm "assuming" that this method should also work in general when I start working with my PostgreSQL database, but with possibly a different encoding. I'm trying to build things as "generic" as possible, but I'll have to worry about specifying encodings at run-time instead of hard-coding them later.





EDIT



I just ran across another SO question, which might point me in the right direction: Convert a Unicode string to an escaped ASCII string. Obviously, it might take a bit more work to get it right, but this looks like the closest thing to what I'm trying to accomplish.










share|improve this question

























  • It's not clear, to me, what you're trying to do. Are you trying to conver an Image bytes to a US-ASCII representation? Or iso-8859-1 encoded strings to US-ASCII, preserving the non-convertible CodePoints to the escaped Unicode representation as ASCII text? What's the use of these?

    – Jimi
    Nov 27 '18 at 17:21













  • MySQL Workbench is showing you a "debugging" output (probably of its own invention). Why do you want your text file to be anything but a straightforward text file with a specific character encoding? (Or, why not a clean JSON file [which would use UTF-8]?).

    – Tom Blodget
    Nov 27 '18 at 17:22











  • For binary data, ideally, it wouldn't come out of your data access layer as a .NET String. A String is a text data type. If it must be a String can you convert the data on the server side to a Base64 string?

    – Tom Blodget
    Nov 27 '18 at 17:26











  • Perhaps I'm over-thinking things (or thinking about it incorrectly), but the intent is to be able to export the data in a format that is usable for importing back into the database at potentially some point in the future. To that end, I am comparing the output produced by my application to the output produced by MySQL Workbench and trying to reproduce the latter as closely as possible. Additionally, I may need to be able to transfer this data from one RDBMS to another (MySQL -> PostgreSQL or vice-versa), so I want to ensure the data is as "clean" as possible.

    – G_Hosa_Phat
    Nov 27 '18 at 17:31








  • 1





    You are over-thinking it. For Text fields, read the Collation value and reproduce the Encoding when saving the Text to a file. Or, convert everything to UTF-8, if possible. Byte Arrays representing Bitmaps must remain as they are. You might, if really required, convert the array to a Base64 representation.

    – Jimi
    Nov 27 '18 at 18:14
















0












0








0








I'm working on a database import/export process in VB.NET which writes data from a MySQL (5.5) database to a plain text file. The application reads the data to a DataTable, then goes through the rows/columns to actually write the data to the OutputFile (System.IO.StreamWriter object). The encoding on the tables in this database is Latin1. There is a MediumBlob field in one of the tables I've been using for testing which contains image files stored as a byte array.



In my attempts to validate the output from my application, I've exported the data directly from the database using the MySQL Workbench, then compared that with the results I get when I write the same data from my application. In the direct export from MySQL Workbench, I see some of these bytes are exported with the backslash. When I read the data through my application, however, this escape character does not appear. Viewed through Notepad++, it clearly shows some distinct differences between the two output results (see screenshot).



Notepad++ comparison of output results



Obviously, while apparently very similar, the two are not completely identical. My application is not including the backslashes for escaped characters, and some characters such as NULL are coming out differently altogether. My code for writing this field to the file is:



OutputFile.Write("'" & System.Text.Encoding.GetEncoding(28591).GetString(CType(COPYRow(ColumnIndex), Byte())) & "'")


There doesn't appear to be an overload for the GetString method that allows me to specify an escape character, so I'm wondering if there's another way that, using this method, I can ensure the characters are correctly encoded, including escape characters.



I'm "assuming" that this method should also work in general when I start working with my PostgreSQL database, but with possibly a different encoding. I'm trying to build things as "generic" as possible, but I'll have to worry about specifying encodings at run-time instead of hard-coding them later.





EDIT



I just ran across another SO question, which might point me in the right direction: Convert a Unicode string to an escaped ASCII string. Obviously, it might take a bit more work to get it right, but this looks like the closest thing to what I'm trying to accomplish.










share|improve this question
















I'm working on a database import/export process in VB.NET which writes data from a MySQL (5.5) database to a plain text file. The application reads the data to a DataTable, then goes through the rows/columns to actually write the data to the OutputFile (System.IO.StreamWriter object). The encoding on the tables in this database is Latin1. There is a MediumBlob field in one of the tables I've been using for testing which contains image files stored as a byte array.



In my attempts to validate the output from my application, I've exported the data directly from the database using the MySQL Workbench, then compared that with the results I get when I write the same data from my application. In the direct export from MySQL Workbench, I see some of these bytes are exported with the backslash. When I read the data through my application, however, this escape character does not appear. Viewed through Notepad++, it clearly shows some distinct differences between the two output results (see screenshot).



Notepad++ comparison of output results



Obviously, while apparently very similar, the two are not completely identical. My application is not including the backslashes for escaped characters, and some characters such as NULL are coming out differently altogether. My code for writing this field to the file is:



OutputFile.Write("'" & System.Text.Encoding.GetEncoding(28591).GetString(CType(COPYRow(ColumnIndex), Byte())) & "'")


There doesn't appear to be an overload for the GetString method that allows me to specify an escape character, so I'm wondering if there's another way that, using this method, I can ensure the characters are correctly encoded, including escape characters.



I'm "assuming" that this method should also work in general when I start working with my PostgreSQL database, but with possibly a different encoding. I'm trying to build things as "generic" as possible, but I'll have to worry about specifying encodings at run-time instead of hard-coding them later.





EDIT



I just ran across another SO question, which might point me in the right direction: Convert a Unicode string to an escaped ASCII string. Obviously, it might take a bit more work to get it right, but this looks like the closest thing to what I'm trying to accomplish.







mysql vb.net character-encoding






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 27 '18 at 17:01







G_Hosa_Phat

















asked Nov 27 '18 at 16:31









G_Hosa_PhatG_Hosa_Phat

287418




287418













  • It's not clear, to me, what you're trying to do. Are you trying to conver an Image bytes to a US-ASCII representation? Or iso-8859-1 encoded strings to US-ASCII, preserving the non-convertible CodePoints to the escaped Unicode representation as ASCII text? What's the use of these?

    – Jimi
    Nov 27 '18 at 17:21













  • MySQL Workbench is showing you a "debugging" output (probably of its own invention). Why do you want your text file to be anything but a straightforward text file with a specific character encoding? (Or, why not a clean JSON file [which would use UTF-8]?).

    – Tom Blodget
    Nov 27 '18 at 17:22











  • For binary data, ideally, it wouldn't come out of your data access layer as a .NET String. A String is a text data type. If it must be a String can you convert the data on the server side to a Base64 string?

    – Tom Blodget
    Nov 27 '18 at 17:26











  • Perhaps I'm over-thinking things (or thinking about it incorrectly), but the intent is to be able to export the data in a format that is usable for importing back into the database at potentially some point in the future. To that end, I am comparing the output produced by my application to the output produced by MySQL Workbench and trying to reproduce the latter as closely as possible. Additionally, I may need to be able to transfer this data from one RDBMS to another (MySQL -> PostgreSQL or vice-versa), so I want to ensure the data is as "clean" as possible.

    – G_Hosa_Phat
    Nov 27 '18 at 17:31








  • 1





    You are over-thinking it. For Text fields, read the Collation value and reproduce the Encoding when saving the Text to a file. Or, convert everything to UTF-8, if possible. Byte Arrays representing Bitmaps must remain as they are. You might, if really required, convert the array to a Base64 representation.

    – Jimi
    Nov 27 '18 at 18:14





















  • It's not clear, to me, what you're trying to do. Are you trying to conver an Image bytes to a US-ASCII representation? Or iso-8859-1 encoded strings to US-ASCII, preserving the non-convertible CodePoints to the escaped Unicode representation as ASCII text? What's the use of these?

    – Jimi
    Nov 27 '18 at 17:21













  • MySQL Workbench is showing you a "debugging" output (probably of its own invention). Why do you want your text file to be anything but a straightforward text file with a specific character encoding? (Or, why not a clean JSON file [which would use UTF-8]?).

    – Tom Blodget
    Nov 27 '18 at 17:22











  • For binary data, ideally, it wouldn't come out of your data access layer as a .NET String. A String is a text data type. If it must be a String can you convert the data on the server side to a Base64 string?

    – Tom Blodget
    Nov 27 '18 at 17:26











  • Perhaps I'm over-thinking things (or thinking about it incorrectly), but the intent is to be able to export the data in a format that is usable for importing back into the database at potentially some point in the future. To that end, I am comparing the output produced by my application to the output produced by MySQL Workbench and trying to reproduce the latter as closely as possible. Additionally, I may need to be able to transfer this data from one RDBMS to another (MySQL -> PostgreSQL or vice-versa), so I want to ensure the data is as "clean" as possible.

    – G_Hosa_Phat
    Nov 27 '18 at 17:31








  • 1





    You are over-thinking it. For Text fields, read the Collation value and reproduce the Encoding when saving the Text to a file. Or, convert everything to UTF-8, if possible. Byte Arrays representing Bitmaps must remain as they are. You might, if really required, convert the array to a Base64 representation.

    – Jimi
    Nov 27 '18 at 18:14



















It's not clear, to me, what you're trying to do. Are you trying to conver an Image bytes to a US-ASCII representation? Or iso-8859-1 encoded strings to US-ASCII, preserving the non-convertible CodePoints to the escaped Unicode representation as ASCII text? What's the use of these?

– Jimi
Nov 27 '18 at 17:21







It's not clear, to me, what you're trying to do. Are you trying to conver an Image bytes to a US-ASCII representation? Or iso-8859-1 encoded strings to US-ASCII, preserving the non-convertible CodePoints to the escaped Unicode representation as ASCII text? What's the use of these?

– Jimi
Nov 27 '18 at 17:21















MySQL Workbench is showing you a "debugging" output (probably of its own invention). Why do you want your text file to be anything but a straightforward text file with a specific character encoding? (Or, why not a clean JSON file [which would use UTF-8]?).

– Tom Blodget
Nov 27 '18 at 17:22





MySQL Workbench is showing you a "debugging" output (probably of its own invention). Why do you want your text file to be anything but a straightforward text file with a specific character encoding? (Or, why not a clean JSON file [which would use UTF-8]?).

– Tom Blodget
Nov 27 '18 at 17:22













For binary data, ideally, it wouldn't come out of your data access layer as a .NET String. A String is a text data type. If it must be a String can you convert the data on the server side to a Base64 string?

– Tom Blodget
Nov 27 '18 at 17:26





For binary data, ideally, it wouldn't come out of your data access layer as a .NET String. A String is a text data type. If it must be a String can you convert the data on the server side to a Base64 string?

– Tom Blodget
Nov 27 '18 at 17:26













Perhaps I'm over-thinking things (or thinking about it incorrectly), but the intent is to be able to export the data in a format that is usable for importing back into the database at potentially some point in the future. To that end, I am comparing the output produced by my application to the output produced by MySQL Workbench and trying to reproduce the latter as closely as possible. Additionally, I may need to be able to transfer this data from one RDBMS to another (MySQL -> PostgreSQL or vice-versa), so I want to ensure the data is as "clean" as possible.

– G_Hosa_Phat
Nov 27 '18 at 17:31







Perhaps I'm over-thinking things (or thinking about it incorrectly), but the intent is to be able to export the data in a format that is usable for importing back into the database at potentially some point in the future. To that end, I am comparing the output produced by my application to the output produced by MySQL Workbench and trying to reproduce the latter as closely as possible. Additionally, I may need to be able to transfer this data from one RDBMS to another (MySQL -> PostgreSQL or vice-versa), so I want to ensure the data is as "clean" as possible.

– G_Hosa_Phat
Nov 27 '18 at 17:31






1




1





You are over-thinking it. For Text fields, read the Collation value and reproduce the Encoding when saving the Text to a file. Or, convert everything to UTF-8, if possible. Byte Arrays representing Bitmaps must remain as they are. You might, if really required, convert the array to a Base64 representation.

– Jimi
Nov 27 '18 at 18:14







You are over-thinking it. For Text fields, read the Collation value and reproduce the Encoding when saving the Text to a file. Or, convert everything to UTF-8, if possible. Byte Arrays representing Bitmaps must remain as they are. You might, if really required, convert the array to a Base64 representation.

– Jimi
Nov 27 '18 at 18:14














0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53504121%2fwriting-latin1-encoded-byte-array-with-escape-characters%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53504121%2fwriting-latin1-encoded-byte-array-with-escape-characters%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

Calculate evaluation metrics using cross_val_predict sklearn

Insert data from modal to MySQL (multiple modal on website)