Sort not sorting lines with a pipe '|' in it correctly











up vote
6
down vote

favorite
2












I am trying to sort some simple pipe-delimited data. However, sort isn't actually sorting. It moves my header row to the bottom, but my two rows starting with 241 are being split by a row starting with 24.



cat sort_fail.csv
column_a|column_b|column_c
241|212|20810378
24|121|2810172
241|213|20810376

sort sort_fail.csv
241|212|20810378
24|121|2810172
241|213|20810376
column_a|column_b|column_c


The column headers are being moved to the bottom of the file, so sort is clearly processing it. But, the actual values aren't being sorted like I'd expect.



In this case I worked around it with



sort sort_fail.csv --field-separator='|' -k1,1


But, I feel like that shouldn't be necessary. Why is sort not sorting?










share|improve this question









New contributor




user10777668 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    use LC_COLLATE=C sort. Depending on what you're expecting, you may also need LC_COLLATE=C sort -t'|' -n
    – mosvy
    6 hours ago

















up vote
6
down vote

favorite
2












I am trying to sort some simple pipe-delimited data. However, sort isn't actually sorting. It moves my header row to the bottom, but my two rows starting with 241 are being split by a row starting with 24.



cat sort_fail.csv
column_a|column_b|column_c
241|212|20810378
24|121|2810172
241|213|20810376

sort sort_fail.csv
241|212|20810378
24|121|2810172
241|213|20810376
column_a|column_b|column_c


The column headers are being moved to the bottom of the file, so sort is clearly processing it. But, the actual values aren't being sorted like I'd expect.



In this case I worked around it with



sort sort_fail.csv --field-separator='|' -k1,1


But, I feel like that shouldn't be necessary. Why is sort not sorting?










share|improve this question









New contributor




user10777668 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    use LC_COLLATE=C sort. Depending on what you're expecting, you may also need LC_COLLATE=C sort -t'|' -n
    – mosvy
    6 hours ago















up vote
6
down vote

favorite
2









up vote
6
down vote

favorite
2






2





I am trying to sort some simple pipe-delimited data. However, sort isn't actually sorting. It moves my header row to the bottom, but my two rows starting with 241 are being split by a row starting with 24.



cat sort_fail.csv
column_a|column_b|column_c
241|212|20810378
24|121|2810172
241|213|20810376

sort sort_fail.csv
241|212|20810378
24|121|2810172
241|213|20810376
column_a|column_b|column_c


The column headers are being moved to the bottom of the file, so sort is clearly processing it. But, the actual values aren't being sorted like I'd expect.



In this case I worked around it with



sort sort_fail.csv --field-separator='|' -k1,1


But, I feel like that shouldn't be necessary. Why is sort not sorting?










share|improve this question









New contributor




user10777668 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I am trying to sort some simple pipe-delimited data. However, sort isn't actually sorting. It moves my header row to the bottom, but my two rows starting with 241 are being split by a row starting with 24.



cat sort_fail.csv
column_a|column_b|column_c
241|212|20810378
24|121|2810172
241|213|20810376

sort sort_fail.csv
241|212|20810378
24|121|2810172
241|213|20810376
column_a|column_b|column_c


The column headers are being moved to the bottom of the file, so sort is clearly processing it. But, the actual values aren't being sorted like I'd expect.



In this case I worked around it with



sort sort_fail.csv --field-separator='|' -k1,1


But, I feel like that shouldn't be necessary. Why is sort not sorting?







sort






share|improve this question









New contributor




user10777668 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




user10777668 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 1 hour ago









muru

35.3k582157




35.3k582157






New contributor




user10777668 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 6 hours ago









user10777668

332




332




New contributor




user10777668 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





user10777668 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






user10777668 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1




    use LC_COLLATE=C sort. Depending on what you're expecting, you may also need LC_COLLATE=C sort -t'|' -n
    – mosvy
    6 hours ago
















  • 1




    use LC_COLLATE=C sort. Depending on what you're expecting, you may also need LC_COLLATE=C sort -t'|' -n
    – mosvy
    6 hours ago










1




1




use LC_COLLATE=C sort. Depending on what you're expecting, you may also need LC_COLLATE=C sort -t'|' -n
– mosvy
6 hours ago






use LC_COLLATE=C sort. Depending on what you're expecting, you may also need LC_COLLATE=C sort -t'|' -n
– mosvy
6 hours ago












3 Answers
3






active

oldest

votes

















up vote
10
down vote



accepted










sort is locale aware, so depending on your LC_COLLATE setting (which is inherited from LANG) you may different results:



$ LANG=C sort sort_fail.csv 
241|212|20810378
241|213|20810376
24|121|2810172
column_a|column_b|column_c

$ LANG=en_US sort sort_fail.csv
241|212|20810378
24|121|2810172
241|213|20810376
column_a|column_b|column_c


This can cause problems in scripts, because you may not be aware of what the calling locale is set to, and so may get different results.



It's not uncommon for scripts to force the setting needed



eg



$ grep LC.*sort /bin/precat
LC_COLLATE=C sort -u | prezip-bin -z "$cmd: $2"


Now what's interesting, here, is the | character looks odd.



But that's because the default rule for en_US, which derives from ISO, says



$ grep 007C /usr/share/i18n/locales/iso14651_t1_common
<U007C> IGNORE;IGNORE;IGNORE;<j> # 142 |


Which means the | character is ignored and the sort order would be as if the character doesn't exist..



$ tr -d '|' < sort_fail.csv | LANG=C sort
24121220810378
241212810172
24121320810376
column_acolumn_bcolumn_c


And that matches the "unexpected" sorting you are seeing.



The work arounds are to use -n (to force numeric sorts), or to use the field separator (as you did) or to use the C locale.






share|improve this answer





















  • Fascinating. I did see some other hits about localization, but figured that would impact the relative ordering of 24 vs 241, not something like this.
    – user10777668
    5 hours ago






  • 1




    something extra useful in GNU sort is the --debug option, which indicates the key (underlined) used to compare
    – Jeff Schaller
    1 hour ago


















up vote
0
down vote













What irritates me is that the 24 doesn't move from its place between the two 241. The second field starts with a 1. Trying the sort with a leading 4 in the second field, the 24 is moved down, so I suspect sort just ignores the | unless told otherwise.
Try sort -n...






share|improve this answer




























    up vote
    0
    down vote













    -n, --numeric-sort
    compare according to string numerical value



    210
    23


    Without the -n, 210 by text is ahead of 23 as it goes character my character.






    share|improve this answer








    New contributor




    michaelkrieger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.


















      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "106"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });






      user10777668 is a new contributor. Be nice, and check out our Code of Conduct.










      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f487458%2fsort-not-sorting-lines-with-a-pipe-in-it-correctly%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      10
      down vote



      accepted










      sort is locale aware, so depending on your LC_COLLATE setting (which is inherited from LANG) you may different results:



      $ LANG=C sort sort_fail.csv 
      241|212|20810378
      241|213|20810376
      24|121|2810172
      column_a|column_b|column_c

      $ LANG=en_US sort sort_fail.csv
      241|212|20810378
      24|121|2810172
      241|213|20810376
      column_a|column_b|column_c


      This can cause problems in scripts, because you may not be aware of what the calling locale is set to, and so may get different results.



      It's not uncommon for scripts to force the setting needed



      eg



      $ grep LC.*sort /bin/precat
      LC_COLLATE=C sort -u | prezip-bin -z "$cmd: $2"


      Now what's interesting, here, is the | character looks odd.



      But that's because the default rule for en_US, which derives from ISO, says



      $ grep 007C /usr/share/i18n/locales/iso14651_t1_common
      <U007C> IGNORE;IGNORE;IGNORE;<j> # 142 |


      Which means the | character is ignored and the sort order would be as if the character doesn't exist..



      $ tr -d '|' < sort_fail.csv | LANG=C sort
      24121220810378
      241212810172
      24121320810376
      column_acolumn_bcolumn_c


      And that matches the "unexpected" sorting you are seeing.



      The work arounds are to use -n (to force numeric sorts), or to use the field separator (as you did) or to use the C locale.






      share|improve this answer





















      • Fascinating. I did see some other hits about localization, but figured that would impact the relative ordering of 24 vs 241, not something like this.
        – user10777668
        5 hours ago






      • 1




        something extra useful in GNU sort is the --debug option, which indicates the key (underlined) used to compare
        – Jeff Schaller
        1 hour ago















      up vote
      10
      down vote



      accepted










      sort is locale aware, so depending on your LC_COLLATE setting (which is inherited from LANG) you may different results:



      $ LANG=C sort sort_fail.csv 
      241|212|20810378
      241|213|20810376
      24|121|2810172
      column_a|column_b|column_c

      $ LANG=en_US sort sort_fail.csv
      241|212|20810378
      24|121|2810172
      241|213|20810376
      column_a|column_b|column_c


      This can cause problems in scripts, because you may not be aware of what the calling locale is set to, and so may get different results.



      It's not uncommon for scripts to force the setting needed



      eg



      $ grep LC.*sort /bin/precat
      LC_COLLATE=C sort -u | prezip-bin -z "$cmd: $2"


      Now what's interesting, here, is the | character looks odd.



      But that's because the default rule for en_US, which derives from ISO, says



      $ grep 007C /usr/share/i18n/locales/iso14651_t1_common
      <U007C> IGNORE;IGNORE;IGNORE;<j> # 142 |


      Which means the | character is ignored and the sort order would be as if the character doesn't exist..



      $ tr -d '|' < sort_fail.csv | LANG=C sort
      24121220810378
      241212810172
      24121320810376
      column_acolumn_bcolumn_c


      And that matches the "unexpected" sorting you are seeing.



      The work arounds are to use -n (to force numeric sorts), or to use the field separator (as you did) or to use the C locale.






      share|improve this answer





















      • Fascinating. I did see some other hits about localization, but figured that would impact the relative ordering of 24 vs 241, not something like this.
        – user10777668
        5 hours ago






      • 1




        something extra useful in GNU sort is the --debug option, which indicates the key (underlined) used to compare
        – Jeff Schaller
        1 hour ago













      up vote
      10
      down vote



      accepted







      up vote
      10
      down vote



      accepted






      sort is locale aware, so depending on your LC_COLLATE setting (which is inherited from LANG) you may different results:



      $ LANG=C sort sort_fail.csv 
      241|212|20810378
      241|213|20810376
      24|121|2810172
      column_a|column_b|column_c

      $ LANG=en_US sort sort_fail.csv
      241|212|20810378
      24|121|2810172
      241|213|20810376
      column_a|column_b|column_c


      This can cause problems in scripts, because you may not be aware of what the calling locale is set to, and so may get different results.



      It's not uncommon for scripts to force the setting needed



      eg



      $ grep LC.*sort /bin/precat
      LC_COLLATE=C sort -u | prezip-bin -z "$cmd: $2"


      Now what's interesting, here, is the | character looks odd.



      But that's because the default rule for en_US, which derives from ISO, says



      $ grep 007C /usr/share/i18n/locales/iso14651_t1_common
      <U007C> IGNORE;IGNORE;IGNORE;<j> # 142 |


      Which means the | character is ignored and the sort order would be as if the character doesn't exist..



      $ tr -d '|' < sort_fail.csv | LANG=C sort
      24121220810378
      241212810172
      24121320810376
      column_acolumn_bcolumn_c


      And that matches the "unexpected" sorting you are seeing.



      The work arounds are to use -n (to force numeric sorts), or to use the field separator (as you did) or to use the C locale.






      share|improve this answer












      sort is locale aware, so depending on your LC_COLLATE setting (which is inherited from LANG) you may different results:



      $ LANG=C sort sort_fail.csv 
      241|212|20810378
      241|213|20810376
      24|121|2810172
      column_a|column_b|column_c

      $ LANG=en_US sort sort_fail.csv
      241|212|20810378
      24|121|2810172
      241|213|20810376
      column_a|column_b|column_c


      This can cause problems in scripts, because you may not be aware of what the calling locale is set to, and so may get different results.



      It's not uncommon for scripts to force the setting needed



      eg



      $ grep LC.*sort /bin/precat
      LC_COLLATE=C sort -u | prezip-bin -z "$cmd: $2"


      Now what's interesting, here, is the | character looks odd.



      But that's because the default rule for en_US, which derives from ISO, says



      $ grep 007C /usr/share/i18n/locales/iso14651_t1_common
      <U007C> IGNORE;IGNORE;IGNORE;<j> # 142 |


      Which means the | character is ignored and the sort order would be as if the character doesn't exist..



      $ tr -d '|' < sort_fail.csv | LANG=C sort
      24121220810378
      241212810172
      24121320810376
      column_acolumn_bcolumn_c


      And that matches the "unexpected" sorting you are seeing.



      The work arounds are to use -n (to force numeric sorts), or to use the field separator (as you did) or to use the C locale.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered 5 hours ago









      Stephen Harris

      23.7k24377




      23.7k24377












      • Fascinating. I did see some other hits about localization, but figured that would impact the relative ordering of 24 vs 241, not something like this.
        – user10777668
        5 hours ago






      • 1




        something extra useful in GNU sort is the --debug option, which indicates the key (underlined) used to compare
        – Jeff Schaller
        1 hour ago


















      • Fascinating. I did see some other hits about localization, but figured that would impact the relative ordering of 24 vs 241, not something like this.
        – user10777668
        5 hours ago






      • 1




        something extra useful in GNU sort is the --debug option, which indicates the key (underlined) used to compare
        – Jeff Schaller
        1 hour ago
















      Fascinating. I did see some other hits about localization, but figured that would impact the relative ordering of 24 vs 241, not something like this.
      – user10777668
      5 hours ago




      Fascinating. I did see some other hits about localization, but figured that would impact the relative ordering of 24 vs 241, not something like this.
      – user10777668
      5 hours ago




      1




      1




      something extra useful in GNU sort is the --debug option, which indicates the key (underlined) used to compare
      – Jeff Schaller
      1 hour ago




      something extra useful in GNU sort is the --debug option, which indicates the key (underlined) used to compare
      – Jeff Schaller
      1 hour ago












      up vote
      0
      down vote













      What irritates me is that the 24 doesn't move from its place between the two 241. The second field starts with a 1. Trying the sort with a leading 4 in the second field, the 24 is moved down, so I suspect sort just ignores the | unless told otherwise.
      Try sort -n...






      share|improve this answer

























        up vote
        0
        down vote













        What irritates me is that the 24 doesn't move from its place between the two 241. The second field starts with a 1. Trying the sort with a leading 4 in the second field, the 24 is moved down, so I suspect sort just ignores the | unless told otherwise.
        Try sort -n...






        share|improve this answer























          up vote
          0
          down vote










          up vote
          0
          down vote









          What irritates me is that the 24 doesn't move from its place between the two 241. The second field starts with a 1. Trying the sort with a leading 4 in the second field, the 24 is moved down, so I suspect sort just ignores the | unless told otherwise.
          Try sort -n...






          share|improve this answer












          What irritates me is that the 24 doesn't move from its place between the two 241. The second field starts with a 1. Trying the sort with a leading 4 in the second field, the 24 is moved down, so I suspect sort just ignores the | unless told otherwise.
          Try sort -n...







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 6 hours ago









          RudiC

          3,7581312




          3,7581312






















              up vote
              0
              down vote













              -n, --numeric-sort
              compare according to string numerical value



              210
              23


              Without the -n, 210 by text is ahead of 23 as it goes character my character.






              share|improve this answer








              New contributor




              michaelkrieger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






















                up vote
                0
                down vote













                -n, --numeric-sort
                compare according to string numerical value



                210
                23


                Without the -n, 210 by text is ahead of 23 as it goes character my character.






                share|improve this answer








                New contributor




                michaelkrieger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.




















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  -n, --numeric-sort
                  compare according to string numerical value



                  210
                  23


                  Without the -n, 210 by text is ahead of 23 as it goes character my character.






                  share|improve this answer








                  New contributor




                  michaelkrieger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  -n, --numeric-sort
                  compare according to string numerical value



                  210
                  23


                  Without the -n, 210 by text is ahead of 23 as it goes character my character.







                  share|improve this answer








                  New contributor




                  michaelkrieger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|improve this answer



                  share|improve this answer






                  New contributor




                  michaelkrieger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered 4 hours ago









                  michaelkrieger

                  161




                  161




                  New contributor




                  michaelkrieger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  michaelkrieger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  michaelkrieger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






















                      user10777668 is a new contributor. Be nice, and check out our Code of Conduct.










                      draft saved

                      draft discarded


















                      user10777668 is a new contributor. Be nice, and check out our Code of Conduct.













                      user10777668 is a new contributor. Be nice, and check out our Code of Conduct.












                      user10777668 is a new contributor. Be nice, and check out our Code of Conduct.
















                      Thanks for contributing an answer to Unix & Linux Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f487458%2fsort-not-sorting-lines-with-a-pipe-in-it-correctly%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                      Calculate evaluation metrics using cross_val_predict sklearn

                      Insert data from modal to MySQL (multiple modal on website)