How to match the line begin?











up vote
1
down vote

favorite












I was writing the cat(1) utility with lex.
When I consider how to implement option -n, i.e. number every line.
but I have to write something like this:



^. {
printf("%8d ", ++lino);
ECHO;
}


I know the end of line(EOL) could matched use anchor $ and n, so I wonder if there's something alike to match the begin of line(BOL) anchor, so I don't have to use the ECHO;










share|improve this question




















  • 1




    Using something like lex to create a simple cat utility is overkill and probably not very efficient and definitely will make things more complicated. Why not simply read the file into a large buffer, and extract lines from the buffer and print them to stdout? Then it's very easy to handle line numbers, just keep a counter and print the number before each line.
    – Some programmer dude
    Dec 9 '15 at 11:47










  • Thank you for your advice. I'm not going to use this cat, not really. I just try to figure out how lex works. I'm diving into something like compier-craft stuff.
    – oxnz
    Dec 10 '15 at 2:49















up vote
1
down vote

favorite












I was writing the cat(1) utility with lex.
When I consider how to implement option -n, i.e. number every line.
but I have to write something like this:



^. {
printf("%8d ", ++lino);
ECHO;
}


I know the end of line(EOL) could matched use anchor $ and n, so I wonder if there's something alike to match the begin of line(BOL) anchor, so I don't have to use the ECHO;










share|improve this question




















  • 1




    Using something like lex to create a simple cat utility is overkill and probably not very efficient and definitely will make things more complicated. Why not simply read the file into a large buffer, and extract lines from the buffer and print them to stdout? Then it's very easy to handle line numbers, just keep a counter and print the number before each line.
    – Some programmer dude
    Dec 9 '15 at 11:47










  • Thank you for your advice. I'm not going to use this cat, not really. I just try to figure out how lex works. I'm diving into something like compier-craft stuff.
    – oxnz
    Dec 10 '15 at 2:49













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I was writing the cat(1) utility with lex.
When I consider how to implement option -n, i.e. number every line.
but I have to write something like this:



^. {
printf("%8d ", ++lino);
ECHO;
}


I know the end of line(EOL) could matched use anchor $ and n, so I wonder if there's something alike to match the begin of line(BOL) anchor, so I don't have to use the ECHO;










share|improve this question















I was writing the cat(1) utility with lex.
When I consider how to implement option -n, i.e. number every line.
but I have to write something like this:



^. {
printf("%8d ", ++lino);
ECHO;
}


I know the end of line(EOL) could matched use anchor $ and n, so I wonder if there's something alike to match the begin of line(BOL) anchor, so I don't have to use the ECHO;







compiler-construction lex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 0:11









Mateusz Piotrowski

3,74663050




3,74663050










asked Dec 9 '15 at 11:44









oxnz

577313




577313








  • 1




    Using something like lex to create a simple cat utility is overkill and probably not very efficient and definitely will make things more complicated. Why not simply read the file into a large buffer, and extract lines from the buffer and print them to stdout? Then it's very easy to handle line numbers, just keep a counter and print the number before each line.
    – Some programmer dude
    Dec 9 '15 at 11:47










  • Thank you for your advice. I'm not going to use this cat, not really. I just try to figure out how lex works. I'm diving into something like compier-craft stuff.
    – oxnz
    Dec 10 '15 at 2:49














  • 1




    Using something like lex to create a simple cat utility is overkill and probably not very efficient and definitely will make things more complicated. Why not simply read the file into a large buffer, and extract lines from the buffer and print them to stdout? Then it's very easy to handle line numbers, just keep a counter and print the number before each line.
    – Some programmer dude
    Dec 9 '15 at 11:47










  • Thank you for your advice. I'm not going to use this cat, not really. I just try to figure out how lex works. I'm diving into something like compier-craft stuff.
    – oxnz
    Dec 10 '15 at 2:49








1




1




Using something like lex to create a simple cat utility is overkill and probably not very efficient and definitely will make things more complicated. Why not simply read the file into a large buffer, and extract lines from the buffer and print them to stdout? Then it's very easy to handle line numbers, just keep a counter and print the number before each line.
– Some programmer dude
Dec 9 '15 at 11:47




Using something like lex to create a simple cat utility is overkill and probably not very efficient and definitely will make things more complicated. Why not simply read the file into a large buffer, and extract lines from the buffer and print them to stdout? Then it's very easy to handle line numbers, just keep a counter and print the number before each line.
– Some programmer dude
Dec 9 '15 at 11:47












Thank you for your advice. I'm not going to use this cat, not really. I just try to figure out how lex works. I'm diving into something like compier-craft stuff.
– oxnz
Dec 10 '15 at 2:49




Thank you for your advice. I'm not going to use this cat, not really. I just try to figure out how lex works. I'm diving into something like compier-craft stuff.
– oxnz
Dec 10 '15 at 2:49












1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










(I agree with the comment by Joachim Pileborg that lex is not the tool for implementing cat. The rest of this answer is in the spirit of explaining a bit about lex.)





  1. The provided lex program will not work if there are empty lines in the input, because ^. does not match an empty line. (In lex, . does not match a newline character.) So a reasonably minimal (f)lex input file would be:



    %options noyywrap noinput nounput
    %%
    int lino = 0;
    ^(.|n) { printf("%8d %c", ++lino, *yytext); }


    Here, I just print out the matched token in the printf, which is the equivalent to using ECHO. So it does not really "eliminate" the ECHO.



  2. (f)lex rules must match at least one character. So it wouldn't really be possible for a pattern to consist only of $, any more than it would be possible for a pattern to consist only of ^ (which is a BOL anchor). In that sense, the answer to your question is simply "no".



  3. A more easily-understood (and probably more efficient) solution is to actually match each line. This solution never uses ECHO, not even in the default rule, so I've told flex to not generate a default rule:



    %options noyywrap noinput nounput nodefault
    %%
    int lino = 0;
    .*n? { printf("%8d %s", ++lino, yytext); }


    That's not quite perfect, because it will truncate lines which contain a NUL character. (That is, the printf will effectively truncate the line; the line will be parsed correctly.) To fix it, it's necessary to use fwrite instead of printf:



    %options noyywrap noinput nounput nodefault
    %%
    int lino = 0;
    .*n? { printf("%8d %s", ++lino);
    fwrite(yytext, 1, yyleng, yyout); }


    The newline is made optional (n?) in case the last line of the file is not terminated with a newline. Because (f)lex patterns never match zero characters, that rule is actually equivalent to the more precise but clunkier regular expression .*n|.+.








share|improve this answer























  • Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
    – oxnz
    Dec 10 '15 at 2:47










  • @oxnz: Unless you require yytext to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macro YY_FATAL_ERROR, which by default causes an error message to be printed to stderr followed by a call to exit() with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to build cat with (f)lex.
    – rici
    Dec 10 '15 at 3:09











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f34178021%2fhow-to-match-the-line-begin%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










(I agree with the comment by Joachim Pileborg that lex is not the tool for implementing cat. The rest of this answer is in the spirit of explaining a bit about lex.)





  1. The provided lex program will not work if there are empty lines in the input, because ^. does not match an empty line. (In lex, . does not match a newline character.) So a reasonably minimal (f)lex input file would be:



    %options noyywrap noinput nounput
    %%
    int lino = 0;
    ^(.|n) { printf("%8d %c", ++lino, *yytext); }


    Here, I just print out the matched token in the printf, which is the equivalent to using ECHO. So it does not really "eliminate" the ECHO.



  2. (f)lex rules must match at least one character. So it wouldn't really be possible for a pattern to consist only of $, any more than it would be possible for a pattern to consist only of ^ (which is a BOL anchor). In that sense, the answer to your question is simply "no".



  3. A more easily-understood (and probably more efficient) solution is to actually match each line. This solution never uses ECHO, not even in the default rule, so I've told flex to not generate a default rule:



    %options noyywrap noinput nounput nodefault
    %%
    int lino = 0;
    .*n? { printf("%8d %s", ++lino, yytext); }


    That's not quite perfect, because it will truncate lines which contain a NUL character. (That is, the printf will effectively truncate the line; the line will be parsed correctly.) To fix it, it's necessary to use fwrite instead of printf:



    %options noyywrap noinput nounput nodefault
    %%
    int lino = 0;
    .*n? { printf("%8d %s", ++lino);
    fwrite(yytext, 1, yyleng, yyout); }


    The newline is made optional (n?) in case the last line of the file is not terminated with a newline. Because (f)lex patterns never match zero characters, that rule is actually equivalent to the more precise but clunkier regular expression .*n|.+.








share|improve this answer























  • Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
    – oxnz
    Dec 10 '15 at 2:47










  • @oxnz: Unless you require yytext to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macro YY_FATAL_ERROR, which by default causes an error message to be printed to stderr followed by a call to exit() with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to build cat with (f)lex.
    – rici
    Dec 10 '15 at 3:09















up vote
2
down vote



accepted










(I agree with the comment by Joachim Pileborg that lex is not the tool for implementing cat. The rest of this answer is in the spirit of explaining a bit about lex.)





  1. The provided lex program will not work if there are empty lines in the input, because ^. does not match an empty line. (In lex, . does not match a newline character.) So a reasonably minimal (f)lex input file would be:



    %options noyywrap noinput nounput
    %%
    int lino = 0;
    ^(.|n) { printf("%8d %c", ++lino, *yytext); }


    Here, I just print out the matched token in the printf, which is the equivalent to using ECHO. So it does not really "eliminate" the ECHO.



  2. (f)lex rules must match at least one character. So it wouldn't really be possible for a pattern to consist only of $, any more than it would be possible for a pattern to consist only of ^ (which is a BOL anchor). In that sense, the answer to your question is simply "no".



  3. A more easily-understood (and probably more efficient) solution is to actually match each line. This solution never uses ECHO, not even in the default rule, so I've told flex to not generate a default rule:



    %options noyywrap noinput nounput nodefault
    %%
    int lino = 0;
    .*n? { printf("%8d %s", ++lino, yytext); }


    That's not quite perfect, because it will truncate lines which contain a NUL character. (That is, the printf will effectively truncate the line; the line will be parsed correctly.) To fix it, it's necessary to use fwrite instead of printf:



    %options noyywrap noinput nounput nodefault
    %%
    int lino = 0;
    .*n? { printf("%8d %s", ++lino);
    fwrite(yytext, 1, yyleng, yyout); }


    The newline is made optional (n?) in case the last line of the file is not terminated with a newline. Because (f)lex patterns never match zero characters, that rule is actually equivalent to the more precise but clunkier regular expression .*n|.+.








share|improve this answer























  • Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
    – oxnz
    Dec 10 '15 at 2:47










  • @oxnz: Unless you require yytext to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macro YY_FATAL_ERROR, which by default causes an error message to be printed to stderr followed by a call to exit() with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to build cat with (f)lex.
    – rici
    Dec 10 '15 at 3:09













up vote
2
down vote



accepted







up vote
2
down vote



accepted






(I agree with the comment by Joachim Pileborg that lex is not the tool for implementing cat. The rest of this answer is in the spirit of explaining a bit about lex.)





  1. The provided lex program will not work if there are empty lines in the input, because ^. does not match an empty line. (In lex, . does not match a newline character.) So a reasonably minimal (f)lex input file would be:



    %options noyywrap noinput nounput
    %%
    int lino = 0;
    ^(.|n) { printf("%8d %c", ++lino, *yytext); }


    Here, I just print out the matched token in the printf, which is the equivalent to using ECHO. So it does not really "eliminate" the ECHO.



  2. (f)lex rules must match at least one character. So it wouldn't really be possible for a pattern to consist only of $, any more than it would be possible for a pattern to consist only of ^ (which is a BOL anchor). In that sense, the answer to your question is simply "no".



  3. A more easily-understood (and probably more efficient) solution is to actually match each line. This solution never uses ECHO, not even in the default rule, so I've told flex to not generate a default rule:



    %options noyywrap noinput nounput nodefault
    %%
    int lino = 0;
    .*n? { printf("%8d %s", ++lino, yytext); }


    That's not quite perfect, because it will truncate lines which contain a NUL character. (That is, the printf will effectively truncate the line; the line will be parsed correctly.) To fix it, it's necessary to use fwrite instead of printf:



    %options noyywrap noinput nounput nodefault
    %%
    int lino = 0;
    .*n? { printf("%8d %s", ++lino);
    fwrite(yytext, 1, yyleng, yyout); }


    The newline is made optional (n?) in case the last line of the file is not terminated with a newline. Because (f)lex patterns never match zero characters, that rule is actually equivalent to the more precise but clunkier regular expression .*n|.+.








share|improve this answer














(I agree with the comment by Joachim Pileborg that lex is not the tool for implementing cat. The rest of this answer is in the spirit of explaining a bit about lex.)





  1. The provided lex program will not work if there are empty lines in the input, because ^. does not match an empty line. (In lex, . does not match a newline character.) So a reasonably minimal (f)lex input file would be:



    %options noyywrap noinput nounput
    %%
    int lino = 0;
    ^(.|n) { printf("%8d %c", ++lino, *yytext); }


    Here, I just print out the matched token in the printf, which is the equivalent to using ECHO. So it does not really "eliminate" the ECHO.



  2. (f)lex rules must match at least one character. So it wouldn't really be possible for a pattern to consist only of $, any more than it would be possible for a pattern to consist only of ^ (which is a BOL anchor). In that sense, the answer to your question is simply "no".



  3. A more easily-understood (and probably more efficient) solution is to actually match each line. This solution never uses ECHO, not even in the default rule, so I've told flex to not generate a default rule:



    %options noyywrap noinput nounput nodefault
    %%
    int lino = 0;
    .*n? { printf("%8d %s", ++lino, yytext); }


    That's not quite perfect, because it will truncate lines which contain a NUL character. (That is, the printf will effectively truncate the line; the line will be parsed correctly.) To fix it, it's necessary to use fwrite instead of printf:



    %options noyywrap noinput nounput nodefault
    %%
    int lino = 0;
    .*n? { printf("%8d %s", ++lino);
    fwrite(yytext, 1, yyleng, yyout); }


    The newline is made optional (n?) in case the last line of the file is not terminated with a newline. Because (f)lex patterns never match zero characters, that rule is actually equivalent to the more precise but clunkier regular expression .*n|.+.









share|improve this answer














share|improve this answer



share|improve this answer








edited May 23 '17 at 12:30









Community

11




11










answered Dec 9 '15 at 20:24









rici

150k19130194




150k19130194












  • Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
    – oxnz
    Dec 10 '15 at 2:47










  • @oxnz: Unless you require yytext to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macro YY_FATAL_ERROR, which by default causes an error message to be printed to stderr followed by a call to exit() with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to build cat with (f)lex.
    – rici
    Dec 10 '15 at 3:09


















  • Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
    – oxnz
    Dec 10 '15 at 2:47










  • @oxnz: Unless you require yytext to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macro YY_FATAL_ERROR, which by default causes an error message to be printed to stderr followed by a call to exit() with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to build cat with (f)lex.
    – rici
    Dec 10 '15 at 3:09
















Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
– oxnz
Dec 10 '15 at 2:47




Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
– oxnz
Dec 10 '15 at 2:47












@oxnz: Unless you require yytext to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macro YY_FATAL_ERROR, which by default causes an error message to be printed to stderr followed by a call to exit() with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to build cat with (f)lex.
– rici
Dec 10 '15 at 3:09




@oxnz: Unless you require yytext to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macro YY_FATAL_ERROR, which by default causes an error message to be printed to stderr followed by a call to exit() with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to build cat with (f)lex.
– rici
Dec 10 '15 at 3:09


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f34178021%2fhow-to-match-the-line-begin%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

Calculate evaluation metrics using cross_val_predict sklearn

Insert data from modal to MySQL (multiple modal on website)