How to match the line begin?
up vote
1
down vote
favorite
I was writing the cat(1)
utility with lex.
When I consider how to implement option -n
, i.e. number every line.
but I have to write something like this:
^. {
printf("%8d ", ++lino);
ECHO;
}
I know the end of line(EOL) could matched use anchor $
and n
, so I wonder if there's something alike to match the begin of line(BOL) anchor, so I don't have to use the ECHO;
compiler-construction lex
add a comment |
up vote
1
down vote
favorite
I was writing the cat(1)
utility with lex.
When I consider how to implement option -n
, i.e. number every line.
but I have to write something like this:
^. {
printf("%8d ", ++lino);
ECHO;
}
I know the end of line(EOL) could matched use anchor $
and n
, so I wonder if there's something alike to match the begin of line(BOL) anchor, so I don't have to use the ECHO;
compiler-construction lex
1
Using something likelex
to create a simplecat
utility is overkill and probably not very efficient and definitely will make things more complicated. Why not simply read the file into a large buffer, and extract lines from the buffer and print them tostdout
? Then it's very easy to handle line numbers, just keep a counter and print the number before each line.
– Some programmer dude
Dec 9 '15 at 11:47
Thank you for your advice. I'm not going to use thiscat
, not really. I just try to figure out how lex works. I'm diving into something like compier-craft stuff.
– oxnz
Dec 10 '15 at 2:49
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I was writing the cat(1)
utility with lex.
When I consider how to implement option -n
, i.e. number every line.
but I have to write something like this:
^. {
printf("%8d ", ++lino);
ECHO;
}
I know the end of line(EOL) could matched use anchor $
and n
, so I wonder if there's something alike to match the begin of line(BOL) anchor, so I don't have to use the ECHO;
compiler-construction lex
I was writing the cat(1)
utility with lex.
When I consider how to implement option -n
, i.e. number every line.
but I have to write something like this:
^. {
printf("%8d ", ++lino);
ECHO;
}
I know the end of line(EOL) could matched use anchor $
and n
, so I wonder if there's something alike to match the begin of line(BOL) anchor, so I don't have to use the ECHO;
compiler-construction lex
compiler-construction lex
edited Nov 22 at 0:11
Mateusz Piotrowski
3,74663050
3,74663050
asked Dec 9 '15 at 11:44
oxnz
577313
577313
1
Using something likelex
to create a simplecat
utility is overkill and probably not very efficient and definitely will make things more complicated. Why not simply read the file into a large buffer, and extract lines from the buffer and print them tostdout
? Then it's very easy to handle line numbers, just keep a counter and print the number before each line.
– Some programmer dude
Dec 9 '15 at 11:47
Thank you for your advice. I'm not going to use thiscat
, not really. I just try to figure out how lex works. I'm diving into something like compier-craft stuff.
– oxnz
Dec 10 '15 at 2:49
add a comment |
1
Using something likelex
to create a simplecat
utility is overkill and probably not very efficient and definitely will make things more complicated. Why not simply read the file into a large buffer, and extract lines from the buffer and print them tostdout
? Then it's very easy to handle line numbers, just keep a counter and print the number before each line.
– Some programmer dude
Dec 9 '15 at 11:47
Thank you for your advice. I'm not going to use thiscat
, not really. I just try to figure out how lex works. I'm diving into something like compier-craft stuff.
– oxnz
Dec 10 '15 at 2:49
1
1
Using something like
lex
to create a simple cat
utility is overkill and probably not very efficient and definitely will make things more complicated. Why not simply read the file into a large buffer, and extract lines from the buffer and print them to stdout
? Then it's very easy to handle line numbers, just keep a counter and print the number before each line.– Some programmer dude
Dec 9 '15 at 11:47
Using something like
lex
to create a simple cat
utility is overkill and probably not very efficient and definitely will make things more complicated. Why not simply read the file into a large buffer, and extract lines from the buffer and print them to stdout
? Then it's very easy to handle line numbers, just keep a counter and print the number before each line.– Some programmer dude
Dec 9 '15 at 11:47
Thank you for your advice. I'm not going to use this
cat
, not really. I just try to figure out how lex works. I'm diving into something like compier-craft stuff.– oxnz
Dec 10 '15 at 2:49
Thank you for your advice. I'm not going to use this
cat
, not really. I just try to figure out how lex works. I'm diving into something like compier-craft stuff.– oxnz
Dec 10 '15 at 2:49
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
(I agree with the comment by Joachim Pileborg that lex
is not the tool for implementing cat
. The rest of this answer is in the spirit of explaining a bit about lex
.)
The provided lex program will not work if there are empty lines in the input, because
^.
does not match an empty line. (In lex,.
does not match a newline character.) So a reasonably minimal (f)lex input file would be:
%options noyywrap noinput nounput
%%
int lino = 0;
^(.|n) { printf("%8d %c", ++lino, *yytext); }
Here, I just print out the matched token in the
printf
, which is the equivalent to usingECHO
. So it does not really "eliminate" theECHO
.
(f)lex rules must match at least one character. So it wouldn't really be possible for a pattern to consist only of
$
, any more than it would be possible for a pattern to consist only of^
(which is a BOL anchor). In that sense, the answer to your question is simply "no".
A more easily-understood (and probably more efficient) solution is to actually match each line. This solution never uses
ECHO
, not even in the default rule, so I've told flex to not generate a default rule:
%options noyywrap noinput nounput nodefault
%%
int lino = 0;
.*n? { printf("%8d %s", ++lino, yytext); }
That's not quite perfect, because it will truncate lines which contain a NUL character. (That is, the
printf
will effectively truncate the line; the line will be parsed correctly.) To fix it, it's necessary to usefwrite
instead ofprintf
:
%options noyywrap noinput nounput nodefault
%%
int lino = 0;
.*n? { printf("%8d %s", ++lino);
fwrite(yytext, 1, yyleng, yyout); }
The newline is made optional (
n?
) in case the last line of the file is not terminated with a newline. Because (f)lex patterns never match zero characters, that rule is actually equivalent to the more precise but clunkier regular expression.*n|.+
.
Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
– oxnz
Dec 10 '15 at 2:47
@oxnz: Unless you requireyytext
to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macroYY_FATAL_ERROR
, which by default causes an error message to be printed tostderr
followed by a call toexit()
with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to buildcat
with (f)lex.
– rici
Dec 10 '15 at 3:09
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
(I agree with the comment by Joachim Pileborg that lex
is not the tool for implementing cat
. The rest of this answer is in the spirit of explaining a bit about lex
.)
The provided lex program will not work if there are empty lines in the input, because
^.
does not match an empty line. (In lex,.
does not match a newline character.) So a reasonably minimal (f)lex input file would be:
%options noyywrap noinput nounput
%%
int lino = 0;
^(.|n) { printf("%8d %c", ++lino, *yytext); }
Here, I just print out the matched token in the
printf
, which is the equivalent to usingECHO
. So it does not really "eliminate" theECHO
.
(f)lex rules must match at least one character. So it wouldn't really be possible for a pattern to consist only of
$
, any more than it would be possible for a pattern to consist only of^
(which is a BOL anchor). In that sense, the answer to your question is simply "no".
A more easily-understood (and probably more efficient) solution is to actually match each line. This solution never uses
ECHO
, not even in the default rule, so I've told flex to not generate a default rule:
%options noyywrap noinput nounput nodefault
%%
int lino = 0;
.*n? { printf("%8d %s", ++lino, yytext); }
That's not quite perfect, because it will truncate lines which contain a NUL character. (That is, the
printf
will effectively truncate the line; the line will be parsed correctly.) To fix it, it's necessary to usefwrite
instead ofprintf
:
%options noyywrap noinput nounput nodefault
%%
int lino = 0;
.*n? { printf("%8d %s", ++lino);
fwrite(yytext, 1, yyleng, yyout); }
The newline is made optional (
n?
) in case the last line of the file is not terminated with a newline. Because (f)lex patterns never match zero characters, that rule is actually equivalent to the more precise but clunkier regular expression.*n|.+
.
Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
– oxnz
Dec 10 '15 at 2:47
@oxnz: Unless you requireyytext
to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macroYY_FATAL_ERROR
, which by default causes an error message to be printed tostderr
followed by a call toexit()
with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to buildcat
with (f)lex.
– rici
Dec 10 '15 at 3:09
add a comment |
up vote
2
down vote
accepted
(I agree with the comment by Joachim Pileborg that lex
is not the tool for implementing cat
. The rest of this answer is in the spirit of explaining a bit about lex
.)
The provided lex program will not work if there are empty lines in the input, because
^.
does not match an empty line. (In lex,.
does not match a newline character.) So a reasonably minimal (f)lex input file would be:
%options noyywrap noinput nounput
%%
int lino = 0;
^(.|n) { printf("%8d %c", ++lino, *yytext); }
Here, I just print out the matched token in the
printf
, which is the equivalent to usingECHO
. So it does not really "eliminate" theECHO
.
(f)lex rules must match at least one character. So it wouldn't really be possible for a pattern to consist only of
$
, any more than it would be possible for a pattern to consist only of^
(which is a BOL anchor). In that sense, the answer to your question is simply "no".
A more easily-understood (and probably more efficient) solution is to actually match each line. This solution never uses
ECHO
, not even in the default rule, so I've told flex to not generate a default rule:
%options noyywrap noinput nounput nodefault
%%
int lino = 0;
.*n? { printf("%8d %s", ++lino, yytext); }
That's not quite perfect, because it will truncate lines which contain a NUL character. (That is, the
printf
will effectively truncate the line; the line will be parsed correctly.) To fix it, it's necessary to usefwrite
instead ofprintf
:
%options noyywrap noinput nounput nodefault
%%
int lino = 0;
.*n? { printf("%8d %s", ++lino);
fwrite(yytext, 1, yyleng, yyout); }
The newline is made optional (
n?
) in case the last line of the file is not terminated with a newline. Because (f)lex patterns never match zero characters, that rule is actually equivalent to the more precise but clunkier regular expression.*n|.+
.
Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
– oxnz
Dec 10 '15 at 2:47
@oxnz: Unless you requireyytext
to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macroYY_FATAL_ERROR
, which by default causes an error message to be printed tostderr
followed by a call toexit()
with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to buildcat
with (f)lex.
– rici
Dec 10 '15 at 3:09
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
(I agree with the comment by Joachim Pileborg that lex
is not the tool for implementing cat
. The rest of this answer is in the spirit of explaining a bit about lex
.)
The provided lex program will not work if there are empty lines in the input, because
^.
does not match an empty line. (In lex,.
does not match a newline character.) So a reasonably minimal (f)lex input file would be:
%options noyywrap noinput nounput
%%
int lino = 0;
^(.|n) { printf("%8d %c", ++lino, *yytext); }
Here, I just print out the matched token in the
printf
, which is the equivalent to usingECHO
. So it does not really "eliminate" theECHO
.
(f)lex rules must match at least one character. So it wouldn't really be possible for a pattern to consist only of
$
, any more than it would be possible for a pattern to consist only of^
(which is a BOL anchor). In that sense, the answer to your question is simply "no".
A more easily-understood (and probably more efficient) solution is to actually match each line. This solution never uses
ECHO
, not even in the default rule, so I've told flex to not generate a default rule:
%options noyywrap noinput nounput nodefault
%%
int lino = 0;
.*n? { printf("%8d %s", ++lino, yytext); }
That's not quite perfect, because it will truncate lines which contain a NUL character. (That is, the
printf
will effectively truncate the line; the line will be parsed correctly.) To fix it, it's necessary to usefwrite
instead ofprintf
:
%options noyywrap noinput nounput nodefault
%%
int lino = 0;
.*n? { printf("%8d %s", ++lino);
fwrite(yytext, 1, yyleng, yyout); }
The newline is made optional (
n?
) in case the last line of the file is not terminated with a newline. Because (f)lex patterns never match zero characters, that rule is actually equivalent to the more precise but clunkier regular expression.*n|.+
.
(I agree with the comment by Joachim Pileborg that lex
is not the tool for implementing cat
. The rest of this answer is in the spirit of explaining a bit about lex
.)
The provided lex program will not work if there are empty lines in the input, because
^.
does not match an empty line. (In lex,.
does not match a newline character.) So a reasonably minimal (f)lex input file would be:
%options noyywrap noinput nounput
%%
int lino = 0;
^(.|n) { printf("%8d %c", ++lino, *yytext); }
Here, I just print out the matched token in the
printf
, which is the equivalent to usingECHO
. So it does not really "eliminate" theECHO
.
(f)lex rules must match at least one character. So it wouldn't really be possible for a pattern to consist only of
$
, any more than it would be possible for a pattern to consist only of^
(which is a BOL anchor). In that sense, the answer to your question is simply "no".
A more easily-understood (and probably more efficient) solution is to actually match each line. This solution never uses
ECHO
, not even in the default rule, so I've told flex to not generate a default rule:
%options noyywrap noinput nounput nodefault
%%
int lino = 0;
.*n? { printf("%8d %s", ++lino, yytext); }
That's not quite perfect, because it will truncate lines which contain a NUL character. (That is, the
printf
will effectively truncate the line; the line will be parsed correctly.) To fix it, it's necessary to usefwrite
instead ofprintf
:
%options noyywrap noinput nounput nodefault
%%
int lino = 0;
.*n? { printf("%8d %s", ++lino);
fwrite(yytext, 1, yyleng, yyout); }
The newline is made optional (
n?
) in case the last line of the file is not terminated with a newline. Because (f)lex patterns never match zero characters, that rule is actually equivalent to the more precise but clunkier regular expression.*n|.+
.
edited May 23 '17 at 12:30
Community♦
11
11
answered Dec 9 '15 at 20:24
rici
150k19130194
150k19130194
Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
– oxnz
Dec 10 '15 at 2:47
@oxnz: Unless you requireyytext
to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macroYY_FATAL_ERROR
, which by default causes an error message to be printed tostderr
followed by a call toexit()
with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to buildcat
with (f)lex.
– rici
Dec 10 '15 at 3:09
add a comment |
Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
– oxnz
Dec 10 '15 at 2:47
@oxnz: Unless you requireyytext
to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macroYY_FATAL_ERROR
, which by default causes an error message to be printed tostderr
followed by a call toexit()
with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to buildcat
with (f)lex.
– rici
Dec 10 '15 at 3:09
Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
– oxnz
Dec 10 '15 at 2:47
Thank you for your answer, it's just awesome. But there still one question in my head, what if the line goes too large, just image that occasionally invoke this cat again a big binary file which may contains very long line?
– oxnz
Dec 10 '15 at 2:47
@oxnz: Unless you require
yytext
to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macro YY_FATAL_ERROR
, which by default causes an error message to be printed to stderr
followed by a call to exit()
with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to build cat
with (f)lex.– rici
Dec 10 '15 at 3:09
@oxnz: Unless you require
yytext
to be an array (which you shouldn't do; the feature only exists for legacy), then (f)lex will continue to reallocate the token buffer as necessary. If it runs out of memory trying to do that, it will throw a fatal error by invoking the macro YY_FATAL_ERROR
, which by default causes an error message to be printed to stderr
followed by a call to exit()
with a non-zero argument. Although that's unlikely, as would be an attempt to print a big binary file with line numbers, it is yet another reason not to build cat
with (f)lex.– rici
Dec 10 '15 at 3:09
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f34178021%2fhow-to-match-the-line-begin%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Using something like
lex
to create a simplecat
utility is overkill and probably not very efficient and definitely will make things more complicated. Why not simply read the file into a large buffer, and extract lines from the buffer and print them tostdout
? Then it's very easy to handle line numbers, just keep a counter and print the number before each line.– Some programmer dude
Dec 9 '15 at 11:47
Thank you for your advice. I'm not going to use this
cat
, not really. I just try to figure out how lex works. I'm diving into something like compier-craft stuff.– oxnz
Dec 10 '15 at 2:49