Re: a |=b, a | b and a || b (why not a ||= b?)

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Nick Malik (nickmalik_at_hotmail.nospam.com)
Date: 12/14/04


Date: Tue, 14 Dec 2004 16:03:15 GMT

First off, I apoligize if my post wasn't clear. I think that we may be
discussing the same thing using slightly different words.

The part of the parser that is made more complicated by a three-symbol
look-ahead is the lexical scanner. My statement wasn't clear. I also use
the term lexical scanner interchangably with lexical analyzer (which is
probably not "correct" ... alas :-). So we are talking about the same
thing. I've always considered the lexical scanner to be PART of the parser
and not seperate from it, and that may also have led to some of the
'language confusion' in my message.

"Frans Bouma [C# MVP]" <perseus.usenetNOSPAM@xs4all.nl> wrote in message
news:%23whXDRc4EHA.2568@TK2MSFTNGP10.phx.gbl...
> Nick Malik wrote:
> > because a three-symbol operator like ||= would require a three-symbol
> > lookahead in the parser, which makes the lexical scanner more complex.
>
> This is nonsense. You have a lexical analyzer, which converts text in a
> textstream into tokens in a tokenstream. That tokenstream goes into the
> parser. So the parser works with tokens. All it sees is: a OP b and OP
> is an operator, in this case '||=', but it also could have been
> '--------->'.
>
> A lexical analyzer works with an NFA, so it doesn't matter how long an
> operator is, or identifier for that matter, as long as there is an
> unabiguous way to identify a token. In this case, there is, as ||= can
> be seen as a separate token, as '=' is not allowed to be part of any
> identifier.
>

It does matter how long a token is in this case, since the operator ||= can
be scanned in many ways:
1) It can be scanned as three valid tokens t('|') t('|') t('=')
2) It can be scanned as two valid tokens t('||') t('=')
3) It can be scanned as two valid tokens t('|') t('|=')
4) It can be scanned as a single token t('||=')

the logic needed to get to the third scan is more difficult to produce.
This is not a simple (deterministic) finite state automata. As you pointed
out, it is non-deterministic.

I have created a lexical analyzer for a production system (not a student
exercise) using symbol tables. I personally found it challenging to debug
any changes to the symbol table for two-element lookahead, and three-symbol
lookahead introduced so much concern that I withdrew three symbol tokens
from the language. The trickiest part is distinguishing between items 2 and
3 above, because most lexical analyzers that aren't hand-coded would have a
terrible time distinguishing, on a routine basis, between them.

It is simple as this: the more complex the logic needed, the greater the
liklihood of bugs and the lower the support of the tools (like tools that
create two-symbol lookahead tables). Each bug costs money to find and fix.
Therefore, increasing complexity for the sake of elegance at the expense of
the project is simply poor judgement on the part of the project manager.

> > Given the fact that the optimized version of the two forms (a ||= b and
a=a
> > || b) would very likely produce the same IL code, it doesn't make sense
to
> > make the lexical scanner more complicated.
>
> also that's IMHO a non-valid reason to disallow ||= (and &&=). Just
> because bitoperators on bools have the same effect as logical operators
> on bools doesn't make this right.

I wasn't discussing bit operators. I'm concerned that this discussion would
jump off the rails quickly if I were to judge the language itself, which I
am not. I am only discussing the difficulty of debugging the lexical
scanner.

> I for one would opt for a ||= and &&= operator. I don't buy the 'a=a||b
> is more readable' argument as a+=b then also should be disallowed.

I am also not making a case for or against readability or clarity in the
language. As long as the functionality is there, I'm happy.

> For the people who think a||=b is not that common, write 2 inner search
> loops, to track that you've found a match you can set a bool foundOne to
> false and OR that with the current match result (which returns false if
> not equal, very common). Saves if statements.

I am also not making a case for how useful this particular three-token
construct may seem to be. I would state, however, that there are many
_useful_ things that currently require more than a single operator. We
could debate endlessly on the "comparitive usefulness" of a myriad of
operators that are designed to make the language more useful, in our own
personal opinion. Remember that your opinion about "what is useful" may not
be universally shared, as I'm sure that mine wouldn't either.

Even if it were, it would not change my concerns about the complexity of the
scanner.

>
> But then again, how many people write:
> if(a)
> {
> b=10;
> }
> else
> {
> b=12;
> }
>
> as:
> b=12;
> if(a)
> {
> b=10;
> }
>

An interesting question, but I don't see how this is salient to my point. I
hope you don't mind if I don't attempt to respond to it.

HTH,
--- Nick



Relevant Pages

  • Re: Passing block to Proc#call
    ... tokens with various tags, and I'm matching based on the tags. ... then increment the position index by one). ... # this scanner matches, have the tokens processed ...
    (comp.lang.ruby)
  • Re: parser
    ... with a lexer that would just scan one-character tokens. ... (let ((next-char (read-char stream nil nil))) ... (defmethod advance ((self scanner)) ... (defun expression (scanner) ...
    (comp.programming)
  • Re: parser
    ... are they designed is my question; in a tree design. ... with a lexer that would just scan one-character tokens. ... (defmethod advance ((self scanner)) ... (defun expression (scanner) ...
    (comp.programming)
  • Re: Can this be restructured ?
    ... The structure you describe is very common for lexical scanners. ... Anyone who has seen a lexical scanner interface will recognise the ... Scanner separates characters into tokens, ... CHAR; (* Character immediately following the last ...
    (comp.programming)
  • Re: scanners + software + Math data and symbols ...
    ... Say, you have documents containing tables of data, algebraic equations, graph functions and such mathematical devices. ... Which scanner would you suggest and/or which method of converting such to, say, LaTeX or MathML? ... It's my favorite scripting language ... modifying "bash" scripts for simple things, ...
    (sci.math)