Almost every major programming language uses precedence tables to determine the order in which operators are evaluated. These tables are quite useful, but they contain a lot of information that is hard to learn. Many programmers can never remember whether &&
is tighter or looser than ||
, or whether bitshift operators are tighter or looser than bitwise logic operators and arithmetic operators. And even if you can remember those rules, the next programmer to read your code won't necessarily remember them; and so style-conscious programmers generally use parentheses to denote the order of uncommon operations. Some language compilers even produce warnings if you rely on the precedence table for boolean logic operations.
In other words, there is a limit to the effective size of all precedence tables. Even if the table can distinguish between many different operators, the programmers cannot, and so they parenthesize. It would be nice to have another precedence rule that increases the effective range of precedences of operators. This rule should not rely on memorization, and it should let the programmer understand the order of operations from just looking briefly at the code. In addition, it should probably work alongside precedence tables, rather than discarding them entirely like Smalltalk's rule of always evaluating infixes from left-to-right. The rule I have come up with is for these two expressions to mean different things.
a<<b + c a << b+c
The difference in meaning should be obvious from looking at them: the first does the bitshift first and the second does the addition first. In other words, operators with space around them are considered to be of looser precedence than operators without space. This means the programming language is not free-form, because whitespace can do something besides just separating tokens.
This whitespace rule reduces the number of extra parentheses needed to write an expression, and it reduces the need to remember precedence tables. But more importantly, it just looks right. The reader's eyes prefer to scan chunks separated by whitespace, and process them as separate units. This rule makes the programming language read the same way the reader does, which is almost the definition of intuitive. The only cases in which this would lead to unexpected behavior would be if a programmer wrote something like
result = a+b * c
and expected the multiplication to be tighter than the addition. However, no style-conscious programmer would ever write code like that. Good programmers already pad looser operators more often than tighter operators.
It probably isn't accurate to say that even an amateur would never make this mistake; beginning programmers can be quite inconsistent with their spacing (more because they haven't developed an eye for code than because they don't know any better). At the same time, traditional precedence tables trip up beginners as well. Any programmer who bothers to read the precedence table should have enough spare neurons to learn the whitespace rule as well. The only situation I can see this being a real problem is with the combination of addition and multiplication, because people are taught that multiplication has tighter precedence than addition as early as elementary school. In that case, the whitespace precedence rule would be another thing that must be learned. But I believe it is an easy enough rule that learning wouldn't take much effort at all.
To be honest, there are some corner cases that need to be addressed. I don't expect them to be a major issue, but they are worth considering.
- How should we process infix operators with space on one side and not the other? I think the best choices are to either reserve that syntax exclusively for prefix and postfix operators, or to consider those equivalent to operators with space on both sides.
- What about commas and semicolons? It's conventional to put space after a comma but not before it, and even if there is no space on either side of it, people will still expect it to be of a very loose precedence. This expectation is learned from an early age due to the comma's usage in informal languages. So if the programming language considers commas and semicolons to be operators, they should be specially immune to whitespace rules.
- What about multi-part operators? The traditional example is the ternary operator
a ? b : c
. I think the easiest rule would be to consider the whitespace before the?
and after the:
for purposes of the whitespace precedence rule. Inconsistent whitespace around the?
might be a syntax error, or interpreted as a postfix?
if such an postfix has been defined. Alternatively, if the programmer cannot define their own multi-part operators, it may be acceptable to just ignore the whitespace rule for the built-in ones.
The motivation behind this alternative precedence rule is that programming is easier when the programmer and the computer can think in the same terms. In general it's easier for a human to think like a computer than for a computer to think like a human—which is why we need programming languages at all—but in this particular case of operator parsing, I think we can do much better at human-like thinking than we currently do.