Just about every programming language has operators, right? An operator is a piece of syntax (or a function that looks like a piece of syntax) that operates on one or more terms. Those terms can themselves be composed out of terms and operators, of course. The most familiar operators are the arithmetical ones (+
, -
, *
, etc.), but programming languages require other operators for a whole host of other things like function definitions, message passing, declarations, etc.
So what is the tersest operator? It is the operator that is the easiest to type. The easiest operator to type is the one you don't have to type: the operator that isn't there. When you stick two terms next to one another with no operator inbetween, that is called concatenation. Some languages are called Concatenative Languages because they believe their use of concatenation is the most fundamental use.
In a well designed language, the things you type most often tend to be the shortest ones. So the invisible concatenation operator tends to be the thing you want to do the most often, or at least something that happens commonly. Hence:
- In Haskell and family, concatenation is used to apply a function.
- In C and family, it’s used to declare a name with a type (though it’s parsed weirdly).
- In Smalltalk and family, putting a word after something sends a message to it (calls a method on it).
- In algebra, putting two letters next to one another means multiplication.
- In regular expressions, putting two atoms together does sequential matching.
- In Perl, putting two terms in a row is a syntax error (also in Python).
That last one is great. Actually, Perl isn't that bad, because it has noun markers. Leaving the &
sigil off of a function name makes it a prefix operator rather than a term, and subsequent terms are arguments to the function. Thus function calls are just as terse as in Haskell (but with the opposite precedence and associativity). Furthermore, as its inventor would like to point out, the alternation of terms and operators forms a sort of self-clocking mechanism, like many natural languages have.
The question of this operator is one of the things I am considering regarding my language (which I shall eventually have to name). I am mostly fluctuating between C's usage and Haskell's usage. If I adopt concatenation for function calls, type annotations will require a :
like in ML. If I adopt concatenation for type annotations, function calls will require parens. Let’s contrive some sample code for both.
number_finder s:Str = { .match = find_number s .value : Int = match.parse } number_finder (Str s) = { .match = find_number(s) Int .value = match.parse }
(Some requisite background info: this is a prototype-based object system. Members of an object are made public by putting a .
before them when you declare them.)
Both uses of concatenation have their advantages and disadvantages, but we must also consider the nature of language when deciding. In Haskell pretty much every operation is a function call. But in this language, many calculations are performed with method calls instead, and an infix .
is almost as easy to type as a space. In addition, there will be less currying than in Haskell. So using concatenation for function calls won't gain as much as it does in Haskell. On the flip side, using it for type annotations won't gain as much as it does in C, because type annotations aren't as necessary as they are in C (notice we left it off of .match).
The third alternative, I suppose, is using concatenation for method calls, like in Smalltalk and Self. However, this is a point at which I believe other linguistic concerns also start affecting the picture. Having a .
between method calls creates a feeling of high-precedence cohesion that a space would break. In addition, it would nullify the sweet syntax of using a prefix .
in a member declaration to make that member public. Also, again, .
is also really easy to type.
What are the non-terseness factors affecting the other choices? One of the principles of clarity is to make the more important parts appear more prominent. When you’re declaring a member of an object, the name is usually the most important part, which suggests that value : Int = (...)
is more clear than Int value = (...)
, because it puts the variable's name first and all the names in the object will line up. But object members aren't the only names you're declaring. When you declare a function parameter, usually the type is more important than the name (at least, from an outsider's perspective). This would suggest that find (Str area, Int num)
is more clear than find (area:Str, num:Int)
. And you’ll probably leave types off of member declarations more often than function parameters.
A language could conceivably use concatenation for both things, provided all the types are predeclared (so the parser knows whether the left side is a type or a function), but I hate predeclarations.
After some thought, I'm leaning toward the C usage, because I think it looks a little cleaner; perhaps I'm just more used to it because I learned C++ before Haskell—which reminds me that you shouldn't underestimate historical conventions either. Or maybe should I push the linguistic waterbed in a different direction like Perl does.