operator fixity and precedence
2022-01-11 · 4 min read
You come across a Haskell/Idris/LEAN file littered with custom operators (read "delicious operator soup"):
parseJson = withObject "Foo" $ \v -> Foo <$> v .: "name" <*> v .: "owner" <*> v .: "deadline" <*> v .: "status" <*> v .:? "reco" <*> v .: "benefit" <*> v .: "cost" <*> v .: "residual"
How do you parse this without tearing your eyes out?
If setting fire to your computer and becoming a wheat farmer doesn't sound like a viable career path, then you should start by finding the list of operators for your language. Here are the ones defined in Idris' prelude:
module Prelude.Ops -- Numerical operators infix 6 ==, /=, <, <=, >, >= infixl 8 +, - infixl 9 *, / -- Boolean operators infixr 5 && infixr 4 || -- List and String operators infixr 7 ::, ++ infixl 7 :< -- Functor/Applicative/Monad/Algebra operators infixl 1 >>=, =<<, >>, >=>, <=<, <&> infixr 2 <|> infixl 3 <*>, *>, <* infixr 4 <$>, $>, <$ infixl 8 <+> -- Utility operators infixr 9 ., .: infixr 0 $ infixl 9 `div`, `mod`
and the documentation for
> :doc infix Fixity declarations: Operators can be assigned a priority level and associativity. During parsing operators with a higher priority will collect their arguments first and the declared associativity will inform how subterms are grouped. For instance the expression `a + b * c * d + e` is parsed as `(a + ((b * c) * d)) + e` because: `(+)` is at level 8 and associates to the left `(*)` is at level 9 and associates to the left
To understand how these expressions are parsed, we need to understand the properties of all operators.
- Arity (the number of inputs)
- Fixity (infix, prefix, postfix)
- Associativity (how parentheses are placed)
- Precedence (the order of operations between different operators)
Operator arity refers to the number of inputs of an operator. The most common arities with some examples:
- Nullary (zero inputs): less of an operator and more of a constant.
- Unary (one input): e.g., boolean negation
- Binary (two inputs): e.g., arithmetic
Operator fixity describes the operator's position in the list of arguments. There are three fixities:
- Prefix (operator comes before arguments): e.g., typical function application
plus 3 4or forced prefix form
(+) 3 4
- Infix (operator comes between arguments): e.g., all arithmetic
3 + 4
- Postfix (operator comes after arguments): e.g., RPN (Reverse Polish Notation) or concatinative languages
3 4 +
Operator associativity determines how the inner-most parentheses are placed when parsing an expression with the same operator (or same precedence).
For example, if you have the expression
a ◯ b ◯ c ◯ d with some operator
◯, there are 5 ways we can parse it:
((a ◯ b) ◯ c) ◯ d<-- left associative
a ◯ (b ◯ (c ◯ d))<-- right associative
(a ◯ b) ◯ (c ◯ d)
a ◯ ((b ◯ c) ◯ d)
(a ◯ (b ◯ c)) ◯ d
In other words, left associative means we place the inner-most parentheses around the left-most pair, while right associative prefers the right-most pair.
Operators can also be non-associative, which means
a ◯ b ◯ c should fail to parse. An example would be relational operators like
x <= y <= z, since
(x <= y) <= z would fail to type check, as you would be comparing a boolean to a
Operator precedence allows us to parse expressions containing multiple different operators without ambiguity. Precedence is a number; higher numbers mean higher priority when parsing.
I like to imagine parsing precedence for an expression in passes. The first pass we parse only operators of precedence 10. The next pass parses precedence 9, and so on. In each pass we ignore all operators below the current precedence level.
From school, we know that
* has a higher precedence than
+. In Idris2 you can see
* has precedence 9 while
+ has precedence 8. Consequently, the expression
100 - 5 * 9 + 111 is parsed like
base expr: 100 - 5 * 9 + 111 precedence 9: 100 - (5 * 9) + 111 precedence 8: (100 - (5 * 9)) + 111