Building a Math Parser from Scratch in Kotlin
How I built a recursive descent parser for Smart Calculator that handles nested expressions, operator precedence, and edge cases — without any external libraries.
When I started building Smart Calculator, I needed a way to evaluate math expressions like 3 + 5 * (2 - 1). My first instinct was to grab a library. Then I looked at the options — most were Java-based, bloated, and added 2-3 MB to the APK. For a calculator app targeting under 5 MB total, that wasn't going to work.
So I built my own. Here's how.
Why Not Just Use a Library?
The popular options for math evaluation on Android are exp4j, mXparser, and Javaluator. They work, but they come with baggage. mXparser alone adds over 1.5 MB to your APK. They also support hundreds of functions I'd never need — hyperbolic trig, combinatorics, custom operators. My calculator needed basic arithmetic, percentages, square roots, and trigonometry. Nothing more.
Building a custom parser meant I could keep the APK under 5 MB and have complete control over error messages when users type something invalid.
Step 1: Tokenization
Before you can evaluate 3 + 5 * 2, you need to break it into tokens: 3, +, 5, *, 2. This sounds simple until you hit edge cases. What about -5? Is that a negative number or a subtraction from nothing? What about .5 without a leading zero? What about 3(5) where the multiplication is implied?
My tokenizer scans the input string character by character. It groups digits and decimal points into number tokens. It recognizes operators, parentheses, and function names like sin, cos, sqrt. The key insight was handling implicit multiplication — when a number is directly followed by an opening parenthesis, I insert a multiplication token automatically.
The tokenizer produces an array of typed tokens: NumberToken, OperatorToken, FunctionToken, ParenToken. Each carries its value and position in the original string (useful for error highlighting later).
Step 2: Recursive Descent Parsing
A recursive descent parser mirrors the grammar of mathematical expressions directly in code. The grammar has natural precedence: addition and subtraction bind loosely, multiplication and division bind tighter, and exponentiation binds tightest. Parentheses override everything.
The parser has three levels. The parseExpression function handles addition and subtraction. It calls parseTerm, which handles multiplication and division. parseTerm calls parseFactor, which handles numbers, parenthesized sub-expressions, unary negation, and function calls.
Each function consumes tokens from left to right, building a result as it goes. When parseTerm encounters a *, it evaluates the right side by calling parseFactor and multiplies. This naturally enforces operator precedence without any special logic — the structure of the recursion handles it.
Step 3: The Edge Cases That Broke Everything
The basic parser worked in a day. The edge cases took a week.
Division by zero was the obvious one. I returned Infinity for n / 0 and NaN for 0 / 0. But I also had to handle displaying these results — showing "Undefined" instead of "NaN" and the infinity symbol instead of "Infinity".
Nested parentheses like ((((5)))) should work fine, but unbalanced ones like ((5) needed clear error messages. I added a parenthesis depth counter during tokenization that rejects expressions before parsing even starts.
Implicit multiplication had subtle bugs. The expression 2(3)(4) should produce 24, but my first implementation treated (3)(4) as a function call instead of two separate multiplications. The fix was checking the token before an opening parenthesis — if it's a number or a closing paren, insert a multiplication.
Percentage was the sneakiest. Users expect 100 + 10% to equal 110 (not 100.1). This means percentage isn't a simple division by 100 — it's relative to the preceding value. I had to add special handling where % modifies the previous operand based on context.
Step 4: Performance
A math parser for a calculator doesn't need to be fast — nobody types expressions with thousands of operations. But I still ran into a performance issue: my first version created a new String object for every character during tokenization. On older devices, evaluating a 50-character expression allocated over 200 objects and triggered garbage collection mid-calculation.
The fix was switching to CharArray operations and pre-allocating the token list based on input length. Evaluation went from 12ms to under 1ms. Overkill? Probably. But it felt good.
What I'd Do Differently
If I built this again, I'd generate an Abstract Syntax Tree (AST) instead of evaluating directly during parsing. An AST lets you do things like simplify expressions, show step-by-step solutions, and cache sub-expressions. My direct-evaluation approach works perfectly for a calculator, but it's a dead end if you want to build anything more complex on top of it.
I'd also write more tests upfront. I ended up with over 80 test cases, but most were written after users found bugs. Test-driven development would have caught the percentage issue and the implicit multiplication bug before they ever shipped.
The takeaway: Don't reach for a library by default. Sometimes the problem is small enough that rolling your own gives you better performance, smaller binaries, and deeper understanding. Just budget extra time for edge cases — they'll take longer than the core implementation.