Subset of JSON parser in 150 lines of Haskell code

Intro

Recently I have started to learn Haskell at my spare time by solving codewars problems time-to-time. This post is about my solution for this kata, since I was impressed by power of applicative parsing and want to document things I have learned during solving.

NOTE: I am no way an expert in Haskell or FP languages, so my solution might be not the best/cleanest/etc.

The task

The task is simple: implement a parser for a subset of JSON. The only big difference from real json is no support for exponential numbers and unicode characters.

Language task suggest to implement has following (BNF)[https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form]:

string ::= "" | "chars"

chars  ::= char | char chars

char   ::= <ASCII character except for ">

number ::= int | int frac | '-' number

int    ::= digits
frac   ::= '.' digits

digits ::= digit | digit digits

digit  ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0'

object  ::= "{}" | '{' members '}'

members ::= pair | pair ',' members

pair    ::= string ':' value

array   ::= "[]" | '[' elements ']'

elements ::= value | value ',' elements

value = string | number | object | array | "true" | "false" | "null"

White-space characters are ignored.

And it’s required to implement function f :: String -> Value, where Value defined as follows:

data Value
  = String String
  | Number Double
  | Object [(Value, Value)] -- an association list -- only a `String` is valid as the index `Value`
  | Array [Value] -- not limited to identical primitive datatypes
  | Boolean Bool -- either `True` or `False`
  | Null
  deriving (Show)

Applicative Parser

Parser by its nature is a function that takes an input string and produces some value, leaving unparsed string untouched. Based on this parser type is defined as following:

newtype Parser a = Parser {runParser :: String -> Maybe (a, String)}

runParser is a function that consumes part of the input string and maybe produces pair of value of type a and rest of the unparsed string. In our case generic type a will be a Value introduced in previous paragraph.

Basic Parsing

Using type introduced above it’s possible to start parsing right away. The easiest thing to implement is to parser for character by predicate

parseCond :: (Char -> Bool) -> Parser Char
parseCond fn = Parser helper
  where
    helper [] = Nothing
    helper (x : xs)
      | fn x = Just (x, xs)
      | otherwise = Nothing

Which can be used like

λ> runParser (parseCond (== 'a')) "abcd"
Just ('a',"bcd")

As you can see, parser consumed one char and returned rest of the string unchanged. If predicate return False then parsing fails:

λ> runParser (parseCond (== 'a')) "bda"
Nothing

Since it will be required to parse single characters a lot, let’s define simple wrapper for parsing one character:

parseChar :: Char -> Parser Char
parseChar x = parseCond (== x)

Next thing it’s required to implement is string parser. A function to parse a string has type f :: [Char] -> Parser [Char] and we already have Parser Char. There should be a way to convert one to another! Hoogle points out there a traverse function, which has following signature:

class (Functor t, Foldable t) => Traversable t where
  traverse :: Applicative f => (a -> f b) -> t a -> f (t b)

Since strings are just list of Char, and list implements Traversable, this function does exactly what we want. If we change t to [], f to Parser and a and b to Char:

  traverse :: Applicative Parser => (Char -> Parser Char) -> [Char] -> Parser ([Char])

Bingo! There is one small problem left – Parser should implement Applicative.

Making parser applicative

Let’s look what Applicative is. Ghci says:

λ> :info Applicative 
type Applicative :: (* -> *) -> Constraint
class Functor f => Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b
  GHC.Base.liftA2 :: (a -> b -> c) -> f a -> f b -> f c
  (*>) :: f a -> f b -> f b
  (<*) :: f a -> f b -> f a
  {-# MINIMAL pure, ((<*>) | liftA2) #-}

To Parser type to be applicative it should be a Functor. Ok, but what is Functor?

type Functor :: (* -> *) -> Constraint
class Functor f where
  fmap :: (a -> b) -> f a -> f b
  (<$) :: a -> f b -> f a
  {-# MINIMAL fmap #-}

To be a Functor instance, Parser type should implement one function called fmap. Let’s do it!

Functor

To better understand what specific typeclasse does, I like to see what other types implement it and play a bit with them. Lucky me, Maybe implements Functor.

fmap has following signature fmap :: (a -> b) -> f a -> f b, which means it “injects” a function inside of a container, that contains a and produces new container that contains b. As an example for Maybe:

λ> fmap (+ 1) (Just 1)
Just 2
λ> fmap (+ 1) (Nothing)
Nothing
λ> fmap ord (Just 'a')
Just 97

If you are familiar with Rust, fmap is like and_then() for Options. For our Parser type it means, that it’s will be possible to change produced value to another one in case of successful parsing.

Functor instance looks like:

instance Functor Parser where
  fmap fn (Parser f) =
    Parser
      ( \x ->
          case f x of
            Just (val, other) -> Just (fn val, other)
            Nothing -> Nothing
      )

Also it can be rewritten more cleanly in do notation, since Maybe implement Monad

instance Functor Parser where
  fmap fn (Parser f) =
    Parser
      ( \x ->
          do
            (v, other) <- f x
            return (fn v, other)
      )

Now it’s possible to do following:

λ> runParser (fmap ord (parseChar 'a')) "abc"
Just (97,"bc")

ord was “injected” into Parser to produce ASCII number out of parsed ‘a’. Also note that fancy looking <$> operator is an alias for fmap. Haskell people like using esoteric symbols for operators for some reason.

Applicative

Since Parser now implements Functor, it’s possible to implement Applicative instance for it. Recap that it’s requied to implement 2 following functions

  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b

Let’s start with pure. This function just “wraps” value into container. For our parser it means not parsing anything and just returning passed value.

pure val = Parser (Just . (val,))

Next thing is more complex. <*> accepts function wrapped into container and applies it to another container, that holds a value. To better understand what’s going on, that’s how it works with Maybe:

λ> Just (+ 1) <*> Just 2
Just 3
λ> Just (+ 1) <*> Nothing
Nothing

Not scary at all: it’s just fmap, but function is wrapped into container. Let’s implement using do notation right away, since case of spaghetti would be unreadable.

  (Parser fn) <*> (Parser f) =
    Parser
      ( \x -> do
          (fn', other) <- fn x
          (v, other') <- f other
          return (fn' v, other')
      )

This gives our parser very powerful property – chaining. Now it’s possible to parse strings:

λ> runParser  (fmap (\old a b -> old ++ [a] ++ [b]) (pure "" :: Parser String) <*> parseChar 'a' <*> parseChar 'b') "abc"
Just ("ab","c")

Using this kind of construction for parsing string would be really annoying, but this problem is already solved by traverse function, that was mentioned before! So it’s possible to define string parser as follows:

parseString :: String -> Parser String
parseString = traverse parseChar

And test it:

λ> runParser (parseString "hello") "hello, world!"
Just ("hello",", world!")

Note that applicative gives up another 2 important operators: <* and *>. I understand them as follows “chain 2 applicatives, but pick the result of the one the arrow is pointing to.

As an example:

λ> runParser (parseString "hello" <* parseString "world") "helloworld"
Just ("hello","")
λ> runParser (parseString "hello" *> parseString "world") "helloworld"
Just ("world","")

We will need it for skipping white-spaces, for instance.

Alternative

The only thing left is to implement Alternative instance. Imagine parsing bool value in json. It could be represented via “true” or “false”. Current implementation of parser cannot choose between different values, since it parses until it finds mismatch of EOF.

Alternative is a way to choose between 2 applicatives.

class Applicative f => Alternative f where
  empty :: f a
  (<|>) :: f a -> f a -> f a

Operator <|> takes two applicatives and chooses one of them. For our parser this means we can try one and if it fails we fall back to another one:

instance Alternative Parser where
  empty = Parser (const Nothing)
  (Parser fn) <|> (Parser fn') =
    Parser
      ( \x ->
          case fn x of
            Nothing -> fn' x
            Just e -> Just e
      )

Now it’s possible to do something like this:

λ> runParser (parseString "hello" <|> parseString "world") "world hello"
Just ("world"," hello")

Parsing Json

Skipping spaces

Since Json is not space-sensitive format, parser should not take them into account. To skip them we need a parser that just reads all spaces.

skipSpace :: Parser String
skipSpace = parseWhile (parseChar ' ') <|> pure ""

There is a function parseWhile which I did not mention, but it is a parser that runs another parser until it does not return an error. Just in case there is spaces at all, there is <|> pure "" fallback which just returns an empty string, indicating that there were no spaces in input string.

Null

The easiest one to parse is Null value. Null value is represented as simple “null” string.

parseNull :: Parser Value
parseNull = const Null <$> parseString "null"

Null parser just parses “null” string and fmaps result to Null Json value.

Boolean

Boolean is represented with either “true” or “false” strings. We can use Alternative property of our parser to choose between “true” and “false” values.

parseBool :: Parser Value
parseBool = (\x -> if x == "true" then Boolean True else Boolean False) <$> (parseString "true" <|> parseString "false")

Number

Based on BNF grammar from the beginning of the article, there are 2 types of number: integers and doubles. Let’s look at their parsers implementation:

Integer parsing is quite straightforward: just read characters until they are numbers.

    intParser :: Parser String
    intParser = parseWhile (parseCond isNumber)

Double parser is a bit tricky, but idea is simple. The idea is to look at the floating point number as two integers, separated by ‘.’ character:

    doubleParser :: Parser String
    doubleParser = fmap concat intParser <*> parseString "." <*> intParser

And to handle negative numbers we need another parser that just tries to read ‘-’ and then double or integer:

    negativeInt :: Parser String
    negativeInt = fmap (++) (parseString "-") <*> (doubleParser <|> intParser)

Equipped with these 3 parsers, it’s quite easy to write generic parser for any number: just chain them all with <|> a pass result to read function which converts string to a number.

The whole number parsing function is shown below:

parseInt :: Parser Value
parseInt =
  Number . read <$> (skipSpace *> (doubleParser <|> intParserNotFromZero <|> negativeInt))
  where
    intParser, doubleParser, negativeInt :: Parser String

    negativeInt = fmap (++) (parseString "-") <*> (doubleParser <|> intParser)
    intParserNotFromZero = test *> parseWhile (parseCond isNumber)
    intParser = parseWhile (parseCond isNumber)
    doubleParser = fmap concat intParser <*> parseString "." <*> intParser
    concat a b c = a ++ b ++ c

String

In Json strings are represented as string in quotation marks. So parser should parse quotation mark and read everthing until the next quotation mark:

parseStringValue :: Parser Value
parseStringValue = String <$> (parseChar '\"' *> (parseWhile (parseCond (/= '\"')) <|> pure "") <* parseChar '\"')

Here we use *> and <* just to ignore parsed quotation marks, since the are not part of the string itself.

Array

Arrays in Json are untyped, so they could contain any types (like tuples in statically typed languages). Array are represented as list of valid Json object separated by ‘,’ in ‘[]’ brackets.

parseArray :: Parser Value
parseArray =
  Array
    <$> ( parseChar '['
            *> ((parseSep (skipSpace *> parseJson <* skipSpace) (parseString ",")) <|> pure [])
            <* parseChar ']'
        )

parseSep is another helper parser (like parseUntil) that adapts 2 parsers: one for a value and one for the separator and returns a list of parsed objects.

Object

Json object is a pairs like collection of pair "name" : <Json>, wrapped with curly braces, where Json is a valid json object.

First of all, let’s define pair parser, which will consume string, ‘:’ and any valid json object.

parsePair :: Parser (Value, Value)
parsePair =
  Parser
    ( \x -> do
        (str, other) <- runParser parseStringValue x
        (val, other') <- runParser (skipSpace *> parseChar ':' *> skipSpace *> parseJson) other
        return ((str, val), other')
    )

I am using do notation to make code readable, but <*> could be used here as well. Then parsing whole Json object is trivial. We need to parse curly braces and parse pairs separated by ‘,’.

parseObject :: Parser Value
parseObject =
  Object
    <$> ( parseChar '{'
            *> ((parseSep (skipSpace *> parsePair <* skipSpace) (parseString ",")) <|> pure [])
            <* parseChar '}'
        )

Conclusion

I was very impressed that whole parser is only 150 LoC. I am new to FP world, so I guess, it could be done even with less code, but at the end I learned a lot about Haskell types and stuff.

The whole code could be found here