Moving on to Real World OCaml

After finishing the first 4 chapters of OCaml from the very beginning, I feel I am ready to move on to other learning resources that assume more programming background.  After all, I can write the Fizzbuzz test now, right?  Real World OCaml and the OCaml tutorials also have the advantage of being more up to date.

Tools for OCaml

Base is an alternative libraries to supplement the compiler-supplied standard library (Core).  Besides Core and Base, there are many more libraries and tools for OCaml, which could be installed using the installation manager OPAM.  OPAM can also install many OCaml tools that you may need.  E.g., utop is a modern interactive command-line interface that supports command history, macro expansion, module completion etc.  Merlin is an editor enhancer that provides a number of advanced IDE-like features.  ocamlformat is a auto-formatter.  One advantage of using it is that it standardizes the formatting and facilitates code reviews or collaborations.

Installing utop on NixOS

My plan is to set up and install my work environment with OPAM.  I’ll be using nvim that is set up with Merlin features for writing programs.  My OS is Nix, and it’s not easy for me to get utop to run successfully.  I’m going to have a separate blog post about how to set up the OCaml work environment as specified in the installation instructions of Real World OCaml in NixOS. 

(Update: I have not successfully set up on NixOS but Jethro informed me he did and shared his experience here.  I decided to switch to Debian and have successfully set up an OCaml work environment there. )

More OCaml Basics from Real World OCaml

  • OCaml allows you to place underscores in the middle of numbers to improve readability. Note that underscores can be placed anywhere within a number, not just every three digits.
  • there are some constraints on what identifiers can be used for variable names. Punctuation is excluded, except for _ and, and variables must start with a lowercase letter or an underscore.
  • Modules:
    • Module names always start with a capital letter.
    • Modules contain functions which can be called with ModuleName.FunctionName.  E.g. Float.of_int refers to the of_int function contained in the Float module. It turns an argument of type int to type float.
    • Modules can also be opened with the open ModuleName syntax without explicitly qualifying by the module name each time a function in a module is called.  The open ModuleName syntax makes all the functions and operators in the module available.  E.g.
      let ratio x y =
        let open Float.O in
        of_int x / of_int y

      open Float.O opens the Float.O module in the function ratio which causes the standard int-only arithmetic operators to be shadowed locally in ratio.  This is why ‘/’ can be used instead of ‘/.’

    • Another syntax for a local opening of module is ModuleName(expression containing the module functions).  E.g. the following is equivalent to above:
      let ratio x y =
        Float.O.(of_int x / of_int y)

Type-Signature of a Function

  • The type-signature of a function separates each type with ->, and tells you the types of each argument in order, with the type of what the function returns after the last ->.  E.g.
    val ratio : int -> int -> float = <fun>

    Describes that the name (val) ratio is a function (=<fun>) that takes the first and second arguments of type int, and returns a value of type float.

  • A function can be an argument to another function.  The type-signature of the function that is an argument is wrapped in brackets () to indicate that it is an argument.  E.g
     val sum_if_true : (int -> bool) -> int -> int -> int = <fun>

    Describes that the first argument is a function that takes an integer and returns a boolean, and that the remaining two arguments are integers.

OCaml’s Type Inferencing

OCaml determines the type of an expression using a technique called type inference, by which the type of an expression is inferred from the available type information about the components of that expression.  The available type information includes some rules OCaml has on its operators.  E.g.:

  • (Obviously) operators must be applied to operands of the suitable type(s) and they must return a suitable type.  E.g. + must be applied to int and must return int.
  • OCaml requires that all branches of an if statement have the same type.
  • The condition in an if statement must be bool.

One can annotate a function with type information, e.g.:

let sum_if_true (test : int -> bool) (x : int) (y : int) : int =
  (if test x then x else 0)
  + (if test y then y else 0)

For the function sum_if_true, the first argument is test of type int -> bool = <fun>.  The second argument is x of type int.  The third argument is y of type int.  The function returns a value of type int.

It would seem to me that one should always annotate when defining a function.  It certainly has the advantage of making it clear to the coder and anyone else who will read the code.  You may say: but a comment elsewhere may achieve the same goal while leaving the code shorter.  However, leaving a comment does not pass on the type information to OCaml!  Only human who read the code.  Giving OCaml as much information as possible is important to avoid mistakes.  Let OCaml check as much information for you as possible!  Although putting your type information in a .mli file may be more standard, and has the same effect.  I’ll talk about .mli files in another post.

Inferring Generic Types

Sometimes, there isn’t enough information to fully determine the concrete type of a given value.  OCaml has introduced a type variable ‘a to express that the type is generic.  The leading single quote mark indicates that it’s a type variable.  That is, an argument of type ‘a can be any type!

Within a function type-signature, if OCaml only introduce one type variable, then that variable represents a single type.  That is, that function can only have arguments of the same generic type.  If OCaml introduces more than one variable (e.g. ‘b) in that function, it is implying that there is no requirement that the two generic types have to be the same.  Therefore, OCaml uses parametric polymorphism to represent this type of type information, because it works by parameterizing the type in question with a type variable.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.