On the Economics of Languages

This article concerns languages, both natural e.g. French, Latin and Arabic and artificial e.g. Esperanto, Java, JSON and a language for the first-order theory of arithmetic. The primary economic cost of any language is the cost of acquisition. Natural languages can take many years to master, and the primary cost of programming languages is training programmers. Natural languages tend to be far more difficult to acquire than programming languages, however, programming languages are far more ephemeral. Although it may be deeply cynical to evaluate languages on a purely utilitarian basis, for most people in developed countries, time is, by far, their scarcest resource. Mortality is at the core of the human condition; it is why ancient cultures revered their gods.

Languages, that seek to be a universal means of communication such as Esperanto and Ido, suffer from a "bootstrap" problem, more generally known as the fallacy of composition. The fallacy of composition is where one person assumes that since something is true for something, it is true for every part of it. For instance, the paradox of thrift in Keynesian economics, is that saving is good for the individual, but bad if everybody were to save, since that would prevent spending in the economy. Similarly, for most people, learning Esperanto is uneconomical, unless everybody does it, in which it would be an extremely profitable enterprise. Although it is unlikely that Esperanto will succeed in becoming the universal language, in the coming decades, many languages are set to die out1, as an inevitable consequence of globalisation. This will also lead to greater cultural homogenisation, which is no bad thing. Although it is possible for there to be political integration in a geographical region, whose populace speaks different languages such as India or Switzerland, it is difficult where there is little sense of a shared cultural identity. This is, perhaps, a significant factor inhibiting a political union, and therefore fiscal union, in the EU. Whereas in the United States, most people will unabashedly say that they consider themselves to be first and foremost Americans.

The difficulty of natural languages, as opposed to artificial languages is that they are constantly evolving. This is both in grammar and lexicon. Words such as smartphone did not exist 50 years ago, and 100 years ago the word computer would have referred to a human performing manual calculations. Similarly, it would be almost impossible to describe the entirety of English with all its nuances with a formal grammar, however, it would be possible to describe the general structure with a formal grammar. The idea of defining natural languages with formal grammars, is not new. The ancient grammarian Pāṇini used rewriting rules akin to those in modern generative grammars to describe the Sanskrit language, and was a source of ideas for Noam Chomsky, who played a significant role in the development of theory of formal languages.

Modern programming languages as well as other languages used in computers such as languages for encoding data such as JSON, and markup languages such as HTML are all examples of formal languages. Programmers benefit in that, programming languages can be specified exactly as a set of rewriting rules. There are some exceptions such as C++; due to meta-programming, the language of C++ programs that can be compiled cannot be specified with a formal grammar (due to the infamous Halting Problem)2, however C++ programs with valid syntax can still be specified with a formal grammar. Being formal languages, makes it significantly easier to learn. Due to the primary cost of programming languages being training costs, new programming languages tend to share the syntax of previous ones. C++, Java, JavaScript, C# are all examples of programming languages that use C-like syntax, sharing the same syntax for doing things such as selection and iteration blocks, as well as a semicolon after every statement (although it is usually the case that it is not technically necessary in JavaScript, just conventional). This is also true for other language families, such as with Haskell, Agda and Idris.

Generally speaking because of switching costs, has led firms to use languages such as Java and C, which are already widely used. They want languages, with a proven track record. Being backed by a large company also helps, for instance: Java (Oracle), C# (Microsoft), Go (Google) and Swift (Apple). It has also meant languages such as JavaScript, which many, including I, do not hold in high regard, has become almost the lingua franca of the web programming, and also popular on the server-side. There have been languages, that compile down to JavaScript such as CoffeeScript, but in general these have not been very successful. People do not want to have to learn another language to contribute to an open-source JavaScript library. The exception is TypeScript, which has made some head way. Part of this is that TypeScript is backed by Microsoft and introduces a static typing to the language, lack of which is why I dislike JavaScript. Comparisons of programming languages, often form religious wars, which is not surprising given that learning a language is an investment, and you wouldn't want to think that your investment is wasted. New languages gain popularity, tend to have some form of compatibility with existing languages. For instance, Haskell has an FFI (foreign function interface) with the C language and Clojure and Scala run on the JVM (Java Virtual Machine).

Of course, in programming unlike in the real world, it would be virtually impossible to have one programming language, that is universally. For one, if that were the case, it would have to be an Assembly language, and with the absence of high-level language computer architectures, which so far have not been successful in the free market3. This, however, would still not replace JavaScript unless, of course, you either emulated such code in JavaScript. Even ignoring JavaScript, this still assumes we live in a world with a single processor architecture. The need for different programming languages, is a result of different application domains. Higher-level programs typically rely on garbage collection, whereas the kernel needs a very fine control of where memory is stored. Some programs are not performance-critical, but others, especially games, want to maximise efficiency.

Domain-specific languages is a term that refers to languages with a well-defined highly specific application domain. Often when producing custom software for specific clients, software houses will make domain-specific languages since the cost of learning the new language, is offset by the increase in productivity for the end-users. Other examples of widely-used DSLs are MATLAB and Julia, used for the application domain of scientific programming and SQL used to interact with databases. Some languages allow implementing domain-specific languages within these existing languages, although I tend not to do this, to avoid the cost of learning another domain-specific language.

When it comes to choosing programming languages, I am strongly in favour of typed languages, for a number of reasons. Firstly, statically typed languages, provide a guarantee that your program will not crash at runtime due to type errors. Secondly, due to abstractions such as polymorphism that a strong type system provides. For those new to programming, it is said that what language you learn doesn't matter, however, I think that teaching typed languages as a first language, is a good thing. Given the utilitarian desire to learn a well-used language, it may be difficult to find a well-known imperative typed language that is friendly to beginners. Personally as first languages, I would use Java or Haskell. However, for many who do not have a mathematical background, and who would not enjoy the boilerplate of Java, Python is the go-to beginner's language. I think the most important thing for those who seek to become proficient in Computer Science, is to familiarise themselves with languages of different paradigms. More valuable than knowing specific programming languages, is the ability to adapt to different contexts. One such curriculum, that I would recommend is: low level programming (C), asssembly (x86), object-oriented programming (Java), functional programming (Haskell) and logic programming (Prolog).

  1. Half of the world's languages will disappear by 2100

  2. Is C++ context-free or context-sensitive?

  3. Intel iAPX 432