<< Home

Dmitri Nosovicki

Why GOG?

GOG allows to write concise and powerful programs that are easy to understand. As you will see, GOG expressions can be packed with meaning. But, perhaps, more important is that GOG allows closer investingation of alternative programming styles, which is absolutely required for people-centric experiments with programming languages. GOG provides:

  • Arbitrary precision, rational, and complex numbers
  • JiT compilatioon & GC
  • First class functions with tacit definition syntax
  • Polynotational semantics
  • Syntactic macros
  • Lexemes
  • Objects
  • Threads
  • Hash tables
  • MySQL support
  • HTTP authorization and transaction server
  • Support for external libraries and system calls
  • etc., etc., etc.

Polynotational Semantics

The idea of polynotational semantics is based on an observation that consecutive nature of written media limits the ways in which we can combine semantic units: variables can either follow the operator, or precede it, or be on both sides of it. This reduces possible number of basic notations, of which in widespread use are 3:

NotationPatternNatural Domain
Prefixoperator var1 var2 var3 ...func(arg1, arg2, arg3)
Infixvar1 operator1 var2 operator2 var3 ...x = 2 + 2 * 3
Postfixvar operator1 operator2 operator3 ...ls | rev | tac | split
*NB. When we consider infix notation, we must bear in mind that its real-world variations vary. One is Infix Notation with Precedence Rules (IPR), the other is Polish / Reverse Polish Notation (PN). To reduce parentheses, IPR uses precedence rules, while PN uses argument accumulation. from those, GOG implements only IPR because of its much wider spread.

Each notation has its strengths, and GOG allows any of them. Importantly, GOG understands what you mean without additional syntactic clues. For example:

λ + 1 2 3
		6
		λ 1 + 2 + 3
		6
		λ 1234 string len sqrt
		2
		

This reduces language syntax to just alphanumeric keys and (), and you can write any program with just these. However on top of this minimalistic base GOG builds features that do utilize spared keyboard range, such as lexical rules. Lexical rules mean that, similarly to natural languages, GOG assigns meanings to words taking in account their lexical structure. For example, you create (lambda (uniqueVar) (map myFunc uniqueVar)) from myFunc just by writing @myFunc, where @ is 'map' lexeme.

Lexemes

Lexemes modify meaning of functions. Similar semantic units are called adverbs in tacit programming, Technically, Lexemes are just higher-order functions that are promoted to a part of the language. Because of that you can use lexemes with any expression that evaluates to a function, including lambdas and subexpressions that return functions, for example: &[X + 2], $(select-function function-list).

Let's look at how lexemes influence function behavior.

λ x = '((1 2 3) (4 5 6) (7 8 9))
		        ((1 2 3) (4 5 6) (7 8 9))

		λ $+ x
		(1 2 3 4 5 6 7 8 9)  ;; Flatten

		λ @$+ x
		(6 15 24) ;; Sum rows

		λ $@+ x
		(12 15 18) ;; Sum columns

		λ $@&+ x
		((1 4 7) (2 5 8) (3 6 9)) ;; Transpose

@ = map; $ = apply; & = collect

* Note that in the last example + works as identity function.

Tacit function definition

Instead of

(lambda (a b) (a + b))

you can write

[A + B]

To distinguish lambda arguments from other names you start them with capital letters. You can use any notation inside lambdas, as well as nest lambda functions infinitely:

Nested tacit definition

[map [*Elem / Num] List]

GOG understands as:

(lambda (List Num) (map (lambda (Elem) (Elem / Num)) List))

Asterisk at the start of Elem informs that it is a parameter of the inner lambda. The deeper inside resides your argument, the more asterisks you add. Note that List goes before Num in automatically created argument list. It is because auto-guessed lambda parameters are always assumed to follow alphabetic order.

Example Programs

Sum square difference

Find the difference between the sum of the squares of the first one hundred natural numbers and the square of their sum, i. e. (1 + 2 + ... 100)^2 - (1^2 + 2^2 + ... 100^2)

Prefix Notation:

λ (= x (range 1 100))
		λ (- (** ($+ x) 2) ($+ (@**_2 x)))

Infix Notation:

λ x = 1 range 100
		λ ($+ x) ** 2 - ($+ (@**_2 x))

Postfix Notation:

λ 1 range_100 !x @**_2 $+ [x $+ **_2 -_X]

Explanation of the last line: Postfix notation passes single value from one function to another, so every token must evaluate to a function of one argument. Let's trace this expression: 1 is the initial value. range_100 is a modified function range, to which by syntax _ was attached argument 100. Function xx_yy equals to [xx X yy], i. e. (lambda (x) (xx x yy)). As you've guessed, range returns a list of increasing integers. ! is a lexeme that takes any name and makes a function that saves its argument to a variable with that name. So !x assigns obtained list to variable x, which we will use later. @**_2 looks complex, but it is just a modified exponentiation function **. **_2 means "square", while @ is the "mapping" lexeme. It makes **_2 map on our range list, squaring all its values. $ is a "reduce" lexeme. $+ reduces list by summation. Finally, the last subexpression is a lambda. [ and ] denote a lambda function. GOG automatically guesses which variables inside a lambda are parameters by a very simple rule: parameters start with a capital letter.

Let's take a closer look at the last function. It starts with [ and ends with ]. Inside, there are 3 functions: $+ (reduce), **_2 (square) and -_X (a function that subtracts variable X from its parameter. X is an argument of the lambda function, because it starts with a capital letter. x is the variable which was saved earlier. This lambda reduces x with summation, squares the result, and subtracts its argument from it.

Relation to ARC and LISP:

In contrast to plenty of languages that allow to define words, there are few languages that allow to define syntax. Lisp is one of those few. Lisp's macros are a mechanism to extend any program with more or less custom syntax. But macros are ineffective at capturing small repetitive patterns. To illustrate, a call to macro takes 3+ characters, while better syntax is the one that saves one or two characters. That is why great macros can't fully undo ineffective core syntax. Lisp core syntax is notoriously limited. It features prefix notation -- a choice to simplify processing, not reading or writing. It can easily be proven that prefix notation is suboptimal for a number of programming idioms. But same can be proven for any other notation as well. Of course, the problem is not a particular notation, but that it is impossible to switch notations at will. In case of Lisp, the arbitrary choice in favor of one simple notation versus all others taints this otherwise perfectly symmetric language. GOG is an attempt at fixing that disproportion.

GOG is based on ARC, which is a brilliantly clear, powerful and very practical functional language. As a result, GOG has all data types and features of Arc: macros, symbols, etc. You could safely say that most heavy lifting for GOG was made by its predecessors, and that GOG is just a continuation of Arc's ideas. Basically, GOG extends Arc with polynotational semantics, adds multivariable nested lamdas, and introduces a lexeme-based algebra for list operations. Furthermore, the initial idea was just to add polynotational semantics to arc as a patch. It worked, but soon appeared other features that were incompatible.

Backward compatibility

Gog is mostly backward compatible with the arc, in sense that traditional arc expressions are recognized as prefix notation, except the following cases:

  • You can not use a procedure or macro object as a data-call argument. It would classify as some unknown notation.
  • Same refers to specifying more than one argument during data call (arc tables support second optional argument).

Execution speed

I did no optimization whatsoever, as I wanted to keep this highly experimental code as straightforward as possible. That said, the overhead is generally insignificant. First attempt to determine notation is made at compile time. That effectively minimizes run-time overhead, especially for prefix notation. Thus, during arc loading, 10839 of 10318 examined expressions (~95%) recognise as prefix notation at compile time. The rest requires run-time notation discovery, which should result in slight overhead for traditional expressions, and yet somewhat more substantial one for infix and postfix notations. Nevertheless, I was not able to notice any overhead using simple benchmarks. This topic might require further investigation, but, as far as I can see, the execution overhead is low.

Precedence rules

It would be a shame to introduce infix notation to a Lisp without precedence rules. GOG uses same precedence as C language does. New functions get precedence by association. Precedence order may be added or changed dynamically for any primitive at any time. See ac.scm comments for details, including on how to adjust the precedence order of any function.