Thursday, October 17, 2013

A simple interpreter from scratch in Python

http://www.jayconrod.com/posts/37/a-simple-interpreter-from-scratch-in-python-part-1

Structure of the interpreter

The core of the interpreter is the intermediate representation (IR). This is how we represent IMP programs in memory. Since IMP is such a simple language, the intermediate representation will correspond directly to the syntax of the language; there will be a class for each kind of expression or statement. In a more complicated language, you would want not only a syntactic representation but also a semantic representation which is easier to analyze or execute.
The interpreter will execute in three stages:
  1. Split characters in the source code into tokens
  2. Organize the tokens into an abstract syntax tree (AST). The AST is our intermediate representation.
  3. Evaluate the AST and print the state at the end
The process of splitting characters into tokens is called lexing and is performed by a lexer. Tokens are short, easily digestible strings that contain the most basic parts of the program such as numbers, identifiers, keywords, and operators. The lexer will drop whitespace and comments, since they are ignored by the interpreter.

The process of organizing tokens into an abstract syntax tree (AST) is called parsing. The parser extracts the structure of the program into a form we can evaluate.

The process of actually executing the parsed AST is called evaluation. This is actually the simplest part of the interpreter.
This article will focus solely on the lexer. We will write a generic lexer library, then use it to create a lexer for IMP. The next articles will focus on the parser and the evaluator.

(How to Write a (Lisp) Interpreter (in Python))

http://norvig.com/lispy.html


What A Language Interpreter Does

A language interpreter has two parts:
  1. Parsing: The parsing component takes an input program in the form of a sequence of characters, verifies it according to thesyntactic rules of the language, and translates the program into an internal representation. In a simple interpreter the internal representation is a tree structure that closely mirrors the nested structure of statements or expressions in the program. In a language translator called a compiler the internal representation is a sequence of instructions that can be directly executed by the computer. As Steve Yegge said"If you don't know how compilers work, then you don't know how computers work." Yegge describes 8 scenarios that can be solved with compilers (or equally with interpreters, or alternatively with Yegge's typical heavy dosage of cynicism.) The Lispy parser is implemented with the function parse.
  2. Execution: The internal representation is then processed according to the semantic rules of the language, thereby carrying out the computation. Execution is implemented with the function eval (note this shadows Python's builtin function).
Here is a picture of the interpretation process and an interactive session showing how parse and eval operate on a short program:

>> program = "(begin (define r 3) (* 3.141592653 (* r r)))"
>>> parse(program)
['begin', ['define', 'r', 3], ['*', 3.141592653, ['*', 'r', 'r']]]
>>> eval(parse(program))
28.274333877
We're using here the simplest possible internal representation, one where Scheme lists, numbers, and symbols are represented as Python lists, numbers, and strings, respectively.


Programming Community Index for October 2013

http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html

Position
Oct 2013
Position
Oct 2012
Delta in PositionProgramming LanguageRatings
Oct 2013
Delta
Oct 2012
Status
11C17.246%-2.58%  A
22Java16.107%-1.09%  A
33Objective-C8.992%-0.49%  A
44C++8.664%-0.60%  A
56PHP6.094%+0.43%  A
65C#5.718%-0.81%  A
77(Visual) Basic4.819%-0.30%  A
88Python3.107%-0.79%  A
923Transact-SQL2.621%+2.13%  A
1011JavaScript2.038%+0.78%  A
1118Visual Basic .NET1.933%+1.33%  A
129Perl1.607%-0.52%  A
1310Ruby1.246%-0.56%  A
1414Pascal0.753%-0.09%  A
1517PL/SQL0.730%+0.10%  A
1613Lisp0.725%-0.22%  A
1712Delphi/Object Pascal0.701%-0.40%  A
1853Groovy0.658%+0.53%  B
1919MATLAB0.614%+0.02%  B
2026COBOL0.599%+0.15%  B