EC102, Lab 1, Beginning Python

[This is the first in a series of lecture notes for the lab component of the core ‘Macroeconomics I’ course that I teach in the M.A. Economics programme at Ambedkar University, Delhi ]

“Show, don’t tell” is an advice often given to authors. In that spirit, rather than telling you a lot about what programming is and how it will be useful to you, I will try to quickly teach you enough so than you can begin to solve interesting economic problems on your own.

Still, I must say a little about two things. First, the scope of this course. A standard ‘introduction to programming’ course is a full-time course that runs for at least a semester. All we have, on the other hand, is about 12 hours of lab time. So I will not try to teach you programming in a general way. Instead, I will focus on the basic skills that will allow you to combine already existing software to solve numerical and data analysis problems from economics. You will have to study much more in case you find yourself working in other domains, particularly where there isn’t much existing software and more things have to be developed from first principles. I will also not try to teach you any techniques and tools that are needed for developing large software projects in large teams. If you ever find yourself writing a program larger than a hundred lines or in collaboration with other people, you will need more preparation than this course provides.

Secondly, I must explain why we are going to use Python rather than some other programming language. My choice was motivated by three considerations:

Python is free software so won’t have to buy a license to use it on your own machine or in a future workplace.
Python is extremely popular currently in scientific programming and data analysis. It is a skill that will be useful for many possible career paths and there is a lot of existing software and educational material.
It is possible to become productive in Python quickly without getting bogged down in too many arcane technical issues.

On the other hand you should not become a technology fanatic. Other languages like C++, Fortran and Matlab have their own strong points and for each there are many economists using them productively. It is more important to understand the conceptual ideas in computing, instead of getting bogged down in the details of a particular language.

Background

Python language, versions 2 and 3

Python is a programming language: a notation for writing instructions for computers. Like algebraic notation or musical notation, it is a set of rules for combining symbols as well as a way of interpreting sequences of symbols written according to these rules.

Over the years the rules that make up the Python language have been changed from time to time. Different versions of the language are identified by a version number of the form x.y where x is the major version number while y is the minor version number. Significant changes in the language are denoted by a change in the major number while small changes are denoted by a change in the minor number.

Currently both the 2.y series and 3.y series of Python versions are in common use, but all further improvements to the language will now happen in only the 3.y series. We have therefore chosen to use the latter in our course. The differences between the two series is documented here, though this document will make sense only after you learn more about the language.

Fortunately, for most of the work that we do the difference between the two series will not be significant and most code we write will work the same under both the versions of the language.

Python, IPython, Anaconda

One reason why people write instructions for computers is in the hopes of having a computer actually execute them. An interpreter is a piece of software which takes programs written in a particular programming language and executes them on a computer. There are a number of interpreters for the Python language, though the most popular is the CPython interpreter that you can download from the Python language site. This interpreter is free software: you are free to copy, distribute and modify the software.

One way to solve a programming problem is to encode the entire solution in a programming language in one go and then use an interpreter to execute the resulting program. You can do this with Python. But Python also supports an interactive way of developing programs in which you code up little bits of your solution, try them out and then combine them up into the full solution.

IPython is a suite of software that supplements the Python interpreter in various ways to make it easy to develop programs interactively.

Programming in Python is productive because we get to use large packages of ready-made subprograms for different tasks such as linear algebra, graph plotting or statistics. Some of these subprograms are bundled with the Python interpreter itself in what is called the standard library, but many other packages are written by third-parties and the user is responsible for finding and installing what they need. This can become cumbersome very soon. Anaconda is a bundle of software that includes a Python interpreter, IPython as well as a large collection of third-party packages that are useful for scientific computing and data analysis. Installing Anaconda makes all of them available in a single step.

For this course I will assume that you have downloaded and installed Anaconda and are using Microsoft Windows. Please discuss with me if you are using (Mac) OS X or Linux — there will only be minor differences.

Please ensure that you have installed the version of Anaconda that uses Python 3.x. The download page by default offers the version with Python 2.x but a link on that page allows you to choose the 3.x series. (Anaconda and IPython also have version numbers of their own. Do not confuse them with the version of the underlying Python language.)

The IPython REPL

Select Anaconda and then IPython Qtconsole from the Windows start menu (the menu you get from clicking on the Windows icon at the bottom left of your screen). After a short wait a new window should open with text similar to this

Python 3.4.3+ (default, Jun  2 2015, 14:09:35) 
Type "copyright", "credits" or "license" for more information.

IPython 2.3.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
%guiref   -> A brief reference about the graphical user interface.

In [1]:

The first line shows the Python version. We confirm that it is 3.y. The last line In [1] is a prompt which shows that the system is waiting for some input from you. Let’s provide it with some input. Type 2+3 and press Enter. You should get

Out[1]: 5

In [2]:

The IPython Qtconsole operates what is known as a Read-Eval-Print Loop. It reads in some Python code that you provide, evaluates it and then prints out the result. Then it loops back to the read phase.

In this case the code 2+2 we provided was an arithmetic expression. In Python numbers are expressed in the usual ways (2,2.5,-2) and the standard arithmetic operators are available: + and - for addition and subtraction, * and / for multiplication and division, ** for exponentiation and ( and ) for grouping. Evaluation for arithmetic expressions just amounts to carrying out the arithmetic. The line

Out[1]: 5

shows the result of the evaluation. The next line

In [2]:

shows that IPython is once again ready to read some more input. The numbers in brackets after In or Out have a significance which we will discuss in a little while.

Let’s try some more arithmetic:

In [2]: 1+3*2
Out[2]: 7

In [3]: (1+2**(1-1))*3
Out[3]: 6

In [4]: 2/3
Out[4]: 0.6666666666666666

Notice that the usual precedence rules of arithmetic hold. Exponentiation before multiplication or division which are before addition or subtraction. For multiple level of grouping we use multiple pairs of parentheses (). Braces {} and brackets [] have other uses in Python.

If the result you got for 2/3 was 0 then you are using Python version 2 rather than version 3. Internally Python differentiates between integers (numbers without a fractional part) and floating-point numbers (numbers which have a fractional part). In Python 2 the result of dividing one integer by another was the quotient from integer division with the remainder thrown away. In Python 3 it is the floating point result that includes a fractional part. This is one of the major differences between Python 2 and 3 and a source of mysterious errors if you are using Python 2. It’s best if you switch to Python 3. If you really want integer division Python 3 provides a new operator //. So 2//3 evaluates as 0.

Working with the REPL

The read-eval-print loop is endless. To stop type in Ctrl+D, i.e. hold down one of the keys labelled Ctrl and then press D.

IPython stores a history of your inputs. You can use the up and down arrow keys to recall older inputs, edit them and then press enter to resubmit.

IPython has many more features. We will introduce them as we go along, but you can enter %quickref or %guiref at the IPython prompt to see a summary anytime. Commands beginning with the % sign are specific to the IPython software. They are not part of the Python language itself.

Python Basics

Objects and Names

Data in Python is organised in the form of objects. So when you type in 2 at the IPython prompt you are asking Python to build up an object which represents the integer 2. When you type 2+3, Python first creates two objects to represent 2 and 3 respectively and then applies the operation + to them to form a new object. Since + is the arithmetic addition operation the resulting object represents the integer 5. The print part of IPython REPL then prints out this object in a human-readable form.

Each object has a type. 2 and 3 are of the integer type, 2.0 or -3.5 are of the floating point type which is used to represent numbers with potentially fractional part. Sometimes you will also see floating point numbers in the form 3.5e12, which is the way Python writes out 3.5 × 10¹². If you want you can read e as ‘exponent’.

The Python language and standard library deals with objects of myriad types. There are complex numbers, text, files, graphs, sets, vectors and many more. Programmers can also define their own types, so there is actually an endless universe of types.

Types are important because they determine what operations can be applied to objects and how the operations behave. For example, division is an operation which makes sense for numbers but not for sets whereas intersection is an operation which makes sense for sets but not for numbers. To take another example, Python provides the + operations for both numbers and text strings, but it means arithmetic addition for numbers and pasting together for text strings.

So far we have referred to objects by giving an explicit description such as 2. Any objects that Python built for us, such as the result of 2+3 have been printed and then lost to us. But as we go further we will need to keep track of intermediate results and build further on them. Python provides for this by allowing us to bind names to objects using assignment statements. Try out the following in IPython (from now on we will suppress the In and Out prompts):

x = 10+1

You will see no output. What has happened is that the name x has been bound to the object created by the expression 10+1. If at some later point you give the input

x * 5

you will get the output 55 just as if you had typed 11 instead of x.

The spaces in the assignment statements are optional, but putting them in makes your code more readable. Names (called identifiers in programmer’s jargon) can contain uppercase letters, lowercase letters, digits and underscores (_) but must begin with a letter or underscore. x123, TFP, capital_stock, _z are all valid identifiers; 123x or x^2 are not. It is a good practice to use short identifiers like x, t or i for intermediate objects that have no overall significance and longer meaningful names like labour_supply for whose values have meaning in the context of the problem.

If a name which has already been used in an assignment statement is used again, then the old binding of the name is broken and the name is bound to the object produced by the right-hand side of the later assignments. So for example

x = 1
x = 2
2 * x

produces the output 4. What happens if you use a name on both the left- and right-hand side of an assignment?

x = 1
x = x+1
x

The result is 2. The rule is that the right-hand side of an assignment is evaluated first, using whatever bindings exist at that point and then the resulting object is bound to the name on the left.

You must be careful not to confuse assignment statements with algebraic equations. Python will complain if you type

x+1 = 5

since x+1 is not an identifier. An assignment statement does not specify equality between two things. It just binds a name on the left to an object on the right.

Another confusion between assignment statements and algebraic equations may happen if you are used to spreadsheets. Suppose you type

y = 1
x = y + 2
y = 2
x

You will get the output 3 and not 4. In the second line, x = y + 2 evaluates the right-hand side using the bindings that exist at that time and hence binds x to an object representing the integer 3. When y is re-bound to 2 in the third line it does not cause automatic re-evaluation of any previous bindings in which y might have been used. The identifier x remains bound to the object representing 3.

You can type %who at the IPython prompt to see a list of bindings you have defined in that session.

Functions

We have already seen how to build new objects out of old by using arithmetic operators like +. Python has a handful of operators for standard arithmetic and logical operations. The more general way of specifying computations is the use of functions. For example Python has a built in function abs which computes the absolute value of a number. So to find out the absolute value of -2 we type

abs(-2)

which prints the result 2. An expression like abs(-2) is called a function call. We say that we are calling the function abs with the parameter or argument -2. Somewhere the function abs is defined in terms of code which computes the absolute value in terms of more elementary steps. When the interpreter encounters an expression like abs(-2) it starts executing that code using -2 as an input, and the result produced by this code is provided to us as the result of evaluating the function call.

Function calls can used as a part of larger expressions and the parameters can themselves be expressions. In the latter case the parameters are evaluated first and the result is then passed on to the function. Try:

abs(-2) - abs(-3)
abs(7-10)
abs(abs(7-10))

If you have been trying out the examples above you would have found that as soon as you type in abs and the opening parenthesis, IPython displays a hint describing the function and the parameters it takes. You can get the same information by typing the name of the function preceded or followed by a ?, like ?abs or abs?. This is another feature provided by IPython which is not provided by the basic Python interpreter.

Modules

When programs are put together from independently developed components, name clashes emerge as a problem. A linear algebra library and a differential equation library may both provide a solve function. If we try to combine the two libraries in the same program then which of these function does solve refer to?

Python’s solution to this problem is modules. Modules are bundles of names, including names of functions and data objects. For example the Python standard library has a math module which includes trignometric function like sin and cos and constants like pi among other things. The names in this module are not accessible by default to a Python program. So in a new IPython session if you enter

pi

Python will complain that the name pi is not defined. To make names in a module accessible we use what is known as an import statement. This has the form import [module name]. So to import the math module we type in

import math

This loads up the definitions in the math module but makes only the name math accessible in the session. To access names defined in the module we need to type math followed by a dot (the period or full-stop) followed by the name, thus:

math.pi
math.sin(0)
math.sin(math.pi/2)

This solves the name clash problem. If linalg is a hypothetical linear algebra module and ode is a hypothetical differential equations module, both of which define a function solve, we can refer unambigiously to linalg.solve or ode.solve.

A few variants of the import statement are provided for additional convenience. Often modules have long names. Using the keyword as we can give a module a more convenient name. So the statement

import math as m

imports the math module but binds it to the name m so that we can call the sin function with

m.sin(m.pi/2)

Using the from keyword we can import a name into the global namespace so we no longer need to use the module name to refer to it. So after

from math import sin, cos

we can refer to the sin and cos functions from the math module as sin and cos without having to prefix it with math.. Finally

from math import *

imports all names from the math module into the global namespace so that all of them can be referred to without any prefix.

You should avoid using the from variant of the import statement since by adding everything to the common global namespace it defeats the whole purpose of preventing name clashes. Also a module prefix like math. is helpful to human readers of your programs, letting them see at a glance which module a name comes from.