Screencasts for Python programming for economists

This page contains the screencasts for the course Python programming. Note that the screencasts were developed when the course was called "Applied Economic Analysis". The content of that course is the same as for the renamed course Python programming for economists.

Table of Contents

How are the screencasts supposed to help?

We use this notebook for the course. The notebook is fairly self-explanatory, but you may get stuck with some of the questions. The screencasts are meant to help you along if you do not know how to proceed.

For students without a programming background (like most economics students) it can be intimidating or frustrating to start typing code. Reasons include not knowing how to do it perfectly or getting scared when there is an error message. Obviously, this is not necessary nor helpful when trying to learn a programming language. But the upshot is that some students do not try anything, sit back and type the correct answers when given to them. Needless to say, you do not learn how to program in Python by typing in the correct answers given in class.

Hence, the screencasts give you hints on how to finish the notebook but they do not give you step-by-step answers for each question. If you get stuck, ask us during the tutorials.

Suggested use of the screencasts

Just watching the screencasts is not going to be very useful for you. The best way to use them is as follows:

  • first read the relevant section in the notebook and try to solve the questions by yourself
  • if you manage to do so, you are done and there is no need to watch the screencast; if you do not manage to answer all the questions, you have the background for the screencast (which is rather short and does not motivate the underlying economic problem) and you know what issues to look for in the video
  • type along with what we do in the screencast. Pause the video if you need more time to type
  • experiment with the code that we give you in the video either during the video or after watching it; try different numerical values, use different syntax or a different library; if you get an error, google it to see what is going on etc.
  • in general, play around with the Python code; that is the best way to understand and learn the language

When you get an error message, here are some tips on how to deal with such messages. Also note that as the screencast is typed "live", so there will be error messages in the videos as well and you can see how we deal with these as we go along.

Python error messages:

  • start reading the error message at the end; hence scroll down in your notebook to see the last line of the error message. This will usually give a clue of what is happening
  • if you do not know what the last line means, use copy/paste to google it or go to stackoverflow directly
  • reading "upwards" from the last line, you can trace how the error was generated; that is, which parts of your code caused the error.

Prerequisite knowledge

Economics: this introduction to python is part of an MSc degree in Economics. Hence, we assume that you know what utility is and how utility (and producer surplus, if relevant) adds up to welfare. You know how demand and supply curves are derived. What market power is, how a Cournot model is solved. Concepts like asymmetric information, adverse selection and moral hazard ring a bell. You know what risk is and how it can be measured. You know what the World Bank is, as later on we will use their data. You have done introductions to statistics and econometrics. Generally speaking, you understand the economics parts of the notebooks, though not necessarily their Python implementation.

Python: before you start with the notebook and the screencasts you need some basic knowledge of Python. You can use Datacamp, Socratica, or introduction videos to Python, numpy etc. from one of the big python conferences, like SciPy 2019 or PyCon 2019 or more recent events.

Things you should be familiar with is the syntax of Python (e.g. the role of spaces), how to import libraries (difference between import numpy as np, import numpy, from numpy import *), how to define a function, use of datatypes like lists, tuples, dictionaries, numpy arrays (what is the difference between adding lists (using "+") and adding numpy arrays), you understand slicing notation for arrays and lists, the use of masks to select data from an array, you are able to plot a function or data points with matplotlib, you can read data into a dataframe with pandas and then manipulate this data (define new columns/variables).

Libraries we use regularly

  • numpy: basic number crunching and vector manipulation
  • pymc: to generate random numbers and do Bayesian analysis
  • tensorflow: to generate random numbers; here used mainly to "have seen it"; we use it more in a later course on data science
  • scipy: scientific python
  • matplotlib: to make plots

new version of pymc

The new version of pymc is imported as import pymc as pm; no longer as import pymc3 as pm. This has a number of implications:

  • in the video and notebook we use import pymc3 as pm but this will give an error when you run it on the jupyterlab server; hence use pymc instead of pymc3
  • the syntax for drawing samples from a distribution in pymc has changed:
    • pm.Normal.dist(0,1).random(size=10) will given an error
    • now use the following two lines:
x = pm.Normal.dist(0,1)
pm.draw(x, draws=10)

See this blog for more information.

or use the numpy library:

import numpy as np
np.random.normal(0,1,10)

Short introduction to JupyterLab

You can use JupyterLab on the Tilburg University server. But you can also install it locally on your computer using the Anaconda distribution. The basics of the following introduction are the same in both cases (and for google colab).

The video was recorded using an older version of the website; but it will look similar in your year.

The goal of this video is to give you an introduction to JupyterLab; not an introduction to Python. So, do not worry if you do not understand (yet) the Python code that is typed and evaluated.

Topics we cover in this video:

  • using git to "clone" the Python-programming-for-economists repository on JupyterLab;
  • difference between a code cell and a markdown cell in a jupyter notebook;
  • you can use the menu at the top to switch between Markdown and Code;
  • you evaluate both a code cell and a markdown cell by pressing the SHIFT and ENTER keys at the same time;
  • if you want to edit an evaluated markdown cell, go to the cell and press ENTER; or double click on the cell with your mouse;
  • how to create headings (using #) and bullet lists (using *) in markdown;
  • create a link in markdown;
  • how to type math in markdown using \(\LaTeX\) and the delimiters $ $
  • in a code cell, you can type Python code:
    • this can be useful if you want to make notes on your datacamp courses
    • type and evaluate the Python code (from datacamp) and explain what the code does in a markdown cell;
  • if you have long variable or function names, use the TAB key to complete the names;
  • this also works if you want to type functions associated with a library like numpy
    • e.g.type np.ara and then TAB to see the completions;
  • apply the numpy sum function to an array my_list: np.sum(my_list) or my_list.sum()
  • create a plot using matplotlib.pyplot and add labels to the axes, a title to the figure and a legend.
  • when you are finished with a notebook, you can close the file and do not forget to close the kernel as well (see the video on how to do this)
    • if you have too many kernel sessions running, you can run out of memory on the server meaning you cannot evaluate Python code anymore.

Questions you can try before continuing:

  • in a markdown cell, we can create \(\alpha\) by typing \(\LaTeX\) $\alpha$; now in a code cell type \alpha and then the TAB key; this gives you the variable \(\alpha\);
  • typing in a code cell \(\alpha = 5\) gives this variable \(\alpha\) the value 5;
  • plot the function \(f(x)=x^3\) on the interval \([-1,1]\).

If you want to know more about the use of JupyterLab, there are a number of introductions to JupyterLab on the web. Here is one (start video at 9 minutes): https://www.youtube.com/watch?time_continue=152&v=Gzun8PpyBCo&feature=emb_logo

You can also google "jupyterlab introduction" and the documentation can be useful as well.

getting the repository in colab

If for some reason you would like to use the notebook on google colab (e.g. because the university server is temporarily down), the following video shows how to import the Python-programming-for-economists repository into colab.

Topics we cover in the video:

  • go to google colab; then from the menu: File => Upload notebook
  • in pop-up window click on the GitHub tab and copy/paste the web address of the github repository that you would like to upload; in the video this is https://github.com/janboone/applied-economics
    • but this repository has been updated to https://github.com/janboone/Python-programming-for-economists
  • if you want to install a new library in colab, e.g. wbdata, type !pip install wbdata
  • install all libraries such that you can run the cell with import statements without errors

Compare jupyter notebook/lab and emacs

Why am I using emacs

As explained above, one of the worries is that students sit back and copy/paste whatever is done in the videos. To force them to make more of an effort, I do not use jupyter notebooks in the screencasts. Hence, a bit more "mental processing" is needed to follow along. This is also the reason that we do not publish the notebooks from the videos. Students need to type along with the video; not copy/paste from the final file.

Further, Emacs makes it easier to give a presentation in the screencast than jupyter notebooks, e.g. by folding sections that are finished and by giving completion on \(\LaTeX\) snippets.

Jupyter vs Emacs

When you see me use Emacs in the videos, you can spot some differences with JupyterLab:

  • evaluating a code cell with Python in jupyter is done by pressing Shift-Enter (that is, press the Shift and Enter keys at the same time); in Emacs press C-c C-c (that is press Control and C simultaneously two times)
  • to get help on a function, type e.g. np.arange? in a code cell and evaluate the cell
  • to type text in a jupyter notebook, turn a code-cell into a Markdown cell; in Emacs you can simply type text
  • to create a new code cell in jupyter: press "a" (new cell above) or "b" (new cell below) when you do not have a cursor in the current cell (if you do have a cursor, first press the ESC key); in Emacs (and org-mode version >= 9.2) type C-c C-, and a menu will appear of block types (python, ipython, elisp etc.)
  • use the TAB key to complete function, variable etc. names. Works both in jupyter and Emacs
  • to get greek letters in a code block of a jupyter notebook, use the \(\LaTeX\) expression for the greek letter (without the delimiters $ $) and type TAB; e.g.type \alpha and then press TAB
  • add %matplotlib inline after importing matplotlib.pyplot to get the figures in the notebook/file itself

The market

why do we love the market?

Topics we cover in this video:

  • optimal way to allocate a fixed number of products among a set of consumers
  • use of np.arange to generate a vector of numbers
  • np.random.normal and tf.random.normal to generate a vector of random numbers
  • sort a vector of numbers
  • use slicing to select a subset of entries in a vector \(x\), e.g. x[:5]
  • use format to format the output in a print statement
  • sum entries in a vector

Questions you should be able to answer before continuing:

  • what is the welfare maximizing way to allocate \(m\) products among \(n>m\) consumers?
  • how can you calculate the Lagrange multiplier in the optimization problem at the end of the video using python?
  • why do we get an error if we would use tf.random.normal(50,10,2)? hint: use tf.random.normal?

Note that due to a new version of pymc on the server, the correct import statement now is import pymc as pm; using pymc3 will give an error.

market outcome

Topics we cover:

  • define a (demand) function
  • booleans False/True represented as 0/1 and can be summed
  • use of scipy's optimize.fsolve to find the zero of a function (if you want more information about this function, use sp.optimize.fsolve?)
    • another example of the use of fsolve
    • if you are interested, see this video on a comparison of fsolve and root to solve equations
  • use of lambda to create an anonymous function (i.e. function without a name)
  • plot the demand function with plt.plot; use plt.vlines to plot a vertical line
  • instead of slicing, we used a boolean mask to select valuations which exceed the equilibrium price: valuations[valuations>price]

elastic demand and supply

You should be able to do this section in the notebook yourself. If not, then check the videos above once more.

why do others not love the market?

income distribution

Topics we cover in this video:

  • multiply boolean masks (afford and wtp in the video) to generate the AND condition: demand consists of people who are willing to pay price \(p\) for the good AND who can afford to pay \(p\).

Questions you should be able to answer:

  • show –using python– that welfare in the market (welfare_2) is below the max. possible welfare
  • the assignment in the notebook: run the model with the income distribution two times and show that higher income inequality can lead to lower welfare in the market. That is, there is an efficiency argument for income redistribution in a market context.

market power

You should be able to do the market power section on your own. It shows a graph suggesting the monopoly price is lower than the perfect competition price. In this context this is simply wrong. The question is: what is wrong in the python code?

Hence, test parts of the code to understand where things are going wrong. To solve this problem, you may want to look at the function min. That is, evaluate min? and e.g. min(3,8).

merger simulation

We split the merger simulation section into different subsections/videos.

  • Cournot

    Topics we cover in the video:

    • define the reaction function in Python for a simple Cournot model
    • use sp.optimize.fminbound on "minus profits" because Python has no maximization routines
    • use fsolve on the function fixed_point to find the equilibrium outcome (both firms have output equal to the optimal reaction to the other firm's output level)
    • this corresponds to the point where the reaction functions intersect in \((q_1,q_2)\) space

    Questions you should be able to answer:

    • which of the two lines drawn in the video is the reaction function of firm 1?
    • show that the equilibrium outcome for the case where \(c_1=0.1,c_2=0.2\) has \((q_1,q_2)= (0.33333333, 0.23333333)\)
  • Pandas

    Topics we cover in the video:

    • create a 2 dimensional vector with draws from a normal distribution; note that in the notebook you only need a 1 dimensional draw (for the merged firm's cost level)
      • the rows are states of the world, the first column is firm 1's cost level, the second column firm 2's costs
    • create a Pandas dataframe with pd.DataFrame and a dictionary of the form: {'column name': vector with values}
    • define new columns in the dataframe
    • two ways you can refer to a column in a dataframe: e.g. df.Q and df['Q']; note that you cannot use the former if there are spaces in the column name

    Questions you should be able to answer:

    • instead of defining q1,q2 separately, define the vector q as follows and use this vector to create the dataframe df (hint: use q.shape)
    costs = tf.random.normal([50,2],0.2,0.05).numpy()
    q = np.array([sp.optimize.fsolve(lambda x: fixed_point(x,costs[i]),[0,0]) for i in range(50)])
    
  • OPTIONAL: Cournot with variable \(n\)

    This section is optional. If this is your first Python course, skip this section for now (and come back to it later).

    In our previous Cournot model (and in the notebook), we defined the function reaction in such a way that it is specific to the number of firms in the market. This video introduces a function reaction that is more general. It makes the code more readable but also a bit more complicated.

    If you want to take this a step further, look at numpy's vectorize.

    Topic we cover:

    • np.zeros_like(c) for an array c

    Questions you should be able to answer:

    • predict/explain what is printed if you evaluate the following code block:
    i = 3
    mask = np.arange(6) != i
    print(mask)
    print(np.arange(6)[mask])
    print(np.sum(np.arange(6)[mask]))
    

external effects

This section you should be able to do on your own. If not, watch again the video's above.

It provides another reason why markets may not generate max. welfare in the real world.

Asymmetric information

Here we consider two standard forms of asymmetric information: adverse selection and moral hazard.

adverse selection

Topics we cover in the video:

  • drawing samples from a uniform distribution
  • in jupyter notebook/lab you can introduce a greek letter, say \(\rho\), by typing \rho and then the TAB key
  • selecting the last, say 3 elements from a vector \(x\) by slicing: x[-3:]
  • downward sloping supply curve in a perfectly competitive insurance market

Questions you should be able to answer:

  • the assignment in the adverse selection section in the notebook: the effect of income on insurance demand

moral hazard

Topics we cover in the video:

  • the government maximizes welfare over marginal tax rates \(\tau\) while each agent in the economy maximizes work effort for a given \(\tau\)
    • we have an optimization problem "over" optimization problems
  • we use pymc for random draws from a log-normal distribution
  • note that because pymc3 on the server is replaced by the newer version pymc, the new syntax to draw samples from a log-normal distribution is as follows:
γ = pm.draw(pm.Lognormal.dist(mu=0.0,sigma=0.5), draws=50)
γ

Questions you should be able to answer:

  • what will be the optimal tax rate with \(\rho=1\) (you will verify this in the notebook)
  • the assignment in the moral hazard section in the notebook: use Rawls' criterion as welfare function for the government

Financial crisis

why is there a problem in financial markets?

Topics we cover in this video:

  • limited liability
  • relu activation function
  • first mention of "broadcasting" (but no need to understand it)
  • draw two dimensional array (that is a matrix) from a normal distribution
  • we calculate the mean across the rows of a two dimensional array with axis=0; axis=1 calculates the mean across columns

Question you should be able to answer:

  • rewrite the code with the relu function for the case where the firm has equity equal to 10; your plot should be the same as the one in the notebook with equity=10.

why these bonus contracts?

Topics we cover in this video:

  • for the derivation of some results we use Emacs calc; you do not need to know how this works, but you should be able to replicate the derivations
  • if you want to do symbolic math yourself on the computer, you can consider using SymPy in a jupyter notebook but we do not cover SymPy in this course

Questions you should be able to answer:

  • the video covers the second subsection of "why these bonus contracts?" in the notebook; you should be able to cover the first subsection "moral hazard"
  • the second subsection "moral hazard and adverse selection" defines the function profit in a different way; you should be able to follow what it does and plot the probability of the average outcome (\(q\)) against the top trader's outside option.
  • as the outside option for the top trader (high type) increases, why does the bank not increase \(w\) and \(b\) in such a way that \(R=w/b\) remains constant? Then risk taking by the top trader would be unaffected. Why would this be (too) expensive for the bank?

Using Python for empirical research

API's to get data

Topics we cover in this video:

  • use of API wbdata to access World Bank's databases
  • use of wb.search_indicators to find indicators on a certain topic
  • create a dictionary of indicators and column names and then download these data into a pandas' dataframe
  • we do this for two sets of indicators and then use pd.merge to merge these dataframes

Questions you should be able to answer:

  • use pd.merge? to find out what how and suffixes can do in a pd.merge statement
  • use the wbdata documentation to find the different themes on which the World bank has data

hacker statistics

high school puzzles

Topics we cover in this video:

  • program a statistical problem with coin throws in python
  • repeat this 10,000 times to see what the properties are of such an experiment

Questions you should be able to answer:

  • program the experiment with a dark cupboard containing 6 red socks and 14 blue socks. You randomly draw 2 socks (without replacement) from the cupboard. What is the probability that you draw two matching socks from the cupboard? Note that this you can calculate yourself, so you can check whether your code gives the right answer: \(\frac{6}{20} \frac{5}{19} + \frac{14}{20} \frac{13}{19}\)
  • solve this section in the notebook: which of the two experiments lasts longer on average? Why?

statistics

Topics we cover in this video:

  • distribution and standard deviation of a sample mean
  • simulating a hypothesis test

Note that in the notebook we use samples = pm.Normal.dist(mu,sd).random(size=(number_of_samples,n)) in a code block. Due to the new version of pymc this code should now be:

mu = 1000
sd = 100
number_of_samples=250

def moments(n):
    samples = pm.draw(pm.Normal.dist(mu=1000,sigma=100,shape=(number_of_samples,n)))
    mus = samples.mean(axis=1)
    std = mus.std()
    return [mus,std]

Questions you should be able to answer:

  • what is the "statistical name" for np.std(mus) in the video?
  • what is the idea behind the hypothesis test in the video?
  • finish the statistics section in the notebook

Regulation in healthcare markets

In this lecture we analyze Dutch healthcare data from Vektis using pandas, matplotlib and pymc.

getting the data ready

Topics we cover in this video:

  • importing data directly from the web using urllib;
  • how to read in a csv file as a dataframe using pd.read_csv();
  • how to rename columns in a pandas' dataframe using a dictionary;
  • how create a new column/variable which equals the sum of a list of columns;
  • using axis to specify whether an action is supposed to be applied across rows (axis=0) or columns (axis=1);
  • using df.replace() to replace values in a dataframe, again using a dictionary;
  • change the type of a column/variable using .astype();
    • depending on the versions of the packages you are using df['age'].astype(int) can give an error
    • if this happens, use df['age'].astype("Float64").astype(int) instead;
  • using numpy functions on columns;
  • using a groupby to aggregate data to the level specified by the list of columns in the groupby
    • in the video we aggregate to age/gender categories, aggregating over postal-code observations.

Questions you should be able to answer:

  • what is the average expenditure (not log expenditure) across ages and postal codes for men and women?
  • what is the average expenditure per age category (across gender and postal codes)?

plot and model

This screencast is not meant as an introduction into Bayesian analysis and pymc. It just shows that this can be done in Python. The notebook gives a bit more detail and some references in case the video "wets your appetite".

Topics we cover in this video:

  • plot average log expenditure across age for both women and men;
  • create a simple Bayesian model of age-fixed effects using pymc;
  • sample from the posterior distribution of the model;
  • plot predicted and observed expenditures for women against age.

To avoid errors, the definition of the function get_data_into_shape should be changed. In particular, the line df['age'] = pd.to_numeric(df['age']).astype(int) is different from what is in the notebook.

def get_data_into_shape(df):
    df['health_expenditure_under_deductible'] = df[cost_categories_under_deductible].sum(axis=1)
    df = df.rename({
        'GESLACHT':'sex',
        'LEEFTIJDSKLASSE':'age',
        'AANTAL_BSN':'number_citizens',
        'KOSTEN_MEDISCH_SPECIALISTISCHE_ZORG':'hospital_care',
        'KOSTEN_FARMACIE':'pharmaceuticals',
        'KOSTEN_TWEEDELIJNS_GGZ':'mental_care',
        'KOSTEN_HUISARTS_INSCHRIJFTARIEF':'GP_capitation',
        'KOSTEN_HUISARTS_CONSULT':'GP_fee_for_service',
        'KOSTEN_HUISARTS_OVERIG':'GP_other',
        'KOSTEN_MONDZORG':'dental care',
        'KOSTEN_PARAMEDISCHE_ZORG_FYSIOTHERAPIE':'physiotherapy',
        'KOSTEN_KRAAMZORG':'maternity_care',
        'KOSTEN_VERLOSKUNDIGE_ZORG':'obstetrics'
    }, axis='columns')
    df.drop(['AANTAL_VERZEKERDEJAREN',
             'KOSTEN_HULPMIDDELEN',
             'KOSTEN_PARAMEDISCHE_ZORG_OVERIG',
             'KOSTEN_ZIEKENVERVOER_ZITTEND',
             'KOSTEN_ZIEKENVERVOER_LIGGEND',
             'KOSTEN_GRENSOVERSCHRIJDENDE_ZORG',
             'KOSTEN_OVERIG',
             'KOSTEN_EERSTELIJNS_ONDERSTEUNING'],inplace=True,axis=1)
    df.drop(df.index[[0]], inplace=True)
    df['sex'] = df['sex'].astype('category')
    df['age'] = df['age'].astype('category')
    df['costs_per_head']=df['health_expenditure_under_deductible']/df['number_citizens']
    df['log_costs_per_head']=np.log(1+df['health_expenditure_under_deductible']/df['number_citizens'])
    df = df[(df['age'] != '90+')]
    df['age'] = pd.to_numeric(df['age']).astype(int) # this line needs to be adjusted compared to the definition in the notebook
    return df

The use of pandas' query method then becomes:

df_2014.query('sex=="M" & age==30')['log_costs_per_head'].hist(bins=50)

Questions you should be able to answer:

  • plot average expenditure (averaged across gender) against age;
  • do the pymc model for men instead of women.

upload data on the university server

In the screencast above we downloaded the data directly using urllib.request.retrieve(). The screencast below shows how to this "by hand" to get the data in university server environment.

Topics we cover in this video:

  • download data from a website to a local folder on your computer;
  • upload from this folder to the university server;
  • create a new folder data on the university server.

Questions you should be able to answer:

  • upload the data set for 2011 on the university server.

Author: Jan Boone