Seminar Data Science for Economics
Table of Contents
Introduction
this lecture
- in this lecture, we look at:
- motivation to learn datascience
- first assignment
motivation
- in this course we teach you
- what is datascience?
- difference with econometrics
- prediction vs causality
- give policy advice based on "unstructured data"
- to use python, pymc and tensorflow
things to learn
- when you finish this course, you can
- obtain data
- clean data (in a reproducible way)
- do data-project management
- work with high dimensional data (tensors)
- simulate your own estimation techniques
- use a (simple) neural network
- use cross validation
finishing the course
- gives you a better understanding
- how statistics/econometrics work
- how to do a Bayesian analysis
- what datascience and AI is
- what the advantages and limits are of datascience techniques
What do you need?
Software
- you will be working with jupyter lab
- if you also want to install things on your computer, use the anaconda distribution. We also use github but for this you do not need to install anything
Rules of the game
to learn python we use datacamp and lectures:
- you need to finish the datacamp courses in time
- see the Lecture schedule for the deadlines
- first try the notebooks yourself, then watch the video if you get stuck
- you do not learn datascience by passively typing in code!
- participate actively by posting github issues for your questions
- these will be discussed in the tutorials
screencasts
- note that the screencasts were made with an older version of
pymc
(calledpymc3
) pymc
syntax is different frompymc3
on a small number of points- these differences are pointed out in the notebook (part 2) and on the screencast website
- the same is true for the eurostat API:
- columns have now different names compared to when the videos were made;
- data itself is different
to get a grade for this seminar:
- finish Assignment 1 before the deadline
- Final Assignment
- check the deadlines for the final assignment