Getting Eurostat data
Table of Contents
1. Introduction
This file describes how we get the data for our project on Dynamic effects of health insurance reform.
In this file we explain how we get the data and from where and at the same time we generate the data. By generating the data in this way, a change in the data by Eurostat will lead to (slightly) different results.
2. Preamble code
Run this code first before running the code of particular data sets. It imports the relevant libraries.
import numpy as np import pandas as pd from country_codes import oecd_dictionary, eurostat_dictionary import eurostat import xarray as xr import matplotlib.pyplot as plt start_year = 2008 end_year = 2018 year_range = np.arange(start_year,end_year+1) EU_countries = ['Belgium', 'Bulgaria', 'Czechia', 'Denmark', 'Germany (until 1990 former territory of the FRG)', 'Germany','Estonia', 'Ireland', 'Greece', 'Spain', 'France', 'Croatia', 'Italy', 'Cyprus', 'Latvia', 'Lithuania', 'Luxembourg', 'Hungary', 'Malta', 'Netherlands', 'Austria', 'Poland', 'Portugal', 'Romania', 'Slovenia', 'Slovakia', 'Finland', 'Sweden', 'United Kingdom', 'Iceland', 'Liechtenstein', 'Norway', 'Switzerland', 'Bosnia and Herzegovina']
We first describe how we get the data for each variable. Each of these variables is saved in a separate dataset. The final section contains the code that merges the data for these variables into one (big) dataset.
3. Data
3.1. population
df = eurostat.get_data_df('demo_r_d2jan') df.rename({'geo\\TIME_PERIOD':'geo'},inplace=True,axis=1) age_range = ['Y1', 'Y10', 'Y11', 'Y12', 'Y13', 'Y14', 'Y15', 'Y16', 'Y17', 'Y18', 'Y19', 'Y2', 'Y20', 'Y21', 'Y22', 'Y23', 'Y24', 'Y25', 'Y26', 'Y27', 'Y28', 'Y29', 'Y3', 'Y30', 'Y31', 'Y32', 'Y33', 'Y34', 'Y35', 'Y36', 'Y37', 'Y38', 'Y39', 'Y4', 'Y40', 'Y41', 'Y42', 'Y43', 'Y44', 'Y45', 'Y46', 'Y47', 'Y48', 'Y49', 'Y5', 'Y50', 'Y51', 'Y52', 'Y53', 'Y54', 'Y55', 'Y56', 'Y57', 'Y58', 'Y59', 'Y6', 'Y60', 'Y61', 'Y62', 'Y63', 'Y64', 'Y65', 'Y66', 'Y67', 'Y68', 'Y69', 'Y7', 'Y70', 'Y71', 'Y72', 'Y73', 'Y74', 'Y75', 'Y76', 'Y77', 'Y78', 'Y79', 'Y8', 'Y80', 'Y81', 'Y82', 'Y83', 'Y84', 'Y85', 'Y86', 'Y87', 'Y88', 'Y89', 'Y9', 'Y90', 'Y91', 'Y92', 'Y93', 'Y94', 'Y95', 'Y96', 'Y97', 'Y98', 'Y99'] df['country_code'] = [x[:2] for x in df['geo']] df['country'] = df['country_code'].replace(eurostat_dictionary) df = df[df.country.isin(EU_countries) & \ (df.sex.isin(["M","F"])) & (df.age.isin(age_range))] df.drop(['country_code'],axis=1,inplace=True) df = pd.melt(df,id_vars=['geo','country','sex','age'], value_vars=[str(y) for y in year_range], var_name='year',value_name='population') df['nuts'] = [len(x)-2 for x in df.geo] df = df.query('nuts == 2') df.drop('nuts',inplace=True, axis=1) df.rename({'geo':'nuts2', 'age':'age string'},\ axis=1,inplace=True) df['age'] = [float(x.replace('Y','')) for x in df['age string']] df.drop('age string',inplace=True, axis=1) #df.set_index(['nuts2','year','sex','age'],inplace=True) df.head()
nuts2 country sex year population age 2 AT11 Austria F 2008 1127.0 1.0 3 AT12 Austria F 2008 7280.0 1.0 4 AT13 Austria F 2008 8398.0 1.0 6 AT21 Austria F 2008 2397.0 1.0 7 AT22 Austria F 2008 5167.0 1.0
3.2. OOP per country
We want a measure of “how generous” a healthcare system is in terms of out-of-pocket (oop) payments. This feature of the healthcare system tends to be defined at the country level. We use household oop payment, HF3
, and PC_CHE
as unit of measurement.
abbreviation | name |
---|---|
HF2 | Voluntary health care payment schemes |
HF3 | Household out-of-pocket payment |
PC_CHE | Percentual share of total current health expenditure (CHE) |
PPS_HAB | Purchasing power standard (PPS) per inhabitant |
df_o = eurostat.get_data_df('hlth_sha11_hf') df_o.rename({'geo\\TIME_PERIOD':'geo'},inplace=True,axis=1) df_o['country_code'] = [x[:2] for x in df_o['geo']] df_o['country'] = df_o['country_code'].replace(eurostat_dictionary) df_o = df_o[df_o.country.isin(EU_countries) & \ (df_o.icha11_hf.isin(["HF2", "HF3"])) &\ (df_o.unit.isin([ 'PC_CHE','PPS_HAB']))] df_o.drop(["geo"],axis=1,inplace=True) df_o = pd.melt(df_o,id_vars=['country','country_code',\ 'unit','icha11_hf'], value_vars=[str(y) for y in year_range], var_name='year',\ value_name='health_expenditure') df_o.set_index(['country','country_code',\ 'year','unit','icha11_hf'],inplace=True) df_o = df_o.unstack() df_o.columns = df_o.columns.droplevel() df_o = df_o.unstack() df_o.columns = df_o.columns.droplevel() df_o.columns = ['HF2_PC_CHE','HF2_PPS_HAB',\ 'HF3_PC_CHE','HF3_PPS_HAB'] df_o.reset_index(inplace=True) df_o.set_index(['country','year'],inplace=True) df_o.tail()
country_code HF2_PC_CHE HF2_PPS_HAB HF3_PC_CHE \ country year United Kingdom 2014 UK 5.40 149.18 15.16 2015 UK 5.40 151.39 15.23 2016 UK 5.04 142.30 15.37 2017 UK 5.41 155.88 15.72 2018 UK 5.62 165.77 15.88 HF3_PPS_HAB country year United Kingdom 2014 419.03 2015 427.46 2016 434.36 2017 453.26 2018 468.46
# df = df.merge(df_c, how = 'left',\ # on = ['country','year','sex','age_category']) df = df.merge(df_o, how = 'left',\ on = ['country','year']) df.set_index(['nuts2','year','sex','age'],inplace=True) df.tail()
country population country_code HF2_PC_CHE \ nuts2 year sex age UKM6 2018 M 99.0 United Kingdom 11.0 UK 5.62 UKM7 2018 M 99.0 United Kingdom 40.0 UK 5.62 UKM8 2018 M 99.0 United Kingdom 21.0 UK 5.62 UKM9 2018 M 99.0 United Kingdom 17.0 UK 5.62 UKN0 2018 M 99.0 United Kingdom 33.0 UK 5.62 HF2_PPS_HAB HF3_PC_CHE HF3_PPS_HAB nuts2 year sex age UKM6 2018 M 99.0 165.77 15.88 468.46 UKM7 2018 M 99.0 165.77 15.88 468.46 UKM8 2018 M 99.0 165.77 15.88 468.46 UKM9 2018 M 99.0 165.77 15.88 468.46 UKN0 2018 M 99.0 165.77 15.88 468.46
ds = df.to_xarray() ds.to_netcdf("./data/population_nuts_2.nc") ds
<xarray.Dataset> Size: 38MB Dimensions: (nuts2: 313, year: 11, sex: 2, age: 99) Coordinates: * nuts2 (nuts2) object 3kB 'AT11' 'AT12' 'AT13' ... 'UKM9' 'UKN0' * year (year) object 88B '2008' '2009' '2010' ... '2017' '2018' * sex (sex) object 16B 'F' 'M' * age (age) float64 792B 1.0 2.0 3.0 4.0 5.0 ... 96.0 97.0 98.0 99.0 Data variables: country (nuts2, year, sex, age) object 5MB 'Austria' ... 'United Ki... population (nuts2, year, sex, age) float64 5MB 1.127e+03 ... 33.0 country_code (nuts2, year, sex, age) object 5MB 'AT' 'AT' ... 'UK' 'UK' HF2_PC_CHE (nuts2, year, sex, age) float64 5MB 6.8 6.8 6.8 ... 5.62 5.62 HF2_PPS_HAB (nuts2, year, sex, age) float64 5MB 212.5 212.5 ... 165.8 HF3_PC_CHE (nuts2, year, sex, age) float64 5MB 18.24 18.24 ... 15.88 HF3_PPS_HAB (nuts2, year, sex, age) float64 5MB 570.5 570.5 ... 468.5
3.3. deaths
df = eurostat.get_data_df('demo_r_magec') df.rename({'geo\\TIME_PERIOD':'geo'},inplace=True,axis=1) age_range = ['Y1', 'Y10', 'Y11', 'Y12', 'Y13', 'Y14', 'Y15', 'Y16', 'Y17', 'Y18', 'Y19', 'Y2', 'Y20', 'Y21', 'Y22', 'Y23', 'Y24', 'Y25', 'Y26', 'Y27', 'Y28', 'Y29', 'Y3', 'Y30', 'Y31', 'Y32', 'Y33', 'Y34', 'Y35', 'Y36', 'Y37', 'Y38', 'Y39', 'Y4', 'Y40', 'Y41', 'Y42', 'Y43', 'Y44', 'Y45', 'Y46', 'Y47', 'Y48', 'Y49', 'Y5', 'Y50', 'Y51', 'Y52', 'Y53', 'Y54', 'Y55', 'Y56', 'Y57', 'Y58', 'Y59', 'Y6', 'Y60', 'Y61', 'Y62', 'Y63', 'Y64', 'Y65', 'Y66', 'Y67', 'Y68', 'Y69', 'Y7', 'Y70', 'Y71', 'Y72', 'Y73', 'Y74', 'Y75', 'Y76', 'Y77', 'Y78', 'Y79', 'Y8', 'Y80', 'Y81', 'Y82', 'Y83', 'Y84', 'Y85', 'Y86', 'Y87', 'Y88', 'Y89', 'Y9', 'Y90', 'Y91', 'Y92', 'Y93', 'Y94', 'Y95', 'Y96', 'Y97', 'Y98', 'Y99'] df['country_code'] = [x[:2] for x in df['geo']] df['country'] = df['country_code'].replace(eurostat_dictionary) df = df[df.country.isin(EU_countries) & \ (df.sex.isin(["M","F"])) & (df.age.isin(age_range))] df.drop(['country','country_code'],axis=1,inplace=True) df = pd.melt(df,id_vars=['geo','sex','age'], value_vars=[str(y) for y in year_range], var_name='year',value_name='deaths') df['nuts'] = [len(x)-2 for x in df.geo] df = df.query('nuts == 2') df.drop('nuts',inplace=True, axis=1) df.rename({'geo':'nuts2', 'age':'age string'},\ axis=1,inplace=True) df['age'] = [float(x.replace('Y','')) for x in df['age string']] df.drop('age string',inplace=True, axis=1) df.set_index(['nuts2','year','sex','age'],inplace=True) df.head()
deaths nuts2 year sex age AT11 2008 F 1.0 1.0 AT12 2008 F 1.0 3.0 AT13 2008 F 1.0 3.0 AT21 2008 F 1.0 0.0 AT22 2008 F 1.0 2.0
ds = df.to_xarray() ds.to_netcdf("./data/deaths_nuts_2.nc") ds
<xarray.Dataset> Size: 5MB Dimensions: (nuts2: 313, year: 11, sex: 2, age: 99) Coordinates: * nuts2 (nuts2) object 3kB 'AT11' 'AT12' 'AT13' ... 'UKM8' 'UKM9' 'UKN0' * year (year) object 88B '2008' '2009' '2010' ... '2016' '2017' '2018' * sex (sex) object 16B 'F' 'M' * age (age) float64 792B 1.0 2.0 3.0 4.0 5.0 ... 95.0 96.0 97.0 98.0 99.0 Data variables: deaths (nuts2, year, sex, age) float64 5MB 1.0 0.0 0.0 ... 48.0 26.0 18.0
3.4. Poverty measures
Note that –when I accessed the data– People at risk of poverty or social exclusion by NUTS 2 region (tgs00107)
is only available from 2013 onwards:
- link to the data on at risk of poverty: https://ec.europa.eu/eurostat/databrowser/view/tgs00103/default/table?lang=en
- at risk of poverty or social exclusion: https://ec.europa.eu/eurostat/databrowser/view/tgs00107/default/table?lang=en&category=t_ilc.t_ilc_pe
Therefore we use the severe material deprivation measure.
Severe material deprivation rate by NUTS 2 regions [TGS00104]
- link to the data: https://ec.europa.eu/eurostat/databrowser/view/tgs00104/default/table?lang=en
- data only starts at 2009
df = eurostat.get_data_df('tgs00104') df.rename({'geo\\TIME_PERIOD':'nuts2'},inplace=True,axis=1) df = df[(df.unit == "PC")] df.drop(["unit"],axis=1,inplace=True) df['nuts'] = [len(x)-2 for x in df.nuts2] df = df[df.nuts == 2] df.drop('nuts',axis=1,inplace=True) df = pd.melt(df,id_vars=['nuts2'], value_vars=[str(y) for y in np.arange(start_year+1,end_year+1)], var_name='year',\ value_name='percentage_material_deprivation') df.set_index(['nuts2','year'],inplace=True) df.head()
percentage_material_deprivation nuts2 year AL01 2009 NaN AL02 2009 NaN AL03 2009 NaN AT11 2009 NaN AT12 2009 NaN
ds_material_deprivation = df.to_xarray() ds_material_deprivation.to_netcdf("./data/material_deprivation_nuts_2.nc") ds_material_deprivation
<xarray.Dataset> Size: 16kB Dimensions: (nuts2: 179, year: 10) Coordinates: * nuts2 (nuts2) object 1kB 'AL01' 'AL02' ... 'SK04' * year (year) object 80B '2009' '2010' ... '2018' Data variables: percentage_material_deprivation (nuts2, year) float64 14kB nan nan ... 10.0
3.5. unmet needs
Self-reported unmet needs for medical examination by main reason declared and NUTS 2 regions
- link to the data: https://ec.europa.eu/eurostat/databrowser/view/hlth_silc_08_r/default/table?lang=en
- https://ec.europa.eu/eurostat/documents/1012329/6070906/Personal+data+-+health+PH.pdf/7edfec40-462a-4cb2-a5fc-8d8845000ca4
value | |
---|---|
TOOEXP | Too expensive |
TOOFAR | Too far to travel |
NOTIME | No time |
NOKNOW | Didn’t know any good doctor or specialist |
WAITING | Waiting list |
FEAR | Fear of doctor, hospital, examination or treatment |
HOPING | Wanted to wait and see if problem got better on its own |
OTH | Other reason |
df = eurostat.get_data_df('hlth_silc_08_r') df.rename({'geo\\TIME_PERIOD':'nuts2'},inplace=True,axis=1) df.drop(["unit"],axis=1,inplace=True) df['nuts'] = [len(x)-2 for x in df.nuts2] df = df[df.nuts == 2] df.drop('nuts',axis=1,inplace=True) columns = ['FEAR','TOOEXP','TOOFAR','WAITING',\ 'HOPING', 'NOTIME', 'NOKNOW','OTH'] df=df[df.reason.isin(columns)] df = pd.melt(df,id_vars=['nuts2','reason'], value_vars=[str(y) for y in year_range], var_name='year',\ value_name='unmet needs') df.set_index(['nuts2','year','reason'],inplace=True) df = df.unstack() df.columns = df.columns.droplevel() df['UNMET'] = df[columns].sum(axis=1) df.drop(['FEAR','TOOFAR','WAITING',\ 'HOPING', 'NOTIME', 'NOKNOW','OTH'],axis=1,inplace=True) df.tail()
reason TOOEXP UNMET nuts2 year SK04 2014 1.1 6.7 2015 0.4 6.6 2016 0.5 8.1 2017 0.2 5.9 2018 0.4 6.5
ds_unmet_needs = df.to_xarray() ds_unmet_needs.to_netcdf("./data/unmet_needs_nuts_2.nc") ds_unmet_needs
<xarray.Dataset> Size: 31kB Dimensions: (nuts2: 168, year: 11) Coordinates: * nuts2 (nuts2) object 1kB 'AL01' 'AL02' 'AL03' ... 'SK02' 'SK03' 'SK04' * year (year) object 88B '2008' '2009' '2010' ... '2016' '2017' '2018' Data variables: TOOEXP (nuts2, year) float64 15kB nan nan nan nan nan ... 0.4 0.5 0.2 0.4 UNMET (nuts2, year) float64 15kB 0.0 0.0 0.0 0.0 0.0 ... 6.6 8.1 5.9 6.5
3.6. infant mortality
- https://ec.europa.eu/eurostat/databrowser/view/DEMO_R_MINFIND/default/table?lang=en&category=demo.demomreg
- https://ec.europa.eu/health/indicators/docs/echi_11.pdf
- The number of deaths of infants (younger than one year of age at death) per 1000 live births (based on one year data).
df = eurostat.get_data_df('demo_r_minfind') df.rename({'geo\\TIME_PERIOD':'nuts2'},inplace=True,axis=1) df.drop(["unit"],axis=1,inplace=True) df['nuts'] = [len(x)-2 for x in df.nuts2] df = df[df.nuts == 2] df.drop('nuts',axis=1,inplace=True) df = pd.melt(df,id_vars=['nuts2'], value_vars=[str(y) for y in year_range], var_name='year',\ value_name='infant mortality') df.set_index(['nuts2','year'],inplace=True) df.head()
infant mortality nuts2 year AL01 2008 NaN AL02 2008 NaN AL03 2008 NaN AT11 2008 3.2 AT12 2008 4.7
ds_infant_mortality = df.to_xarray() ds_infant_mortality.to_netcdf("./data/infant_mortality_nuts_2.nc") ds_infant_mortality
<xarray.Dataset> Size: 33kB Dimensions: (nuts2: 347, year: 11) Coordinates: * nuts2 (nuts2) object 3kB 'AL01' 'AL02' 'AL03' ... 'UKM9' 'UKN0' * year (year) object 88B '2008' '2009' '2010' ... '2017' '2018' Data variables: infant mortality (nuts2, year) float64 31kB nan nan nan nan ... 4.5 3.8 4.2
3.7. Number of physicians
- link to the data: https://ec.europa.eu/eurostat/databrowser/view/hlth_rs_physreg/default/table?lang=en&category=hlth.hlth_care.hlth_res.hlth_staff
df = eurostat.get_data_df('hlth_rs_physreg') df.rename({'geo\\TIME_PERIOD':'nuts2'},inplace=True,axis=1) df = df[(df.unit == "P_HTHAB")] df.drop(["unit"],axis=1,inplace=True) df['nuts'] = [len(x)-2 for x in df.nuts2] df = df[df.nuts == 2] df.drop('nuts',axis=1,inplace=True) df = pd.melt(df,id_vars=['nuts2'], value_vars=[str(y) for y in year_range], var_name='year',\ value_name='number_physicians per inhabitant') df.set_index(['nuts2','year'],inplace=True) df.head()
number_physicians per inhabitant nuts2 year AT11 2008 328.16 AT12 2008 411.20 AT13 2008 643.68 AT21 2008 398.51 AT22 2008 441.60
ds_number_physicians = df.to_xarray() ds_number_physicians.to_netcdf("./data/number_physicians_nuts_2.nc") ds_number_physicians
<xarray.Dataset> Size: 24kB Dimensions: (nuts2: 247, year: 11) Coordinates: * nuts2 (nuts2) object 2kB 'AT11' ... 'TRC3' * year (year) object 88B '2008' '2009' ... '2018' Data variables: number_physicians per inhabitant (nuts2, year) float64 22kB 328.2 ... 109.8
4. Merging the data
ds_P = xr.open_dataset('./data/population_nuts_2.nc') ds_P['year'] = (ds_P['year'].values).astype('int') ds_M = xr.open_dataset('./data/deaths_nuts_2.nc') ds_M['year'] = (ds_M['year'].values).astype('int') ds_infant = xr.open_dataset("./data/infant_mortality_nuts_2.nc") ds_infant['year'] = (ds_infant['year'].values).astype('int') ds_physician = xr.open_dataset("./data/number_physicians_nuts_2.nc") ds_physician['year'] = (ds_physician['year'].values).astype('int') ds_deprivation = xr.open_dataset("./data/material_deprivation_nuts_2.nc") ds_deprivation['year'] = (ds_deprivation['year'].values).astype('int') ds_unmet = xr.open_dataset("./data/unmet_needs_nuts_2.nc") ds_unmet['year'] = (ds_unmet['year'].values).astype('int') ds = xr.merge([ds_P.drop_vars(["HF2_PC_CHE","HF2_PPS_HAB","HF3_PPS_HAB"]),ds_M,ds_deprivation,ds_unmet,ds_infant,ds_physician],join='left') ds
<xarray.Dataset> Size: 60MB Dimensions: (nuts2: 313, year: 11, sex: 2, age: 99) Coordinates: * nuts2 (nuts2) <U4 5kB 'AT11' 'AT12' ... 'UKN0' * year (year) int64 88B 2008 2009 ... 2017 2018 * sex (sex) <U1 8B 'F' 'M' * age (age) float64 792B 1.0 2.0 ... 98.0 99.0 Data variables: country (nuts2, year, sex, age) <U14 38MB ... population (nuts2, year, sex, age) float64 5MB ... country_code (nuts2, year, sex, age) <U2 5MB ... HF3_PC_CHE (nuts2, year, sex, age) float64 5MB ... deaths (nuts2, year, sex, age) float64 5MB ... percentage_material_deprivation (nuts2, year) float64 28kB nan nan ... nan TOOEXP (nuts2, year) float64 28kB nan nan ... nan UNMET (nuts2, year) float64 28kB nan nan ... nan infant mortality (nuts2, year) float64 28kB 3.2 0.9 ... 4.2 number_physicians per inhabitant (nuts2, year) float64 28kB 328.2 ... nan
ds["country"] = ds["country"].isel(year=0,sex=0,age=0) ds = ds.assign_coords(country=("nuts2", ds.country.values)) ds = ds.transpose("age","sex","year","nuts2") ds
<xarray.Dataset> Size: 22MB Dimensions: (nuts2: 313, year: 11, sex: 2, age: 99) Coordinates: * nuts2 (nuts2) <U4 5kB 'AT11' 'AT12' ... 'UKN0' * year (year) int64 88B 2008 2009 ... 2017 2018 * sex (sex) <U1 8B 'F' 'M' * age (age) float64 792B 1.0 2.0 ... 98.0 99.0 country (nuts2) <U14 18kB 'Austria' ... 'United... Data variables: population (age, sex, year, nuts2) float64 5MB ... country_code (age, sex, year, nuts2) <U2 5MB ... HF3_PC_CHE (age, sex, year, nuts2) float64 5MB ... deaths (age, sex, year, nuts2) float64 5MB ... percentage_material_deprivation (year, nuts2) float64 28kB nan nan ... nan TOOEXP (year, nuts2) float64 28kB nan nan ... nan UNMET (year, nuts2) float64 28kB nan nan ... nan infant mortality (year, nuts2) float64 28kB 3.2 4.7 ... 4.2 number_physicians per inhabitant (year, nuts2) float64 28kB 328.2 ... nan
ds.to_netcdf("./data/oop_health_data.nc")
5. Data extensions
5.1. transport accidents
import pandas as pd import eurostat df = eurostat.get_data_df("tran_r_acci") df = df.rename( columns={ "geo\\TIME_PERIOD": "geo", } ) df['country_code'] = [x[:2] for x in df['geo']] df['country'] = df['country_code'].replace(eurostat_dictionary) df = df[df.country.isin(EU_countries) & \ (df.victim.isin(["KIL"])) & (df.unit == 'P_MHAB')] # df.drop(['country_code'],axis=1,inplace=True) df = pd.melt(df,id_vars=['geo','country','country_code'], value_vars=[str(y) for y in year_range], var_name='year',value_name='killed_per_million_inhabitants') df['nuts'] = [len(x)-2 for x in df.geo] df = df.query('nuts == 2') df.drop(['nuts', 'country'],inplace=True, axis=1) df.rename({'geo':'nuts2'},inplace=True,axis=1) # df.set_index(['nuts2','year'],inplace=True) print(df.head()) # ds_traffic_mortality = df.to_xarray() # ds_traffic_mortality.to_netcdf("./data/extensions/traffic_mortality_nuts_2.nc") # ds_traffic_mortality
nuts2 country_code year killed_per_million_inhabitants 2 AT11 AT 2008 103.0 3 AT12 AT 2008 109.0 4 AT13 AT 2008 16.0 6 AT21 AT 2008 100.0 7 AT22 AT 2008 91.0
df[df.nuts2=="NO04"]
nuts2 country_code year killed_per_million_inhabitants 311 NO04 NO 2008 57.0 751 NO04 NO 2009 42.0 1191 NO04 NO 2010 38.0 1631 NO04 NO 2011 26.0 2071 NO04 NO 2012 36.0 2511 NO04 NO 2013 39.0 2951 NO04 NO 2014 24.0 3391 NO04 NO 2015 26.0 3831 NO04 NO 2016 34.0 4271 NO04 NO 2017 17.0 4711 NO04 NO 2018 24.0
5.2. country wide devices for medical imaging
- https://ec.europa.eu/eurostat/databrowser/view/hlth_rs_medim/default/table?lang=en
- at country level, not NUTS 2 level
- unit: per 100k inhabitants
import pandas as pd import eurostat df_2 = eurostat.get_data_df("hlth_rs_medim") df_2 = df_2.rename( columns={ "geo\\TIME_PERIOD": "geo", } ) df_2['country_code'] = [x[:2] for x in df_2['geo']] df_2['country'] = df_2['country_code'].replace(eurostat_dictionary) df_2 = df_2[df_2.country.isin(EU_countries) &\ (df_2.unit == 'P_HTHAB') & (df_2.icha_hp=='HP1_3')] df_2.drop(['geo'],axis=1,inplace=True) df_2 = pd.melt(df_2,id_vars=['country_code','country','facility'], value_vars=[str(y) for y in year_range], var_name='year',value_name='number_devices') # df_2.drop(['country'],inplace=True, axis=1) df_2.set_index(['country_code','country','year','facility'],inplace=True) df_2 = df_2.unstack() df_2.reset_index(inplace=True) df_2.columns = [' '.join(col).strip() for col in df_2.columns.values] col = ['number_devices CT_SCAN', 'number_devices GAMMA', 'number_devices MAMMO', 'number_devices MRI', 'number_devices PET_SCAN', 'number_devices RAD'] df_2['number_devices'] = df_2[col].sum(axis=1) df_2.drop(col,inplace=True,axis=1) # df_2.set_index(['country','year'],inplace=True) print(df_2.head()) # ds_devices = df.to_xarray() # ds_devices.to_netcdf("./data/extensions/devices_country.nc") # ds_devices
country_code country year number_devices 0 AT Austria 2008 6.68 1 AT Austria 2009 6.70 2 AT Austria 2010 6.78 3 AT Austria 2011 6.70 4 AT Austria 2012 6.79
df = df.merge(df_2, how='left',on=['country_code','year'])
df.drop(["country_code","country"],axis=1,inplace=True) df.set_index(['nuts2','year'],inplace=True) df.head()
killed_per_million_inhabitants number_devices nuts2 year AT11 2008 103.0 6.68 AT12 2008 109.0 6.68 AT13 2008 16.0 6.68 AT21 2008 100.0 6.68 AT22 2008 91.0 6.68
df.loc[["NO01","NO02","NO04","NO05"]]
killed_per_million_inhabitants number_devices nuts2 year NO01 2008 22.0 0.00 2009 26.0 0.00 2010 20.0 0.00 2011 11.0 0.00 2012 17.0 0.00 2013 14.0 0.00 2014 17.0 0.00 2015 11.0 0.00 2016 13.0 4.11 2017 10.0 4.53 2018 9.0 5.44 NO02 2008 91.0 0.00 2009 64.0 0.00 2010 80.0 0.00 2011 77.0 0.00 2012 53.0 0.00 2013 47.0 0.00 2014 42.0 0.00 2015 44.0 0.00 2016 42.0 4.11 2017 26.0 4.53 2018 31.0 5.44 NO04 2008 57.0 0.00 2009 42.0 0.00 2010 38.0 0.00 2011 26.0 0.00 2012 36.0 0.00 2013 39.0 0.00 2014 24.0 0.00 2015 26.0 0.00 2016 34.0 4.11 2017 17.0 4.53 2018 24.0 5.44 NO05 2008 69.0 0.00 2009 56.0 0.00 2010 39.0 0.00 2011 37.0 0.00 2012 29.0 0.00 2013 46.0 0.00 2014 22.0 0.00 2015 16.0 0.00 2016 18.0 4.11 2017 26.0 4.53 2018 23.0 5.44
print(df.head()) ds_accidents_devices = df.to_xarray() ds_accidents_devices.to_netcdf("./data/extensions/traffic_mortality_nuts_2.nc") # ds_accidents_devices
killed_per_million_inhabitants number_devices nuts2 year AT11 2008 103.0 6.68 AT12 2008 109.0 6.68 AT13 2008 16.0 6.68 AT21 2008 100.0 6.68 AT22 2008 91.0 6.68
5.3. GDP per capita
import pandas as pd import eurostat df = eurostat.get_data_df("nama_10r_2gdp") df = df.rename( columns={ "geo\\TIME_PERIOD": "geo", } ) df['country_code'] = [x[:2] for x in df['geo']] df['country'] = df['country_code'].replace(eurostat_dictionary) df = df[df.country.isin(EU_countries) & \ (df.unit == 'PPS_EU27_2020_HAB')] # Purchasing power standard (PPS, EU27 from 2020), per inhabitant df.drop(['country_code'],axis=1,inplace=True) df = pd.melt(df,id_vars=['geo','country'], value_vars=[str(y) for y in year_range], var_name='year',value_name='GDP_per_capita_PPS') df['nuts'] = [len(x)-2 for x in df.geo] df = df.query('nuts == 2') df.drop(['nuts', 'country'],inplace=True, axis=1) df.rename({'geo':'nuts2'},inplace=True,axis=1) df.set_index(['nuts2','year'],inplace=True) print(df.head()) ds_GDP_per_capita = df.to_xarray() ds_GDP_per_capita.to_netcdf("./data/extensions/GDP_per_capita_nuts_2.nc") ds_GDP_per_capita
GDP_per_capita_PPS nuts2 year AT11 2008 20100.0 AT12 2008 26100.0 AT13 2008 41000.0 AT21 2008 27700.0 AT22 2008 28600.0 <xarray.Dataset> Size: 25kB Dimensions: (nuts2: 257, year: 11) Coordinates: * nuts2 (nuts2) object 2kB 'AT11' 'AT12' ... 'SK03' 'SK04' * year (year) object 88B '2008' '2009' '2010' ... '2017' '2018' Data variables: GDP_per_capita_PPS (nuts2, year) float64 23kB 2.01e+04 ... 1.54e+04
5.4. unemployment
import pandas as pd import eurostat df = eurostat.get_data_df("lfst_r_lfur2gan") df = df.rename( columns={ "geo\\TIME_PERIOD": "geo", } ) df['country_code'] = [x[:2] for x in df['geo']] df['country'] = df['country_code'].replace(eurostat_dictionary) df = df[df.country.isin(EU_countries) & (df.citizen=='TOTAL') &\ (df.unit == 'PC') & (df.sex=='T') & (df.age=='Y15-74')] df.drop(['country_code'],axis=1,inplace=True) df = pd.melt(df,id_vars=['geo','country'], value_vars=[str(y) for y in year_range], var_name='year',value_name='unemployment') df['nuts'] = [len(x)-2 for x in df.geo] df = df.query('nuts == 2') df.drop(['nuts', 'country'],inplace=True, axis=1) df.rename({'geo':'nuts2'},inplace=True,axis=1) df.set_index(['nuts2','year'],inplace=True) print(df.head()) ds_unemployment = df.to_xarray() ds_unemployment.to_netcdf("./data/extensions/unemployment_nuts_2.nc") ds_unemployment
unemployment nuts2 year AT11 2008 4.2 AT12 2008 3.7 AT13 2008 7.3 AT21 2008 3.6 AT22 2008 3.8 <xarray.Dataset> Size: 31kB Dimensions: (nuts2: 319, year: 11) Coordinates: * nuts2 (nuts2) object 3kB 'AT11' 'AT12' 'AT13' ... 'UKM9' 'UKN0' * year (year) object 88B '2008' '2009' '2010' ... '2017' '2018' Data variables: unemployment (nuts2, year) float64 28kB 4.2 5.0 4.0 3.8 ... 6.1 5.7 4.6 3.6
5.5. education
- https://ec.europa.eu/eurostat/databrowser/view/tgs00109/default/table?lang=en&category=t_reg.t_reg_educ
- data starts from 2013 onward (not 2008)
import pandas as pd import eurostat df = eurostat.get_data_df("tgs00109") df = df.rename( columns={ "geo\\TIME_PERIOD": "geo", } ) df['country_code'] = [x[:2] for x in df['geo']] df['country'] = df['country_code'].replace(eurostat_dictionary) df = df[df.country.isin(EU_countries) &\ (df.unit == 'PC') & (df.sex=='T')] df.drop(['country_code'],axis=1,inplace=True) df = pd.melt(df,id_vars=['geo','country'], value_vars=[str(y) for y in np.arange(2013,2019,1)], var_name='year',value_name='tertiary_education') df['nuts'] = [len(x)-2 for x in df.geo] df = df.query('nuts == 2') df.drop(['nuts', 'country'],inplace=True, axis=1) df.rename({'geo':'nuts2'},inplace=True,axis=1) df.set_index(['nuts2','year'],inplace=True) print(df.head()) ds_education = df.to_xarray() ds_education.to_netcdf("./data/extensions/education_nuts_2.nc") ds_education
tertiary_education nuts2 year AT11 2013 15.8 AT12 2013 17.6 AT13 2013 30.6 AT21 2013 17.3 AT22 2013 17.5 <xarray.Dataset> Size: 17kB Dimensions: (nuts2: 308, year: 6) Coordinates: * nuts2 (nuts2) object 2kB 'AT11' 'AT12' ... 'UKM9' 'UKN0' * year (year) object 48B '2013' '2014' '2015' ... '2017' '2018' Data variables: tertiary_education (nuts2, year) float64 15kB 15.8 25.7 28.5 ... 35.6 37.2
5.6. merging into extended dataset
ds_G = xr.open_dataset('./data/extensions/GDP_per_capita_nuts_2.nc') ds_E = xr.open_dataset('./data/extensions/education_nuts_2.nc') ds_T = xr.open_dataset('./data/extensions/traffic_mortality_nuts_2.nc') ds_U = xr.open_dataset('./data/extensions/unemployment_nuts_2.nc')
ds_extended = xr.merge([ds_T,ds_G,ds_E,ds_U],join='left') ds_extended['year'] = (ds_extended['year'].values).astype('int') ds_extended = ds_extended.transpose("year","nuts2") # ds_extended['number_devices'] = ds_extended['number_devices'].isel(country=0) # ds_extended['country_code'] = ds_extended['country_code'].isel(nuts2=0) # ds_extended['killed_per_million_inhabitants'] = ds_extended['killed_per_million_inhabitants'].isel(country=0) ds_extended
<xarray.Dataset> Size: 137kB Dimensions: (nuts2: 300, year: 11) Coordinates: * nuts2 (nuts2) <U4 5kB 'AT11' 'AT12' ... 'UKN0' * year (year) int64 88B 2008 2009 ... 2017 2018 Data variables: killed_per_million_inhabitants (year, nuts2) float64 26kB ... number_devices (year, nuts2) float64 26kB ... GDP_per_capita_PPS (year, nuts2) float64 26kB 2.01e+04 ... nan tertiary_education (year, nuts2) float64 26kB nan nan ... 37.2 unemployment (year, nuts2) float64 26kB 4.2 3.7 ... 3.6
ds_extended.to_netcdf("./data/extensions/extended_data.nc")