Getting Eurostat data
Table of Contents
1. Introduction
This file describes how we get the data for our project on the health effects of demand side cost sharing.
In this file we explain how we get the data and from where and at the same time we generate the data. By generating the data in this way, a change in the data by Eurostat will lead to (slightly) different results.
- Link to excel file with overview: http://ec.europa.eu/eurostat/documents/140934/4381644/Regional-statistics-overview.xlsx
The dataset itself is too big for github and can be found on DataverseNL.
2. Preamble code
Run this code first before running the code of particular data sets. It imports the relevant libraries.
import numpy as np import pandas as pd from country_codes import oecd_dictionary, eurostat_dictionary import eurostat import xarray as xr import matplotlib.pyplot as plt EU_countries = ['Belgium', 'Bulgaria', 'Czechia', 'Denmark', 'Germany (until 1990 former territory of the FRG)', 'Germany','Estonia', 'Ireland', 'Greece', 'Spain', 'France', 'Croatia', 'Italy', 'Cyprus', 'Latvia', 'Lithuania', 'Luxembourg', 'Hungary', 'Malta', 'Netherlands', 'Austria', 'Poland', 'Portugal', 'Romania', 'Slovenia', 'Slovakia', 'Finland', 'Sweden', 'United Kingdom', 'Iceland', 'Liechtenstein', 'Norway', 'Switzerland', 'Bosnia and Herzegovina']
We first describe how we get the data for each variable. Each of these variables is saved in a separate dataset. The final section contains the code that merges the data for these variables into one (big) dataset.
3. Data
First, we fetch the variables at the country level from the Eurostat website. Then we move to the NUTS 2 regional data.
3.1. country data
3.1.1. OOP per country
We want a measure of "how generous" a healthcare system is in terms of oop-payments. This feature of the healthcare system tends to be defined at the country level. We use household oop payment and expenditure on voluntary payment schemes (for things that are not covered by basic or mandatory insurance). The paper uses mainly PC_CHE
as unit of measurement.
abbreviation | name |
---|---|
HF2 | Voluntary health care payment schemes |
HF3 | Household out-of-pocket payment |
PC_CHE | Percentual share of total current health expenditure (CHE) |
PPS_HAB | Purchasing power standard (PPS) per inhabitant |
df = eurostat.get_data_df('hlth_sha11_hf') df.rename({'geo\\time':'geo'},inplace=True,axis=1) df['country_code'] = [x[:2] for x in df['geo']] df['country'] = df['country_code'].replace(eurostat_dictionary) df = df[df.country.isin(EU_countries) & \ (df.icha11_hf.isin(["HF2", "HF3"])) &\ (df.unit.isin([ 'PC_CHE','PPS_HAB']))] df.drop(["geo"],axis=1,inplace=True) df = pd.melt(df,id_vars=['country','country_code',\ 'unit','icha11_hf'], value_vars=np.arange(1992,2019), var_name='year',\ value_name='health_expenditure') df.set_index(['country','country_code',\ 'year','unit','icha11_hf'],inplace=True) df = df.unstack() df.columns = df.columns.droplevel() df = df.unstack() df.columns = df.columns.droplevel() df.columns = ['HF2_PC_CHE','HF2_PPS_HAB',\ 'HF3_PC_CHE','HF3_PPS_HAB'] df.reset_index(inplace=True) df.set_index(['country','country_code','year'],inplace=True) df.tail()
HF2_PC_CHE | HF2_PPS_HAB | HF3_PC_CHE | HF3_PPS_HAB | |||
---|---|---|---|---|---|---|
country | country_code | year | ||||
United Kingdom | UK | 2014 | 5.40 | 155.48 | 15.16 | 436.73 |
2015 | 5.40 | 161.07 | 15.23 | 454.78 | ||
2016 | 5.04 | 148.46 | 15.37 | 453.16 | ||
2017 | 5.41 | 160.78 | 15.72 | 467.53 | ||
2018 | 5.62 | 172.06 | 15.88 | 486.25 |
ds = df.to_xarray() ds.to_netcdf("./data/country/out_of_pocket.nc") ds
<xarray.Dataset> Dimensions: (country: 33, country_code: 33, year: 27) Coordinates: * country (country) object 'Austria' 'Belgium' ... 'United Kingdom' * country_code (country_code) object 'AT' 'BA' 'BE' 'BG' ... 'SI' 'SK' 'UK' * year (year) int64 1992 1993 1994 1995 1996 ... 2015 2016 2017 2018 Data variables: HF2_PC_CHE (country, country_code, year) float64 nan nan ... 5.41 5.62 HF2_PPS_HAB (country, country_code, year) float64 nan nan ... 160.8 172.1 HF3_PC_CHE (country, country_code, year) float64 nan nan ... 15.72 15.88 HF3_PPS_HAB (country, country_code, year) float64 nan nan ... 467.5 486.2
- country: 33
- country_code: 33
- year: 27
- country(country)object'Austria' ... 'United Kingdom'
array(['Austria', 'Belgium', 'Bosnia and Herzegovina', 'Bulgaria', 'Croatia', 'Cyprus', 'Czechia', 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Iceland', 'Ireland', 'Italy', 'Latvia', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'United Kingdom'], dtype=object)
- country_code(country_code)object'AT' 'BA' 'BE' ... 'SI' 'SK' 'UK'
array(['AT', 'BA', 'BE', 'BG', 'CH', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL', 'ES', 'FI', 'FR', 'HR', 'HU', 'IE', 'IS', 'IT', 'LI', 'LT', 'LU', 'LV', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK', 'UK'], dtype=object)
- year(year)int641992 1993 1994 ... 2016 2017 2018
array([1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018])
- HF2_PC_CHE(country, country_code, year)float64nan nan nan nan ... 5.04 5.41 5.62
array([[[ nan, nan, nan, ..., 6.8 , 6.83, 6.91], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 4.74, 4.86, 4.9 ], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 0.3 , 0.37, 0.4 ], [ nan, nan, nan, ..., nan, nan, nan], ..., ... ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 5.04, 5.41, 5.62]]])
- HF2_PPS_HAB(country, country_code, year)float64nan nan nan ... 148.5 160.8 172.1
array([[[ nan, nan, nan, ..., 260.74, 263.61, 274.53], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 175.08, 180.84, 187.76], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 2.54, 3.08, 3.48], [ nan, nan, nan, ..., nan, nan, nan], ..., ... ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 148.46, 160.78, 172.06]]])
- HF3_PC_CHE(country, country_code, year)float64nan nan nan ... 15.37 15.72 15.88
array([[[ nan, nan, nan, ..., 19.24, 19.15, 18.43], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 18.09, 17.91, 17.81], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 28.61, 29.12, 29.31], [ nan, nan, nan, ..., nan, nan, nan], ..., ... ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 15.37, 15.72, 15.88]]])
- HF3_PPS_HAB(country, country_code, year)float64nan nan nan ... 453.2 467.5 486.2
array([[[ nan, nan, nan, ..., 738.36, 739.37, 732.45], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 667.95, 665.96, 682.09], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 242.78, 242.47, 256.89], [ nan, nan, nan, ..., nan, nan, nan], ..., ... ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 453.16, 467.53, 486.25]]])
3.1.2. healthcare expenditure
abbreviation | variable name |
---|---|
TOT_HC | Current health care expenditure (CHE) |
EUR_HAB | Euro per inhabitant |
df = eurostat.get_data_df('hlth_sha11_hc') df.rename({'geo\\time':'geo'},inplace=True,axis=1) df['country_code'] = [x[:2] for x in df['geo']] df['country'] = df['country_code'].replace(eurostat_dictionary) df = df[df.country.isin(EU_countries) & \ (df.icha11_hc.isin(["TOT_HC"])) & \ (df.unit.isin(['EUR_HAB']))] df.drop(["geo"],axis=1,inplace=True) df = pd.melt(df,id_vars=['country','country_code',\ 'unit','icha11_hc'], value_vars=np.arange(1992,2020), var_name='year',\ value_name='health_expenditure') df.set_index(['country','country_code','year',\ 'unit','icha11_hc'],inplace=True) df = df.unstack() df.columns = df.columns.droplevel() df = df.unstack() df.columns = df.columns.droplevel() df.reset_index(inplace=True) df.rename({'EUR_HAB':'health expenditure per capita'},\ axis=1,inplace=True) df.set_index(['country','country_code','year'],inplace=True) df.head()
unit | health expenditure per capita | ||
---|---|---|---|
country | country_code | year | |
Austria | AT | 1992 | NaN |
1993 | NaN | ||
1994 | NaN | ||
1995 | NaN | ||
1996 | NaN |
ds = df.to_xarray() ds.to_netcdf("./data/country/healthcare_expenditure.nc") ds
<xarray.Dataset> Dimensions: (country: 33, country_code: 33, year: 28) Coordinates: * country (country) object 'Austria' ... 'United Kin... * country_code (country_code) object 'AT' 'BA' ... 'SK' 'UK' * year (year) int64 1992 1993 1994 ... 2018 2019 Data variables: health expenditure per capita (country, country_code, year) float64 nan ...
- country: 33
- country_code: 33
- year: 28
- country(country)object'Austria' ... 'United Kingdom'
array(['Austria', 'Belgium', 'Bosnia and Herzegovina', 'Bulgaria', 'Croatia', 'Cyprus', 'Czechia', 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Iceland', 'Ireland', 'Italy', 'Latvia', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'United Kingdom'], dtype=object)
- country_code(country_code)object'AT' 'BA' 'BE' ... 'SI' 'SK' 'UK'
array(['AT', 'BA', 'BE', 'BG', 'CH', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL', 'ES', 'FI', 'FR', 'HR', 'HU', 'IE', 'IS', 'IT', 'LI', 'LT', 'LU', 'LV', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK', 'UK'], dtype=object)
- year(year)int641992 1993 1994 ... 2017 2018 2019
array([1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019])
- health expenditure per capita(country, country_code, year)float64nan nan nan ... 3.608e+03 3.839e+03
array([[[ nan, nan, nan, ..., 4359.69, 4510.08, 4689.27], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 4206.56, 4343.57, 4438.82], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 409.09, 434.88, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., ... ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 3503.62, 3607.59, 3838.53]]])
3.1.3. self-perceived health
See section 3.2.1.1 below where we merge these data on age with mortality.
3.2. nuts 2
- overview of regional data: https://ec.europa.eu/eurostat/web/regions/data/database
3.2.1. crude number of deaths and population by age and sex and self-perceived health
df = eurostat.get_data_df('demo_r_d2jan') df.rename({'geo\\time':'geo'},inplace=True,axis=1) age_range = ['Y1', 'Y10', 'Y11', 'Y12', 'Y13', 'Y14', 'Y15', 'Y16', 'Y17', 'Y18', 'Y19', 'Y2', 'Y20', 'Y21', 'Y22', 'Y23', 'Y24', 'Y25', 'Y26', 'Y27', 'Y28', 'Y29', 'Y3', 'Y30', 'Y31', 'Y32', 'Y33', 'Y34', 'Y35', 'Y36', 'Y37', 'Y38', 'Y39', 'Y4', 'Y40', 'Y41', 'Y42', 'Y43', 'Y44', 'Y45', 'Y46', 'Y47', 'Y48', 'Y49', 'Y5', 'Y50', 'Y51', 'Y52', 'Y53', 'Y54', 'Y55', 'Y56', 'Y57', 'Y58', 'Y59', 'Y6', 'Y60', 'Y61', 'Y62', 'Y63', 'Y64', 'Y65', 'Y66', 'Y67', 'Y68', 'Y69', 'Y7', 'Y70', 'Y71', 'Y72', 'Y73', 'Y74', 'Y75', 'Y76', 'Y77', 'Y78', 'Y79', 'Y8', 'Y80', 'Y81', 'Y82', 'Y83', 'Y84', 'Y85', 'Y86', 'Y87', 'Y88', 'Y89', 'Y9', 'Y90', 'Y91', 'Y92', 'Y93', 'Y94', 'Y95', 'Y96', 'Y97', 'Y98', 'Y99'] df['country_code'] = [x[:2] for x in df['geo']] df['country'] = df['country_code'].replace(eurostat_dictionary) df = df[df.country.isin(EU_countries) & \ (df.sex.isin(["M","F"])) & (df.age.isin(age_range))] df.drop(['country_code'],axis=1,inplace=True) df = pd.melt(df,id_vars=['geo','country','sex','age'], value_vars=np.arange(1990,2021), var_name='year',value_name='population') df['nuts'] = [len(x)-2 for x in df.geo] df = df.query('nuts == 2') df.drop('nuts',inplace=True, axis=1) df.rename({'geo':'nuts2', 'age':'age string'},\ axis=1,inplace=True) df['age'] = [float(x.replace('Y','')) for x in df['age string']] df.drop('age string',inplace=True, axis=1) #df.set_index(['nuts2','year','sex','age'],inplace=True) df.head()
nuts2 | country | sex | year | population | age | |
---|---|---|---|---|---|---|
2 | AT11 | Austria | F | 1990 | 1345.0 | 1.0 |
3 | AT12 | Austria | F | 1990 | 8038.0 | 1.0 |
4 | AT13 | Austria | F | 1990 | 7240.0 | 1.0 |
6 | AT21 | Austria | F | 1990 | 3042.0 | 1.0 |
7 | AT22 | Austria | F | 1990 | 6491.0 | 1.0 |
@np.vectorize def age_to_age_category(age): if age < 16: cat = 'Y0-15' elif age < 25: cat = 'Y16-24' elif age < 35: cat = 'Y25-34' elif age < 45: cat = 'Y35-44' elif age < 55: cat = 'Y45-54' elif age < 65: cat = 'Y55-64' elif age < 75: cat = 'Y65-74' elif age < 85: cat = 'Y75-84' else: cat = 'Y_GE85' return cat df['age_category'] = age_to_age_category(df.age) df.head()
nuts2 country sex year population age age_category 2 AT11 Austria F 1990 1345.0 1.0 Y0-15 3 AT12 Austria F 1990 8038.0 1.0 Y0-15 4 AT13 Austria F 1990 7240.0 1.0 Y0-15 6 AT21 Austria F 1990 3042.0 1.0 Y0-15 7 AT22 Austria F 1990 6491.0 1.0 Y0-15
- self-perceived health
- https://ec.europa.eu/eurostat/databrowser/view/hlth_silc_02/default/table?lang=en
- we combine bad and very bad self perceived health
ages = ['Y16-24', 'Y25-34', 'Y35-44', 'Y45-54', 'Y55-64', 'Y65-74', 'Y75-84', 'Y_GE85'] df_c = eurostat.get_data_df('hlth_silc_02') df_c.rename({'geo\\time':'geo'},inplace=True,axis=1) df_c['country'] = df_c['geo'].replace(eurostat_dictionary) df_c = df_c[df_c.country.isin(EU_countries) & (df_c.sex.isin(["M","F"]) ) & (df_c.levels.isin(["B_VB"])) & (df_c.isced11 == "TOTAL") & df_c.age.isin(ages)] df_c.drop(["unit","isced11","geo","levels"],axis=1,inplace=True) df_c = pd.melt(df_c,id_vars=['country','sex','age'], value_vars=np.arange(2008,2023), var_name='year',value_name='bad_self_perceived_health') # df_c.set_index(['country','year','sex','mortalit'],inplace=True) # df_c = df_c.unstack() # df_c.reset_index(inplace=True) # df_c.columns = [' '.join(col).strip() for col in df_c.columns.values] df_c.rename({'age':'age_category'},inplace=True,axis=1) # df_c.to_csv('./data/preventable_mortality.csv') df_c.head()
country sex age_category year bad_self_perceived_health 0 Austria F Y16-24 2008 1.0 1 Belgium F Y16-24 2008 1.8 2 Bulgaria F Y16-24 2008 0.9 3 Switzerland F Y16-24 2008 0.9 4 Cyprus F Y16-24 2008 0.2
df = df.merge(df_c, how = 'left',\ on = ['country','year','sex','age_category']) df.set_index(['nuts2','year','sex','age'],inplace=True) df.tail()
country population age_category \ nuts2 year sex age UKM6 2020 M 99.0 United Kingdom NaN Y_GE85 UKM7 2020 M 99.0 United Kingdom NaN Y_GE85 UKM8 2020 M 99.0 United Kingdom NaN Y_GE85 UKM9 2020 M 99.0 United Kingdom NaN Y_GE85 UKN0 2020 M 99.0 United Kingdom NaN Y_GE85 bad_self_perceived_health nuts2 year sex age UKM6 2020 M 99.0 NaN UKM7 2020 M 99.0 NaN UKM8 2020 M 99.0 NaN UKM9 2020 M 99.0 NaN UKN0 2020 M 99.0 NaN
ds = df.to_xarray() ds.to_netcdf("./data/population_nuts_2.nc") ds
<xarray.Dataset> Dimensions: (age: 99, nuts2: 306, sex: 2, year: 31) Coordinates: * nuts2 (nuts2) object 'AT11' 'AT12' ... 'UKM9' 'UKN0' * year (year) int64 1990 1991 1992 ... 2018 2019 2020 * sex (sex) object 'F' 'M' * age (age) float64 1.0 2.0 3.0 4.0 ... 97.0 98.0 99.0 Data variables: country (nuts2, year, sex, age) object 'Austria' ... '... population (nuts2, year, sex, age) float64 1.345e+03 ... nan age_category (nuts2, year, sex, age) object 'Y0-15' ... 'Y_... bad_self_perceived_health (nuts2, year, sex, age) float64 nan nan ... nan
xarray.Dataset- age: 99
- nuts2: 306
- sex: 2
- year: 31
- nuts2(nuts2)object'AT11' 'AT12' ... 'UKM9' 'UKN0'
array(['AT11', 'AT12', 'AT13', ..., 'UKM8', 'UKM9', 'UKN0'], dtype=object)
- year(year)int641990 1991 1992 ... 2018 2019 2020
array([1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020])
- sex(sex)object'F' 'M'
array(['F', 'M'], dtype=object)
- age(age)float641.0 2.0 3.0 4.0 ... 97.0 98.0 99.0
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55., 56., 57., 58., 59., 60., 61., 62., 63., 64., 65., 66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77., 78., 79., 80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90., 91., 92., 93., 94., 95., 96., 97., 98., 99.])
- country(nuts2, year, sex, age)object'Austria' ... 'United Kingdom'
array([[[['Austria', 'Austria', 'Austria', ..., 'Austria', 'Austria', 'Austria'], ['Austria', 'Austria', 'Austria', ..., 'Austria', 'Austria', 'Austria']], [['Austria', 'Austria', 'Austria', ..., 'Austria', 'Austria', 'Austria'], ['Austria', 'Austria', 'Austria', ..., 'Austria', 'Austria', 'Austria']], [['Austria', 'Austria', 'Austria', ..., 'Austria', 'Austria', 'Austria'], ['Austria', 'Austria', 'Austria', ..., 'Austria', 'Austria', 'Austria']], ..., [['Austria', 'Austria', 'Austria', ..., 'Austria', 'Austria', 'Austria'], ['Austria', 'Austria', 'Austria', ..., 'Austria', 'Austria', ... ['United Kingdom', 'United Kingdom', 'United Kingdom', ..., 'United Kingdom', 'United Kingdom', 'United Kingdom']], ..., [['United Kingdom', 'United Kingdom', 'United Kingdom', ..., 'United Kingdom', 'United Kingdom', 'United Kingdom'], ['United Kingdom', 'United Kingdom', 'United Kingdom', ..., 'United Kingdom', 'United Kingdom', 'United Kingdom']], [['United Kingdom', 'United Kingdom', 'United Kingdom', ..., 'United Kingdom', 'United Kingdom', 'United Kingdom'], ['United Kingdom', 'United Kingdom', 'United Kingdom', ..., 'United Kingdom', 'United Kingdom', 'United Kingdom']], [['United Kingdom', 'United Kingdom', 'United Kingdom', ..., 'United Kingdom', 'United Kingdom', 'United Kingdom'], ['United Kingdom', 'United Kingdom', 'United Kingdom', ..., 'United Kingdom', 'United Kingdom', 'United Kingdom']]]], dtype=object)
- population(nuts2, year, sex, age)float641.345e+03 1.349e+03 ... nan nan
array([[[[1.3450e+03, 1.3490e+03, 1.4070e+03, ..., nan, nan, nan], [1.4380e+03, 1.4500e+03, 1.4920e+03, ..., nan, nan, nan]], [[1.3710e+03, 1.3760e+03, 1.3810e+03, ..., nan, nan, nan], [1.4550e+03, 1.4660e+03, 1.4700e+03, ..., nan, nan, nan]], [[1.4360e+03, 1.4140e+03, 1.4260e+03, ..., nan, nan, nan], [1.4280e+03, 1.4940e+03, 1.5070e+03, ..., nan, nan, nan]], ..., [[1.1630e+03, 1.1810e+03, 1.1690e+03, ..., 7.4000e+01, 3.7000e+01, 1.9000e+01], [1.2910e+03, 1.2630e+03, 1.3150e+03, ..., 1.7000e+01, ... nan, nan], [1.3305e+04, 1.3199e+04, 1.3380e+04, ..., nan, nan, nan]], ..., [[1.1730e+04, 1.1970e+04, 1.2065e+04, ..., 3.6600e+02, 2.2800e+02, 1.3400e+02], [1.2481e+04, 1.2764e+04, 1.2681e+04, ..., 1.1300e+02, 4.2000e+01, 3.3000e+01]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]]]])
- age_category(nuts2, year, sex, age)object'Y0-15' 'Y0-15' ... 'Y_GE85'
array([[[['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85'], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85']], [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85'], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85']], [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85'], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85']], ..., [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85'], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85']], [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85'], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85']], [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85'], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85']]], ... [[['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85'], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85']], [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85'], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85']], [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85'], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85']], ..., [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85'], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85']], [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85'], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85']], [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85'], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', 'Y_GE85']]]], dtype=object)
- bad_self_perceived_health(nuts2, year, sex, age)float64nan nan nan nan ... nan nan nan nan
array([[[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], ..., [[ nan, nan, nan, ..., 38.5, 38.5, 38.5], [ nan, nan, nan, ..., 34.7, 34.7, 34.7]], [[ nan, nan, nan, ..., 42.6, 42.6, 42.6], [ nan, nan, nan, ..., 32.1, 32.1, 32.1]], [[ nan, nan, nan, ..., 38.9, 38.9, 38.9], [ nan, nan, nan, ..., 31. , 31. , 31. ]]], ... [[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], ..., [[ nan, nan, nan, ..., 17.6, 17.6, 17.6], [ nan, nan, nan, ..., 14.5, 14.5, 14.5]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]]]])
df = eurostat.get_data_df('demo_r_magec') df.rename({'geo\\time':'geo'},inplace=True,axis=1) age_range = ['Y1', 'Y10', 'Y11', 'Y12', 'Y13', 'Y14', 'Y15', 'Y16', 'Y17', 'Y18', 'Y19', 'Y2', 'Y20', 'Y21', 'Y22', 'Y23', 'Y24', 'Y25', 'Y26', 'Y27', 'Y28', 'Y29', 'Y3', 'Y30', 'Y31', 'Y32', 'Y33', 'Y34', 'Y35', 'Y36', 'Y37', 'Y38', 'Y39', 'Y4', 'Y40', 'Y41', 'Y42', 'Y43', 'Y44', 'Y45', 'Y46', 'Y47', 'Y48', 'Y49', 'Y5', 'Y50', 'Y51', 'Y52', 'Y53', 'Y54', 'Y55', 'Y56', 'Y57', 'Y58', 'Y59', 'Y6', 'Y60', 'Y61', 'Y62', 'Y63', 'Y64', 'Y65', 'Y66', 'Y67', 'Y68', 'Y69', 'Y7', 'Y70', 'Y71', 'Y72', 'Y73', 'Y74', 'Y75', 'Y76', 'Y77', 'Y78', 'Y79', 'Y8', 'Y80', 'Y81', 'Y82', 'Y83', 'Y84', 'Y85', 'Y86', 'Y87', 'Y88', 'Y89', 'Y9', 'Y90', 'Y91', 'Y92', 'Y93', 'Y94', 'Y95', 'Y96', 'Y97', 'Y98', 'Y99'] df['country_code'] = [x[:2] for x in df['geo']] df['country'] = df['country_code'].replace(eurostat_dictionary) df = df[df.country.isin(EU_countries) & \ (df.sex.isin(["M","F"])) & (df.age.isin(age_range))] df.drop(['country','country_code'],axis=1,inplace=True) df = pd.melt(df,id_vars=['geo','sex','age'], value_vars=np.arange(1990,2020), var_name='year',value_name='deaths') df['nuts'] = [len(x)-2 for x in df.geo] df = df.query('nuts == 2') df.drop('nuts',inplace=True, axis=1) df.rename({'geo':'nuts2', 'age':'age string'},\ axis=1,inplace=True) df['age'] = [float(x.replace('Y','')) for x in df['age string']] df.drop('age string',inplace=True, axis=1) df.set_index(['nuts2','year','sex','age'],inplace=True) df.head()
deaths nuts2 year sex age AT11 1990 F 1.0 0.0 AT12 1990 F 1.0 7.0 AT13 1990 F 1.0 4.0 AT21 1990 F 1.0 1.0 AT22 1990 F 1.0 5.0 ds = df.to_xarray() ds.to_netcdf("./data/deaths_nuts_2.nc") ds
<xarray.Dataset> Dimensions: (age: 99, nuts2: 305, sex: 2, year: 30) Coordinates: * nuts2 (nuts2) object 'AT11' 'AT12' 'AT13' 'AT21' ... 'UKM8' 'UKM9' 'UKN0' * year (year) int64 1990 1991 1992 1993 1994 ... 2015 2016 2017 2018 2019 * sex (sex) object 'F' 'M' * age (age) float64 1.0 2.0 3.0 4.0 5.0 6.0 ... 95.0 96.0 97.0 98.0 99.0 Data variables: deaths (nuts2, year, sex, age) float64 0.0 0.0 1.0 1.0 ... nan nan nan nan
xarray.Dataset- age: 99
- nuts2: 305
- sex: 2
- year: 30
- nuts2(nuts2)object'AT11' 'AT12' ... 'UKM9' 'UKN0'
array(['AT11', 'AT12', 'AT13', ..., 'UKM8', 'UKM9', 'UKN0'], dtype=object)
- year(year)int641990 1991 1992 ... 2017 2018 2019
array([1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019])
- sex(sex)object'F' 'M'
array(['F', 'M'], dtype=object)
- age(age)float641.0 2.0 3.0 4.0 ... 97.0 98.0 99.0
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55., 56., 57., 58., 59., 60., 61., 62., 63., 64., 65., 66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77., 78., 79., 80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90., 91., 92., 93., 94., 95., 96., 97., 98., 99.])
- deaths(nuts2, year, sex, age)float640.0 0.0 1.0 1.0 ... nan nan nan nan
array([[[[ 0., 0., 1., ..., nan, nan, nan], [ 2., 0., 0., ..., nan, nan, nan]], [[ 0., 1., 0., ..., nan, nan, nan], [ 3., 0., 0., ..., nan, nan, nan]], [[ 1., 0., 1., ..., nan, nan, nan], [ 1., 0., 1., ..., nan, nan, nan]], ..., [[ 0., 0., 0., ..., 31., 12., 4.], [ 0., 0., 0., ..., 11., 5., 3.]], [[ 2., 0., 0., ..., 25., 17., 5.], [ 1., 0., 0., ..., 3., 4., 1.]], [[ 0., 0., 0., ..., 36., 21., 16.], [ 0., 1., 0., ..., 12., 5., 5.]]], ... [[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], ..., [[ 1., 3., 0., ..., 122., 84., 54.], [ 4., 2., 1., ..., 39., 28., 17.]], [[ 2., 2., 0., ..., 139., 69., 57.], [ 1., 1., 2., ..., 48., 26., 18.]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]]]])
3.2.2. mortality lagged per age
- useful link to lag variables in a pandas dataframe: https://stackoverflow.com/questions/61234837/pandas-panel-data-shifting-values-by-two-taking-into-consideration-year-gaps
ds_population = xr.open_dataset('./data/population_nuts_2.nc') ds_deaths = xr.open_dataset('./data/deaths_nuts_2.nc') ds_mortality = xr.merge([ds_population, ds_deaths]) ds_mortality
<xarray.Dataset> Dimensions: (age: 99, nuts2: 305, sex: 2, year: 31) Coordinates: * year (year) int64 1990 1991 1992 1993 1994 ... 2017 2018 2019 2020 * nuts2 (nuts2) object 'AT11' 'AT12' 'AT13' ... 'UKM8' 'UKM9' 'UKN0' * sex (sex) object 'F' 'M' * age (age) float64 1.0 2.0 3.0 4.0 5.0 ... 95.0 96.0 97.0 98.0 99.0 Data variables: population (nuts2, year, sex, age) float64 ... deaths (nuts2, year, sex, age) float64 0.0 0.0 1.0 1.0 ... nan nan nan
- age: 99
- nuts2: 305
- sex: 2
- year: 31
- year(year)int641990 1991 1992 ... 2018 2019 2020
array([1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020])
- nuts2(nuts2)object'AT11' 'AT12' ... 'UKM9' 'UKN0'
array(['AT11', 'AT12', 'AT13', ..., 'UKM8', 'UKM9', 'UKN0'], dtype=object)
- sex(sex)object'F' 'M'
array(['F', 'M'], dtype=object)
- age(age)float641.0 2.0 3.0 4.0 ... 97.0 98.0 99.0
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55., 56., 57., 58., 59., 60., 61., 62., 63., 64., 65., 66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77., 78., 79., 80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90., 91., 92., 93., 94., 95., 96., 97., 98., 99.])
- population(nuts2, year, sex, age)float64...
[1872090 values with dtype=float64]
- deaths(nuts2, year, sex, age)float640.0 0.0 1.0 1.0 ... nan nan nan nan
array([[[[ 0., 0., 1., ..., nan, nan, nan], [ 2., 0., 0., ..., nan, nan, nan]], [[ 0., 1., 0., ..., nan, nan, nan], [ 3., 0., 0., ..., nan, nan, nan]], [[ 1., 0., 1., ..., nan, nan, nan], [ 1., 0., 1., ..., nan, nan, nan]], ..., [[ 2., 0., 0., ..., 25., 17., 5.], [ 1., 0., 0., ..., 3., 4., 1.]], [[ 0., 0., 0., ..., 36., 21., 16.], [ 0., 1., 0., ..., 12., 5., 5.]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]]], ... [[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], ..., [[ 2., 2., 0., ..., 139., 69., 57.], [ 1., 1., 2., ..., 48., 26., 18.]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]]]])
df_mortality = ds_mortality.to_dataframe() # drop rows where deaths > population at Jan. 1st # (e.g. because people moved in the year) df_mortality = df_mortality[(df_mortality.population >= \ df_mortality.deaths) &\ (df_mortality.population > 0)] df_mortality['mortality'] = df_mortality['deaths']/\ df_mortality['population'] df_mortality.reset_index(inplace=True) df_mortality.rename({'level_3':'year'},\ inplace=True,axis=1) df_mortality['year'] = pd.to_datetime(df_mortality['year'],\ format='%Y') df_mortality.head()
age | nuts2 | sex | year | population | deaths | mortality | |
---|---|---|---|---|---|---|---|
0 | 1.0 | AT11 | F | 1990-01-01 | 1345.0 | 0.0 | 0.000000 |
1 | 1.0 | AT11 | F | 1991-01-01 | 1371.0 | 0.0 | 0.000000 |
2 | 1.0 | AT11 | F | 1992-01-01 | 1436.0 | 1.0 | 0.000696 |
3 | 1.0 | AT11 | F | 1993-01-01 | 1372.0 | 0.0 | 0.000000 |
4 | 1.0 | AT11 | F | 1994-01-01 | 1349.0 | 1.0 | 0.000741 |
new_column = (df_mortality.set_index(['year','age','nuts2','sex']).\ groupby(['age','nuts2','sex'])['mortality']\ .shift(1)).to_frame() new_column.reset_index(inplace=True) new_column.age += 1 new_column['year'] = new_column['year'].dt.year new_column.rename({'mortality':'lagged_mortality'},\ inplace=True,axis=1) new_column.set_index(['year','age','nuts2','sex'],\ inplace=True) new_column.head()
lagged_mortality | ||||
---|---|---|---|---|
year | age | nuts2 | sex | |
1990 | 2.0 | AT11 | F | NaN |
1991 | 2.0 | AT11 | F | 0.000000 |
1992 | 2.0 | AT11 | F | 0.000000 |
1993 | 2.0 | AT11 | F | 0.000696 |
1994 | 2.0 | AT11 | F | 0.000000 |
ds = new_column.to_xarray() ds.to_netcdf("./data/lagged_mortality_nuts_2.nc") ds
<xarray.Dataset> Dimensions: (age: 99, nuts2: 297, sex: 2, year: 30) Coordinates: * year (year) int64 1990 1991 1992 1993 ... 2016 2017 2018 2019 * age (age) float64 2.0 3.0 4.0 5.0 6.0 ... 97.0 98.0 99.0 100.0 * nuts2 (nuts2) object 'AT11' 'AT12' 'AT13' ... 'UKM9' 'UKN0' * sex (sex) object 'F' 'M' Data variables: lagged_mortality (year, age, nuts2, sex) float64 nan nan nan ... nan nan
- age: 99
- nuts2: 297
- sex: 2
- year: 30
- year(year)int641990 1991 1992 ... 2017 2018 2019
array([1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019])
- age(age)float642.0 3.0 4.0 5.0 ... 98.0 99.0 100.0
array([ 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55., 56., 57., 58., 59., 60., 61., 62., 63., 64., 65., 66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77., 78., 79., 80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90., 91., 92., 93., 94., 95., 96., 97., 98., 99., 100.])
- nuts2(nuts2)object'AT11' 'AT12' ... 'UKM9' 'UKN0'
array(['AT11', 'AT12', 'AT13', ..., 'UKM8', 'UKM9', 'UKN0'], dtype=object)
- sex(sex)object'F' 'M'
array(['F', 'M'], dtype=object)
- lagged_mortality(year, age, nuts2, sex)float64nan nan nan nan ... nan nan nan nan
array([[[[ nan, nan], [ nan, nan], [ nan, nan], ..., [ nan, nan], [ nan, nan], [ nan, nan]], [[ nan, nan], [ nan, nan], [ nan, nan], ..., [ nan, nan], [ nan, nan], [ nan, nan]], [[ nan, nan], [ nan, nan], [ nan, nan], ..., ... ..., [ nan, nan], [ nan, nan], [ nan, nan]], [[4.59459459e-01, 1.00000000e+00], [4.08256881e-01, 4.20000000e-01], [3.94636015e-01, 4.82758621e-01], ..., [ nan, nan], [ nan, nan], [ nan, nan]], [[2.63157895e-01, nan], [4.78723404e-01, 6.84210526e-01], [4.25925926e-01, 6.00000000e-01], ..., [ nan, nan], [ nan, nan], [ nan, nan]]]])
3.2.3. Poverty measures
- Severe material deprivation rate by NUTS 2 regions [TGS00104]
df = eurostat.get_data_df('tgs00104') df.rename({'geo\\time':'nuts2'},inplace=True,axis=1) df = df[(df.unit == "PC")] df.drop(["unit"],axis=1,inplace=True) df['nuts'] = [len(x)-2 for x in df.nuts2] df = df[df.nuts == 2] df.drop('nuts',axis=1,inplace=True) df = pd.melt(df,id_vars=['nuts2'], value_vars=np.arange(2009,2021), var_name='year',\ value_name='percentage_material_deprivation') df.set_index(['nuts2','year'],inplace=True) df.head()
percentage_material_deprivation nuts2 year AL01 2009 NaN AL02 2009 NaN AL03 2009 NaN AT11 2009 NaN AT12 2009 NaN ds_material_deprivation = df.to_xarray() ds_material_deprivation.to_netcdf("./data/material_deprivation_nuts_2.nc") ds_material_deprivation
<xarray.Dataset> Dimensions: (nuts2: 179, year: 12) Coordinates: * nuts2 (nuts2) object 'AL01' 'AL02' ... 'SK04' * year (year) int64 2009 2010 2011 ... 2019 2020 Data variables: percentage_material_deprivation (nuts2, year) float64 nan nan ... 9.6 8.3
xarray.Dataset- nuts2: 179
- year: 12
- nuts2(nuts2)object'AL01' 'AL02' ... 'SK03' 'SK04'
array(['AL01', 'AL02', 'AL03', 'AT11', 'AT12', 'AT13', 'AT21', 'AT22', 'AT31', 'AT32', 'AT33', 'AT34', 'BG31', 'BG32', 'BG33', 'BG34', 'BG41', 'BG42', 'CH01', 'CH02', 'CH03', 'CH04', 'CH05', 'CH06', 'CH07', 'CZ01', 'CZ02', 'CZ03', 'CZ04', 'CZ05', 'CZ06', 'CZ07', 'CZ08', 'DK01', 'DK02', 'DK03', 'DK04', 'DK05', 'EL30', 'EL41', 'EL42', 'EL43', 'EL51', 'EL52', 'EL53', 'EL54', 'EL61', 'EL62', 'EL63', 'EL64', 'EL65', 'ES11', 'ES12', 'ES13', 'ES21', 'ES22', 'ES23', 'ES24', 'ES30', 'ES41', 'ES42', 'ES43', 'ES51', 'ES52', 'ES53', 'ES61', 'ES62', 'ES63', 'ES64', 'ES70', 'FI1B', 'FI1C', 'FI1D', 'HR03', 'HR04', 'HU10', 'HU11', 'HU12', 'HU21', 'HU22', 'HU23', 'HU31', 'HU32', 'HU33', 'IE01', 'IE02', 'IE04', 'IE05', 'IE06', 'ITC1', 'ITC2', 'ITC3', 'ITC4', 'ITF1', 'ITF2', 'ITF3', 'ITF4', 'ITF5', 'ITF6', 'ITG1', 'ITG2', 'ITH1', 'ITH2', 'ITH3', 'ITH4', 'ITH5', 'ITI1', 'ITI2', 'ITI3', 'ITI4', 'LT01', 'LT02', 'NL11', 'NL12', 'NL13', 'NL21', 'NL22', 'NL23', 'NL31', 'NL32', 'NL33', 'NL34', 'NL41', 'NL42', 'NO01', 'NO02', 'NO03', 'NO04', 'NO05', 'NO06', 'NO07', 'PL21', 'PL22', 'PL41', 'PL42', 'PL43', 'PL51', 'PL52', 'PL61', 'PL62', 'PL63', 'PL71', 'PL72', 'PL81', 'PL82', 'PL84', 'PL91', 'PL92', 'PT11', 'PT15', 'PT16', 'PT17', 'PT18', 'PT20', 'PT30', 'RO11', 'RO12', 'RO21', 'RO22', 'RO31', 'RO32', 'RO41', 'RO42', 'SE11', 'SE12', 'SE21', 'SE22', 'SE23', 'SE31', 'SE32', 'SE33', 'SI01', 'SI02', 'SI03', 'SI04', 'SK01', 'SK02', 'SK03', 'SK04'], dtype=object)
- year(year)int642009 2010 2011 ... 2018 2019 2020
array([2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020])
- percentage_material_deprivation(nuts2, year)float64nan nan nan nan ... 10.0 9.6 8.3
array([[ nan, nan, nan, ..., 38.6, 38.5, 36.6], [ nan, nan, nan, ..., 40.4, 37.7, 35.1], [ nan, nan, nan, ..., 35.3, 34.9, 32.2], ..., [10.3, 9.8, 8.6, ..., 4.2, 5.6, 3.6], [11.1, 10.6, 10.7, ..., 9.1, 10.6, 8.1], [13.5, 15. , 12.3, ..., 10. , 9.6, 8.3]])
- At-risk-of-poverty rate by NUTS 2 regions [TGS00103]
df = eurostat.get_data_df('tgs00103') df.rename({'geo\\time':'nuts2'},inplace=True,axis=1) df = df[(df.unit == "PC_POP")] df.drop(["unit"],axis=1,inplace=True) df['nuts'] = [len(x)-2 for x in df.nuts2] df = df[df.nuts == 2] df.drop('nuts',axis=1,inplace=True) df = pd.melt(df,id_vars=['nuts2'], value_vars=np.arange(2010,2022), var_name='year', value_name='at risk of poverty') df.set_index(['nuts2','year'],inplace=True) df.head()
at risk of poverty nuts2 year AL01 2010 NaN AL02 2010 NaN AL03 2010 NaN BG31 2010 30.5 BG32 2010 28.8 ds_at_risk_of_poverty = df.to_xarray() ds_at_risk_of_poverty.to_netcdf("./data/at_risk_of_poverty_nuts_2.nc") ds_at_risk_of_poverty
<xarray.Dataset> Dimensions: (nuts2: 173, year: 12) Coordinates: * nuts2 (nuts2) object 'AL01' 'AL02' 'AL03' ... 'SK03' 'SK04' * year (year) int64 2010 2011 2012 2013 ... 2018 2019 2020 2021 Data variables: at risk of poverty (nuts2, year) float64 nan nan nan nan ... 17.0 16.5 nan
xarray.Dataset- nuts2: 173
- year: 12
- nuts2(nuts2)object'AL01' 'AL02' ... 'SK03' 'SK04'
array(['AL01', 'AL02', 'AL03', 'BG31', 'BG32', 'BG33', 'BG34', 'BG41', 'BG42', 'CH01', 'CH02', 'CH03', 'CH04', 'CH05', 'CH06', 'CH07', 'CZ01', 'CZ02', 'CZ03', 'CZ04', 'CZ05', 'CZ06', 'CZ07', 'CZ08', 'DK01', 'DK02', 'DK03', 'DK04', 'DK05', 'EL30', 'EL41', 'EL42', 'EL43', 'EL51', 'EL52', 'EL53', 'EL54', 'EL61', 'EL62', 'EL63', 'EL64', 'EL65', 'ES11', 'ES12', 'ES13', 'ES21', 'ES22', 'ES23', 'ES24', 'ES30', 'ES41', 'ES42', 'ES43', 'ES51', 'ES52', 'ES53', 'ES61', 'ES62', 'ES63', 'ES64', 'ES70', 'FI1B', 'FI1C', 'FI1D', 'HR02', 'HR03', 'HR04', 'HR05', 'HR06', 'HU10', 'HU11', 'HU12', 'HU21', 'HU22', 'HU23', 'HU31', 'HU32', 'HU33', 'IE01', 'IE02', 'IE04', 'IE05', 'IE06', 'ITC1', 'ITC2', 'ITC3', 'ITC4', 'ITF1', 'ITF2', 'ITF3', 'ITF4', 'ITF5', 'ITF6', 'ITG1', 'ITG2', 'ITH1', 'ITH2', 'ITH3', 'ITH4', 'ITH5', 'ITI1', 'ITI2', 'ITI3', 'ITI4', 'LT01', 'LT02', 'NL11', 'NL12', 'NL13', 'NL21', 'NL22', 'NL23', 'NL31', 'NL32', 'NL33', 'NL34', 'NL41', 'NL42', 'NO01', 'NO02', 'NO03', 'NO04', 'NO05', 'NO06', 'NO07', 'PL21', 'PL22', 'PL41', 'PL42', 'PL43', 'PL51', 'PL52', 'PL61', 'PL62', 'PL63', 'PL71', 'PL72', 'PL81', 'PL82', 'PL84', 'PL91', 'PL92', 'PT11', 'PT15', 'PT16', 'PT17', 'PT18', 'PT20', 'PT30', 'RO11', 'RO12', 'RO21', 'RO22', 'RO31', 'RO32', 'RO41', 'RO42', 'SE11', 'SE12', 'SE21', 'SE22', 'SE23', 'SE31', 'SE32', 'SE33', 'SI01', 'SI02', 'SI03', 'SI04', 'SK01', 'SK02', 'SK03', 'SK04'], dtype=object)
- year(year)int642010 2011 2012 ... 2019 2020 2021
array([2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021])
- at risk of poverty(nuts2, year)float64nan nan nan nan ... 17.0 16.5 nan
array([[ nan, nan, nan, ..., 25.9, 25.1, nan], [ nan, nan, nan, ..., 22.7, 21.3, nan], [ nan, nan, nan, ..., 20.8, 19.5, nan], ..., [10.3, 11.4, 11.9, ..., 7.4, 7.1, nan], [13.1, 13.1, 14.1, ..., 15.6, 14.8, nan], [15.8, 16.9, 16.7, ..., 17. , 16.5, nan]])
3.2.4. unmet needs
Self-reported unmet needs for medical examination by main reason declared and NUTS 2 regions
- link to the data: https://ec.europa.eu/eurostat/databrowser/view/hlth_silc_08_r/default/table?lang=en
- https://ec.europa.eu/eurostat/documents/1012329/6070906/Personal+data+-+health+PH.pdf/7edfec40-462a-4cb2-a5fc-8d8845000ca4
value | |
---|---|
TOOEXP | Too expensive |
TOOFAR | Too far to travel |
TOOEFW | Too expensive or too far to travel or waiting list |
NOTIME | No time |
NO_UNMET | No unmet needs to declare |
NOKNOW | Didn't know any good doctor or specialist |
WAITING | Waiting list |
FEAR | Fear of doctor, hospital, examination or treatment |
HOPING | Wanted to wait and see if problem got better on its own |
OTH | Other |
We drop the values NO_UNMET
and OTH
. The former is dropped because we are interested in unmet medical needs. The latter because the reason is unclear.
df = eurostat.get_data_df('hlth_silc_08_r') df.rename({'geo\\time':'nuts2'},inplace=True,axis=1) df.drop(["unit"],axis=1,inplace=True) df['nuts'] = [len(x)-2 for x in df.nuts2] df = df[df.nuts == 2] df.drop('nuts',axis=1,inplace=True) columns = ['FEAR','TOOEXP','TOOFAR','WAITING',\ 'HOPING', 'NOTIME', 'NOKNOW'] df=df[df.reason.isin(columns)] df = pd.melt(df,id_vars=['nuts2','reason'], value_vars=np.arange(2009,2021), var_name='year',\ value_name='unmet needs') df.set_index(['nuts2','year','reason'],inplace=True) df = df.unstack() df.columns = df.columns.droplevel() df['UNMET'] = df[columns].sum(axis=1) df.drop(['FEAR','TOOFAR','WAITING',\ 'HOPING', 'NOTIME', 'NOKNOW'],axis=1,inplace=True) df.tail()
reason | TOOEXP | UNMET | |
---|---|---|---|
nuts2 | year | ||
SK04 | 2016 | 0.5 | 7.4 |
2017 | 0.2 | 5.5 | |
2018 | 0.4 | 6.2 | |
2019 | 0.4 | 6.6 | |
2020 | 0.2 | 6.0 |
ds_unmet_needs = df.to_xarray() ds_unmet_needs.to_netcdf("./data/unmet_needs_nuts_2.nc") ds_unmet_needs
<xarray.Dataset> Dimensions: (nuts2: 106, year: 12) Coordinates: * nuts2 (nuts2) object 'AL01' 'AL02' 'AL03' 'BG31' ... 'SK02' 'SK03' 'SK04' * year (year) int64 2009 2010 2011 2012 2013 ... 2016 2017 2018 2019 2020 Data variables: TOOEXP (nuts2, year) float64 nan nan nan nan nan ... 0.5 0.2 0.4 0.4 0.2 UNMET (nuts2, year) float64 0.0 0.0 0.0 0.0 0.0 ... 7.4 5.5 6.2 6.6 6.0
- nuts2: 106
- year: 12
- nuts2(nuts2)object'AL01' 'AL02' ... 'SK03' 'SK04'
array(['AL01', 'AL02', 'AL03', 'BG31', 'BG32', 'BG33', 'BG34', 'BG41', 'BG42', 'CH01', 'CH02', 'CH03', 'CH04', 'CH05', 'CH06', 'CH07', 'DK01', 'DK02', 'DK03', 'DK04', 'DK05', 'EL30', 'EL41', 'EL42', 'EL43', 'EL51', 'EL52', 'EL53', 'EL54', 'EL61', 'EL62', 'EL63', 'EL64', 'EL65', 'FI1B', 'FI1C', 'FI1D', 'HR02', 'HR03', 'HR04', 'HR05', 'HR06', 'HU10', 'HU11', 'HU12', 'HU21', 'HU22', 'HU23', 'HU31', 'HU32', 'HU33', 'IE01', 'IE02', 'IE04', 'IE05', 'IE06', 'LT01', 'LT02', 'NO01', 'NO02', 'NO03', 'NO04', 'NO05', 'NO06', 'NO07', 'PL21', 'PL22', 'PL41', 'PL42', 'PL43', 'PL51', 'PL52', 'PL61', 'PL62', 'PL63', 'PL71', 'PL72', 'PL81', 'PL82', 'PL84', 'PL91', 'PL92', 'RO11', 'RO12', 'RO21', 'RO22', 'RO31', 'RO32', 'RO41', 'RO42', 'SE11', 'SE12', 'SE21', 'SE22', 'SE23', 'SE31', 'SE32', 'SE33', 'SI01', 'SI02', 'SI03', 'SI04', 'SK01', 'SK02', 'SK03', 'SK04'], dtype=object)
- year(year)int642009 2010 2011 ... 2018 2019 2020
array([2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020])
- TOOEXP(nuts2, year)float64nan nan nan nan ... 0.2 0.4 0.4 0.2
array([[ nan, nan, nan, ..., 18.9, 21.8, 12. ], [ nan, nan, nan, ..., 12.3, 9.3, 7.6], [ nan, nan, nan, ..., 10.8, 11.6, 8.5], ..., [ 0.6, 0.8, 1. , ..., 0.4, 0.4, 0.7], [ 0.6, 0.6, 0.5, ..., 1.1, 1.1, 1.6], [ 0.7, 0.8, 0.8, ..., 0.4, 0.4, 0.2]])
- UNMET(nuts2, year)float640.0 0.0 0.0 0.0 ... 5.5 6.2 6.6 6.0
array([[ 0. , 0. , 0. , ..., 26.1, 26.9, 19.8], [ 0. , 0. , 0. , ..., 20.6, 15. , 11.1], [ 0. , 0. , 0. , ..., 17.5, 24.4, 13.6], ..., [ 5.2, 5.7, 5.6, ..., 4. , 3.7, 3. ], [ 3.6, 4.9, 4.6, ..., 9.9, 9.3, 13.5], [ 3.5, 4.4, 5.2, ..., 6.2, 6.6, 6. ]])
3.2.5. infant mortality
- https://ec.europa.eu/eurostat/databrowser/view/DEMO_R_MINFIND/default/table?lang=en&category=demo.demomreg
- https://ec.europa.eu/health/indicators/docs/echi_11.pdf
- The number of deaths of infants (younger than one year of age at death) per 1000 live births (based on one year data).
df = eurostat.get_data_df('demo_r_minfind') df.rename({'geo\\time':'nuts2'},inplace=True,axis=1) df.drop(["unit"],axis=1,inplace=True) df['nuts'] = [len(x)-2 for x in df.nuts2] df = df[df.nuts == 2] df.drop('nuts',axis=1,inplace=True) df = pd.melt(df,id_vars=['nuts2'], value_vars=np.arange(2009,2022), var_name='year',\ value_name='infant mortality') df.set_index(['nuts2','year'],inplace=True) # df = df.unstack() # df.columns = df.columns.droplevel() # df['UNMET'] = df[columns].sum(axis=1) # df.drop(['FEAR','TOOFAR','WAITING',\ # 'HOPING', 'NOTIME', 'NOKNOW'],axis=1,inplace=True) df.head()
infant mortality nuts2 year AL01 2009 NaN AL02 2009 NaN AL03 2009 NaN AT11 2009 0.9 AT12 2009 4.2
ds_infant_mortality = df.to_xarray() ds_infant_mortality.to_netcdf("./data/infant_mortality_nuts_2.nc") ds_infant_mortality
<xarray.Dataset> Dimensions: (nuts2: 340, year: 13) Coordinates: * nuts2 (nuts2) object 'AL01' 'AL02' 'AL03' ... 'UKM9' 'UKN0' * year (year) int64 2009 2010 2011 2012 ... 2018 2019 2020 2021 Data variables: infant mortality (nuts2, year) float64 nan nan nan nan ... 4.2 nan nan nan
- nuts2: 340
- year: 13
- nuts2(nuts2)object'AL01' 'AL02' ... 'UKM9' 'UKN0'
array(['AL01', 'AL02', 'AL03', ..., 'UKM8', 'UKM9', 'UKN0'], dtype=object)
- year(year)int642009 2010 2011 ... 2019 2020 2021
array([2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021])
- infant mortality(nuts2, year)float64nan nan nan nan ... 4.2 nan nan nan
array([[ nan, nan, nan, ..., 11.1, 9. , nan], [ nan, nan, nan, ..., 10.6, 11.3, nan], [ nan, nan, nan, ..., 8.2, 9.2, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ 4.8, 5.4, 3.9, ..., nan, nan, nan]])
4. Combining data
The sections below generate the datasets per variable. Once these datasets are generated, the code in this section merges all these variables into one .csv
file.
4.1. nuts2 data
In this subsection, we merge the data at the NUTS 2 level.
from glob import glob ds_2 = xr.merge([xr.open_dataset(f) for f in glob('./data/*.nc')]) ds_2
<xarray.Dataset> Dimensions: (age: 100, nuts2: 348, sex: 2, year: 32) Coordinates: * nuts2 (nuts2) object 'AL01' 'AL02' ... 'UKN0' * year (year) int64 1990 1991 1992 ... 2020 2021 * age (age) float64 1.0 2.0 3.0 ... 99.0 100.0 * sex (sex) object 'F' 'M' Data variables: infant mortality (nuts2, year) float64 nan nan ... nan nan country (nuts2, year, sex, age) object nan ... nan population (nuts2, year, sex, age) float64 nan ... nan age_category (nuts2, year, sex, age) object nan ... nan bad_self_perceived_health (nuts2, year, sex, age) float64 nan ... nan at risk of poverty (nuts2, year) float64 nan nan ... nan nan lagged_mortality (year, age, nuts2, sex) float64 nan ... nan percentage_material_deprivation (nuts2, year) float64 nan nan ... nan nan deaths (nuts2, year, sex, age) float64 nan ... nan TOOEXP (nuts2, year) float64 nan nan ... nan nan UNMET (nuts2, year) float64 nan nan ... nan nan
- age: 100
- nuts2: 348
- sex: 2
- year: 32
- nuts2(nuts2)object'AL01' 'AL02' ... 'UKM9' 'UKN0'
array(['AL01', 'AL02', 'AL03', ..., 'UKM8', 'UKM9', 'UKN0'], dtype=object)
- year(year)int641990 1991 1992 ... 2019 2020 2021
array([1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021])
- age(age)float641.0 2.0 3.0 4.0 ... 98.0 99.0 100.0
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55., 56., 57., 58., 59., 60., 61., 62., 63., 64., 65., 66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77., 78., 79., 80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90., 91., 92., 93., 94., 95., 96., 97., 98., 99., 100.])
- sex(sex)object'F' 'M'
array(['F', 'M'], dtype=object)
- infant mortality(nuts2, year)float64nan nan nan nan ... 4.2 nan nan nan
array([[ nan, nan, nan, ..., 11.1, 9. , nan], [ nan, nan, nan, ..., 10.6, 11.3, nan], [ nan, nan, nan, ..., 8.2, 9.2, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]])
- country(nuts2, year, sex, age)objectnan nan nan nan ... nan nan nan nan
array([[[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], ..., [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]], ... [['United Kingdom', 'United Kingdom', 'United Kingdom', ..., 'United Kingdom', 'United Kingdom', nan], ['United Kingdom', 'United Kingdom', 'United Kingdom', ..., 'United Kingdom', 'United Kingdom', nan]], ..., [['United Kingdom', 'United Kingdom', 'United Kingdom', ..., 'United Kingdom', 'United Kingdom', nan], ['United Kingdom', 'United Kingdom', 'United Kingdom', ..., 'United Kingdom', 'United Kingdom', nan]], [['United Kingdom', 'United Kingdom', 'United Kingdom', ..., 'United Kingdom', 'United Kingdom', nan], ['United Kingdom', 'United Kingdom', 'United Kingdom', ..., 'United Kingdom', 'United Kingdom', nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]]], dtype=object)
- population(nuts2, year, sex, age)float64nan nan nan nan ... nan nan nan nan
array([[[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], ..., [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]]], ... [[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[12594., 12486., 12690., ..., nan, nan, nan], [13305., 13199., 13380., ..., nan, nan, nan]], ..., [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]]]])
- age_category(nuts2, year, sex, age)objectnan nan nan nan ... nan nan nan nan
array([[[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], ..., [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]], ... [[['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', nan], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', nan]], [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', nan], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', nan]], [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', nan], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', nan]], ..., [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', nan], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', nan]], [['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', nan], ['Y0-15', 'Y0-15', 'Y0-15', ..., 'Y_GE85', 'Y_GE85', nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]]], dtype=object)
- bad_self_perceived_health(nuts2, year, sex, age)float64nan nan nan nan ... nan nan nan nan
array([[[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], ..., [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]], ... [[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], ..., [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]]])
- at risk of poverty(nuts2, year)float64nan nan nan nan ... nan nan nan nan
array([[ nan, nan, nan, ..., 25.9, 25.1, nan], [ nan, nan, nan, ..., 22.7, 21.3, nan], [ nan, nan, nan, ..., 20.8, 19.5, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]])
- lagged_mortality(year, age, nuts2, sex)float64nan nan nan nan ... nan nan nan nan
array([[[[nan, nan], [nan, nan], [nan, nan], ..., [nan, nan], [nan, nan], [nan, nan]], [[nan, nan], [nan, nan], [nan, nan], ..., [nan, nan], [nan, nan], [nan, nan]], [[nan, nan], [nan, nan], [nan, nan], ..., ... ..., [nan, nan], [nan, nan], [nan, nan]], [[nan, nan], [nan, nan], [nan, nan], ..., [nan, nan], [nan, nan], [nan, nan]], [[nan, nan], [nan, nan], [nan, nan], ..., [nan, nan], [nan, nan], [nan, nan]]]])
- percentage_material_deprivation(nuts2, year)float64nan nan nan nan ... nan nan nan nan
array([[ nan, nan, nan, ..., 38.5, 36.6, nan], [ nan, nan, nan, ..., 37.7, 35.1, nan], [ nan, nan, nan, ..., 34.9, 32.2, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]])
- deaths(nuts2, year, sex, age)float64nan nan nan nan ... nan nan nan nan
array([[[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], ..., [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]], ... [[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], ..., [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], [[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]]])
- TOOEXP(nuts2, year)float64nan nan nan nan ... nan nan nan nan
array([[ nan, nan, nan, ..., 21.8, 12. , nan], [ nan, nan, nan, ..., 9.3, 7.6, nan], [ nan, nan, nan, ..., 11.6, 8.5, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]])
- UNMET(nuts2, year)float64nan nan nan nan ... nan nan nan nan
array([[ nan, nan, nan, ..., 26.9, 19.8, nan], [ nan, nan, nan, ..., 15. , 11.1, nan], [ nan, nan, nan, ..., 24.4, 13.6, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]])
df = ds_2.to_dataframe() df.reset_index(inplace=True) df.rename({'level_0':'age','level_1':'nuts2','level_3':'year'},\ axis=1,inplace=True) df['country_code'] = df['nuts2'].str.slice(0,2) df['nuts1'] = df['nuts2'].str.slice(0,3) df['country'] = df['country_code'].replace(eurostat_dictionary) df.head()
age nuts2 sex year infant mortality country population age_category \ 0 1.0 AL01 F 1990 NaN Albania NaN NaN 1 1.0 AL01 F 1991 NaN Albania NaN NaN 2 1.0 AL01 F 1992 NaN Albania NaN NaN 3 1.0 AL01 F 1993 NaN Albania NaN NaN 4 1.0 AL01 F 1994 NaN Albania NaN NaN bad_self_perceived_health at risk of poverty lagged_mortality \ 0 NaN NaN NaN 1 NaN NaN NaN 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN percentage_material_deprivation deaths TOOEXP UNMET country_code nuts1 0 NaN NaN NaN NaN AL AL0 1 NaN NaN NaN NaN AL AL0 2 NaN NaN NaN NaN AL AL0 3 NaN NaN NaN NaN AL AL0 4 NaN NaN NaN NaN AL AL0
4.2. country data
In this section, we add country level variables to the dataframe.
from glob import glob ds_c = xr.merge([xr.open_dataset(f) for f in glob('./data/country/*.nc')]) ds_c
<xarray.Dataset> Dimensions: (country: 33, country_code: 33, year: 28) Coordinates: * year (year) int64 1992 1993 1994 ... 2018 2019 * country (country) object 'Austria' ... 'United Kin... * country_code (country_code) object 'AT' 'BA' ... 'SK' 'UK' Data variables: HF2_PC_CHE (country, country_code, year) float64 nan ... HF2_PPS_HAB (country, country_code, year) float64 nan ... HF3_PC_CHE (country, country_code, year) float64 nan ... HF3_PPS_HAB (country, country_code, year) float64 nan ... health expenditure per capita (country, country_code, year) float64 nan ...
- country: 33
- country_code: 33
- year: 28
- year(year)int641992 1993 1994 ... 2017 2018 2019
array([1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019])
- country(country)object'Austria' ... 'United Kingdom'
array(['Austria', 'Belgium', 'Bosnia and Herzegovina', 'Bulgaria', 'Croatia', 'Cyprus', 'Czechia', 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Iceland', 'Ireland', 'Italy', 'Latvia', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'United Kingdom'], dtype=object)
- country_code(country_code)object'AT' 'BA' 'BE' ... 'SI' 'SK' 'UK'
array(['AT', 'BA', 'BE', 'BG', 'CH', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL', 'ES', 'FI', 'FR', 'HR', 'HU', 'IE', 'IS', 'IT', 'LI', 'LT', 'LU', 'LV', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK', 'UK'], dtype=object)
- HF2_PC_CHE(country, country_code, year)float64nan nan nan nan ... 5.41 5.62 nan
array([[[ nan, nan, nan, ..., 6.83, 6.91, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 4.86, 4.9 , nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 0.37, 0.4 , nan], [ nan, nan, nan, ..., nan, nan, nan], ..., ... ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 5.41, 5.62, nan]]])
- HF2_PPS_HAB(country, country_code, year)float64nan nan nan nan ... 160.8 172.1 nan
array([[[ nan, nan, nan, ..., 263.61, 274.53, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 180.84, 187.76, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 3.08, 3.48, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., ... ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 160.78, 172.06, nan]]])
- HF3_PC_CHE(country, country_code, year)float64nan nan nan nan ... 15.72 15.88 nan
array([[[ nan, nan, nan, ..., 19.15, 18.43, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 17.91, 17.81, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 29.12, 29.31, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., ... ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 15.72, 15.88, nan]]])
- HF3_PPS_HAB(country, country_code, year)float64nan nan nan nan ... 467.5 486.2 nan
array([[[ nan, nan, nan, ..., 739.37, 732.45, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 665.96, 682.09, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 242.47, 256.89, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., ... ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., 467.53, 486.25, nan]]])
- health expenditure per capita(country, country_code, year)float64...
array([[[ nan, nan, ..., 4510.08, 4689.27], [ nan, nan, ..., nan, nan], ..., [ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan]], [[ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan], ..., [ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan]], ..., [[ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan], ..., [ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan]], [[ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan], ..., [ nan, nan, ..., nan, nan], [ nan, nan, ..., 3607.59, 3838.53]]])
df_country = ds_c.to_dataframe() df_country.reset_index(inplace=True) df_country.rename({'level_0':'country','level_1':'country_code',\ 'level_2':'year'},axis=1,inplace=True) df_country.head()
country country_code year HF2_PC_CHE HF2_PPS_HAB HF3_PC_CHE \ 0 Austria AT 1992 NaN NaN NaN 1 Austria AT 1993 NaN NaN NaN 2 Austria AT 1994 NaN NaN NaN 3 Austria AT 1995 NaN NaN NaN 4 Austria AT 1996 NaN NaN NaN HF3_PPS_HAB health expenditure per capita 0 NaN NaN 1 NaN NaN 2 NaN NaN 3 NaN NaN 4 NaN NaN
df = df.merge(df_country, how = 'left',\ on = ['country','year','country_code']) # drop rows where deaths > population at Jan. 1st # (e.g. because people moved in the year) df = df[(df.population >= df.deaths) & (df.population > 0)] df['mortality'] = df.deaths/df.population df.head()
age nuts2 sex year infant mortality country population age_category \ 192 1.0 AT11 F 1990 NaN Austria 1345.0 Y0-15 193 1.0 AT11 F 1991 NaN Austria 1371.0 Y0-15 194 1.0 AT11 F 1992 NaN Austria 1436.0 Y0-15 195 1.0 AT11 F 1993 NaN Austria 1372.0 Y0-15 196 1.0 AT11 F 1994 NaN Austria 1349.0 Y0-15 bad_self_perceived_health at risk of poverty ... TOOEXP UNMET \ 192 NaN NaN ... NaN NaN 193 NaN NaN ... NaN NaN 194 NaN NaN ... NaN NaN 195 NaN NaN ... NaN NaN 196 NaN NaN ... NaN NaN country_code nuts1 HF2_PC_CHE HF2_PPS_HAB HF3_PC_CHE HF3_PPS_HAB \ 192 AT AT1 NaN NaN NaN NaN 193 AT AT1 NaN NaN NaN NaN 194 AT AT1 NaN NaN NaN NaN 195 AT AT1 NaN NaN NaN NaN 196 AT AT1 NaN NaN NaN NaN health expenditure per capita mortality 192 NaN 0.000000 193 NaN 0.000000 194 NaN 0.000696 195 NaN 0.000000 196 NaN 0.000741 [5 rows x 23 columns]
Finally, we save the merged NUTS 2 and country data into a .csv
file.
df.to_csv('./data/data_deaths_by_age_nuts_2.csv')