Getting Eurostat data

1. Introduction
2. Preamble code
3. Data
- 3.1. country data
- 3.2. nuts 2
4. Combining data
- 4.1. nuts2 data
- 4.2. country data

1. Introduction

This file describes how we get the data for our project on the health effects of demand side cost sharing.

In this file we explain how we get the data and from where and at the same time we generate the data. By generating the data in this way, a change in the data by Eurostat will lead to (slightly) different results.

Link to excel file with overview: http://ec.europa.eu/eurostat/documents/140934/4381644/Regional-statistics-overview.xlsx

The dataset itself is too big for github and can be found on DataverseNL.

2. Preamble code

Run this code first before running the code of particular data sets. It imports the relevant libraries.

import numpy as np
import pandas as pd
from country_codes import oecd_dictionary, eurostat_dictionary
import eurostat
import xarray as xr
import matplotlib.pyplot as plt

EU_countries = ['Belgium', 'Bulgaria', 'Czechia', 'Denmark',
'Germany (until 1990 former territory of the FRG)', 'Germany','Estonia',
'Ireland', 'Greece', 'Spain', 'France', 'Croatia', 'Italy',
'Cyprus', 'Latvia', 'Lithuania', 'Luxembourg', 'Hungary', 'Malta',
'Netherlands', 'Austria', 'Poland', 'Portugal', 'Romania',
'Slovenia', 'Slovakia', 'Finland', 'Sweden', 'United Kingdom',
'Iceland', 'Liechtenstein', 'Norway', 'Switzerland',
'Bosnia and Herzegovina']

We first describe how we get the data for each variable. Each of these variables is saved in a separate dataset. The final section contains the code that merges the data for these variables into one (big) dataset.

3. Data

First, we fetch the variables at the country level from the Eurostat website. Then we move to the NUTS 2 regional data.

3.1. country data

3.1.1. OOP per country

We want a measure of "how generous" a healthcare system is in terms of oop-payments. This feature of the healthcare system tends to be defined at the country level. We use household oop payment and expenditure on voluntary payment schemes (for things that are not covered by basic or mandatory insurance). The paper uses mainly PC_CHE as unit of measurement.

link to data: https://ec.europa.eu/eurostat/databrowser/view/hlth_sha11_hf/default/table?lang=en

abbreviation	name
HF2	Voluntary health care payment schemes
HF3	Household out-of-pocket payment
PC_CHE	Percentual share of total current health expenditure (CHE)
PPS_HAB	Purchasing power standard (PPS) per inhabitant

df = eurostat.get_data_df('hlth_sha11_hf')
df.rename({'geo\\time':'geo'},inplace=True,axis=1)
df['country_code'] = [x[:2] for x in df['geo']]
df['country'] = df['country_code'].replace(eurostat_dictionary)
df = df[df.country.isin(EU_countries)  & \
        (df.icha11_hf.isin(["HF2", "HF3"])) &\
        (df.unit.isin([ 'PC_CHE','PPS_HAB']))]
df.drop(["geo"],axis=1,inplace=True)

df = pd.melt(df,id_vars=['country','country_code',\
                         'unit','icha11_hf'],
                        value_vars=np.arange(1992,2019),
                        var_name='year',\
             value_name='health_expenditure')
df.set_index(['country','country_code',\
              'year','unit','icha11_hf'],inplace=True)
df = df.unstack()
df.columns = df.columns.droplevel()
df = df.unstack()
df.columns = df.columns.droplevel()
df.columns = ['HF2_PC_CHE','HF2_PPS_HAB',\
              'HF3_PC_CHE','HF3_PPS_HAB']
df.reset_index(inplace=True)
df.set_index(['country','country_code','year'],inplace=True)
df.tail()

			HF2_PC_CHE	HF2_PPS_HAB	HF3_PC_CHE	HF3_PPS_HAB
country	country_code	year
United Kingdom	UK	2014	5.40	155.48	15.16	436.73
		2015	5.40	161.07	15.23	454.78
		2016	5.04	148.46	15.37	453.16
		2017	5.41	160.78	15.72	467.53
		2018	5.62	172.06	15.88	486.25

ds = df.to_xarray()
ds.to_netcdf("./data/country/out_of_pocket.nc")
ds

<xarray.Dataset>
Dimensions:       (country: 33, country_code: 33, year: 27)
Coordinates:
  * country       (country) object 'Austria' 'Belgium' ... 'United Kingdom'
  * country_code  (country_code) object 'AT' 'BA' 'BE' 'BG' ... 'SI' 'SK' 'UK'
  * year          (year) int64 1992 1993 1994 1995 1996 ... 2015 2016 2017 2018
Data variables:
    HF2_PC_CHE    (country, country_code, year) float64 nan nan ... 5.41 5.62
    HF2_PPS_HAB   (country, country_code, year) float64 nan nan ... 160.8 172.1
    HF3_PC_CHE    (country, country_code, year) float64 nan nan ... 15.72 15.88
    HF3_PPS_HAB   (country, country_code, year) float64 nan nan ... 467.5 486.2

3.1.2. healthcare expenditure

link to data: https://ec.europa.eu/eurostat/databrowser/view/hlth_sha11_hc/default/table?lang=en

abbreviation	variable name
TOT_HC	Current health care expenditure (CHE)
EUR_HAB	Euro per inhabitant

df = eurostat.get_data_df('hlth_sha11_hc')
df.rename({'geo\\time':'geo'},inplace=True,axis=1)
df['country_code'] = [x[:2] for x in df['geo']]
df['country'] = df['country_code'].replace(eurostat_dictionary)
df = df[df.country.isin(EU_countries)  & \
        (df.icha11_hc.isin(["TOT_HC"])) & \
        (df.unit.isin(['EUR_HAB']))]
df.drop(["geo"],axis=1,inplace=True)

df = pd.melt(df,id_vars=['country','country_code',\
                         'unit','icha11_hc'],
                        value_vars=np.arange(1992,2020),
                        var_name='year',\
             value_name='health_expenditure')
df.set_index(['country','country_code','year',\
              'unit','icha11_hc'],inplace=True)
df = df.unstack()
df.columns = df.columns.droplevel()
df = df.unstack()
df.columns = df.columns.droplevel()
df.reset_index(inplace=True)
df.rename({'EUR_HAB':'health expenditure per capita'},\
          axis=1,inplace=True)
df.set_index(['country','country_code','year'],inplace=True)
df.head()

		unit	health expenditure per capita
country	country_code	year
Austria	AT	1992	NaN
		1993	NaN
		1994	NaN
		1995	NaN
		1996	NaN

ds = df.to_xarray()
ds.to_netcdf("./data/country/healthcare_expenditure.nc")
ds

<xarray.Dataset>
Dimensions:                        (country: 33, country_code: 33, year: 28)
Coordinates:
  * country                        (country) object 'Austria' ... 'United Kin...
  * country_code                   (country_code) object 'AT' 'BA' ... 'SK' 'UK'
  * year                           (year) int64 1992 1993 1994 ... 2018 2019
Data variables:
    health expenditure per capita  (country, country_code, year) float64 nan ...

3.1.3. self-perceived health

See section 3.2.1.1 below where we merge these data on age with mortality.

3.2. nuts 2

overview of regional data: https://ec.europa.eu/eurostat/web/regions/data/database

3.2.1. crude number of deaths and population by age and sex and self-perceived health

population: https://ec.europa.eu/eurostat/databrowser/view/demo_r_d2jan/default/table?lang=en

df = eurostat.get_data_df('demo_r_d2jan')
df.rename({'geo\\time':'geo'},inplace=True,axis=1)
age_range = ['Y1', 'Y10', 'Y11', 'Y12', 'Y13', 'Y14', 'Y15',
       'Y16', 'Y17', 'Y18', 'Y19', 'Y2', 'Y20', 'Y21', 'Y22', 'Y23',
       'Y24', 'Y25', 'Y26', 'Y27', 'Y28', 'Y29', 'Y3', 'Y30', 'Y31',
       'Y32', 'Y33', 'Y34', 'Y35', 'Y36', 'Y37', 'Y38', 'Y39', 'Y4',
       'Y40', 'Y41', 'Y42', 'Y43', 'Y44', 'Y45', 'Y46', 'Y47', 'Y48',
       'Y49', 'Y5', 'Y50', 'Y51', 'Y52', 'Y53', 'Y54', 'Y55', 'Y56',
       'Y57', 'Y58', 'Y59', 'Y6', 'Y60', 'Y61', 'Y62', 'Y63', 'Y64',
       'Y65', 'Y66', 'Y67', 'Y68', 'Y69', 'Y7', 'Y70', 'Y71', 'Y72',
       'Y73', 'Y74', 'Y75', 'Y76', 'Y77', 'Y78', 'Y79', 'Y8', 'Y80',
       'Y81', 'Y82', 'Y83', 'Y84', 'Y85', 'Y86', 'Y87', 'Y88', 'Y89',
       'Y9', 'Y90', 'Y91', 'Y92', 'Y93', 'Y94', 'Y95', 'Y96', 'Y97',
       'Y98', 'Y99']
df['country_code'] = [x[:2] for x in df['geo']]
df['country'] = df['country_code'].replace(eurostat_dictionary)
df = df[df.country.isin(EU_countries) & \
        (df.sex.isin(["M","F"])) & (df.age.isin(age_range))]
df.drop(['country_code'],axis=1,inplace=True)
df = pd.melt(df,id_vars=['geo','country','sex','age'],
                        value_vars=np.arange(1990,2021),
                        var_name='year',value_name='population')
df['nuts'] = [len(x)-2 for x in df.geo]
df = df.query('nuts == 2')
df.drop('nuts',inplace=True, axis=1)
df.rename({'geo':'nuts2', 'age':'age string'},\
          axis=1,inplace=True)
df['age'] = [float(x.replace('Y','')) for x in df['age string']]
df.drop('age string',inplace=True, axis=1)

#df.set_index(['nuts2','year','sex','age'],inplace=True)

df.head()

	nuts2	country	sex	year	population	age
2	AT11	Austria	F	1990	1345.0	1.0
3	AT12	Austria	F	1990	8038.0	1.0
4	AT13	Austria	F	1990	7240.0	1.0
6	AT21	Austria	F	1990	3042.0	1.0
7	AT22	Austria	F	1990	6491.0	1.0

@np.vectorize
def age_to_age_category(age):
    if age < 16:
        cat = 'Y0-15'
    elif age < 25:
        cat = 'Y16-24'
    elif age < 35:
        cat = 'Y25-34'
    elif age < 45:
        cat = 'Y35-44'
    elif age < 55:
        cat = 'Y45-54'
    elif age < 65:
        cat = 'Y55-64'
    elif age < 75:
        cat = 'Y65-74'
    elif age < 85:
        cat = 'Y75-84'
    else:
        cat = 'Y_GE85'
    return cat
df['age_category'] = age_to_age_category(df.age)
df.head()

  nuts2  country sex  year  population  age age_category
2  AT11  Austria   F  1990      1345.0  1.0        Y0-15
3  AT12  Austria   F  1990      8038.0  1.0        Y0-15
4  AT13  Austria   F  1990      7240.0  1.0        Y0-15
6  AT21  Austria   F  1990      3042.0  1.0        Y0-15
7  AT22  Austria   F  1990      6491.0  1.0        Y0-15

self-perceived health

https://ec.europa.eu/eurostat/databrowser/view/hlth_silc_02/default/table?lang=en
we combine bad and very bad self perceived health

ages = ['Y16-24', 'Y25-34', 'Y35-44', 'Y45-54', 'Y55-64',
        'Y65-74', 'Y75-84', 'Y_GE85']
df_c = eurostat.get_data_df('hlth_silc_02')
df_c.rename({'geo\\time':'geo'},inplace=True,axis=1)
df_c['country'] = df_c['geo'].replace(eurostat_dictionary)
df_c = df_c[df_c.country.isin(EU_countries) & (df_c.sex.isin(["M","F"]) ) & (df_c.levels.isin(["B_VB"])) & (df_c.isced11 == "TOTAL") & df_c.age.isin(ages)]
df_c.drop(["unit","isced11","geo","levels"],axis=1,inplace=True)
df_c = pd.melt(df_c,id_vars=['country','sex','age'],
                        value_vars=np.arange(2008,2023),
                        var_name='year',value_name='bad_self_perceived_health')
# df_c.set_index(['country','year','sex','mortalit'],inplace=True)
# df_c = df_c.unstack()
# df_c.reset_index(inplace=True)
# df_c.columns = [' '.join(col).strip() for col in df_c.columns.values]
df_c.rename({'age':'age_category'},inplace=True,axis=1)
# df_c.to_csv('./data/preventable_mortality.csv')
df_c.head()

       country sex age_category  year  bad_self_perceived_health
0      Austria   F       Y16-24  2008                        1.0
1      Belgium   F       Y16-24  2008                        1.8
2     Bulgaria   F       Y16-24  2008                        0.9
3  Switzerland   F       Y16-24  2008                        0.9
4       Cyprus   F       Y16-24  2008                        0.2

df = df.merge(df_c, how = 'left',\
              on = ['country','year','sex','age_category'])
df.set_index(['nuts2','year','sex','age'],inplace=True)
df.tail()

                            country  population age_category  \
nuts2 year sex age                                             
UKM6  2020 M   99.0  United Kingdom         NaN       Y_GE85   
UKM7  2020 M   99.0  United Kingdom         NaN       Y_GE85   
UKM8  2020 M   99.0  United Kingdom         NaN       Y_GE85   
UKM9  2020 M   99.0  United Kingdom         NaN       Y_GE85   
UKN0  2020 M   99.0  United Kingdom         NaN       Y_GE85   

                     bad_self_perceived_health  
nuts2 year sex age                              
UKM6  2020 M   99.0                        NaN  
UKM7  2020 M   99.0                        NaN  
UKM8  2020 M   99.0                        NaN  
UKM9  2020 M   99.0                        NaN  
UKN0  2020 M   99.0                        NaN

ds = df.to_xarray()
ds.to_netcdf("./data/population_nuts_2.nc")
ds

<xarray.Dataset>
Dimensions:                    (age: 99, nuts2: 306, sex: 2, year: 31)
Coordinates:
  * nuts2                      (nuts2) object 'AT11' 'AT12' ... 'UKM9' 'UKN0'
  * year                       (year) int64 1990 1991 1992 ... 2018 2019 2020
  * sex                        (sex) object 'F' 'M'
  * age                        (age) float64 1.0 2.0 3.0 4.0 ... 97.0 98.0 99.0
Data variables:
    country                    (nuts2, year, sex, age) object 'Austria' ... '...
    population                 (nuts2, year, sex, age) float64 1.345e+03 ... nan
    age_category               (nuts2, year, sex, age) object 'Y0-15' ... 'Y_...
    bad_self_perceived_health  (nuts2, year, sex, age) float64 nan nan ... nan

deaths: https://ec.europa.eu/eurostat/databrowser/view/demo_r_magec/default/table?lang=en

df = eurostat.get_data_df('demo_r_magec')
df.rename({'geo\\time':'geo'},inplace=True,axis=1)
age_range = ['Y1', 'Y10', 'Y11', 'Y12', 'Y13', 'Y14', 'Y15',
       'Y16', 'Y17', 'Y18', 'Y19', 'Y2', 'Y20', 'Y21', 'Y22', 'Y23',
       'Y24', 'Y25', 'Y26', 'Y27', 'Y28', 'Y29', 'Y3', 'Y30', 'Y31',
       'Y32', 'Y33', 'Y34', 'Y35', 'Y36', 'Y37', 'Y38', 'Y39', 'Y4',
       'Y40', 'Y41', 'Y42', 'Y43', 'Y44', 'Y45', 'Y46', 'Y47', 'Y48',
       'Y49', 'Y5', 'Y50', 'Y51', 'Y52', 'Y53', 'Y54', 'Y55', 'Y56',
       'Y57', 'Y58', 'Y59', 'Y6', 'Y60', 'Y61', 'Y62', 'Y63', 'Y64',
       'Y65', 'Y66', 'Y67', 'Y68', 'Y69', 'Y7', 'Y70', 'Y71', 'Y72',
       'Y73', 'Y74', 'Y75', 'Y76', 'Y77', 'Y78', 'Y79', 'Y8', 'Y80',
       'Y81', 'Y82', 'Y83', 'Y84', 'Y85', 'Y86', 'Y87', 'Y88', 'Y89',
       'Y9', 'Y90', 'Y91', 'Y92', 'Y93', 'Y94', 'Y95', 'Y96', 'Y97',
       'Y98', 'Y99']
df['country_code'] = [x[:2] for x in df['geo']]
df['country'] = df['country_code'].replace(eurostat_dictionary)
df = df[df.country.isin(EU_countries) & \
        (df.sex.isin(["M","F"])) & (df.age.isin(age_range))]
df.drop(['country','country_code'],axis=1,inplace=True)
df = pd.melt(df,id_vars=['geo','sex','age'],
                        value_vars=np.arange(1990,2020),
                        var_name='year',value_name='deaths')
df['nuts'] = [len(x)-2 for x in df.geo]
df = df.query('nuts == 2')
df.drop('nuts',inplace=True, axis=1)
df.rename({'geo':'nuts2', 'age':'age string'},\
          axis=1,inplace=True)
df['age'] = [float(x.replace('Y','')) for x in df['age string']]
df.drop('age string',inplace=True, axis=1)

df.set_index(['nuts2','year','sex','age'],inplace=True)

df.head()

				deaths
nuts2	year	sex	age
AT11	1990	F	1.0	0.0
AT12	1990	F	1.0	7.0
AT13	1990	F	1.0	4.0
AT21	1990	F	1.0	1.0
AT22	1990	F	1.0	5.0

ds = df.to_xarray()
ds.to_netcdf("./data/deaths_nuts_2.nc")
ds

<xarray.Dataset>
Dimensions:  (age: 99, nuts2: 305, sex: 2, year: 30)
Coordinates:
  * nuts2    (nuts2) object 'AT11' 'AT12' 'AT13' 'AT21' ... 'UKM8' 'UKM9' 'UKN0'
  * year     (year) int64 1990 1991 1992 1993 1994 ... 2015 2016 2017 2018 2019
  * sex      (sex) object 'F' 'M'
  * age      (age) float64 1.0 2.0 3.0 4.0 5.0 6.0 ... 95.0 96.0 97.0 98.0 99.0
Data variables:
    deaths   (nuts2, year, sex, age) float64 0.0 0.0 1.0 1.0 ... nan nan nan nan

3.2.2. mortality lagged per age

useful link to lag variables in a pandas dataframe: https://stackoverflow.com/questions/61234837/pandas-panel-data-shifting-values-by-two-taking-into-consideration-year-gaps

ds_population = xr.open_dataset('./data/population_nuts_2.nc')
ds_deaths =  xr.open_dataset('./data/deaths_nuts_2.nc')

ds_mortality = xr.merge([ds_population, ds_deaths])
ds_mortality

<xarray.Dataset>
Dimensions:     (age: 99, nuts2: 305, sex: 2, year: 31)
Coordinates:
  * year        (year) int64 1990 1991 1992 1993 1994 ... 2017 2018 2019 2020
  * nuts2       (nuts2) object 'AT11' 'AT12' 'AT13' ... 'UKM8' 'UKM9' 'UKN0'
  * sex         (sex) object 'F' 'M'
  * age         (age) float64 1.0 2.0 3.0 4.0 5.0 ... 95.0 96.0 97.0 98.0 99.0
Data variables:
    population  (nuts2, year, sex, age) float64 ...
    deaths      (nuts2, year, sex, age) float64 0.0 0.0 1.0 1.0 ... nan nan nan

df_mortality = ds_mortality.to_dataframe()
# drop rows where deaths > population at Jan. 1st
# (e.g. because people moved in the year)
df_mortality = df_mortality[(df_mortality.population >= \
                             df_mortality.deaths) &\
                            (df_mortality.population > 0)]
df_mortality['mortality'] = df_mortality['deaths']/\
    df_mortality['population']
df_mortality.reset_index(inplace=True)
df_mortality.rename({'level_3':'year'},\
                    inplace=True,axis=1)
df_mortality['year'] = pd.to_datetime(df_mortality['year'],\
                                      format='%Y')
df_mortality.head()

	age	nuts2	sex	year	population	deaths	mortality
0	1.0	AT11	F	1990-01-01	1345.0	0.0	0.000000
1	1.0	AT11	F	1991-01-01	1371.0	0.0	0.000000
2	1.0	AT11	F	1992-01-01	1436.0	1.0	0.000696
3	1.0	AT11	F	1993-01-01	1372.0	0.0	0.000000
4	1.0	AT11	F	1994-01-01	1349.0	1.0	0.000741

new_column = (df_mortality.set_index(['year','age','nuts2','sex']).\
              groupby(['age','nuts2','sex'])['mortality']\
              .shift(1)).to_frame()
new_column.reset_index(inplace=True)
new_column.age += 1
new_column['year'] = new_column['year'].dt.year
new_column.rename({'mortality':'lagged_mortality'},\
                  inplace=True,axis=1)
new_column.set_index(['year','age','nuts2','sex'],\
                     inplace=True)
new_column.head()

				lagged_mortality
year	age	nuts2	sex
1990	2.0	AT11	F	NaN
1991	2.0	AT11	F	0.000000
1992	2.0	AT11	F	0.000000
1993	2.0	AT11	F	0.000696
1994	2.0	AT11	F	0.000000

ds = new_column.to_xarray()
ds.to_netcdf("./data/lagged_mortality_nuts_2.nc")
ds

<xarray.Dataset>
Dimensions:           (age: 99, nuts2: 297, sex: 2, year: 30)
Coordinates:
  * year              (year) int64 1990 1991 1992 1993 ... 2016 2017 2018 2019
  * age               (age) float64 2.0 3.0 4.0 5.0 6.0 ... 97.0 98.0 99.0 100.0
  * nuts2             (nuts2) object 'AT11' 'AT12' 'AT13' ... 'UKM9' 'UKN0'
  * sex               (sex) object 'F' 'M'
Data variables:
    lagged_mortality  (year, age, nuts2, sex) float64 nan nan nan ... nan nan

3.2.3. Poverty measures

Severe material deprivation rate by NUTS 2 regions [TGS00104]

link to the data: https://ec.europa.eu/eurostat/databrowser/view/tgs00104/default/table?lang=en

df = eurostat.get_data_df('tgs00104')
df.rename({'geo\\time':'nuts2'},inplace=True,axis=1)
df = df[(df.unit == "PC")]
df.drop(["unit"],axis=1,inplace=True)
df['nuts'] = [len(x)-2 for x in df.nuts2]
df = df[df.nuts == 2]
df.drop('nuts',axis=1,inplace=True)

df = pd.melt(df,id_vars=['nuts2'],
                        value_vars=np.arange(2009,2021),
                        var_name='year',\
             value_name='percentage_material_deprivation')
df.set_index(['nuts2','year'],inplace=True)
df.head()

		percentage_material_deprivation
nuts2	year
AL01	2009	NaN
AL02	2009	NaN
AL03	2009	NaN
AT11	2009	NaN
AT12	2009	NaN

ds_material_deprivation = df.to_xarray()
ds_material_deprivation.to_netcdf("./data/material_deprivation_nuts_2.nc")
ds_material_deprivation

<xarray.Dataset>
Dimensions:                          (nuts2: 179, year: 12)
Coordinates:
  * nuts2                            (nuts2) object 'AL01' 'AL02' ... 'SK04'
  * year                             (year) int64 2009 2010 2011 ... 2019 2020
Data variables:
    percentage_material_deprivation  (nuts2, year) float64 nan nan ... 9.6 8.3

At-risk-of-poverty rate by NUTS 2 regions [TGS00103]

link to the data: https://ec.europa.eu/eurostat/databrowser/view/tgs00103/default/table?lang=en

df = eurostat.get_data_df('tgs00103')
df.rename({'geo\\time':'nuts2'},inplace=True,axis=1)
df = df[(df.unit == "PC_POP")]
df.drop(["unit"],axis=1,inplace=True)
df['nuts'] = [len(x)-2 for x in df.nuts2]
df = df[df.nuts == 2]
df.drop('nuts',axis=1,inplace=True)

df = pd.melt(df,id_vars=['nuts2'],
                        value_vars=np.arange(2010,2022),
                        var_name='year',
             value_name='at risk of poverty')
df.set_index(['nuts2','year'],inplace=True)
df.head()

		at risk of poverty
nuts2	year
AL01	2010	NaN
AL02	2010	NaN
AL03	2010	NaN
BG31	2010	30.5
BG32	2010	28.8

ds_at_risk_of_poverty = df.to_xarray()
ds_at_risk_of_poverty.to_netcdf("./data/at_risk_of_poverty_nuts_2.nc")
ds_at_risk_of_poverty

<xarray.Dataset>
Dimensions:             (nuts2: 173, year: 12)
Coordinates:
  * nuts2               (nuts2) object 'AL01' 'AL02' 'AL03' ... 'SK03' 'SK04'
  * year                (year) int64 2010 2011 2012 2013 ... 2018 2019 2020 2021
Data variables:
    at risk of poverty  (nuts2, year) float64 nan nan nan nan ... 17.0 16.5 nan

3.2.4. unmet needs

Self-reported unmet needs for medical examination by main reason declared and NUTS 2 regions

value
TOOEXP	Too expensive
TOOFAR	Too far to travel
TOOEFW	Too expensive or too far to travel or waiting list
NOTIME	No time
NO_UNMET	No unmet needs to declare
NOKNOW	Didn't know any good doctor or specialist
WAITING	Waiting list
FEAR	Fear of doctor, hospital, examination or treatment
HOPING	Wanted to wait and see if problem got better on its own
OTH	Other

We drop the values NO_UNMET and OTH. The former is dropped because we are interested in unmet medical needs. The latter because the reason is unclear.

df = eurostat.get_data_df('hlth_silc_08_r')
df.rename({'geo\\time':'nuts2'},inplace=True,axis=1)
df.drop(["unit"],axis=1,inplace=True)
df['nuts'] = [len(x)-2 for x in df.nuts2]
df = df[df.nuts == 2]
df.drop('nuts',axis=1,inplace=True)
columns = ['FEAR','TOOEXP','TOOFAR','WAITING',\
           'HOPING', 'NOTIME', 'NOKNOW']
df=df[df.reason.isin(columns)]
df = pd.melt(df,id_vars=['nuts2','reason'],
                        value_vars=np.arange(2009,2021),
                        var_name='year',\
             value_name='unmet needs')
df.set_index(['nuts2','year','reason'],inplace=True)

df = df.unstack()
df.columns = df.columns.droplevel()
df['UNMET'] = df[columns].sum(axis=1)
df.drop(['FEAR','TOOFAR','WAITING',\
           'HOPING', 'NOTIME', 'NOKNOW'],axis=1,inplace=True)
df.tail()

	reason	TOOEXP	UNMET
nuts2	year
SK04	2016	0.5	7.4
	2017	0.2	5.5
	2018	0.4	6.2
	2019	0.4	6.6
	2020	0.2	6.0

ds_unmet_needs = df.to_xarray()
ds_unmet_needs.to_netcdf("./data/unmet_needs_nuts_2.nc")
ds_unmet_needs

<xarray.Dataset>
Dimensions:  (nuts2: 106, year: 12)
Coordinates:
  * nuts2    (nuts2) object 'AL01' 'AL02' 'AL03' 'BG31' ... 'SK02' 'SK03' 'SK04'
  * year     (year) int64 2009 2010 2011 2012 2013 ... 2016 2017 2018 2019 2020
Data variables:
    TOOEXP   (nuts2, year) float64 nan nan nan nan nan ... 0.5 0.2 0.4 0.4 0.2
    UNMET    (nuts2, year) float64 0.0 0.0 0.0 0.0 0.0 ... 7.4 5.5 6.2 6.6 6.0