Introduction to the covid19swiss Dataset

The covid19swiss R package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) pandemic outbreak in Switzerland cantons and Principality of Liechtenstein (FL).

Data structure

The covid19swiss dataset includes the following fields:

Where the available data_type field includes the following cases:

The data organized in a long format:


#>         date location         location_type location_code location_code_type
#> 1 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 2 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 3 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 4 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 5 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 6 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#>      data_type value
#> 1 tested_total     4
#> 2  cases_total    NA
#> 3     hosp_new    NA
#> 4 hosp_current    NA
#> 5  icu_current    NA
#> 6 vent_current    NA

It is straightforward to transform the data into a wide format with the pivot_wider function from the tidyr package:


covid19swiss_wide <- covid19swiss %>% 
  pivot_wider(names_from = data_type, values_from = value)

#> # A tibble: 6 × 13
#>   date       location location_type         location_code location_code_type
#>   <date>     <chr>    <chr>                 <chr>         <chr>             
#> 1 2020-01-24 GE       Canton of Switzerland CH.GE         gn_a1_code        
#> 2 2020-01-25 GE       Canton of Switzerland CH.GE         gn_a1_code        
#> 3 2020-01-26 GE       Canton of Switzerland CH.GE         gn_a1_code        
#> 4 2020-01-27 GE       Canton of Switzerland CH.GE         gn_a1_code        
#> 5 2020-01-28 GE       Canton of Switzerland CH.GE         gn_a1_code        
#> 6 2020-01-29 GE       Canton of Switzerland CH.GE         gn_a1_code        
#> # ℹ 8 more variables: tested_total <int>, cases_total <int>, hosp_new <int>,
#> #   hosp_current <int>, icu_current <int>, vent_current <int>,
#> #   recovered_total <int>, deaths_total <int>

Query and summarise the data

The following examples demonstrate simple methods for query and summarise the data with the dplyr and tidyr packages.

Cases summary by canton

The first example demonstrates how to query the total confirmed, recovered, and death cases by canton as of April 8th:


covid19swiss %>%
  filter(date == as.Date("2020-09-08"),
         data_type %in% c("cases_total", "recovered_total", "death_total")) %>%
  select(location, value, data_type) %>%
  pivot_wider(names_from = data_type, values_from = value) %>%
#> # A tibble: 26 × 3
#>    location cases_total recovered_total
#>    <chr>          <int>           <int>
#>  1 VD              8109              NA
#>  2 GE              7409              NA
#>  3 ZH              6643              NA
#>  4 TI              3565             929
#>  5 BE              2698              NA
#>  6 VS              2416             320
#>  7 AG              2260            1495
#>  8 FR              1925             164
#>  9 SG              1353              NA
#> 10 BS              1254            1154
#> # ℹ 16 more rows

Note: some fields, such as total_recovered or total_tested, are not available for some cantons and marked as missing values (i.e., NA)

Calculating rates for Canton of Geneva

In the next example, we will filter the dataset for the Canton of Geneva and calculate the following metrics:

covid19swiss %>% dplyr::filter(location == "GE",
                               date == as.Date("2020-04-10")) %>%
  dplyr::select(data_type, value) %>%
  tidyr::pivot_wider(names_from = data_type, values_from = value) %>%
  dplyr::mutate(positive_tested = round(100 * cases_total / tested_total, 2),
                death_rate = round(100 * deaths_total / cases_total, 2),
                recovery_rate = round(100 * recovered_total / cases_total, 2)) %>%
  dplyr::select(positive_tested, recovery_rate, death_rate) 
#> # A tibble: 1 × 3
#>   positive_tested recovery_rate death_rate
#>             <dbl>         <dbl>      <dbl>
#> 1            24.5          9.79       3.58

Values are in precentage

Separating between Switzerland and Principality of Liechtenstein

The raw data include both Switzerland and the Principality of Liechtenstein. Separating the data by country can be done by using the location field:

switzerland <- covid19swiss %>% filter(location != "FL")

liechtenstein <- covid19swiss %>% filter(location == "FL")

#>         date location                 location_type location_code
#> 1 2020-02-27       FL Principality of Liechtenstein          <NA>
#> 2 2020-02-27       FL Principality of Liechtenstein          <NA>
#> 3 2020-02-27       FL Principality of Liechtenstein          <NA>
#> 4 2020-02-27       FL Principality of Liechtenstein          <NA>
#> 5 2020-02-27       FL Principality of Liechtenstein          <NA>
#> 6 2020-02-27       FL Principality of Liechtenstein          <NA>
#>   location_code_type    data_type value
#> 1         gn_a1_code tested_total     3
#> 2         gn_a1_code  cases_total    NA
#> 3         gn_a1_code     hosp_new    NA
#> 4         gn_a1_code hosp_current    NA
#> 5         gn_a1_code  icu_current    NA
#> 6         gn_a1_code vent_current    NA