--- title: "Download Centre" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Download Centre} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- > Download the data as a SQLite database or as CSV files. ## Latest data The latest data are updated on an hourly basis. You can dowload them in several ways: - all-in-one (SQLite) - by level (CSV) - by country (CSV) - by location (CSV) ### Download all-in-one Download all the data at once as a compressed [SQLite](https://sqlite.org/) database file. The database contains two tables: - table `location`: contains information about the geographical entities - table `timeseries`: contains the time-series of epidemiological variables and policy measures for each location The two tables can be joined on the column `id`. Read more in the [documentation](/articles/docs.html). | URL | Description | Format | Downloads | |-------------------------------------------------------|----------------------------------------|--------------------------|------------------| | https://storage.covid19datahub.io/latest.db.gz | Full database. Contains all the data. | [GZIP](https://storage.covid19datahub.io/latest.db.gz) | ![](https://storage.covid19datahub.io/downloads/latest.db.gz.svg) | ### Download by level Download worldwide data at 3 different levels of granularity: country-level data (level `1`), state-level data (level `2`), and city-level data (level `3`). | URL | Description | Format | Downloads | |-------------------------------------------------------|----------------------------------------|--------------------------|------------------| | https://storage.covid19datahub.io/level/1.csv.zip | Worldwide country-level data. | [CSV](https://storage.covid19datahub.io/level/1.csv) -- [ZIP](https://storage.covid19datahub.io/level/1.csv.zip) -- [GZIP](https://storage.covid19datahub.io/level/1.csv.gz) | ![](https://storage.covid19datahub.io/downloads/level/1.svg) | | https://storage.covid19datahub.io/level/2.csv.zip | Worldwide state-level data. | [CSV](https://storage.covid19datahub.io/level/2.csv) -- [ZIP](https://storage.covid19datahub.io/level/2.csv.zip) -- [GZIP](https://storage.covid19datahub.io/level/2.csv.gz) | ![](https://storage.covid19datahub.io/downloads/level/2.svg) | | https://storage.covid19datahub.io/level/3.csv.zip | Worldwide city-level data. | [CSV](https://storage.covid19datahub.io/level/3.csv) -- [ZIP](https://storage.covid19datahub.io/level/3.csv.zip) -- [GZIP](https://storage.covid19datahub.io/level/3.csv.gz) | ![](https://storage.covid19datahub.io/downloads/level/3.svg) | **For developers.** The endpoint to download data by level is: - `https://storage.covid19datahub.io/level/.` where: - `` is equal to `1`, `2`, or `3` - `` is one of: `csv` for the uncompressed file; `csv.zip` for the ZIP file; `csv.gz` for the GZIP file ### Download by country ```{r, include=FALSE} select_country <- function(){ country <- read.csv("https://storage.covid19datahub.io/country/index.csv", fileEncoding = "UTF-8") options <- paste(sprintf('', country$iso_alpha_3, country$name), collapse = "\n") sprintf('', options) } ``` ```{r, include=FALSE} select_location <- function(){ location <- read.csv("https://storage.covid19datahub.io/location/index.csv", fileEncoding = "UTF-8") location$name <- gsub("^(, )*", "", paste(sep = ", ", location$administrative_area_level_3, location$administrative_area_level_2, location$administrative_area_level_1)) options <- paste(sprintf('', location$id, location$name), collapse = "\n") sprintf('', options) } ``` Download the data for all the administrative divisions within one country. `r select_country()`
**For developers.** The endpoint to download data by country is: - `https://storage.covid19datahub.io/country/.` where: - `iso` is the 3-letter ISO code of the country - `` is one of: `csv` for the uncompressed file; `csv.zip` for the ZIP file; `csv.gz` for the GZIP file The lookup table mapping countries to `iso` codes is available at `https://storage.covid19datahub.io/country/index.` ### Download by location Download the data by single location. `r select_location()`
**For developers.** The endpoint to download data by location is: - `https://storage.covid19datahub.io/location/.` where: - `id` is the hash code identifying a location - `` is one of: `csv` for the uncompressed file; `csv.zip` for the ZIP file; `csv.gz` for the GZIP file The lookup table mapping locations to `id`s is available at `https://storage.covid19datahub.io/location/index.` ## Vintage data Vintage data are immutable snapshots of the data taken each day. The vintage file on date `YYYY-MM-DD` contains the data that were available on that day. Typically, the data available on day $T$ include the counts up to day $T-1$ due to natural delays in reporting the data to the local authorities and different time zones worldwide. ```{r, include=FALSE} tab <- function(start, end, version){ if(end |", dates, dates, dates) } else{ tab <- sprintf("| https://storage.covid19datahub.io/%s.db.gz | [Download PDF](https://storage.covid19datahub.io/%s.pdf)             | %s | |", dates, dates, dates, dates, dates, dates, dates, dates) } paste(head, paste0(tab, collapse = "\n"), sep = '\n') } ``` `r tab(start = "2023-03-01", end = Sys.Date()-1, version = 3)` Tests for Switzerland are now retrieved from the file [COVID19Test_geoRegion_w](https://opendata.swiss/en/dataset/covid-19-schweiz/resource/d0222de9-00c3-4266-a53a-c00e459aeef6). Before this update, `tests` for Switzerland were significantly lower, because only tests performed in hospitals were counted. `r tab(start = "2021-11-15", end = "2023-02-28", version = 3)` Version 3 is now live. This is a major update. The vintage data are now shipped in SQLite databases and the vintage data sources are reported in the corresponding PDF files. The vintage data are now generated on the same date of the data snapshot. Before 14 November 2021, the vintage data were generated with a delay of 48 hours to make sure all the observations are complete, and we don't take snapshots of yet-not-complete data. Infact, there is a natural delay in reporting the data to the local authorities (+24h) and different time zones worldwide (+24h). This means that e.g., a vintage file for 1st November was actually generated on 3rd November, and the data between the 1st and 3rd November were filtered out. In other words, the vintage datasets before 14 November 2021 are affected by a look-ahead bias of 2 days. This is no longer the case after 14 November 2021. [See the changelog](/news/index.html) for further information on this update. `r tab(start = "2021-04-11", end = "2021-11-14", version = 2)` Due to the incresing size of the data files, we stopped providing the pre-processed data on 10 April 2021, so to improve the update and storage of the raw data. **Please switch to the raw data if you are still using the pre-processed files.** Pre-processed data fill missing dates in the raw data with `NA` values. This ensures that all locations share the same grid of dates and no single day is skipped. Then, `NA` values are replaced with the previous non-`NA` value or `0`. `r tab(start = "2020-12-30", end = "2021-04-10", version = 2)` Since 2021-12-30, the datasets include the columns `vaccines` as described [here](/articles/docs.html#epidemiological-variables). `r tab(start = "2020-12-12", end = "2020-12-29", version = 2)` Since 2020-12-12, policies for admin areas level 3 are inherited from the policies available for admin areas level 2 as described [here](/articles/docs.html#policy-measures). `r tab(start = "2020-04-14", end = "2020-12-11", version = 2)` `r gsub("^# ", "## ", readr::read_file('../LICENSE.md'))`