---
title: "Python"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Python}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
[![](https://img.shields.io/pypi/v/covid19dh.svg?color=brightgreen)](https://pypi.org/pypi/covid19dh/) [![](https://img.shields.io/pypi/dm/covid19dh.svg?color=blue)](https://pypi.org/pypi/covid19dh/) [![](https://img.shields.io/github/stars/covid19datahub/python?style=social)](https://github.com/covid19datahub/python)
## Setup and usage
Install from [pip](https://pypi.org/project/covid19dh/) with
```py
pip install covid19dh
```
Importing the main function `covid19()`
```py
from covid19dh import covid19
x, src = covid19()
```
Package is regularly updated. Update with
```bash
pip install --upgrade covid19dh
```
## Return values
The function `covid19()` returns 2 pandas dataframes:
* the data and
* references to the data sources.
## Parametrization
### Country
List of country names (case-insensitive) or ISO codes (alpha-2, alpha-3 or numeric). The list of ISO codes can be found [here](https://github.com/covid19datahub/COVID19/blob/master/inst/extdata/src.csv).
Fetching data from a particular country:
```py
x, src = covid19("USA") # Unites States
```
Specify multiple countries at the same time:
```py
x, src = covid19(["ESP","PT","andorra",250])
```
If `country` is omitted, the whole dataset is returned:
```py
x, src = covid19()
```
### Raw data
Logical. Skip data cleaning? Default `True`. If `raw=False`, the raw data are cleaned by filling missing dates with `NaN` values. This ensures that all locations share the same grid of dates and no single day is skipped. Then, `NaN` values are replaced with the previous non-`NaN` value or `0`.
```py
x, src = covid19(raw = False)
```
### Date filter
Date can be specified with `datetime.datetime`, `datetime.date` or as a `str` in format `YYYY-mm-dd`.
```py
from datetime import datetime
x, src = covid19("SWE", start = datetime(2020,4,1), end = "2020-05-01")
```
### Level
Integer. Granularity level of the data:
1. Country level
2. State, region or canton level
3. City or municipality level
```py
from datetime import date
x, src = covid19("USA", level = 2, start = date(2020,5,1))
```
### Cache
Logical. Memory caching? Significantly improves performance on successive calls. By default, using the cached data is enabled.
Caching can be disabled (e.g. for long running programs) by:
```py
x, src = covid19("FRA", cache = False)
```
### Vintage
Logical. Retrieve the snapshot of the dataset that was generated at the `end` date instead of using the latest version. Default `False`.
To fetch e.g. US data that were accessible on *22th April 2020* type
```py
x, src = covid19("US", end = "2020-04-22", vintage = True)
```
The vintage data are collected at the end of the day, but published with approximately 48 hour delay, once the day is completed in all the timezones.
Hence if `vintage = True`, but `end` is not set, warning is raised and `None` is returned.
```py
x, src = covid19("USA", vintage = True) # too early to get today's vintage
```
```
UserWarning: vintage data not available yet
```
### Citations
The data sources are returned as second value.
```py
from covid19dh import covid19
x, src = covid19("USA")
print(src)
```
## Star the repo
Star
`r gsub("^# ", "## ", readr::read_file('../LICENSE.md'))`