To provide convenient access to epidemiological data on the coronavirus outbreak, we developed an R package, nCov2019 (https://github.com/yulab-smu/nCov2019). Besides detailed basis statistics, it also includes information about vaccine development and therapeutics candidates. We redesigned the function plot() for geographic maps visualization and provided a interactive shiny app. These analytics tools could be useful in informing the public and studying how this and similar viruses spread in populous countries.
Our R package is designed for both command line and dashboard
interaction analysis, As show in diagram, while dashboard()
is the main entry for the GUI explore part, the query()
is
the main function used in CLI explore part, 5 types of data were contain
in its return result. Result type were explain in Statistic
query part.
To start off, users could utilize the ‘remotes’ package to install it directly from GitHub by running the following in R:
remotes::install_github("yulab-smu/nCov2019", dependencies = TRUE)
Data query is simple as one command:
library("nCov2019")
res <- query()
## Querying the latest data...
## last update: 2023-03-01
## Querying the global data...
## Gloabl total 679972187 cases; and 6800189 deaths
## Gloabl total affect country or areas: 231
## Gloabl total recovered cases: 26867
## last update: 2023-03-01
## Querying the historical data...
## Query finish, each time you can launch query() to reflash the data
This may take seconds to few minutes, which depend on the users’ network connection, if the user connection is broken, a local stored version data will be used for demo.
The result returned by query()
function will contains 5
types of statistic:
names(res)
## [1] "latest" "global" "historical"
global
The global overall summary statisticlatest
The global latest statistic for all
countrieshistorical
The historical statistic for all
countriesvaccine
The current vaccine development
progresstherapeutics
The current therapeutics development
progressThe query()
only need to be performed once in a session,
print each of statistic objects, users could get their update time. And
for the vaccine
and therapeutics
query
results, print them will return the candidates number.
The query result of global status will contain a data frame with 21
types of statistic, which have detail explanation on the bottom of this
documents. And summary(x)
will return overview of global
status.
x = res$global
x$affectedCountries # total affected countries
## [1] 231
summary(x)
## Gloabl total 679972187 cases; and 6800189 deaths
## Gloabl total affect country or areas: 231
## Gloabl total recovered cases: 26867
## last update: 2023-03-01
Here is the example for operating latest data. once again, all data
have queried and store in res
.
x = res$latest
And then print(x)
will return the update time for the
latest data
print(x) # check update time
## last update: 2023-03-01
To subset latest data could be easily done by using [
.
x["Global"]
or x["global"]
will return the
data frame for all countries but users could determine a specific
country, such as:
head(x["Global"]) # return all global countries.
x[c("USA","India")] # return only for USA and India
The data is order by “todayCases” column, users could sort them by other order.
df = x["Global"]
head(df[order(df$cases, decreasing = T),])
As for the latest data, it provides 11 types of main information by default, but 12 more statistic type are provided in the “latest$detail”, they also have corresponding explanation on the bottom.
x = res$latest
head(x$detail) # more detail data
Historical data is useful in retrospective analysis or to establish
predictive models, the operation is similar as latest data, user could
get the data frame for all countries or some specific countries within
c()
vector, such as
head(Z[c(country1,country2,country3)])
Z = res$historical
print(Z) # update time
## last update: 2023-02-27
head(Z["Global"])
head(Z[c("China","UK","USA")])
For the following countries, we provide detail province data, which
can be obtained in a similar way but within [
operation:
head(Z[country,province])
Australia
Canada
China
Denmark
France
Netherlands
head(Z['China','hubei'])
For users’ own historical data, we provide a convert()
function, users could convert other data into class of nCov2019History
data, and then explore in nCov2019:
userowndata <- read.csv("path_to_user_data.csv")
# userowndata, it should contain these 6 column:
# "country","province","date","cases","deaths","recovered"
Z = convert(data=userowndata)
head(Z["Global"])
We provide a visualization function as a redesign “plot”.
plot(
x,
region = "Global",
continuous_scale = FALSE,
palette = "Reds",
date = NULL,
from = NULL,
to = NULL,
title = "COVID-19",
type = "cases",
...
)
Here, type could be one of “cases”,“deaths”,“recovered”,“active”,“todayCases”,“todayDeaths”,“todayRecovered”,“population” and “tests”. By default, color palette is “Reds”, more color palettes can be found here: palette.
To get the overview for the latest status, the mini code required is as below:
X <- res$latest
plot(X)
## Warning in geom_map(aes_string("long", "lat", map_id = "region", group =
## "group", : Ignoring unknown aesthetics: x and y
Or To get the overview for the detection testing status,
plot(X, type="tests",palette="Green")
It could be also intuitively compare the number of new confirmed cases per day among different countries.
library(ggplot2)
library(dplyr)
##
## 载入程辑包:'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
X <- res$historical
tmp <- X["global"] %>%
group_by(country) %>%
arrange(country,date) %>%
mutate(diff = cases - lag(cases, default = first(cases))) %>%
filter(country %in% c("Australia", "Japan", "Italy", "Germany", "China"))
ggplot(tmp,aes(date, log(diff+1), color=country)) + geom_line() +
labs(y="Log2(daily increase cases)") +
theme(axis.text = element_text(angle = 15, hjust = 1)) +
scale_x_date(date_labels = "%Y-%m-%d") +
theme_minimal()
user could also plot the outbreak map on the past time with historical data by specify a date in function plot().
Y <- res$historical
plot(Y, region="Global" ,date = "2020-08-01", type="cases")
Animated world-wide epidemic maps could be generated in the similar way. This is the example to draw a spread animation from 2020-03-01 to 2020-08-01, with little code.
library(nCov2019)
res = query()
from = "2020-03-01"
to = "2020-08-01"
y = res$historical
plot(y, from = from, to=to)
If you wanted to visualize the cumulative summary data, an example plot could be the following:
library(ggplot2)
x <- res$historical
d = x['Japan' ] # you can replace Anhui with any province
d = d[order(d$cases), ]
ggplot(d,
aes(date, cases)) +
geom_col(fill = 'firebrick') +
theme_minimal(base_size = 14) +
xlab(NULL) + ylab(NULL) +
scale_x_date(date_labels = "%Y/%m/%d") +
labs(caption = paste("accessed date:", max(d$date)))
Plot the trend for for the Top 10 increase cases countries on last day
library("dplyr")
library("ggrepel")
x <- res$latest
y <- res$historical
country_list = x["global"]$country[1:10]
y[country_list] %>%
subset( date > as.Date("2020-10-01") ) %>%
group_by(country) %>%
arrange(country,date) %>%
mutate(increase = cases - lag(cases, default = first(cases))) -> df
ggplot(df, aes(x=date, y=increase, color=country ))+
geom_smooth() +
geom_label_repel(aes(label = paste(country,increase)),
data = df[df$date == max(df$date), ], hjust = 1) +
labs(x=NULL,y=NULL)+
theme_bw() + theme(legend.position = 'none')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Plot the curve of cases, recovered and deaths for specify country
library('tidyr')
library('ggrepel')
library('ggplot2')
y <- res$historical
country = "India"
y[country] -> d
d <- gather(d, curve, count, -date, -country)
ggplot(d, aes(date, count, color = curve)) + geom_point() + geom_line() +
labs(x=NULL,y=NULL,title=paste("Trend of cases, recovered and deaths in", country)) +
scale_color_manual(values=c("#f39c12", "#dd4b39", "#00a65a")) +
theme_bw() +
geom_label_repel(aes(label = paste(curve,count)),
data = d[d$date == max(d$date), ], hjust = 1) +
theme(legend.position = "none",
axis.text = element_text(angle = 15, hjust = 1)) +
scale_x_date(date_labels = "%Y-%m-%d")
Here is the example code for draw a heatmap for the historical data range in nCov2019.
library('tidyr')
library('ggrepel')
library('ggplot2')
y <- res$historical
d <- y["global"]
d <- d[d$cases > 0,]
length(unique(d$country))
## [1] 201
d <- subset(d,date <= as.Date("2020-3-19"))
max_time <- max(d$date)
min_time <- max_time - 7
d <- d[d$date >= min_time,]
dd <- d[d$date == max(d$date,na.rm = TRUE),]
d$country <- factor(d$country,
levels=unique(dd$country[order(dd$cases)]))
breaks = c(0,1000, 10000, 100000, 10000000)
ggplot(d, aes(date, country)) +
geom_tile(aes(fill = cases), color = 'black') +
scale_fill_viridis_c(trans = 'log', breaks = breaks,
labels = breaks) +
xlab(NULL) + ylab(NULL) +
scale_x_date(date_labels = "%Y-%m-%d") + theme_minimal()
Plot the global trend in a novel way.
require(dplyr)
y <- res$historical
d <- y["global"]
time = as.Date("2020-03-19")
dd <- filter(d, date == time) %>%
arrange(desc(cases))
dd = dd[1:40, ]
dd$country = factor(dd$country, levels=dd$country)
dd$angle = 1:40 * 360/40
require(ggplot2)
p <- ggplot(dd, aes(country, cases, fill=cases)) +
geom_col(width=1, color='grey90') +
geom_col(aes(y=I(5)), width=1, fill='grey90', alpha = .2) +
geom_col(aes(y=I(3)), width=1, fill='grey90', alpha = .2) +
geom_col(aes(y=I(2)), width=1, fill = "white") +
scale_y_log10() +
scale_fill_gradientn(colors=c("darkgreen", "green", "orange", "firebrick","red"), trans="log") +
geom_text(aes(label=paste(country, cases, sep="\n"),
y = cases *.8, angle=angle),
data=function(d) d[d$cases > 700,],
size=3, color = "white", fontface="bold", vjust=1) +
geom_text(aes(label=paste0(cases, " cases ", country),
y = max(cases) * 2, angle=angle+90),
data=function(d) d[d$cases < 700,],
size=3, vjust=0) +
coord_polar(direction=-1) +
theme_void() +
theme(legend.position="none") +
ggtitle("COVID19 global trend", time)
p
Number of days since 1 million cases per country
require(dplyr)
require(ggplot2)
require(shadowtext)
## 载入需要的程辑包:shadowtext
y <- res$historical
d <- y["global"]
dd <- d %>%
as_tibble %>%
filter(cases > 1000000) %>%
group_by(country) %>%
mutate(days_since_1m = as.numeric(date - min(date))) %>%
ungroup
breaks=c(1000, 10000, 20000, 50000, 500000,500000,5000000,20000000)
p <- ggplot(dd, aes(days_since_1m, cases, color = country)) +
geom_smooth(method='lm', aes(group=1),
data = dd,
color='grey10', linetype='dashed') +
geom_line(size = 0.8) +
geom_point(pch = 21, size = 1) +
scale_y_log10(expand = expansion(add = c(0,0.1)),
breaks = breaks, labels = breaks) +
scale_x_continuous(expand = expansion(add = c(0,1))) +
theme_minimal(base_size = 14) +
theme(
panel.grid.minor = element_blank(),
legend.position = "none",
plot.margin = margin(3,15,3,3,"mm")
) +
coord_cartesian(clip = "off") +
geom_shadowtext(aes(label = paste0(" ",country)), hjust=0, vjust = 0,
data = . %>% group_by(country) %>% top_n(1,days_since_1m),
bg.color = "white") +
labs(x = "Number of days since 1,000,000th case", y = "",
subtitle = "Total number of cases")
print(p)
## `geom_smooth()` using formula = 'y ~ x'
dashboard could launch as below:
dashboard()
statistic | explain |
---|---|
active | active number = comfirmed cases - deaths - recoveredd |
activePerOneMillion | active number / million population |
cases | comfirmed cases |
casesPerOneMillion | comfirmed cases / million population |
continent | continent |
country | country |
critical | Critical patients |
criticalPerOneMillion | Critical patients / million population |
date | date |
deaths | deaths |
deathsPerOneMillion | deaths patients / million population |
oneCasePerPeople | oneCasePerPeople |
oneDeathPerPeople | oneDeathPerPeople |
oneTestPerPeople | oneTestPerPeople |
population | population |
recovered | recovered |
recoveredPerOneMillion | recoveredPerOneMillion |
tests | COVID-19 test |
testsPerOneMillion | COVID-19 test / million population |
todayCases | comfirm cases in today |
todayDeaths | comfirm cases in today |
todayRecovered | comfirm cases in today |
updated | the latest update time |
If you use nCov2019
, please cite the following
preprint:
Wu T, Hu E, Ge X*, Yu G*. 2021. nCov2019: an R package for studying the COVID-19 coronavirus pandemic. PeerJ 9:e11421 https://doi.org/10.7717/peerj.11421
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin21.3.0 (64-bit)
## Running under: macOS Monterey 12.1
##
## Matrix products: default
## BLAS: /usr/local/Cellar/openblas/0.3.20/lib/libopenblasp-r0.3.20.dylib
## LAPACK: /usr/local/Cellar/r/4.2.0/lib/R/lib/libRlapack.dylib
##
## locale:
## [1] C/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] shadowtext_0.1.2 tidyr_1.2.0 ggrepel_0.9.1 dplyr_1.0.9
## [5] ggplot2_3.4.0 nCov2019_0.4.6
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.2 xfun_0.31 bslib_0.3.1 purrr_0.3.4
## [5] reshape2_1.4.4 lattice_0.20-45 splines_4.2.0 colorspace_2.0-3
## [9] vctrs_0.5.1 generics_0.1.2 viridisLite_0.4.0 htmltools_0.5.2
## [13] yaml_2.3.5 mgcv_1.8-40 utf8_1.2.2 rlang_1.0.6
## [17] jquerylib_0.1.4 pillar_1.7.0 glue_1.6.2 withr_2.5.0
## [21] DBI_1.1.2 RColorBrewer_1.1-3 lifecycle_1.0.3 plyr_1.8.7
## [25] stringr_1.4.0 munsell_0.5.0 gtable_0.3.0 evaluate_0.15
## [29] labeling_0.4.2 knitr_1.39 fastmap_1.1.0 fansi_1.0.3
## [33] highr_0.9 Rcpp_1.0.8.3 scales_1.2.0 jsonlite_1.8.0
## [37] farver_2.1.0 digest_0.6.29 stringi_1.7.6 grid_4.2.0
## [41] cli_3.4.1 tools_4.2.0 magrittr_2.0.3 sass_0.4.1
## [45] maps_3.4.0 tibble_3.1.7 crayon_1.5.1 pkgconfig_2.0.3
## [49] Matrix_1.5-1 ellipsis_0.3.2 downloader_0.4 assertthat_0.2.1
## [53] rmarkdown_2.14 R6_2.5.1 nlme_3.1-157 compiler_4.2.0