Title: | Mutation Analysis and Assay Validation Toolkit for COVID-19 (Coronavirus Disease 2019) |
---|---|
Description: | A feasible framework for mutation analysis and reverse transcription polymerase chain reaction (RT-PCR) assay evaluation of COVID-19, including mutation profile visualization, statistics and mutation ratio of each assay. The mutation ratio is conducive to evaluating the coverage of RT-PCR assays in large-sized samples<doi:10.20944/preprints202004.0529.v1>. |
Authors: | Shaoqian Ma [aut, cre] , Yongyou Zhang [aut] |
Maintainer: | Shaoqian Ma <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 0.1.3 |
Built: | 2024-11-09 04:56:52 UTC |
Source: | https://github.com/MSQ-123/CovidMutations |
This function is to use the well established assays information to detect mutations in different SARS-CoV-2 genomic sites. The output will be series of figures presenting the mutation profile using a specific assay and a figure for comparison between the mutation detection rate in each primers binding region.
AssayMutRatio( nucmerr = nucmerr, assays = assays, totalsample = totalsample, plotType = "barplot", outdir = "." )
AssayMutRatio( nucmerr = nucmerr, assays = assays, totalsample = totalsample, plotType = "barplot", outdir = "." )
nucmerr |
nucmerr Mutation information containing group list(derived from "nucmer" object using "nucmerRMD" function). |
assays |
Assays dataframe including the detection ranges of mutations. |
totalsample |
Total sample number, total cleared GISAID fasta data. |
plotType |
Figure type for either "barplot" or "logtrans". |
outdir |
The output directory. |
Plot the selected figure type as output.
data("nucmerr") data("assays") Total <- 1000 ## Total Cleared GISAID fasta data, sekitseq outdir <- tempdir() #Output the results AssayMutRatio(nucmerr = nucmerr, assays = assays, totalsample = Total, plotType = "logtrans", outdir = outdir)
data("nucmerr") data("assays") Total <- 1000 ## Total Cleared GISAID fasta data, sekitseq outdir <- tempdir() #Output the results AssayMutRatio(nucmerr = nucmerr, assays = assays, totalsample = Total, plotType = "logtrans", outdir = outdir)
These assays include the primer detection ranges in which mutations may occur.
data(assays)
data(assays)
A dataframe with 10 rows and 7 columns.
Kilic T, Weissleder R, Lee H (2019) iScience 23, 101406. (PubMed)
data(assays)
data(assays)
The list is used for displacing some original cities' names with "China" in order to make the downstream analysis easier.
data(chinalist)
data(chinalist)
A dataframe with 31 rows and 1 column.
This data is created by Zhanglab in Xiamen University.
data(chinalist)
data(chinalist)
A dataframe which could be used for downstream analysis like mutation statistics description.
data(covid_annot)
data(covid_annot)
A dataframe with 49821 rows and 10 columns.
data(covid_annot)
data(covid_annot)
The detection of SARS-CoV-2 is important for the prevention of the outbreak and management of patients. Real-time reverse-transcription polymerase chain reaction (RT-PCR) assay is one of the most effective molecular diagnosis strategies to detect virus in clinical laboratory. It will be more accurate and practical to use double assays to detect some samples with co-occurring mutations.
doubleAssay(nucmerr = nucmerr, assay1 = assay1, assay2 = assay2, outdir = ".")
doubleAssay(nucmerr = nucmerr, assay1 = assay1, assay2 = assay2, outdir = ".")
nucmerr |
nucmerr Mutation information containing group list(derived from "nucmer" object using "nucmerRMD" function). |
assay1 |
Information of the first assay(containing primers locations and probe location, see the format of assays provided as example data. e.g. data(assays); assay1<- assays[1,]) |
assay2 |
Information of the second assay, the format is the same as the first assay. |
outdir |
The output directory. If NULL print the plot in Rstudio. |
Plot three figures in a single panel, including two results of assays and a "venn" plot for co-occurring mutated samples.
data("nucmerr") data("assays") assay1 <- assays[1,] assay2 <- assays[2,] #outdir <- tempdir() doubleAssay(nucmerr = nucmerr, assay1 = assay1, assay2 = assay2, outdir = NULL)
data("nucmerr") data("assays") assay1 <- assays[1,] assay2 <- assays[2,] #outdir <- tempdir() doubleAssay(nucmerr = nucmerr, assay1 = assay1, assay2 = assay2, outdir = NULL)
This "GFF3" data is used for counting the mutations in each gene in virus sample.
data(gene_position)
data(gene_position)
A dataframe with 26 rows and 10 columns.
data(gene_position)
data(gene_position)
This "GFF3" data is used for annotating the effects of mutations in virus sample.
data(gff3)
data(gff3)
A dataframe with 26 rows and 10 columns.
https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=2697049
data(gff3)
data(gff3)
This function is to visualize the global protein mutational pattern in the SARS-CoV-2 genome.
globalProteinMut( covid_annot = covid_annot, outdir = ".", figure_Type = "heatmap", top = 10, country = "global" )
globalProteinMut( covid_annot = covid_annot, outdir = ".", figure_Type = "heatmap", top = 10, country = "global" )
covid_annot |
The mutation effects provided by "indelSNP" function. |
outdir |
The output directory. |
figure_Type |
Figure type for either "heatmap" or "count". |
top |
The number of variants to plot. |
country |
Choose a country to plot the mutational pattern or choose "global" to profile mutations across all countries. The default is "global". |
Plot the selected figure type as output.
data("covid_annot") outdir <- tempdir() # make sure the covid_annot is a dataframe covid_annot <- as.data.frame(covid_annot) globalProteinMut(covid_annot = covid_annot, outdir = outdir, figure_Type = "heatmap", top = 10, country = "USA")
data("covid_annot") outdir <- tempdir() # make sure the covid_annot is a dataframe covid_annot <- as.data.frame(covid_annot) globalProteinMut(covid_annot = covid_annot, outdir = outdir, figure_Type = "heatmap", top = 10, country = "USA")
This function is to visualize the global SNP pattern in the SARS-CoV-2 genome.
globalSNPprofile( nucmerr = nucmerr, outdir = ".", figure_Type = "heatmap", country = "global", top = 5 )
globalSNPprofile( nucmerr = nucmerr, outdir = ".", figure_Type = "heatmap", country = "global", top = 5 )
nucmerr |
Mutation information containing group list(derived from "nucmer" object using "nucmerRMD" function). |
outdir |
The output directory. |
figure_Type |
Figure type for either "heatmap" or "count". |
country |
Choose a country to plot the mutational pattern or choose "global" to profile mutations across all countries. The default is "global". |
top |
The number of mutational classes to plot. |
Plot the selected figure type as output.
data("nucmerr") outdir <- tempdir() globalSNPprofile(nucmerr = nucmerr, outdir = outdir, figure_Type = "heatmap", country = "global", top = 5)
data("nucmerr") outdir <- tempdir() globalSNPprofile(nucmerr = nucmerr, outdir = outdir, figure_Type = "heatmap", country = "global", top = 5)
This function is to annotate the mutational events and indicate their potential effects on the proteins. Mutational events include SNP, insertion and deletion.
indelSNP( nucmer = nucmer, saveRda = FALSE, refseq = refseq, gff3 = gff3, annot = annot, outdir = "." )
indelSNP( nucmer = nucmer, saveRda = FALSE, refseq = refseq, gff3 = gff3, annot = annot, outdir = "." )
nucmer |
An object called "nucmer", mutation information derived from "nucmer.snp" variant file by "seqkit" software and "nucmer SNP-calling" scripts. To be processed by "indelSNP" function, The nucmer object should be first transformed by "mergeEvents" function. |
saveRda |
Whether to save the results as ".rda" file. |
refseq |
SARS-Cov-2 genomic reference sequence. |
gff3 |
"GFF3" format annotation data for SARS-Cov-2. |
annot |
Annotation of genes(corresponding proteins) list from "GFF3" file by "setNames(gff3[,10],gff3[,9])". |
outdir |
The output directory. |
Write the result as ".csv" file to the specified directory.
data("nucmer") # Fix IUPAC codes nucmer<-nucmer[!nucmer$qvar%in%c("B","D","H","K","M","N","R","S","V","W","Y"),] nucmer<- mergeEvents(nucmer = nucmer)## This will update the nucmer object data("refseq") data("gff3") annot <- setNames(gff3[,10],gff3[,9]) outdir <- tempdir() indelSNP(nucmer = nucmer, saveRda = FALSE, refseq = refseq, gff3 = gff3, annot = annot, outdir = outdir)
data("nucmer") # Fix IUPAC codes nucmer<-nucmer[!nucmer$qvar%in%c("B","D","H","K","M","N","R","S","V","W","Y"),] nucmer<- mergeEvents(nucmer = nucmer)## This will update the nucmer object data("refseq") data("gff3") annot <- setNames(gff3[,10],gff3[,9]) outdir <- tempdir() indelSNP(nucmer = nucmer, saveRda = FALSE, refseq = refseq, gff3 = gff3, annot = annot, outdir = outdir)
Last five nucleotides of primer mutation count/type for any reverse transcription polymerase chain reaction (RT-PCR) primer.
LastfiveNrMutation( nucmerr = nucmerr, assays = assays, totalsample = totalsample, figurelist = FALSE, outdir = "." )
LastfiveNrMutation( nucmerr = nucmerr, assays = assays, totalsample = totalsample, figurelist = FALSE, outdir = "." )
nucmerr |
nucmerr Mutation information containing group list(derived from "nucmer" object using "nucmerRMD" function). |
assays |
Assays dataframe including the detection ranges of mutations. |
totalsample |
Total sample number, total cleared GISAID fasta data. |
figurelist |
Whether to output the integrated plot list for each assay. |
outdir |
The output directory. if the figurelist = TRUE, output the figure in the R session. |
Plot the mutation counts(last five nucleotides for each primer) for each assay as output.
data("nucmerr") data("assays") totalsample <- 1000 outdir <- tempdir() LastfiveNrMutation(nucmerr = nucmerr, assays = assays, totalsample = totalsample, figurelist = FALSE, outdir = outdir)
data("nucmerr") data("assays") totalsample <- 1000 outdir <- tempdir() LastfiveNrMutation(nucmerr = nucmerr, assays = assays, totalsample = totalsample, figurelist = FALSE, outdir = outdir)
The first step for handling the nucmer object, then effects of mutations can be analysed using "indelSNP" function.
mergeEvents(nucmer = nucmer)
mergeEvents(nucmer = nucmer)
nucmer |
An object called "nucmer", mutation information derived from "nucmer.snp" variant file by "seqkit" software and "nucmer SNP-calling" scripts. |
An updated "nucmer" object.
#The example data: data("nucmer") #options(stringsAsFactors = FALSE) #The input nucmer object can be made by the comment below: #nucmer<-read.delim("nucmer.snps",as.is=TRUE,skip=4,header=FALSE) #colnames(nucmer)<-c("rpos","rvar","qvar","qpos","","","","", #"rlength","qlength","","","rname","qname") #rownames(nucmer)<-paste0("var",1:nrow(nucmer)) # Fix IUPAC codes nucmer<-nucmer[!nucmer$qvar%in%c("B","D","H","K","M","N","R","S","V","W","Y"),] nucmer<- mergeEvents(nucmer = nucmer)## This will update the nucmer object
#The example data: data("nucmer") #options(stringsAsFactors = FALSE) #The input nucmer object can be made by the comment below: #nucmer<-read.delim("nucmer.snps",as.is=TRUE,skip=4,header=FALSE) #colnames(nucmer)<-c("rpos","rvar","qvar","qpos","","","","", #"rlength","qlength","","","rname","qname") #rownames(nucmer)<-paste0("var",1:nrow(nucmer)) # Fix IUPAC codes nucmer<-nucmer[!nucmer$qvar%in%c("B","D","H","K","M","N","R","S","V","W","Y"),] nucmer<- mergeEvents(nucmer = nucmer)## This will update the nucmer object
After annotating the mutations, this function is to plot the counts of mutational events for each gene in the SARS-CoV-2 genome.
MutByGene(nucmerr = nucmerr, gff3 = gff3, figurelist = FALSE, outdir = ".")
MutByGene(nucmerr = nucmerr, gff3 = gff3, figurelist = FALSE, outdir = ".")
nucmerr |
Mutation information containing group list(derived from "nucmer" object using "nucmerRMD" function). |
gff3 |
"GFF3" format gene position data for SARS-Cov-2(the "GFF3" file should include columns named: "Gene", "Start", "Stop"). |
figurelist |
Whether to output the integrated plot list for each gene. |
outdir |
The output directory, if the figurelist = TRUE, output the figure in the R session. |
Plot the mutation counts figure for each gene as output.
data("nucmerr") data("gene_position") outdir <- tempdir() MutByGene(nucmerr = nucmerr, gff3 = gene_position, figurelist = FALSE, outdir = outdir) #if figurelist = TRUE, the recommendation for figure display(in pixel)is: width=1650, height=1300
data("nucmerr") data("gene_position") outdir <- tempdir() MutByGene(nucmerr = nucmerr, gff3 = gene_position, figurelist = FALSE, outdir = outdir) #if figurelist = TRUE, the recommendation for figure display(in pixel)is: width=1650, height=1300
Visualization for the top mutated samples, average mutational counts, top mutated position in the genome, mutational density across the genome and distribution of mutations across countries.
mutStat( nucmerr = nucmerr, outdir = ".", figure_Type = "TopMuSample", type_top = 10, country = FALSE, mutpos = NULL )
mutStat( nucmerr = nucmerr, outdir = ".", figure_Type = "TopMuSample", type_top = 10, country = FALSE, mutpos = NULL )
nucmerr |
Mutation information containing group list(derived from "nucmer" object using "nucmerRMD" function). |
outdir |
The output directory. |
figure_Type |
Figure type for: "TopMuSample", "AverageMu", "TopMuPos", "MutDens", "CountryMutCount", "TopCountryMut". |
type_top |
To plot the figure involving "top n"("TopMuSample", "TopMuPos", "TopCountryMut"), the "type_top" should specify the number of objects to display. |
country |
To plot the figure using country as groups("CountryMutCount" and "TopCountryMut"), the "country" should be TRUE. |
mutpos |
If the figure type is "TopCountryMut", "mutpos" can specify A range of genomic position(eg. 28831:28931) for plot |
Plot the selected figure type as output.
data("nucmerr") outdir <- tempdir() mutStat(nucmerr = nucmerr, outdir = outdir, figure_Type = "TopCountryMut", type_top = 10, country = FALSE, mutpos = NULL)
data("nucmerr") outdir <- tempdir() mutStat(nucmerr = nucmerr, outdir = outdir, figure_Type = "TopCountryMut", type_top = 10, country = FALSE, mutpos = NULL)
The "nucmer.snpss" variant file is obtained by processing the SARS-Cov-2 sequence from Gisaid website (complete, high coverage only, low coverage exclusion, Host=human, Virus name = hCoV-19) with "seqkit" software and "nucmer" scripts. The example data is downsampled from complete data in 2020-06-14.
data(nucmer)
data(nucmer)
A dataframe with 5000 rows(mutation sites) and 14 columns.
data(nucmer)
data(nucmer)
A dataset contains some group information subtracted from the "nucmer" object by "nucmerRMD" function in order to best describe the results.
data(nucmerr)
data(nucmerr)
A dataframe with 4982 rows (downsampled mutation sites) and 10 columns.
data(nucmerr)
data(nucmerr)
Manipulate the "nucmer" object to make the analysis easier.
nucmerRMD(nucmer = nucmer, outdir = ".", chinalist = chinalist)
nucmerRMD(nucmer = nucmer, outdir = ".", chinalist = chinalist)
nucmer |
An object called "nucmer", mutation information derived from "nucmer.snp" variant file by "seqkit" software and "nucmer SNP-calling" scripts. |
outdir |
The output directory. |
chinalist |
A list of places in China, for displacing some original cities with "China" in order to make the downstream analysis easier. |
Saving the updated "nucmer" object.
data("nucmer") data("chinalist") #outdir <- tempdir() specify your output directory nucmerr<- nucmerRMD(nucmer = nucmer, outdir = NULL, chinalist = chinalist)
data("nucmer") data("chinalist") #outdir <- tempdir() specify your output directory nucmerr<- nucmerRMD(nucmer = nucmer, outdir = NULL, chinalist = chinalist)
Basic descriptions for the mutational events.
plotMutAnno(results = results, figureType = "MostMut", outdir = ".")
plotMutAnno(results = results, figureType = "MostMut", outdir = ".")
results |
The mutation effects provided by "indelSNP" function. |
figureType |
Figure type for: "MostMut", "MutPerSample", "VarClasses", "VarType", "NucleoEvents", "ProEvents". |
outdir |
The output directory. |
Plot the selected figure type as output.
data("covid_annot") # make sure the covid_annot is a dataframe covid_annot <- as.data.frame(covid_annot) #outdir <- tempdir() specify your output directory plotMutAnno(results = covid_annot,figureType = "MostMut", outdir = NULL)
data("covid_annot") # make sure the covid_annot is a dataframe covid_annot <- as.data.frame(covid_annot) #outdir <- tempdir() specify your output directory plotMutAnno(results = covid_annot,figureType = "MostMut", outdir = NULL)
Plot the most frequent mutational events for proteins selected. The protein name should be specified correctly (only for SARS-CoV-2).
plotMutProteins( results = results, proteinName = "NSP2", top = 20, outdir = "." )
plotMutProteins( results = results, proteinName = "NSP2", top = 20, outdir = "." )
results |
The mutation effects provided by "indelSNP" function. |
proteinName |
Proteins in the SARS-CoV-2 genome, available choices: 5'UTR, NSP1~NSP10, NSP12a, NSP12b, NSP13, NSP14, NSP15, NSP16, S, ORF3a, E, M, ORF6, ORF7a, ORF7b, ORF8, N, ORF10. |
top |
The number of objects to display. |
outdir |
The output directory. |
Plot the mutational events for selected proteins as output.
data("covid_annot") # make sure the covid_annot is a dataframe covid_annot <- as.data.frame(covid_annot) #outdir <- tempdir() specify your output directory plotMutProteins(results = covid_annot,proteinName = "NSP2", top = 20, outdir = NULL)
data("covid_annot") # make sure the covid_annot is a dataframe covid_annot <- as.data.frame(covid_annot) #outdir <- tempdir() specify your output directory plotMutProteins(results = covid_annot,proteinName = "NSP2", top = 20, outdir = NULL)
This reference sequence is derived from "fasta" file, preprocessed by "read.fasta" function(refseq<-read.fasta("NC_045512.2.fa",forceDNAtolower=FALSE)[[1]]). It is used for annotating mutations in virus samples.
data(refseq)
data(refseq)
"SeqFastadna" characters.
https://pubmed.ncbi.nlm.nih.gov/32015508/
data(refseq)
data(refseq)