## Impute missing values using beagle software.
gDataDropsImputedBeagle <- codeMarkers(gData = gDataDropsMiss,
impute = TRUE,
imputeType = "beagle",
verbose = TRUE)
After a few seconds, you got the following error message:
Error in read.table(gzfile(paste0(tmpVcfOut, ".vcf.gz")), stringsAsFactors = FALSE) :
no lines available in input
In addition: Warning message:
In system(paste0("java -Xmx3000m -jar ", shQuote(paste0(sort(path.package()[grep("statgenGWAS", :
Well, I went to the temp folder where that system command is running, and copy the beagle.jar
from the package java directory trying to run it directly from the command line using the following call (names of the temp files may differ in your case):
java -Xmx2667m -jar beagle.28Jun21.220.jar gt=file424861223ebdinput.vcf out=file424859a64c75out gp=true seed=1234 nthreads=1 map=file42485d1a4254.map
And as a result, I got the following error message:
beagle.28Jun21.220.jar (version 5.2)
Enter "java -jar beagle.28Jun21.220.jar" to list command line argument
Start time: 10:54 PM EET on 29 Nov 2021
Command line: java -Xmx2372m -jar beagle.28Jun21.220.jar
gt=file424861223ebdinput.vcf
out=file424859a64c75out
gp=true
seed=1234
nthreads=1
map=file42485d1a4254.map
Reference samples: 0
Study samples: 246
Window 1 [chr1:3498-3498]
Reference markers: 1
Study markers: 1
Exception in thread "main" java.lang.IllegalArgumentException: Window has only one position: CHROM=chr1 POS=3498
at vcf.MarkerMap.meanSingleBaseGenDist(MarkerMap.java:98)
at phase.FixedPhaseData.markerMap(FixedPhaseData.java:169)
at phase.FixedPhaseData.<init>(FixedPhaseData.java:116)
at main.Main.phaseStage1Variants(Main.java:187)
at main.Main.phaseTarg(Main.java:181)
at main.Main.phaseAndImpute(Main.java:171)
at main.Main.main(Main.java:126)
Well, it looks like a kind of compatibility issue. Therefore, I start trying previous versions of beagle jar file where the following versions have been tested with no success:
beagle.29May21.d6d.jar
beagle.21Apr21.304.jar
beagle.17Apr21.dc6.jar
beagle.05Apr21.9b7.jar
Until, it works perfectly fine with beagle.18May20.d20.jar
(version 5.1), and I got the following message telling that all the 102633 missing values imputed (the 1% random missing values added in the marker matrix as mentioned in the vignette):
> gDataDropsImputedBeagle <- codeMarkers(gData = gDataDropsMiss,
+ impute = TRUE,
+ imputeType = "beagle",
+ verbose = TRUE)
Input contains 41722 SNPs for 246 genotypes.
0 genotypes removed because proportion of missing values larger than or equal to 1.
0 SNPs removed because proportion of missing values larger than or equal to 1.
67 duplicate SNPs removed.
102633 missing values imputed.
1131 duplicate SNPs removed after imputation.
Output contains 40524 SNPs for 246 genotypes.
But, when I checked for the missing values in the marker matrix after this imputation step sum(is.na(gDataDropsImputedBeagle$markers))
, I found 48457 missing values. When I dug deep, I noticed they were present in the marker matrix before the imputation, then they turned missing somehow!
I am not sure if the generated temporarily input VCF file via your statgenGWAS package is not compatible with version 5.1 of beagle software, or there is something else went wrong in here.
Your support to fix this beagle imputation issue is highly appreciated and required.
Thanks for all your digging.
I did a bit more of that and it turns out that when beagle was updated to version 5.0 something changed in the input format. I'm not yet completely sure what, but I will try to dig deeper into that.
For now as a workaround I put back beagle 4.1 as the beagle version used in the package. If you install the package from the develop branch the code above will work fine again with all missing values actually imputed.
Your temporary fix for beagle 5.2 issues has a few typos in the beagle.jar parameters in the "R/codeMarkers.R" script on lines 458 (it is gt, not gtgl) and 459 (it is gp, not gprobs). There is no gtgl nor gprobs parameters as per beagle documentation. Also, the binary beagle.jar file in the inst/java directory is still the old 4.1 version, not 5.2 (or the latest 5.4).
Well, even after fixing these issues, we still get the following error message:
Error in read.table(gzfile(paste0(tmpVcfOut, ".vcf.gz")), stringsAsFactors = FALSE) :
no lines available in input
In addition: Warning message:
In system(paste0("java -Xmx3000m -jar ", shQuote(paste0(sort(path.package()[grep("statgenGWAS", :
running command 'java -Xmx3000m -jar "C:/Users/Kel-shamaa/Documents/R/win-library/4.0/statgenGWAS/java/beagle.jar" gt=C:\Users\KEL-SH~1\AppData\Local\Temp\Rtmpq0yOKZ\file53e814707bcinput.vcf out=C:\Users\KEL-SH~1\AppData\Local\Temp\Rtmpq0yOKZ\file53e848ca5970out gp=true seed=1234 nthreads=1 map=C:\Users\KEL-SH~1\AppData\Local\Temp\Rtmpq0yOKZ\file53e8790b5801.map' had status 1
It looks like something related to the map file because when I excluded it from the parameters, the beagle call looks running on the CMD!
The temporary fix I made, was effectively reverting to beagle 4.1. This means the parameters and the .jar file are now back to what they were for beagle 4.1.
The beagle documentation you link to is for version 5.2 and indeed there some of the parameters were renamed.
The problem is indeed with the map. It seems in beagle 5.0 and higher certain map formats that were allowed in beagle 4.x are no longer allowed.