添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statgenGWAS version 1.0.7
You can reproduce the problem using the same code introduced in the CRAN vignettes:
Introduction to the statgenGWAS package

## Impute missing values using beagle software.
gDataDropsImputedBeagle <- codeMarkers(gData = gDataDropsMiss, 
                                       impute = TRUE,
                                       imputeType = "beagle",
                                       verbose = TRUE)

After a few seconds, you got the following error message:

Error in read.table(gzfile(paste0(tmpVcfOut, ".vcf.gz")), stringsAsFactors = FALSE) : 
  no lines available in input
In addition: Warning message:
In system(paste0("java -Xmx3000m -jar ", shQuote(paste0(sort(path.package()[grep("statgenGWAS",  :

Well, I went to the temp folder where that system command is running, and copy the beagle.jar from the package java directory trying to run it directly from the command line using the following call (names of the temp files may differ in your case):

java -Xmx2667m -jar beagle.28Jun21.220.jar gt=file424861223ebdinput.vcf out=file424859a64c75out gp=true seed=1234 nthreads=1 map=file42485d1a4254.map

And as a result, I got the following error message:

beagle.28Jun21.220.jar (version 5.2)
Enter "java -jar beagle.28Jun21.220.jar" to list command line argument
Start time: 10:54 PM EET on 29 Nov 2021
Command line: java -Xmx2372m -jar beagle.28Jun21.220.jar
  gt=file424861223ebdinput.vcf
  out=file424859a64c75out
  gp=true
  seed=1234
  nthreads=1
  map=file42485d1a4254.map
Reference samples:                    0
Study     samples:                  246
Window 1 [chr1:3498-3498]
Reference markers:                    1
Study     markers:                    1
Exception in thread "main" java.lang.IllegalArgumentException: Window has only one position: CHROM=chr1 POS=3498
        at vcf.MarkerMap.meanSingleBaseGenDist(MarkerMap.java:98)
        at phase.FixedPhaseData.markerMap(FixedPhaseData.java:169)
        at phase.FixedPhaseData.<init>(FixedPhaseData.java:116)
        at main.Main.phaseStage1Variants(Main.java:187)
        at main.Main.phaseTarg(Main.java:181)
        at main.Main.phaseAndImpute(Main.java:171)
        at main.Main.main(Main.java:126)
          

Well, it looks like a kind of compatibility issue. Therefore, I start trying previous versions of beagle jar file where the following versions have been tested with no success:

  • beagle.29May21.d6d.jar
  • beagle.21Apr21.304.jar
  • beagle.17Apr21.dc6.jar
  • beagle.05Apr21.9b7.jar
  • Until, it works perfectly fine with beagle.18May20.d20.jar (version 5.1), and I got the following message telling that all the 102633 missing values imputed (the 1% random missing values added in the marker matrix as mentioned in the vignette):

    > gDataDropsImputedBeagle <- codeMarkers(gData = gDataDropsMiss, 
    +                                        impute = TRUE,
    +                                        imputeType = "beagle",
    +                                        verbose = TRUE)
    Input contains 41722 SNPs for 246 genotypes.
    0 genotypes removed because proportion of missing values larger than or equal to 1.
    0 SNPs removed because proportion of missing values larger than or equal to 1.
    67 duplicate SNPs removed.
    102633 missing values imputed.
    1131 duplicate SNPs removed after imputation.
    Output contains 40524 SNPs for 246 genotypes.
    

    But, when I checked for the missing values in the marker matrix after this imputation step sum(is.na(gDataDropsImputedBeagle$markers)), I found 48457 missing values. When I dug deep, I noticed they were present in the marker matrix before the imputation, then they turned missing somehow!

    I am not sure if the generated temporarily input VCF file via your statgenGWAS package is not compatible with version 5.1 of beagle software, or there is something else went wrong in here.

    Your support to fix this beagle imputation issue is highly appreciated and required.

    Thanks for all your digging.
    I did a bit more of that and it turns out that when beagle was updated to version 5.0 something changed in the input format. I'm not yet completely sure what, but I will try to dig deeper into that.

    For now as a workaround I put back beagle 4.1 as the beagle version used in the package. If you install the package from the develop branch the code above will work fine again with all missing values actually imputed.

    Your temporary fix for beagle 5.2 issues has a few typos in the beagle.jar parameters in the "R/codeMarkers.R" script on lines 458 (it is gt, not gtgl) and 459 (it is gp, not gprobs). There is no gtgl nor gprobs parameters as per beagle documentation. Also, the binary beagle.jar file in the inst/java directory is still the old 4.1 version, not 5.2 (or the latest 5.4).

    Well, even after fixing these issues, we still get the following error message:

    Error in read.table(gzfile(paste0(tmpVcfOut, ".vcf.gz")), stringsAsFactors = FALSE) : 
      no lines available in input
    In addition: Warning message:
    In system(paste0("java -Xmx3000m -jar ", shQuote(paste0(sort(path.package()[grep("statgenGWAS",  :
      running command 'java -Xmx3000m -jar "C:/Users/Kel-shamaa/Documents/R/win-library/4.0/statgenGWAS/java/beagle.jar" gt=C:\Users\KEL-SH~1\AppData\Local\Temp\Rtmpq0yOKZ\file53e814707bcinput.vcf out=C:\Users\KEL-SH~1\AppData\Local\Temp\Rtmpq0yOKZ\file53e848ca5970out gp=true seed=1234 nthreads=1 map=C:\Users\KEL-SH~1\AppData\Local\Temp\Rtmpq0yOKZ\file53e8790b5801.map' had status 1
    

    It looks like something related to the map file because when I excluded it from the parameters, the beagle call looks running on the CMD!

    The temporary fix I made, was effectively reverting to beagle 4.1. This means the parameters and the .jar file are now back to what they were for beagle 4.1.
    The beagle documentation you link to is for version 5.2 and indeed there some of the parameters were renamed.
    The problem is indeed with the map. It seems in beagle 5.0 and higher certain map formats that were allowed in beagle 4.x are no longer allowed.