Hi Team I want to to use lakefs in the debugger mode Can you lakefs #help

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

打酱油的槟榔 · .NET ...· 4 周前 ·

傻傻的双杠 · Compiling XLA with ...· 3 周前 ·

傲视众生的开水瓶 · Git Blame Info - ...· 3 周前 ·

坏坏的甜瓜 · SQLFetchScroll ...· 1 周前 ·

温柔的大葱 · 斗罗：尸祖将臣，初拥复活比比东-月飘柔-微信读书· 2 月前 ·

完美的皮带 · ion-button：具有自定义 CSS ...· 5 月前 ·

近视的咖啡 · 在 C# 中创建位图 | C# ...· 5 月前 ·

拉风的蘑菇 · 奥格斯的法则在线,观看免费,播放,剪辑视频 ...· 5 月前 ·

谦和的机器人 · Solved: Search ...· 5 月前 ·

Channels

dev

help

# help

Title

# help

Vaibhav Kumar

03/20/2023, 2:13 PM

Hi Team I want to to use lakefs in the debugger mode. Can you please me how can I set one up? I want to debug one Hadoop filesystem

Lynn Rozen

03/20/2023, 2:31 PM

Hi @ Vaibhav Kumar , you want to debug the Hadoop filesystem code? Run lakeFS server with the dubugger?

Vaibhav Kumar

03/20/2023, 2:32 PM

yes @ Lynn Rozen

Lynn Rozen

03/20/2023, 2:32 PM

Yes on both?(:

Vaibhav Kumar

03/20/2023, 2:34 PM

Actually I am working on this issue https://github.com/treeverse/lakeFS/issues/2801 So, Need to put debugger in place. So I am not sure as of now do I need both or not

Lynn Rozen

03/20/2023, 3:17 PM

There are different ways to debug the HadoopFS client, I'm using for example spark local job with intellij. You can read more here https://sparkbyexamples.com/spark/how-to-debug-spark-application-locally-or-remote/

Vaibhav Kumar

04/01/2023, 11:40 AM

I checked it, but this is the way to put debugger on the spark app. If we were to put debugger on the HadoopFS client don't you think we need to run the lakeFS in some mode (I feel binary mode). When we will put

spark.read.parquet("<lakefs://repo-that-doesnt-exist/main/path/to/data>")

for the control to come to

HadoopFS client

the lakeFS should be running somewhere right and within that I will have to put the debugger. Does that make sense?

Yoni Augarten

04/01/2023, 11:45 AM

Hey @ Vaibhav Kumar , if I understand correctly, you would like to run the lakeFS Golang server in debug mode. That is, while using/developing the Hadoop FileSystem, you want to be able to put breakpoints in the server-side code. Is that correct?

Vaibhav Kumar

04/01/2023, 11:51 AM

yes correct @ Yoni Augarten As I am working on this issue So there are two parts to it I feel 1.

spark.read.parquet("<lakefs://repo-that-doesnt-exist/main/path/to/data>") -

-> I know i can put a debugger on spark app side 2. Hadoopfs The server side I am not sure how to use it in debugger mode. And how things will land here?

Yoni Augarten

04/01/2023, 11:55 AM

Great! I'm sorry that we don't have an up-to-date guide on this one, but here is the gist of it: 1. Clone the lakeFS project. 2. Make sure you have Golang installed. 3. Run

make all

4. Import the project to your IDE. 5. Run

cmd/lakefs/cmd/main.go

with the

run --local-settings

arguments. (you will later change these to connect your installation to the storage). 6. Come back here and let me know how it works 🙂

Vaibhav Kumar

04/01/2023, 12:00 PM

Sure, thanks

During

make-all

I got the below error

Copy code

go: downloading <http://golang.org/x/xerrors|golang.org/x/xerrors> v0.0.0-20200804184101-5ec99f83aff1
go: downloading <http://golang.org/x/sys|golang.org/x/sys> v0.0.0-20210510120138-977fb7262007
Makefile:103: *** "Missing dependency - no docker in PATH". Stop.

Yoni Augarten

04/01/2023, 12:06 PM

Yep - so you also need docker for the tests. For now maybe try

make gen

instead - but you may still need docker.

Vaibhav Kumar

04/01/2023, 12:57 PM

@ Yoni Augarten I was able to run the lakeFS server using -

go run main.go run --local-settings

As main.go was a Go file and our HadoopFS client is in Java so how do I link these things together. Eventually I have to put breakpoints in HadoopFS client code.

Yoni Augarten

04/01/2023, 1:17 PM

Great progress! So the client is a little trickier to debug. The simplest way is to: 1. Import

clients/hadoopfs

as a separate IDE project. 2. Write a main method to call the code that you want to test. You will set the

fs.lakefs.*

configuration to point to your local instance of lakeFS. This is simpler than running a Spark program and debugging it - although that's also possible. Let me know if that makes sense.

Vaibhav Kumar

04/01/2023, 1:31 PM

The way you suggested is the better way I think as well. Using spark seems complicating things. Now the question is reading the below trace from the issue I see

getFileStatus

(Line 749) is the first function where the call goes to.This function expect
path
as a param, So I hope this the same spark path which we pass from

spark.read.parquet("<lakefs://repo-that-doesnt-exist/main/path/to/data>")

Kindly confirm if my observation is correct. Trace from the issue 2801

Copy code

java.io.IOException: listObjects
at io.lakefs.LakeFSFileSystem$ListingIterator.readNextChunk(LakeFSFileSystem.java:901)
	at io.lakefs.LakeFSFileSystem$ListingIterator.hasNext(LakeFSFileSystem.java:881)
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:707)
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:40)
	at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1439)

Yoni Augarten

04/01/2023, 1:38 PM

You are correct: you would use

lakefs://

paths when calling

getFileStatus

on the LakeFSFileSystem.

Vaibhav Kumar

04/01/2023, 1:59 PM

Thank you for confirming. Will start my work from here and will let you know if in case I see any issues

Yoni Augarten

04/01/2023, 2:04 PM

Sure thing, we're lucky to have you as a contributor! 🙂 Let me know if there's anything else.

dancing lakefs 1

jiggling lakefs 1

Vaibhav Kumar

04/01/2023, 7:05 PM

To start using the hadoopFS client as a JAR in another project I tried building the JAR from this POM . I am getting this weird error, Seems like something wrong with Java version . Do you know what is causing this error?

Copy code

Caused by: java.lang.IllegalArgumentException: Unsupported class file major version 63
        at net.bytebuddy.jar.asm.ClassReader.<init>(ClassReader.java:196)
        at net.bytebuddy.jar.asm.ClassReader.<init>(ClassReader.java:177)
        at net.bytebuddy.jar.asm.ClassReader.<init>(ClassReader.java:163)
        at net.bytebuddy.utility.OpenedClassReader.of(OpenedClassReader.java:86)
        at net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:3889)
        at net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2166)
        at net.bytebuddy.dynamic.scaffold.inline.RedefinitionDynamicTypeBuilder.make(RedefinitionDynamicTypeBuilder.java:224)
        at net.bytebuddy.dynamic.scaffold.inline.AbstractInliningDynamicTypeBuilder.make(AbstractInliningDynamicTypeBuilder.java:123)
        at net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3659)
        at org.mockito.internal.creation.bytebuddy.InlineBytecodeGenerator.transform(InlineBytecodeGenerator.java:391)
        at java.instrument/java.lang.instrument.ClassFileTransformer.transform(ClassFileTransformer.java:244)
        at java.instrument/sun.instrument.TransformerManager.transform(TransformerManager.java:188)
        at java.instrument/sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:541)
        at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method)
        at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:169)
        at org.mockito.internal.creation.bytebuddy.InlineBytecodeGenerator.triggerRetransformation(InlineBytecodeGenerator.java:276)
        ... 46 more
Running io.lakefs.FSConfigurationTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
Results :
Tests in error: 
  testGetFileStatus_ExistingFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_ExistsAsDirectoryInSecondList(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_NotExistsNoPrefix(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingDirToExistingFileName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDeleteDirectoryRecursiveBatch120(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDeleteDirectoryRecursiveBatch123(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testCreateExistingDirectory(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_ExistsAsDirectoryContents(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_srcEqualsDst(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingDirToNonExistingDirWithParent(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  getUri(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testOpen(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDelete_FileNotExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingDirToExistingNonEmptyDirName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingFileToExistingDirName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testListStatusDirectory(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_NotExistsPrefixWithNoSlashTwoLists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_nonExistingSrcFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_NotExistsPrefixWithNoSlash(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testListStatusNotFound(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_srcAndDstOnDifferentBranch(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDelete_NotExistsRecursive(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDelete_FileExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDelete_EmptyDirectoryExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_ExistsAsObject(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDeleteDirectoryRecursiveBatch1(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDeleteDirectoryRecursiveBatch2(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDeleteDirectoryRecursiveBatch3(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDeleteDirectoryRecursiveBatch5(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testGetFileStatus_NoFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testCreateExistingFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingDirToNonExistingDirWithoutParent(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testListStatusFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testAppend(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testCreate(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_fallbackStageAPI(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testOpen_NotExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testMkdirs(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testGetFileStatus_DirectoryMarker(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testGlobStatus_SingleFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingFileToExistingFileName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDelete_DirectoryWithFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testDelete_DirectoryWithFileRecursive(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testExists_ExistsAsDirectoryMarker(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testRename_existingFileToNonExistingDst(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
  testGetFileStatus_ExistingFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_ExistsAsDirectoryInSecondList(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_NotExistsNoPrefix(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingDirToExistingFileName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDeleteDirectoryRecursiveBatch120(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDeleteDirectoryRecursiveBatch123(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testCreateExistingDirectory(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_ExistsAsDirectoryContents(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_srcEqualsDst(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingDirToNonExistingDirWithParent(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  getUri(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testOpen(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDelete_FileNotExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingDirToExistingNonEmptyDirName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingFileToExistingDirName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testListStatusDirectory(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_NotExistsPrefixWithNoSlashTwoLists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_nonExistingSrcFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_NotExistsPrefixWithNoSlash(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testListStatusNotFound(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_srcAndDstOnDifferentBranch(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDelete_NotExistsRecursive(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDelete_FileExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDelete_EmptyDirectoryExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_ExistsAsObject(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDeleteDirectoryRecursiveBatch1(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDeleteDirectoryRecursiveBatch2(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDeleteDirectoryRecursiveBatch3(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDeleteDirectoryRecursiveBatch5(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testGetFileStatus_NoFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testCreateExistingFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingDirToNonExistingDirWithoutParent(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testListStatusFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testAppend(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testCreate(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_fallbackStageAPI(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testOpen_NotExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testMkdirs(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testGetFileStatus_DirectoryMarker(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testGlobStatus_SingleFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingFileToExistingFileName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDelete_DirectoryWithFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testDelete_DirectoryWithFileRecursive(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testExists_ExistsAsDirectoryMarker(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
  testRename_existingFileToNonExistingDst(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
Tests run: 105, Failures: 0, Errors: 90, Skipped: 0
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  05:05 min
[INFO] Finished at: 2023-04-02T00:19:55+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project hadoop-lakefs: There are test failures.
[ERROR] 
[ERROR] Please refer to /Users/simar/lakeFS/clients/hadoopfs/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] <http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException>

Yoni Augarten

04/01/2023, 7:06 PM

@ Vaibhav Kumar , the FileSystem is compile and tested with Java 8, since it needs to be compatible with legacy Hadoop versions. You should have Java 8 installed, and set your JAVA_HOME to it when running the tests.

Vaibhav Kumar

04/01/2023, 7:10 PM

Ok sure

I tried building it with Java 8, The build still failed but with different error.

Copy code

[INFO] Building jar: /Users/simar/lakeFS/clients/hadoopfs/target/hadoop-lakefs-0.1.0.jar
[INFO] 
[INFO] --- gpg:1.5:sign (sign-artifacts) @ hadoop-lakefs ---
Downloading from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.pom>
Downloaded from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.pom> (3.1 kB at 51 kB/s)
Downloading from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.jar>
Downloaded from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.jar> (239 kB at 4.1 MB/s)
/bin/sh: gpg: command not found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  03:15 min
[INFO] Finished at: 2023-04-02T20:55:03+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-gpg-plugin:1.5:sign (sign-artifacts) on project hadoop-lakefs: Exit code: 127 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] <http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException>

Yoni Augarten

04/02/2023, 3:55 PM

There is a signing step there that you should skip, by adding

-P'!treeverse-signing'

to the Maven command

Vaibhav Kumar

04/02/2023, 4:39 PM

done

I tried importing the JAR to my test project but I see some builder function in

LakeFSFileStatus

I am not sure how to use it

Yoni Augarten

04/02/2023, 4:42 PM

What are you trying to achieve?

Vaibhav Kumar

04/02/2023, 4:44 PM

As per we discussed above, I am trying to call

getFileStatus

and will pass the lakeFS file path here to read in it.

Yoni Augarten

04/02/2023, 4:46 PM

getFileStatus is a method on the LakeFSFileSystem. So you want to have an instance of the LakeFSFileSystem.

FileSystem.get is your friend here

In pseudo code, you want to do something like:

Copy code

Path p = new Path("<lakefs://my-repo/main/1.txt>");
LakeFSFileSystem fs = FileSystem.get(hadoopConf, p);
fs.getFileStatus(p);

You don't necessarily need a Spark session around

Vaibhav Kumar

04/02/2023, 4:54 PM

My bad I was calling the wrong file above. 😶 Apart from this I should just have my LakefsServer running right like this

go run main.go run --local-settings

Yoni Augarten

04/02/2023, 4:58 PM

Yes. Take a look at the LakeFSFileSystem to learn how to connect it to lakeFS

Vaibhav Kumar

04/02/2023, 6:14 PM

I am facing some issue while creating lakefsClient . I had expected

spark.sparkContext.hadoopConfiguration.set

to set up Lakefs client but seems the syntax is not working. In the doc it was mentioned to use it the below way.

Copy code

def main(args : Array[String]) {
  val spark = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExample")
    .getOrCreate();
  val path = new Path("<S3a://test-repo/main/sample.json>")
  spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "AKIAJLDV6JMK2R5TRQSQ")
  spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "oOal7VcsJQQnGoPcM9AEYXCe1Q76PHMpX4R1+Ai+")
  spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", "http:localhost:8000")
  spark.sparkContext.hadoopConfiguration.set("fs.s3a.path.style.access", "true")
  val y = new LakeFSFileSystem()
  y.getFileStatus(path)

Error

Exception in thread "main" java.lang.NullPointerException: Cannot invoke "io.lakefs.LakeFSClient.getObjectsApi()" because "this.lfsClient" is null

Yoni Augarten

04/03/2023, 9:04 AM

@ Vaibhav Kumar , please read about the distinction between using the S3-compatible API and using the LakeFSFileSystem . Your configuration is for the former, while it should be for the latter. Also, you shouldn't simply instantiate a FileSystem instance, you should use

FileSystem.get

like I mentioned above.

Vaibhav Kumar

04/03/2023, 10:01 AM

Yes, I used s3 one because I don’t have exhausted the AWS account free credits.. Please confirm if this test will work on the local storage as well.

Yoni Augarten

04/03/2023, 10:02 AM

LakeFSFileSystem will only work with S3 as the backing storage.

Working with the

fs.s3a

configuration will not use LakeFSFileSystem, so it's not relevant for solving this issue.

You can try to use MinIO or another S3-compatible storage instead of S3.

Again, please read the attached doc to understand how LakeFSFileSystem works.

Vaibhav Kumar

04/03/2023, 11:50 AM

Yes, I have gone the doc. I was just confirming it once with you 🙂

Copy code

Path p = new Path("<lakefs://my-repo/main/1.txt>");
LakeFSFileSystem fs = FileSystem.get(hadoopConf, p);
fs.getFileStatus(p);

In the above function shall I pass

hadoopConf

as a map of fs.* params?

Yoni Augarten

04/03/2023, 11:52 AM

https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#get-java.net.URI-org.apache.hadoop.conf.Configuration-

gratitude thank you 1

Vaibhav Kumar

04/07/2023, 2:39 PM

I have used tried the implementation the below way. Now is just need to work with lakefs:// prefix instead of

s3a://

. Please confirm If I have used it the correct way? One observation

FileSystem._get_

is working with the URI option not directly with the

path

variable

def main(args : Array[String]) {

val conf = new Configuration()

conf.set("fs.s3a.access.key", "AKIAJLDV6JMK2R5TRQSQ")

conf.set("fs.s3a.secret.key", "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")

conf.set("fs.s3a.endpoint", "<http://localhost:8000>")

conf.set("fs.s3a.path.style.access", "true")

val uri = "<s3a://test-repo/main/sample.json>"

val path = new Path("<s3a://test-repo/main/sample1.json>")

URI._create_(uri)

val  fs =  FileSystem._get_(URI._create_(uri), conf)

fs.getFileStatus(path)

Elad Lachmi

04/07/2023, 2:53 PM

Hi @ Vaibhav Kumar , Yes, that looks about right. You'll need to use a

lakefs://

uri, as you mentioned.

Vaibhav Kumar

04/07/2023, 2:57 PM

Ok and the rest things look good right , the way I have used path and URI ?this is in reference to solving issue 2801 as mentioned above.

Elad Lachmi

04/07/2023, 3:01 PM

Yes, looks about right

Vaibhav Kumar

04/07/2023, 3:05 PM

One more thing I have used

fs.getFileStatus(path)

from hadoop FileSytems above but according to the issue stack trace the function call should be from LakefsFileSystem ?

Elad Lachmi

04/07/2023, 3:10 PM

The

lakefs://

uri tells the

FileSystem

instance to use lakeFSFS, that's why you need the

lakefs://

uri

Vaibhav Kumar

04/07/2023, 5:48 PM

after changing it to

lakefs://

it doesn't look like it is referring to

hadoopFs

client from lakefs. The trace below still shows hadoop's Filesystem error.

0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Exception in thread "main" org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "lakefs"

at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)

at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)

at org.example.SparkSessionTest$.main(SparkSessionTest.scala:27)

at org.example.SparkSessionTest.main(SparkSessionTest.scala)

Elad Lachmi

04/07/2023, 6:04 PM

Did you set

fs.lakefs.impl

? That’s what registers a handler for the

lakefs://

URIs

Vaibhav Kumar

04/08/2023, 9:15 AM

I have set that. I am using Minio as storage. Shall I now read the file from

S3

lakefs

under uri and path. Below is my code

def main(args : Array[String]) {

val conf = new Configuration()

conf.set("fs.s3a.access.key", "0yfZnzCeJdB9Y2i1")

conf.set("fs.s3a.secret.key", "uhYMtk6s97qLKs8jnJhrIMLfBs3uGkv6")

conf.set("fs.lakefs.access.key", "AKIAJLDV6JMK2R5TRQSQ")

conf.set("fs.lakefs.secret.key", "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")

conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")

conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")

conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")

val uri = "<lakefs://s3bucket/sample1.json>"

val path = new Path("<lakefs://s3bucket/sample1.json>")

val  fs =  FileSystem._get_(URI._create_(uri), conf)

fs.getFileStatus(path)

Error

Copy code

805  [main] WARN  org.apache.hadoop.fs.FileSystem  - Failed to initialize fileystem <lakefs://test-repo/main/sample1.json>: java.lang.RuntimeException: lakeFS blockstore type local unsupported by this FileSystem
Exception in thread "main" java.lang.RuntimeException: lakeFS blockstore type local unsupported by this FileSystem
	at io.lakefs.storage.PhysicalAddressTranslator.translate(PhysicalAddressTranslator.java:29)
	at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:153)
	at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
	at org.example.SparkSessionTest$.main(SparkSessionTest.scala:29)
	at org.example.SparkSessionTest.main(SparkSessionTest.scala)

Ariel Shaqed (Scolnicov)

04/08/2023, 9:40 AM

Sorry to hear it's not working. I suspect we still don't have a correct configuration. According to the error message your lakeFS blockstore type is "local". If you configured lakeFS to use an s3 compatible blockstore, that type should be "s3". I can offer 2 options... 1. If you can send your lakeFS configuration I can take a look. But please make sure to remove all secret information before sending! 2. This old guide explains how to configure lakeFS to work with minio. I went over it and it seems still relevant. (Supporting MinIO in production is a roadmap item; afair mostly we're missing continuous integration).

Vaibhav Kumar

04/08/2023, 12:59 PM

I have lakefs running on docker and storage namespace was local. Was this you mean as incorrectly configured?

Elad Lachmi

04/08/2023, 1:13 PM

lakeFSFS only works with S3-compatible block store adaptors

Vaibhav Kumar

04/08/2023, 1:14 PM

Ok so what I shall change in the lakefs config and restart the container

Elad Lachmi

04/08/2023, 1:16 PM

Please take a look at Ariel's last message Those are the options

If you'd like to work locally, you'll need to set up minio

Ariel Shaqed (Scolnicov)

04/08/2023, 2:20 PM

If you enjoy using Docker, another easy way to spin something up is to load the Everything Bagel docker-compose. That's a docker-compose file that's in the lakeFS repo that brings up lakeFS with MinIO and a bunch of popular data science tools. You might bring it up, then go over the Docker compose file and get rid of all containers except lakefs, minio, and minio-setup. It's probably a question of what's most convenient for you, mostly in terms of how you like to build and develop.

lakefs 1

Vaibhav Kumar

04/08/2023, 4:36 PM

Thanks @ Ariel Shaqed (Scolnicov) @ Elad Lachmi . This was helpful. I am however facing issue uploading the file on Lakefs. The button 'choose file ' seems not working.

Ariel Shaqed (Scolnicov)

04/08/2023, 5:57 PM

That's odd! Can you try to enter a destination path? Does it still stay grayed out? Or, can you try to use "lakectl fs upload"? I'll only be able to take a look tomorrow morning, sorry.

Vaibhav Kumar

04/09/2023, 6:01 AM

@ Ariel Shaqed (Scolnicov) it seems like something has been wrong with my system. I am facing this in every other apps as well. Never mind the file has been uploaded to lakefs file system backed by Minio. Now when I run it I am getting the below error.

Copy code

object SparkSessionTest {
  def main(args : Array[String]) {
    val conf = new Configuration()
        conf.set("fs.s3a.access.key", "minioadmin")
        conf.set("fs.s3a.secret.key", "minioadmin")
        conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
        conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
        conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
        conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
        conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
    val uri = "<lakefs://example/main/sample1.json>"
    val path = new Path("<lakefs://example/main/sample1.json>")
    val  fs =  FileSystem.get(URI.create(uri), conf)
    fs.getFileStatus(path)
Error
 Copy code




    
0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
682  [main] WARN  org.apache.hadoop.metrics2.impl.MetricsConfig  - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
754  [main] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - Scheduled Metric snapshot period at 10 second(s).
754  [main] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - s3a-file-system metrics system started
1200 [shutdown-hook-0] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - Stopping s3a-file-system metrics system...
1200 [shutdown-hook-0] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - s3a-file-system metrics system stopped.
1200 [shutdown-hook-0] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - s3a-file-system metrics system shutdown complete.
1201 [Thread-2] WARN  org.apache.hadoop.util.ShutdownHookManager  - ShutdownHook 'ClientFinalizer' failed, java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
	at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
Caused by: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
	at org.apache.hadoop.fs.s3a.S3AFileSystem.close(S3AFileSystem.java:3963)
	at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:3678)
	at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:3695)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1623)

Ariel Shaqed (Scolnicov)


                                      04/09/2023, 7:00 AM


                                        Glad to hear things are starting to improve!
These lines
                                        
                                         
                                          
                                           
                                           Copy code
                                          
                                         
                                        
                                        java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
                                        make me suspect that the lakefs-spark-client assembly version might not match the Spark version that you are using, or possibly the Hadoop version.
• Which Spark version are you using?  Is this what you get from the Everything Bagel docker-compose, or something else?
• Which lakeFS client are you using?  That is, I need either the full Maven coordinates ( it will look something like "io.lakefs
                                        
                                         lakefs spark client 301 2.12
                                        
                                        0.6.5", and we need to look at the entire name)  or the name of the jar (it will look something like ".../lakefs-spark-client-312-hadoop3-assembly-0.6.5.jar", and we need to look at the entire name).
THANKS!

Vaibhav Kumar


                                      04/09/2023, 7:03 AM


                                        
                                         @
                                         
                                         Ariel Shaqed (Scolnicov)
                                        
                                        is it possible for us to connect sometime for this issue and resolve things in a go? I feel to cut down on the iterations it would be easier as well. What do you suggest
                                        
                                        🙂
                                        
                                        ?


                                         I am using the below POM.xml which shows my spark ,hadoop lakefs version
Within Bagel I removed everything else but just kept Minio, Minio set up and lakefs
                                         
                                          
                                           
                                            
                                            Copy code
                                           
                                          
                                         
                                         <project xmlns="<http://maven.apache.org/POM/4.0.0>" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>" xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/maven-v4_0_0.xsd>">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.example</groupId>
  <artifactId>test-scala</artifactId>
  <version>1.0-SNAPSHOT</version>
  <name>${project.artifactId}</name>
  <description>My wonderfull scala app</description>
  <inceptionYear>2010</inceptionYear>
  <licenses>
    <license>
      <name>My License</name>
      <url>http://....</url>
      <distribution>repo</distribution>
    </license>
  </licenses>
  <properties>
    <maven.compiler.source>1.5</maven.compiler.source>
    <maven.compiler.target>1.5</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.13.0</scala.version>
  </properties>
  <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.13</artifactId>
      <version>3.2.1</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.13</artifactId>
      <version>3.2.1</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>io.lakefs</groupId>
      <artifactId>hadoop-lakefs</artifactId>
      <version>0.1.0</version>
    </dependency>
    <!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common> -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>3.3.5</version>
    </dependency>
    <!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws> -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-aws</artifactId>
      <version>3.3.5</version>
    </dependency>
  </dependencies>
  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <plugins>
    </plugins>
  </build>
</project>

Ariel Shaqed (Scolnicov)


                                       04/09/2023, 8:35 AM


                                         There are some mismatched versions in there
                                         
                                         😞
                                         
                                         • All Spark versions should be the same, and compatible with what you use for the Jar.  I would consider using spark-3.1.2 for everything.
• The FS version here seem very odd!  Our
                                         
                                          guide recommends
                                         
                                         using
                                         
                                          version 0.1.13
                                         
                                         , so something like
                                         
                                          
                                           
                                            
                                            Copy code
                                           
                                          
                                         




    

                                         <dependency>
      <groupId>io.lakefs</groupId>
      <artifactId>hadoop-lakefs-assembly</artifactId>
      <version>0.1.13</version>
    </dependency>

Vaibhav Kumar


                                       04/09/2023, 12:45 PM


                                         I changed the versions suggested by you. Now my program goes to hung state
                                         
                                          
                                           
                                            
                                            Copy code
                                           
                                          
                                         
                                         0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
489  [main] INFO  org.apache.commons.beanutils.FluentPropertyBeanIntrospector  - Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
498  [main] WARN  org.apache.hadoop.metrics2.impl.MetricsConfig  - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
588  [main] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - Scheduled Metric snapshot period at 10 second(s).
588  [main] INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl  - s3a-file-system metrics system started
                                         Revised POM
                                         
                                          
                                           
                                            
                                            Copy code
                                           
                                          
                                         
                                         <project xmlns="<http://maven.apache.org/POM/4.0.0>" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>" xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/maven-v4_0_0.xsd>">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.example</groupId>
  <artifactId>test-scala</artifactId>
  <version>1.0-SNAPSHOT</version>
  <name>${project.artifactId}</name>
  <description>My wonderfull scala app</description>
  <inceptionYear>2010</inceptionYear>
  <licenses>
    <license>
      <name>My License</name>
      <url>http://....</url>
      <distribution>repo</distribution>
    </license>
  </licenses>
  <properties>
    <maven.compiler.source>1.5</maven.compiler.source>
    <maven.compiler.target>1.5</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.13.0</scala.version>
  </properties>
  <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>3.1.2</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>3.1.2</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>io.lakefs</groupId>
      <artifactId>hadoop-lakefs-assembly</artifactId>
      <version>0.1.13</version>
    </dependency>
    <!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common> -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>3.1.2</version>
    </dependency>
    <!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws> -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-aws</artifactId>
      <version>3.1.2</version>
    </dependency>
  </dependencies>
  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <plugins>
    </plugins>
  </build>
</project>

Ariel Shaqed (Scolnicov)


                                       04/09/2023, 1:05 PM


                                         The fact that it doesn't do anything is worrying
                                         
                                         😞
                                         
                                         Spark is running, no worrying messages
                                         
                                         🙂
                                         
                                         I am really not sure where to go from here -- I would expect to see some outputs from spark-submit,
                                         
                                          somewhere
                                         
                                         .  Does Docker see any interesting log lines from any of the Spark containers?

Vaibhav Kumar


                                       04/09/2023, 3:36 PM


                                         I know it is a bit frustrating I have spent more than a week setting it but still not able to replicate it
                                         
                                         😑
                                         
                                         I am using intelliJ and running the App directly. Do I need to use spark-submit?

Ariel Shaqed (Scolnicov)


                                          04/09/2023, 3:46 PM


                                            I'll find someone who can advise with running in IntelliJ. I literally never run IntelliJ  (or Spark in the debugger).

Vaibhav Kumar


                                          04/09/2023, 3:48 PM


                                            I have spark container running as well. If you help me with how can I run the JAR in those container may be I can give it another try . there is running spark3.0 there not 3.1.2


                                             Some good news. I have tried spark submit and got a different error. I hope the below trace helps
                                             
                                              @
                                              
                                              Ariel Shaqed (Scolnicov)
                                             
                                             . I feel
                                             conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
                                             is doing something fishy it is not letting io.lakefs do things.
                                             
                                              
                                               
                                                
                                                Copy code
                                               
                                              
                                             
                                             Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class io.lakefs.LakeFSFileSystem not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
        at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:34)
        at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:578)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
                                             
                                              
                                               
                                                
                                                Copy code
                                               
                                              
                                             
                                             val spark = SparkSession.builder()
  .master("local[1]")
  .appName("SparkByExample")
  .getOrCreate();
println("First SparkContext:")
println("APP Name :"+spark.sparkContext.appName);
println("Deploy Mode :"+spark.sparkContext.deployMode);
println("Master :"+spark.sparkContext.master);
val conf = new Configuration()
conf.set("fs.s3a.access.key", "minioadmin")
conf.set("fs.s3a.secret.key", "minioadmin")
conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
val uri = "<lakefs://example/main/sample1.json>"
val path = new Path("<lakefs://example/main/sample1.json>")
val  fs =  FileSystem.get(URI.create(uri), conf)
fs.getFileStatus(path)
                                             
                                              
                                               
                                                
                                                Copy code
                                               
                                              
                                             
                                             <project xmlns="<http://maven.apache.org/POM/4.0.0>" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>"
         xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/maven-v4_0_0.xsd>">
    <groupId>com.sparkbyexamples</groupId>
    <modelVersion>4.0.0</modelVersion>
    <artifactId>spark-scala-examples</artifactId>
    <version>1.0-SNAPSHOT</version>
    <inceptionYear>2008</inceptionYear>
    <packaging>jar</packaging>
    <properties>
        <scala.version>2.12.12</scala.version>
        <spark.version>3.0.0</spark.version>
    </properties>
    <repositories>
        <repository>
            <id><http://scala-tools.org|scala-tools.org></id>
            <name>Scala-Tools Maven2 Repository</name>
            <url><http://scala-tools.org/repo-releases></url>
        </repository>
    </repositories>
    <pluginRepositories>
        <pluginRepository>
            <id><http://scala-tools.org|scala-tools.org></id>
            <name>Scala-Tools Maven2 Repository</name>
            <url><http://scala-tools.org/repo-releases></url>
        </pluginRepository>
    </pluginRepositories>
    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
        <dependency>
            <groupId>org.specs</groupId>
            <artifactId>specs</artifactId>
            <version>1.2.5</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.thoughtworks.xstream</groupId>
            <artifactId>xstream</artifactId>
            <version>1.4.11</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>${spark.version}</version>
            <scope>compile</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.12</artifactId>
            <version>${spark.version}</version>
         <scope>compile</scope>
        </dependency>
        <dependency>
           <groupId>com.databricks</groupId>
           <artifactId>spark-xml_2.11</artifactId>
           <version>0.4.1</version>
       </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-avro_2.12</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>io.lakefs</groupId>
            <artifactId>hadoop-lakefs-assembly</artifactId>
            <version>0.1.13</version>
        </dependency>
        <!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common> -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.0.0</version>
        </dependency>
    </dependencies>
    <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <resources><resource><directory>src/main/resources</directory></resource></resources>
        <plugins>
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <scalaVersion>${scala.version}</scalaVersion>
                        <arg>-target:jvm-1.5</arg>
                    </args>
                </configuration>
            </plugin>
        </plugins>
    </build>
    <reporting>
        <plugins>
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <configuration>
                    <scalaVersion>${scala.version}</scalaVersion>
                </configuration>
            </plugin>
        </plugins>
    </reporting>
</project>

Yoni Augarten


                                           04/10/2023, 9:16 AM


                                             Hey
                                             
                                              @
                                              
                                              Vaibhav Kumar
                                             
                                             , how are you running your code? I was able to use your pom file to successfully run the attached code.
I used the following command:
                                             
                                              
                                               
                                                
                                                Copy code
                                               
                                              
                                             
                                             mvn package exec:java -Dexec.mainClass=Main
                                             (Assuming your code is in a main method in an object called Main)

Vaibhav Kumar


                                           04/10/2023, 9:31 AM


                                             
                                              @
                                              
                                              Yoni Augarten
                                             
                                             I am using
                                             spark-submit --class com.sparkbyexamples.spark.SparkSessionTest target/spark-scala-examples-1.0-SNAPSHOT.jar
                                             
                                              Below is my whole code
                                             
                                             
                                              
                                               
                                                
                                                Copy code
                                               
                                              
                                             
                                             package com.sparkbyexamples.spark
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import java.net.URI
import org.apache.spark.sql.SparkSession
import io.lakefs
object SparkSessionTest {
  def main(args:Array[String]): Unit ={
    val spark = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample")
      .getOrCreate();
    println("First SparkContext:")
    println("APP Name :"+spark.sparkContext.appName);
    println("Deploy Mode :"+spark.sparkContext.deployMode);
    println("Master :"+spark.sparkContext.master);
    val conf = new Configuration()
    conf.set("fs.s3a.access.key", "minioadmin")
    conf.set("fs.s3a.secret.key", "minioadmin")
    conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
    conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
    conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
    conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
    conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
    val uri = "<lakefs://example/main/sample1.json>"
    val path = new Path("<lakefs://example/main/sample1.json>")
    val  fs =  FileSystem.get(URI.create(uri), conf)
    fs.getFileStatus(path)
}

Yoni Augarten


                                           04/10/2023, 9:40 AM


                                             I don't think your Jar contains any of your dependencies, that's why LakeFSFileSystem cannot be found. Make sure you are running your program with a classpath that includes all of the Maven dependencies. This can be done either using Maven like I did above, or by creating an "uber-jar", or by using an IDE to manage dependencies, or by some other way.

Ariel Shaqed (Scolnicov)


                                           04/10/2023, 9:44 AM


                                             Thanks, Yoni, that really sounds like a Spark issue!
Vaibhav, are you using a downloaded jar or did you build it yourself? You'll want to run "mvn package" to assemble a hat that brings in all its dependencies.

Vaibhav Kumar


                                           04/10/2023, 9:47 AM


                                             I can see the jar under External Libraries .PFB the screenshots.

Yoni Augarten


                                                 04/10/2023, 9:50 AM


                                                   Then if you run it from your IDE it should work. When you simply spark-submit a jar, these external dependencies are not included.

Vaibhav Kumar


                                                 04/10/2023, 9:54 AM


                                                   but when we package a project to JAR then all code and dependencies comes at a single place right which is the JAR  Another point is if lib was not found my import statements would have failed but those are passing
Let me know if my understanding is correct

Yoni Augarten


                                                 04/10/2023, 9:55 AM


                                                   In what way are they "passing"?


                                                    Import statements in Java are not executed in the same way code is


                                                     Regarding the JAR, it depends how you created it. Since your pom doesn't include any plugin that knows to create an uber-jar, your jar probably only includes
                                                     
                                                      your
                                                     
                                                     compiled classes, and not other dependencies.

Vaibhav Kumar


                                                   04/10/2023, 10:03 AM


                                                     
                                                      @
                                                      
                                                      Yoni Augarten
                                                     
                                                     I think you are right when I run from IDE directly it is now throwing a different error
                                                     
                                                      
                                                       
                                                        
                                                        Copy code
                                                       
                                                      
                                                     
                                                     Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2479)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3254)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3286)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
	at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:154)
	at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
	at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:34)
	at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2383)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2477)
	... 16 more


                                                      aws-sdk is missing i think

Yoni Augarten


                                                    04/10/2023, 10:05 AM


                                                      
                                                       @
                                                       
                                                       Vaibhav Kumar
                                                      
                                                      - thanks for updating. Developing Spark applications does have its learning curve when trying to work with external libraries. Some are easier to import using pom files, while with others there is no other choice but downloading the Jar from the internet. Unfortunately there is no one-size-fits-all approach as there are just too many moving parts. It's a matter of trial and error in the end.

Vaibhav Kumar


                                                    04/10/2023, 10:06 AM


                                                      no problem at the end we all are leaning and growing
                                                      
                                                      🙂


                                                       I reran after importing  the below module and now my program is in hung state
                                                       
                                                        
                                                         
                                                          
                                                          Copy code
                                                         
                                                        
                                                       




    

                                                       <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-aws</artifactId>
    <version>3.0.0</version>
</dependency>


                                                        while copying I get the below error most of the time
lakeFS % ./lakectl fs upload -s /Users/simar/Downloads/sample1.json -d
                                                        
                                                         lakefs://example/main/
                                                        
                                                        request failed: parameter "path" in query has an error: value is required but missing: value is required but missing

Ariel Shaqed (Scolnicov)


                                                      04/10/2023, 10:24 AM


                                                        Can you try adding a path at the end of the destination URL?

Vaibhav Kumar


                                                      04/10/2023, 10:27 AM


                                                        path as in ?
                                                        
                                                         lakefs://example/main/
                                                        
                                                        is the path where I want to add the file

Ariel Shaqed (Scolnicov)


                                                      04/10/2023, 10:38 AM


                                                        
                                                         lakefs://example/main/
                                                        
                                                        is a root, can you try
                                                        
                                                         lakefs://example/main/sample1.json
                                                        
                                                        ?

Vaibhav Kumar


                                                      04/10/2023, 11:18 AM


                                                        
                                                         @
                                                         
                                                         Ariel Shaqed (Scolnicov)
                                                        
                                                        
                                                         @
                                                         
                                                         Yoni Augarten
                                                        
                                                        I saw some different error this time. This time there is some connection issue with Minio. Please see the log below
                                                        
                                                         
                                                          
                                                           
                                                           Copy code
                                                          
                                                         
                                                        
                                                        Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on example: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to 127.0.0.1:9090 [/127.0.0.1] failed: Connection refused (Connection refused): Unable to execute HTTP request: Connect to 127.0.0.1:9090 [/127.0.0.1] failed: Connection refused (Connection refused)
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:144)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:332)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:275)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
	at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:154)
	at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
	at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:34)
	at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)


                                                         Minio is running already at
                                                         
                                                          http://127.0.0.1:9001/
                                                         
                                                         in my browser

Ariel Shaqed (Scolnicov)


                                                       04/10/2023, 11:31 AM


                                                         I think you have a configuration issue with your S3 endoint somewhere: it says
                                                         
                                                          
                                                           
                                                            
                                                            Copy code
                                                           
                                                          
                                                         
                                                         Connect to 127.0.0.1:9090 [/127.0.0.1] failed: Connection refused (Connection refused):
                                                         and it looks to be using another port.

Vaibhav Kumar


                                                       04/10/2023, 11:55 AM


                                                         
                                                          @
                                                          
                                                          Ariel Shaqed (Scolnicov)
                                                         
                                                         it has worked I changed the port to the API one.
                                                         
                                                          jumping lakefs


                                                          but how to print anything out of it?  I mean when the file was present it worked without any error
But when the file is not present it didn't exactly gave me the error that you showed  in the issue in the stack trace.
                                                          
                                                           
                                                            
                                                             
                                                             Copy code
                                                            
                                                           
                                                          
                                                          Exception in thread "main" java.io.FileNotFoundException: <lakefs://example/main/sample2.json> not found
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:784)
	at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:74)
	at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:35)
	at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)

Ariel Shaqed (Scolnicov)


                                                        04/10/2023, 12:11 PM


                                                          Yay, now we're cooking!
Yeah, that's the good flow. Now try a repo that doesn't exist, or a branch that doesn't exist.

Vaibhav Kumar


                                                        04/10/2023, 12:17 PM


                                                          Yup, It worked fully.


                                                           Now, moving and solving the problem I need to gather the response from lakefs server that is being received  in LakeFSFileSystem right?

Ariel Shaqed (Scolnicov)


                                                         04/10/2023, 12:30 PM


                                                           Yeah, swagger.yml can give all possible return status codes and messages. This issue is to translate these exceptions into e.g. FileNotFoundException, which is more informative to the user.

Vaibhav Kumar


                                                         04/10/2023, 1:00 PM


                                                           
                                                            https://github.com/treeverse/lakeFS/blob/master/api/swagger.yml
                                                           
                                                           this one right?

Ariel Shaqed (Scolnicov)


                                                         04/10/2023, 1:14 PM


                                                           Yup! lakeFSFS problems a bunch of calls, reach can fail. I would start with the open and create methods, they hold many of the most common failures.
After that list, getFileStatus, listStatus might also have similar failure modes.

Vaibhav Kumar


                                                         05/10/2023, 5:28 PM


                                                           
                                                            @
                                                            
                                                            Ariel Shaqed (Scolnicov)
                                                           
                                                           I went through the swagger doc and responses we are getting using postman.
I think I have figured out where we need to add the code. It should be somewhere in
                                                           readNextChunk()
                                                           in
                                                           LakeFSFileSystem.java
                                                           I am not getting how shall I get the response code in
                                                           LakeFSFileSystem.java

Elad Lachmi


                                                            05/10/2023, 5:33 PM


                                                              Hi
                                                              
                                                               @
                                                               
                                                               Vaibhav Kumar
                                                              
                                                              ,
To make sure I understand what you’re trying to do…
The idea is to handle errors better instead of simply throwing them?

Vaibhav Kumar


                                                            05/10/2023, 5:37 PM


                                                              I want to get the response directly from the lakefs server for clear messages. If you see in this
                                                              
                                                               issue
                                                              
                                                              which I am working on the error thrown is not very clear. While if you hit the list object api we get proper error like '"repository not found". I have set up ready already to reproduce the same error but I am not sure how to get those response codes
                                                              
                                                               @
                                                               
                                                               Elad Lachmi

Elad Lachmi


                                                            05/10/2023, 5:44 PM


                                                              It depends on how the client is implemented
If the client throws an error for any non-200 response HTTP status, then you’ll need to handle it in the
                                                              catch
                                                              block
If it throws only for e.g. 5xx, then it might require checking the response status on the
                                                              resp

If the client throws an error for any non-200 response HTTP status, then you’ll need to handle it in the catch block


                                                               👆🏻
                                                               
                                                               Looking through some of the existing code, it looks like this is the case

Vaibhav Kumar


                                                             05/10/2023, 5:49 PM


                                                               yes you are correct but I am not getting any suggestions around response code in the below screenshot

Elad Lachmi


                                                                05/10/2023, 5:51 PM


                                                                  You need to look at
                                                                  e
                                                                  I believe
                                                                  resp
                                                                  is out of scope when you’re in the
                                                                  catch
                                                                  block

Vaibhav Kumar


                                                                05/10/2023, 5:53 PM


                                                                  Same with
                                                                  e
                                                                  as well

Elad Lachmi


                                                                   05/10/2023, 5:56 PM

ApiException


                                                                     has a
                                                                     getCode()
                                                                     method and a few more you might want to use for this purpose

Vaibhav Kumar


                                                                   05/10/2023, 6:06 PM


                                                                     ok, can you please share some document for my reference?  I am not getting it how will I use it.

Elad Lachmi


                                                                   05/10/2023, 6:08 PM


                                                                     You can have the
                                                                     ApiException
                                                                     class here
                                                                     
                                                                      
                                                                       
                                                                        
                                                                        Copy code
                                                                       
                                                                      
                                                                     
                                                                     clients/java/src/main/java/io/lakefs/clients/api/ApiException.java
                                                                     You can look at it and see which methods are available

Vaibhav Kumar


                                                                   05/10/2023, 6:12 PM


                                                                     yes I can see now
                                                                     getCode()
                                                                     is there. So now all I have to do is
                                                                     e.getCode()
                                                                     in the catch block right?

Elad Lachmi


                                                                      05/10/2023, 6:24 PM


                                                                        Yes, and you have the
                                                                        HttpStatus
                                                                        enum to compare to for different HTTP statuses

Vaibhav Kumar


                                                                      05/10/2023, 6:29 PM


                                                                        ok sure,
So after making the changes in this code. I can just run the lakeFS go server and test my changes right?
To replicate the error till nor I was using hadoop client (pasted below) from my intelliJ. This talks to my lakefs docker and MInio as of now.
                                                                        
                                                                         
                                                                          
                                                                           
                                                                           Copy code
                                                                          
                                                                         
                                                                        
                                                                        def main(args : Array[String]) {
//    val spark = SparkSession.builder.master("local[1]").appName("SparkByExample").getOrCreate
    val conf = new Configuration()
        conf.set("fs.s3a.access.key", "minioadmin")
        conf.set("fs.s3a.secret.key", "minioadmin")
        conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
        conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
        conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
        conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
        conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
    val uri = "<lakefs://test1/main/sample1.json>"
    val path = new Path("<lakefs://test1/main/sample1.json>")
    val  fs =  FileSystem.get(URI.create(uri), conf)
    fs.getFileStatus(path)

Elad Lachmi


                                                                      05/10/2023, 6:30 PM


                                                                        Sounds like a good plan
Good luck!

Vaibhav Kumar


                                                                      05/10/2023, 6:31 PM


                                                                        Thankyou
                                                                        
                                                                         @
                                                                         
                                                                         Elad Lachmi
                                                                        
                                                                        . Appreciate this help from you.
Will let you my findings.

Elad Lachmi


                                                                      05/10/2023, 6:32 PM


                                                                        Sure, happy to help
                                                                        
                                                                        🙂

Vaibhav Kumar


                                                                      05/13/2023, 6:38 PM


                                                                        I have made the changes to my
                                                                        LakefsFilesystem.java
                                                                        and run it from the go command
                                                                        lakefs % go run main.go run --local-settings
                                                                        I have created a hadoop client on my local and trying to test the changes (response code in exception)] to my lakefs code .
The changes that I have made I cannot see those reflecting in the logs when I run my client shown below.
Do you know what could be the issue here?
                                                                        
                                                                         
                                                                          
                                                                           
                                                                           Copy code
                                                                          
                                                                         
                                                                        




    

                                                                        package org.example
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import java.net.URI
object SparkSessionTest {
  def main(args : Array[String]) {
    val conf = new Configuration()
        conf.set("fs.s3a.access.key", "minioadmin")
        conf.set("fs.s3a.secret.key", "minioadmin")
        conf.set("fs.lakefs.access.key", "AKIAJLDV6JMK2R5TRQSQ")
        conf.set("fs.lakefs.secret.key", "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")
        conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
        conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
        conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
    val uri = "<lakefs://test1/main/sample1.json>"
    val path = new Path("<lakefs://test1/main/sample1.json>")
    val  fs =  FileSystem.get(URI.create(uri), conf)
    fs.getFileStatus(path)

Isan Rivkin


                                                                      05/14/2023, 8:28 AM


                                                                        Hi, I would first make sure that I’m setting the correct log level, second would be making sure that the code change you made are reflecting in the code that is running.

Vaibhav Kumar


                                                                      05/14/2023, 8:37 AM


                                                                        
                                                                         @
                                                                         
                                                                         Isan Rivkin
                                                                        
                                                                        1. I am using
                                                                        
                                                                         LOG.info
                                                                        
                                                                        in
                                                                        LakefsFilesystem.java
                                                                        . Let me know if that is incorrect
2. I have the server running from this changed code. I have the client running separately. This thing I have checked.


                                                                         I have observed something unusual , Earlier when I was running my lakefs,Minio in a docker  setup(Baggle). So as soon as I upload file to lakefs repo it got reflected in minio as well.
Now as I running lakefs from terminal using go command and minio still from docker. I cannot see the same behaviour . the file are not syncing.  Now I am getting error like
                                                                         lakeFS blockstore type local unsupported by this FileSystem

Isan Rivkin


                                                                       05/14/2023, 9:04 AM


                                                                         I can’t tell if INFO log level is the right one, it all depends on the logs you are trying to see, if you’re logging DEBUG then you should probably use that.
Regarding your error, looking at the PhysicalAddressTranslator.java class the Hadoop client supports s3 and does not support local storage.

Vaibhav Kumar


                                                                       05/14/2023, 10:16 AM


                                                                         Yes but I am using minio storage and I have used the properties accordingly in client
I have tried with
                                                                         LOG.debug
                                                                         as well but it is not working.
Can someone from Lakefs connect with me for sometime on this? It is not a really complex PR but to set things for testing it took me a lot of time.

Isan Rivkin


                                                                       05/14/2023, 10:54 AM


                                                                         It would be more beneficial for the community to be able to search different questions.
Im not sure what is the issue you are currently facing, outside the PR you are trying to do. What’s the issue you are asking about because this thread contains a lot of conversations. Is your current problem that you run lakeFS and expect to see logs that you added but don’t actually see them?

Vaibhav Kumar


                                                                       05/14/2023, 11:37 AM


                                                                         I am working on this
                                                                         
                                                                          issue
                                                                         
                                                                         . Upon further investigating I got to know that I have to work on catching some server responses in this file
                                                                         
                                                                          File
                                                                         
                                                                         .
So now I am trying to add some debug statement in the same file under
                                                                         getFileStatus
                                                                         function . I am trying to hit the lakeFS from outside by creating the hadoop client and then to check if I get those logs or not.

Isan Rivkin


                                                                       05/14/2023, 12:01 PM


                                                                         Thanks for the context, To solve the logs issue I would suggest adding simple log even at the start of the code you are running in Java and try to see if it logs or not, then if not debug it from there.

Vaibhav Kumar


                                                                       05/14/2023, 12:28 PM


                                                                         I have tried almost everything around logs.but it is not working.

Isan Rivkin


                                                                       05/14/2023, 5:24 PM


                                                                         If you see other logs then I guess it’s either that that the Java code you are running is not the code you think is running therefore your logs are not appearing, or something overrides the log level. Try making a change that’s not related to logs then and see if the behavior changes at all.

Vaibhav Kumar


                                                                       05/14/2023, 6:18 PM


                                                                         
                                                                          @
                                                                          
                                                                          Isan Rivkin
                                                                         
                                                                         I think I got the issue. I am running this
                                                                         
                                                                          compose
                                                                         
                                                                         file which obviously pulls the latest lakefs image and not my changed code.
When I added some logs in the main.go file and running it locally I can see the logs from
                                                                         main.go
                                                                         on my terminal.
Now to narrow down the problem how shall I set the blockstore to minio when I run using go run command.
 In other words what are the argument required to set the below properties in
                                                                         go run main.go run --local-settings
                                                                         
                                                                          
                                                                           
                                                                            
                                                                            Copy code
                                                                           
                                                                          
                                                                         




    

                                                                         ## Commands from docker compose file to set blockstore to MINIO 
- LAKEFS_DATABASE_TYPE=local
      - LAKEFS_BLOCKSTORE_TYPE=s3
      - LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true
      - LAKEFS_BLOCKSTORE_S3_ENDPOINT=<http://minio:9000>
      - LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=minioadmin
      - LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY=minioadmin
      - LAKEFS_AUTH_ENCRYPT_SECRET_KEY=some random secret string
      - LAKEFS_STATS_ENABLED
      - LAKEFS_LOGGING_LEVEL
      - LAKECTL_CREDENTIALS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
      - LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
      - LAKECTL_SERVER_ENDPOINT_URL=<http://localhost:8000>

Isan Rivkin


                                                                       05/14/2023, 6:20 PM


                                                                         Glad the logs issue solved

Vaibhav Kumar


                                                                       05/14/2023, 6:26 PM


                                                                         yes, but I still have the other part left to be solved
                                                                         
                                                                         😄

Ariel Shaqed (Scolnicov)


                                                                       05/14/2023, 6:27 PM


                                                                         I believe you can change your compose file to run treeverse/lakefs:dev. Now if you
                                                                         make build-docker
                                                                         and rm the lakeFS container and start a new one, you should be able to see your code.
Alternatively, you might be able to
                                                                         docker cp
                                                                         a lakefs executable into the container on /app/lakefs. Restart that container and your code should run. (This one can be trickier if your container runs a sufficiently different version of Linux than your physical machine.)

Vaibhav Kumar


                                                                       05/14/2023, 6:47 PM


                                                                         As of now I found this
                                                                         
                                                                          config do
                                                                         
                                                                         c . So I was creating `.lakefs.yaml`file and adding configs this way
                                                                         database.type="local"
                                                                         let me know if I am going in the right direction.
Sorry, but I didn't followed you exactly on the docker related stuff that you have suggested. I think you are saying to build the local image using pom.xml?  Can you share some doc for me to take a look into that as well?


                                                                          I have set the config the below way in
                                                                          .lakefs.yaml
                                                                          but I am getting some syntax issues
                                                                          
                                                                          🫣
                                                                          
                                                                          
                                                                           
                                                                            
                                                                             
                                                                             Copy code
                                                                            
                                                                           
                                                                          
                                                                          database.type="local"
blockstore.type="s3"