傻傻的双杠 · Compiling XLA with ...· 3 周前 · |
傲视众生的开水瓶 · Git Blame Info - ...· 3 周前 · |
坏坏的甜瓜 · SQLFetchScroll ...· 1 周前 · |
温柔的大葱 · 斗罗:尸祖将臣,初拥复活比比东-月飘柔-微信读书· 2 月前 · |
完美的皮带 · ion-button:具有自定义 CSS ...· 5 月前 · |
近视的咖啡 · 在 C# 中创建位图 | C# ...· 5 月前 · |
拉风的蘑菇 · 奥格斯的法则在线,观看免费,播放,剪辑视频 ...· 5 月前 · |
谦和的机器人 · Solved: Search ...· 5 月前 · |
info |
https://forum.lakefs.io/t/10064740/hi-team-i-want-to-to-use-lakefs-in-the-debugger-mode-can-you |
彷徨的脆皮肠
10 月前 |
Vaibhav Kumar
03/20/2023, 2:13 PMLynn Rozen
03/20/2023, 2:31 PMVaibhav Kumar
03/20/2023, 2:32 PMLynn Rozen
03/20/2023, 2:32 PMVaibhav Kumar
03/20/2023, 2:34 PMLynn Rozen
03/20/2023, 3:17 PMVaibhav Kumar
04/01/2023, 11:40 AMspark.read.parquet("<lakefs://repo-that-doesnt-exist/main/path/to/data>")
for the control to come to
HadoopFS client
the lakeFS should be running somewhere right and within that I will have to put the debugger.
Does that make sense?
Yoni Augarten
04/01/2023, 11:45 AMVaibhav Kumar
04/01/2023, 11:51 AMspark.read.parquet("<lakefs://repo-that-doesnt-exist/main/path/to/data>") -
-> I know i can put a debugger on spark app side
2.
Hadoopfs
The server side I am not sure how to use it in debugger mode. And how things will land here?
Yoni Augarten
04/01/2023, 11:55 AMmake all
4. Import the project to your IDE.
5. Run
cmd/lakefs/cmd/main.go
with the
run --local-settings
arguments. (you will later change these to connect your installation to the storage).
6. Come back here and let me know how it works
🙂
Vaibhav Kumar
04/01/2023, 12:00 PMmake-all
I got the below error
go: downloading <http://golang.org/x/xerrors|golang.org/x/xerrors> v0.0.0-20200804184101-5ec99f83aff1
go: downloading <http://golang.org/x/sys|golang.org/x/sys> v0.0.0-20210510120138-977fb7262007
Makefile:103: *** "Missing dependency - no docker in PATH". Stop.
Yoni Augarten
04/01/2023, 12:06 PMmake gen
instead - but you may still need docker.
Vaibhav Kumar
04/01/2023, 12:57 PMgo run main.go run --local-settings
As main.go was a Go file and our HadoopFS client is in Java so how do I link these things together. Eventually I have to put breakpoints in HadoopFS client code.
Yoni Augarten
04/01/2023, 1:17 PMclients/hadoopfs
as a separate IDE project.
2. Write a main method to call the code that you want to test. You will set the
fs.lakefs.*
configuration to point to your local instance of lakeFS.
This is simpler than running a Spark program and debugging it - although that's also possible. Let me know if that makes sense.
Vaibhav Kumar
04/01/2023, 1:31 PMgetFileStatus
(Line 749) is the first function where the call goes to.This function expect
path
as a param, So I hope this the same spark path which we pass from
spark.read.parquet("<lakefs://repo-that-doesnt-exist/main/path/to/data>")
Kindly confirm if my observation is correct.
Trace from the issue 2801
java.io.IOException: listObjects
at io.lakefs.LakeFSFileSystem$ListingIterator.readNextChunk(LakeFSFileSystem.java:901)
at io.lakefs.LakeFSFileSystem$ListingIterator.hasNext(LakeFSFileSystem.java:881)
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:707)
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:40)
at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1439)
Yoni Augarten
04/01/2023, 1:38 PMlakefs://
paths when calling
getFileStatus
on the LakeFSFileSystem.
Vaibhav Kumar
04/01/2023, 1:59 PMYoni Augarten
04/01/2023, 2:04 PMVaibhav Kumar
04/01/2023, 7:05 PMCaused by: java.lang.IllegalArgumentException: Unsupported class file major version 63
at net.bytebuddy.jar.asm.ClassReader.<init>(ClassReader.java:196)
at net.bytebuddy.jar.asm.ClassReader.<init>(ClassReader.java:177)
at net.bytebuddy.jar.asm.ClassReader.<init>(ClassReader.java:163)
at net.bytebuddy.utility.OpenedClassReader.of(OpenedClassReader.java:86)
at net.bytebuddy.dynamic.scaffold.TypeWriter$Default$ForInlining.create(TypeWriter.java:3889)
at net.bytebuddy.dynamic.scaffold.TypeWriter$Default.make(TypeWriter.java:2166)
at net.bytebuddy.dynamic.scaffold.inline.RedefinitionDynamicTypeBuilder.make(RedefinitionDynamicTypeBuilder.java:224)
at net.bytebuddy.dynamic.scaffold.inline.AbstractInliningDynamicTypeBuilder.make(AbstractInliningDynamicTypeBuilder.java:123)
at net.bytebuddy.dynamic.DynamicType$Builder$AbstractBase.make(DynamicType.java:3659)
at org.mockito.internal.creation.bytebuddy.InlineBytecodeGenerator.transform(InlineBytecodeGenerator.java:391)
at java.instrument/java.lang.instrument.ClassFileTransformer.transform(ClassFileTransformer.java:244)
at java.instrument/sun.instrument.TransformerManager.transform(TransformerManager.java:188)
at java.instrument/sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:541)
at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method)
at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:169)
at org.mockito.internal.creation.bytebuddy.InlineBytecodeGenerator.triggerRetransformation(InlineBytecodeGenerator.java:276)
... 46 more
Running io.lakefs.FSConfigurationTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
Results :
Tests in error:
testGetFileStatus_ExistingFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_ExistsAsDirectoryInSecondList(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_NotExistsNoPrefix(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingDirToExistingFileName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDeleteDirectoryRecursiveBatch120(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDeleteDirectoryRecursiveBatch123(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testCreateExistingDirectory(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_ExistsAsDirectoryContents(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_srcEqualsDst(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingDirToNonExistingDirWithParent(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
getUri(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testOpen(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDelete_FileNotExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingDirToExistingNonEmptyDirName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingFileToExistingDirName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testListStatusDirectory(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_NotExistsPrefixWithNoSlashTwoLists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_nonExistingSrcFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_NotExistsPrefixWithNoSlash(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testListStatusNotFound(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_srcAndDstOnDifferentBranch(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDelete_NotExistsRecursive(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDelete_FileExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDelete_EmptyDirectoryExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_ExistsAsObject(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDeleteDirectoryRecursiveBatch1(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDeleteDirectoryRecursiveBatch2(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDeleteDirectoryRecursiveBatch3(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDeleteDirectoryRecursiveBatch5(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testGetFileStatus_NoFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testCreateExistingFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingDirToNonExistingDirWithoutParent(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testListStatusFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testAppend(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testCreate(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_fallbackStageAPI(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testOpen_NotExists(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testMkdirs(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testGetFileStatus_DirectoryMarker(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testGlobStatus_SingleFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingFileToExistingFileName(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDelete_DirectoryWithFile(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testDelete_DirectoryWithFileRecursive(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testExists_ExistsAsDirectoryMarker(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testRename_existingFileToNonExistingDst(io.lakefs.LakeFSFileSystemSimpleModeTest): (..)
testGetFileStatus_ExistingFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_ExistsAsDirectoryInSecondList(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_NotExistsNoPrefix(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingDirToExistingFileName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDeleteDirectoryRecursiveBatch120(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDeleteDirectoryRecursiveBatch123(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testCreateExistingDirectory(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_ExistsAsDirectoryContents(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_srcEqualsDst(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingDirToNonExistingDirWithParent(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
getUri(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testOpen(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDelete_FileNotExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingDirToExistingNonEmptyDirName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingFileToExistingDirName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testListStatusDirectory(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_NotExistsPrefixWithNoSlashTwoLists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_nonExistingSrcFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_NotExistsPrefixWithNoSlash(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testListStatusNotFound(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_srcAndDstOnDifferentBranch(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDelete_NotExistsRecursive(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDelete_FileExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDelete_EmptyDirectoryExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_ExistsAsObject(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDeleteDirectoryRecursiveBatch1(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDeleteDirectoryRecursiveBatch2(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDeleteDirectoryRecursiveBatch3(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDeleteDirectoryRecursiveBatch5(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testGetFileStatus_NoFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testCreateExistingFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingDirToNonExistingDirWithoutParent(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testListStatusFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testAppend(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testCreate(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_fallbackStageAPI(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testOpen_NotExists(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testMkdirs(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testGetFileStatus_DirectoryMarker(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testGlobStatus_SingleFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingFileToExistingFileName(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDelete_DirectoryWithFile(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testDelete_DirectoryWithFileRecursive(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testExists_ExistsAsDirectoryMarker(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
testRename_existingFileToNonExistingDst(io.lakefs.LakeFSFileSystemPresignedModeTest): (..)
Tests run: 105, Failures: 0, Errors: 90, Skipped: 0
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 05:05 min
[INFO] Finished at: 2023-04-02T00:19:55+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project hadoop-lakefs: There are test failures.
[ERROR]
[ERROR] Please refer to /Users/simar/lakeFS/clients/hadoopfs/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] <http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException>
Yoni Augarten
04/01/2023, 7:06 PMVaibhav Kumar
04/01/2023, 7:10 PM[INFO] Building jar: /Users/simar/lakeFS/clients/hadoopfs/target/hadoop-lakefs-0.1.0.jar
[INFO]
[INFO] --- gpg:1.5:sign (sign-artifacts) @ hadoop-lakefs ---
Downloading from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.pom>
Downloaded from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.pom> (3.1 kB at 51 kB/s)
Downloading from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.jar>
Downloaded from central: <https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.jar> (239 kB at 4.1 MB/s)
/bin/sh: gpg: command not found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:15 min
[INFO] Finished at: 2023-04-02T20:55:03+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-gpg-plugin:1.5:sign (sign-artifacts) on project hadoop-lakefs: Exit code: 127 -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] <http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException>
Yoni Augarten
04/02/2023, 3:55 PM-P'!treeverse-signing'
to the Maven command
Vaibhav Kumar
04/02/2023, 4:39 PMLakeFSFileStatus
I am not sure how to use it
Yoni Augarten
04/02/2023, 4:42 PMVaibhav Kumar
04/02/2023, 4:44 PMgetFileStatus
and will pass the lakeFS file path here to read in it.
Yoni Augarten
04/02/2023, 4:46 PMPath p = new Path("<lakefs://my-repo/main/1.txt>");
LakeFSFileSystem fs = FileSystem.get(hadoopConf, p);
fs.getFileStatus(p);
Vaibhav Kumar
04/02/2023, 4:54 PMgo run main.go run --local-settings
Yoni Augarten
04/02/2023, 4:58 PMVaibhav Kumar
04/02/2023, 6:14 PMspark.sparkContext.hadoopConfiguration.set
to set up Lakefs client but seems the syntax is not working. In the doc it was mentioned to use it the below way.
def main(args : Array[String]) {
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample")
.getOrCreate();
val path = new Path("<S3a://test-repo/main/sample.json>")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "AKIAJLDV6JMK2R5TRQSQ")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "oOal7VcsJQQnGoPcM9AEYXCe1Q76PHMpX4R1+Ai+")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", "http:localhost:8000")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.path.style.access", "true")
val y = new LakeFSFileSystem()
y.getFileStatus(path)
Error
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "io.lakefs.LakeFSClient.getObjectsApi()" because "this.lfsClient" is null
Yoni Augarten
04/03/2023, 9:04 AMFileSystem.get
like I mentioned above.
Vaibhav Kumar
04/03/2023, 10:01 AMYoni Augarten
04/03/2023, 10:02 AMfs.s3a
configuration will not use LakeFSFileSystem, so it's not relevant for solving this issue.
Vaibhav Kumar
04/03/2023, 11:50 AMPath p = new Path("<lakefs://my-repo/main/1.txt>");
LakeFSFileSystem fs = FileSystem.get(hadoopConf, p);
fs.getFileStatus(p);
In the above function shall I pass
hadoopConf
as a map of fs.* params?
Yoni Augarten
04/03/2023, 11:52 AMVaibhav Kumar
04/07/2023, 2:39 PMs3a://
. Please confirm If I have used it the correct way?
One observation
FileSystem._get_
is working with the URI option not directly with the
path
variable
def main(args : Array[String]) {
val conf = new Configuration()
conf.set("fs.s3a.access.key", "AKIAJLDV6JMK2R5TRQSQ")
conf.set("fs.s3a.secret.key", "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")
conf.set("fs.s3a.endpoint", "<http://localhost:8000>")
conf.set("fs.s3a.path.style.access", "true")
val uri = "<s3a://test-repo/main/sample.json>"
val path = new Path("<s3a://test-repo/main/sample1.json>")
URI._create_(uri)
val fs = FileSystem._get_(URI._create_(uri), conf)
fs.getFileStatus(path)
Elad Lachmi
04/07/2023, 2:53 PMlakefs://
uri, as you mentioned.
Vaibhav Kumar
04/07/2023, 2:57 PMElad Lachmi
04/07/2023, 3:01 PMVaibhav Kumar
04/07/2023, 3:05 PMfs.getFileStatus(path)
from hadoop FileSytems above but according to the issue stack trace the function call should be from
LakefsFileSystem
?
Elad Lachmi
04/07/2023, 3:10 PMlakefs://
uri tells the
FileSystem
instance to use lakeFSFS, that's why you need the
lakefs://
uri
Vaibhav Kumar
04/07/2023, 5:48 PMlakefs://
it doesn't look like it is referring to
hadoopFs
client from lakefs. The trace below still shows hadoop's Filesystem error.
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "lakefs"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.example.SparkSessionTest$.main(SparkSessionTest.scala:27)
at org.example.SparkSessionTest.main(SparkSessionTest.scala)
Elad Lachmi
04/07/2023, 6:04 PMfs.lakefs.impl
?
That’s what registers a handler for the
lakefs://
URIs
Vaibhav Kumar
04/08/2023, 9:15 AMS3
or
lakefs
under uri and path. Below is my code
def main(args : Array[String]) {
val conf = new Configuration()
conf.set("fs.s3a.access.key", "0yfZnzCeJdB9Y2i1")
conf.set("fs.s3a.secret.key", "uhYMtk6s97qLKs8jnJhrIMLfBs3uGkv6")
conf.set("fs.lakefs.access.key", "AKIAJLDV6JMK2R5TRQSQ")
conf.set("fs.lakefs.secret.key", "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")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
val uri = "<lakefs://s3bucket/sample1.json>"
val path = new Path("<lakefs://s3bucket/sample1.json>")
val fs = FileSystem._get_(URI._create_(uri), conf)
fs.getFileStatus(path)
Error
805 [main] WARN org.apache.hadoop.fs.FileSystem - Failed to initialize fileystem <lakefs://test-repo/main/sample1.json>: java.lang.RuntimeException: lakeFS blockstore type local unsupported by this FileSystem
Exception in thread "main" java.lang.RuntimeException: lakeFS blockstore type local unsupported by this FileSystem
at io.lakefs.storage.PhysicalAddressTranslator.translate(PhysicalAddressTranslator.java:29)
at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:153)
at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.example.SparkSessionTest$.main(SparkSessionTest.scala:29)
at org.example.SparkSessionTest.main(SparkSessionTest.scala)
Ariel Shaqed (Scolnicov)
04/08/2023, 9:40 AMVaibhav Kumar
04/08/2023, 12:59 PMElad Lachmi
04/08/2023, 1:13 PMVaibhav Kumar
04/08/2023, 1:14 PMElad Lachmi
04/08/2023, 1:16 PMAriel Shaqed (Scolnicov)
04/08/2023, 2:20 PMVaibhav Kumar
04/08/2023, 4:36 PMAriel Shaqed (Scolnicov)
04/08/2023, 5:57 PMVaibhav Kumar
04/09/2023, 6:01 AMobject SparkSessionTest {
def main(args : Array[String]) {
val conf = new Configuration()
conf.set("fs.s3a.access.key", "minioadmin")
conf.set("fs.s3a.secret.key", "minioadmin")
conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
val uri = "<lakefs://example/main/sample1.json>"
val path = new Path("<lakefs://example/main/sample1.json>")
val fs = FileSystem.get(URI.create(uri), conf)
fs.getFileStatus(path)
Error
Copy code
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
682 [main] WARN org.apache.hadoop.metrics2.impl.MetricsConfig - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
754 [main] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - Scheduled Metric snapshot period at 10 second(s).
754 [main] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - s3a-file-system metrics system started
1200 [shutdown-hook-0] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - Stopping s3a-file-system metrics system...
1200 [shutdown-hook-0] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - s3a-file-system metrics system stopped.
1200 [shutdown-hook-0] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - s3a-file-system metrics system shutdown complete.
1201 [Thread-2] WARN org.apache.hadoop.util.ShutdownHookManager - ShutdownHook 'ClientFinalizer' failed, java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
Caused by: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
at org.apache.hadoop.fs.s3a.S3AFileSystem.close(S3AFileSystem.java:3963)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:3678)
at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:3695)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1623)
a
Ariel Shaqed (Scolnicov)
04/09/2023, 7:00 AM
Glad to hear things are starting to improve!
These lines
Copy code
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'void org.apache.hadoop.fs.statistics.IOStatisticsLogging.logIOStatisticsAtLevel(org.slf4j.Logger, java.lang.String, java.lang.Object)'
make me suspect that the lakefs-spark-client assembly version might not match the Spark version that you are using, or possibly the Hadoop version.
• Which Spark version are you using? Is this what you get from the Everything Bagel docker-compose, or something else?
• Which lakeFS client are you using? That is, I need either the full Maven coordinates ( it will look something like "io.lakefs
lakefs spark client 301 2.12
0.6.5", and we need to look at the entire name) or the name of the jar (it will look something like ".../lakefs-spark-client-312-hadoop3-assembly-0.6.5.jar", and we need to look at the entire name).
THANKS!
v
Vaibhav Kumar
04/09/2023, 7:03 AM
@
Ariel Shaqed (Scolnicov)
is it possible for us to connect sometime for this issue and resolve things in a go? I feel to cut down on the iterations it would be easier as well. What do you suggest
🙂
?
✔️
1
I am using the below POM.xml which shows my spark ,hadoop lakefs version
Within Bagel I removed everything else but just kept Minio, Minio set up and lakefs
Copy code
<project xmlns="<http://maven.apache.org/POM/4.0.0>" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>" xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/maven-v4_0_0.xsd>">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>test-scala</artifactId>
<version>1.0-SNAPSHOT</version>
<name>${project.artifactId}</name>
<description>My wonderfull scala app</description>
<inceptionYear>2010</inceptionYear>
<licenses>
<license>
<name>My License</name>
<url>http://....</url>
<distribution>repo</distribution>
</license>
</licenses>
<properties>
<maven.compiler.source>1.5</maven.compiler.source>
<maven.compiler.target>1.5</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.version>2.13.0</scala.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.13</artifactId>
<version>3.2.1</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.13</artifactId>
<version>3.2.1</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>io.lakefs</groupId>
<artifactId>hadoop-lakefs</artifactId>
<version>0.1.0</version>
</dependency>
<!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common> -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.3.5</version>
</dependency>
<!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws> -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>3.3.5</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<plugins>
</plugins>
</build>
</project>
a
Ariel Shaqed (Scolnicov)
04/09/2023, 8:35 AM
There are some mismatched versions in there
😞
• All Spark versions should be the same, and compatible with what you use for the Jar. I would consider using spark-3.1.2 for everything.
• The FS version here seem very odd! Our
guide recommends
using
version 0.1.13
, so something like
Copy code
<dependency>
<groupId>io.lakefs</groupId>
<artifactId>hadoop-lakefs-assembly</artifactId>
<version>0.1.13</version>
</dependency>
v
Vaibhav Kumar
04/09/2023, 12:45 PM
I changed the versions suggested by you. Now my program goes to hung state
Copy code
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
489 [main] INFO org.apache.commons.beanutils.FluentPropertyBeanIntrospector - Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
498 [main] WARN org.apache.hadoop.metrics2.impl.MetricsConfig - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
588 [main] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - Scheduled Metric snapshot period at 10 second(s).
588 [main] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - s3a-file-system metrics system started
Revised POM
Copy code
<project xmlns="<http://maven.apache.org/POM/4.0.0>" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>" xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/maven-v4_0_0.xsd>">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>test-scala</artifactId>
<version>1.0-SNAPSHOT</version>
<name>${project.artifactId}</name>
<description>My wonderfull scala app</description>
<inceptionYear>2010</inceptionYear>
<licenses>
<license>
<name>My License</name>
<url>http://....</url>
<distribution>repo</distribution>
</license>
</licenses>
<properties>
<maven.compiler.source>1.5</maven.compiler.source>
<maven.compiler.target>1.5</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.version>2.13.0</scala.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.1.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.lakefs</groupId>
<artifactId>hadoop-lakefs-assembly</artifactId>
<version>0.1.13</version>
</dependency>
<!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common> -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.2</version>
</dependency>
<!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws> -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>3.1.2</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<plugins>
</plugins>
</build>
</project>
a
Ariel Shaqed (Scolnicov)
04/09/2023, 1:05 PM
The fact that it doesn't do anything is worrying
😞
Spark is running, no worrying messages
🙂
I am really not sure where to go from here -- I would expect to see some outputs from spark-submit,
somewhere
. Does Docker see any interesting log lines from any of the Spark containers?
v
Vaibhav Kumar
04/09/2023, 3:36 PM
I know it is a bit frustrating I have spent more than a week setting it but still not able to replicate it
😑
I am using intelliJ and running the App directly. Do I need to use spark-submit?
a
Ariel Shaqed (Scolnicov)
04/09/2023, 3:46 PM
I'll find someone who can advise with running in IntelliJ. I literally never run IntelliJ (or Spark in the debugger).
v
Vaibhav Kumar
04/09/2023, 3:48 PM
I have spark container running as well. If you help me with how can I run the JAR in those container may be I can give it another try . there is running spark3.0 there not 3.1.2
Some good news. I have tried spark submit and got a different error. I hope the below trace helps
@
Ariel Shaqed (Scolnicov)
. I feel
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
is doing something fishy it is not letting io.lakefs do things.
Copy code
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class io.lakefs.LakeFSFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:34)
at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:578)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Copy code
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample")
.getOrCreate();
println("First SparkContext:")
println("APP Name :"+spark.sparkContext.appName);
println("Deploy Mode :"+spark.sparkContext.deployMode);
println("Master :"+spark.sparkContext.master);
val conf = new Configuration()
conf.set("fs.s3a.access.key", "minioadmin")
conf.set("fs.s3a.secret.key", "minioadmin")
conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
val uri = "<lakefs://example/main/sample1.json>"
val path = new Path("<lakefs://example/main/sample1.json>")
val fs = FileSystem.get(URI.create(uri), conf)
fs.getFileStatus(path)
Copy code
<project xmlns="<http://maven.apache.org/POM/4.0.0>" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>"
xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/maven-v4_0_0.xsd>">
<groupId>com.sparkbyexamples</groupId>
<modelVersion>4.0.0</modelVersion>
<artifactId>spark-scala-examples</artifactId>
<version>1.0-SNAPSHOT</version>
<inceptionYear>2008</inceptionYear>
<packaging>jar</packaging>
<properties>
<scala.version>2.12.12</scala.version>
<spark.version>3.0.0</spark.version>
</properties>
<repositories>
<repository>
<id><http://scala-tools.org|scala-tools.org></id>
<name>Scala-Tools Maven2 Repository</name>
<url><http://scala-tools.org/repo-releases></url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id><http://scala-tools.org|scala-tools.org></id>
<name>Scala-Tools Maven2 Repository</name>
<url><http://scala-tools.org/repo-releases></url>
</pluginRepository>
</pluginRepositories>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.specs</groupId>
<artifactId>specs</artifactId>
<version>1.2.5</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.thoughtworks.xstream</groupId>
<artifactId>xstream</artifactId>
<version>1.4.11</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-xml_2.11</artifactId>
<version>0.4.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.12</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>io.lakefs</groupId>
<artifactId>hadoop-lakefs-assembly</artifactId>
<version>0.1.13</version>
</dependency>
<!-- <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common> -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.0.0</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<resources><resource><directory>src/main/resources</directory></resource></resources>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<arg>-target:jvm-1.5</arg>
</args>
</configuration>
</plugin>
</plugins>
</build>
<reporting>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
</configuration>
</plugin>
</plugins>
</reporting>
</project>
y
Yoni Augarten
04/10/2023, 9:16 AM
Hey
@
Vaibhav Kumar
, how are you running your code? I was able to use your pom file to successfully run the attached code.
I used the following command:
Copy code
mvn package exec:java -Dexec.mainClass=Main
(Assuming your code is in a main method in an object called Main)
v
Vaibhav Kumar
04/10/2023, 9:31 AM
@
Yoni Augarten
I am using
spark-submit --class com.sparkbyexamples.spark.SparkSessionTest target/spark-scala-examples-1.0-SNAPSHOT.jar
Below is my whole code
Copy code
package com.sparkbyexamples.spark
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import java.net.URI
import org.apache.spark.sql.SparkSession
import io.lakefs
object SparkSessionTest {
def main(args:Array[String]): Unit ={
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample")
.getOrCreate();
println("First SparkContext:")
println("APP Name :"+spark.sparkContext.appName);
println("Deploy Mode :"+spark.sparkContext.deployMode);
println("Master :"+spark.sparkContext.master);
val conf = new Configuration()
conf.set("fs.s3a.access.key", "minioadmin")
conf.set("fs.s3a.secret.key", "minioadmin")
conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
val uri = "<lakefs://example/main/sample1.json>"
val path = new Path("<lakefs://example/main/sample1.json>")
val fs = FileSystem.get(URI.create(uri), conf)
fs.getFileStatus(path)
}
y
Yoni Augarten
04/10/2023, 9:40 AM
I don't think your Jar contains any of your dependencies, that's why LakeFSFileSystem cannot be found. Make sure you are running your program with a classpath that includes all of the Maven dependencies. This can be done either using Maven like I did above, or by creating an "uber-jar", or by using an IDE to manage dependencies, or by some other way.
a
Ariel Shaqed (Scolnicov)
04/10/2023, 9:44 AM
Thanks, Yoni, that really sounds like a Spark issue!
Vaibhav, are you using a downloaded jar or did you build it yourself? You'll want to run "mvn package" to assemble a hat that brings in all its dependencies.
v
Vaibhav Kumar
04/10/2023, 9:47 AM
I can see the jar under External Libraries .PFB the screenshots.
y
Yoni Augarten
04/10/2023, 9:50 AM
Then if you run it from your IDE it should work. When you simply spark-submit a jar, these external dependencies are not included.
v
Vaibhav Kumar
04/10/2023, 9:54 AM
but when we package a project to JAR then all code and dependencies comes at a single place right which is the JAR Another point is if lib was not found my import statements would have failed but those are passing
Let me know if my understanding is correct
y
Yoni Augarten
04/10/2023, 9:55 AM
In what way are they "passing"?
Import statements in Java are not executed in the same way code is
Regarding the JAR, it depends how you created it. Since your pom doesn't include any plugin that knows to create an uber-jar, your jar probably only includes
your
compiled classes, and not other dependencies.
v
Vaibhav Kumar
04/10/2023, 10:03 AM
@
Yoni Augarten
I think you are right when I run from IDE directly it is now throwing a different error
Copy code
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2479)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3254)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3286)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:154)
at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:34)
at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2383)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2477)
... 16 more
aws-sdk is missing i think
y
Yoni Augarten
04/10/2023, 10:05 AM
@
Vaibhav Kumar
- thanks for updating. Developing Spark applications does have its learning curve when trying to work with external libraries. Some are easier to import using pom files, while with others there is no other choice but downloading the Jar from the internet. Unfortunately there is no one-size-fits-all approach as there are just too many moving parts. It's a matter of trial and error in the end.
v
Vaibhav Kumar
04/10/2023, 10:06 AM
no problem at the end we all are leaning and growing
🙂
😊
1
I reran after importing the below module and now my program is in hung state
Copy code
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>3.0.0</version>
</dependency>
while copying I get the below error most of the time
lakeFS % ./lakectl fs upload -s /Users/simar/Downloads/sample1.json -d
lakefs://example/main/
request failed: parameter "path" in query has an error: value is required but missing: value is required but missing
a
Ariel Shaqed (Scolnicov)
04/10/2023, 10:24 AM
Can you try adding a path at the end of the destination URL?
v
Vaibhav Kumar
04/10/2023, 10:27 AM
path as in ?
lakefs://example/main/
is the path where I want to add the file
a
Ariel Shaqed (Scolnicov)
04/10/2023, 10:38 AM
lakefs://example/main/
is a root, can you try
lakefs://example/main/sample1.json
?
v
Vaibhav Kumar
04/10/2023, 11:18 AM
@
Ariel Shaqed (Scolnicov)
@
Yoni Augarten
I saw some different error this time. This time there is some connection issue with Minio. Please see the log below
Copy code
Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on example: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to 127.0.0.1:9090 [/127.0.0.1] failed: Connection refused (Connection refused): Unable to execute HTTP request: Connect to 127.0.0.1:9090 [/127.0.0.1] failed: Connection refused (Connection refused)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:144)
at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:332)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:275)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at io.lakefs.LakeFSFileSystem.initializeWithClientFactory(LakeFSFileSystem.java:154)
at io.lakefs.LakeFSFileSystem.initialize(LakeFSFileSystem.java:110)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3288)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:34)
at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)
Minio is running already at
http://127.0.0.1:9001/
in my browser
a
Ariel Shaqed (Scolnicov)
04/10/2023, 11:31 AM
I think you have a configuration issue with your S3 endoint somewhere: it says
Copy code
Connect to 127.0.0.1:9090 [/127.0.0.1] failed: Connection refused (Connection refused):
and it looks to be using another port.
v
Vaibhav Kumar
04/10/2023, 11:55 AM
@
Ariel Shaqed (Scolnicov)
it has worked I changed the port to the API one.
jumping lakefs
but how to print anything out of it? I mean when the file was present it worked without any error
But when the file is not present it didn't exactly gave me the error that you showed in the issue in the stack trace.
Copy code
Exception in thread "main" java.io.FileNotFoundException: <lakefs://example/main/sample2.json> not found
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:784)
at io.lakefs.LakeFSFileSystem.getFileStatus(LakeFSFileSystem.java:74)
at com.sparkbyexamples.spark.SparkSessionTest$.main(SparkSessionTest.scala:35)
at com.sparkbyexamples.spark.SparkSessionTest.main(SparkSessionTest.scala)
👀
1
a
Ariel Shaqed (Scolnicov)
04/10/2023, 12:11 PM
Yay, now we're cooking!
Yeah, that's the good flow. Now try a repo that doesn't exist, or a branch that doesn't exist.
v
Vaibhav Kumar
04/10/2023, 12:17 PM
Yup, It worked fully.
Now, moving and solving the problem I need to gather the response from lakefs server that is being received in LakeFSFileSystem right?
a
Ariel Shaqed (Scolnicov)
04/10/2023, 12:30 PM
Yeah, swagger.yml can give all possible return status codes and messages. This issue is to translate these exceptions into e.g. FileNotFoundException, which is more informative to the user.
v
Vaibhav Kumar
04/10/2023, 1:00 PM
https://github.com/treeverse/lakeFS/blob/master/api/swagger.yml
this one right?
a
Ariel Shaqed (Scolnicov)
04/10/2023, 1:14 PM
Yup! lakeFSFS problems a bunch of calls, reach can fail. I would start with the open and create methods, they hold many of the most common failures.
After that list, getFileStatus, listStatus might also have similar failure modes.
v
Vaibhav Kumar
05/10/2023, 5:28 PM
@
Ariel Shaqed (Scolnicov)
I went through the swagger doc and responses we are getting using postman.
I think I have figured out where we need to add the code. It should be somewhere in
readNextChunk()
in
LakeFSFileSystem.java
I am not getting how shall I get the response code in
LakeFSFileSystem.java
e
Elad Lachmi
05/10/2023, 5:33 PM
Hi
@
Vaibhav Kumar
,
To make sure I understand what you’re trying to do…
The idea is to handle errors better instead of simply throwing them?
v
Vaibhav Kumar
05/10/2023, 5:37 PM
I want to get the response directly from the lakefs server for clear messages. If you see in this
issue
which I am working on the error thrown is not very clear. While if you hit the list object api we get proper error like '"repository not found". I have set up ready already to reproduce the same error but I am not sure how to get those response codes
@
Elad Lachmi
e
Elad Lachmi
05/10/2023, 5:44 PM
It depends on how the client is implemented
If the client throws an error for any non-200 response HTTP status, then you’ll need to handle it in the
catch
block
If it throws only for e.g. 5xx, then it might require checking the response status on the
resp
If the client throws an error for any non-200 response HTTP status, then you’ll need to handle it in the
catch
block
👆🏻
Looking through some of the existing code, it looks like this is the case
v
Vaibhav Kumar
05/10/2023, 5:49 PM
yes you are correct but I am not getting any suggestions around response code in the below screenshot
e
Elad Lachmi
05/10/2023, 5:51 PM
You need to look at
e
I believe
resp
is out of scope when you’re in the
catch
block
v
Vaibhav Kumar
05/10/2023, 5:53 PM
Same with
e
as well
e
Elad Lachmi
05/10/2023, 5:56 PM
ApiException
has a
getCode()
method and a few more you might want to use for this purpose
👍🏼
1
v
Vaibhav Kumar
05/10/2023, 6:06 PM
ok, can you please share some document for my reference? I am not getting it how will I use it.
e
Elad Lachmi
05/10/2023, 6:08 PM
You can have the
ApiException
class here
Copy code
clients/java/src/main/java/io/lakefs/clients/api/ApiException.java
You can look at it and see which methods are available
v
Vaibhav Kumar
05/10/2023, 6:12 PM
yes I can see now
getCode()
is there. So now all I have to do is
e.getCode()
in the catch block right?
e
Elad Lachmi
05/10/2023, 6:24 PM
Yes, and you have the
HttpStatus
enum to compare to for different HTTP statuses
v
Vaibhav Kumar
05/10/2023, 6:29 PM
ok sure,
So after making the changes in this code. I can just run the lakeFS go server and test my changes right?
To replicate the error till nor I was using hadoop client (pasted below) from my intelliJ. This talks to my lakefs docker and MInio as of now.
Copy code
def main(args : Array[String]) {
// val spark = SparkSession.builder.master("local[1]").appName("SparkByExample").getOrCreate
val conf = new Configuration()
conf.set("fs.s3a.access.key", "minioadmin")
conf.set("fs.s3a.secret.key", "minioadmin")
conf.set("fs.lakefs.access.key", "AKIAIOSFODNN7EXAMPLE")
conf.set("fs.lakefs.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
val uri = "<lakefs://test1/main/sample1.json>"
val path = new Path("<lakefs://test1/main/sample1.json>")
val fs = FileSystem.get(URI.create(uri), conf)
fs.getFileStatus(path)
e
Elad Lachmi
05/10/2023, 6:30 PM
Sounds like a good plan
Good luck!
v
Vaibhav Kumar
05/10/2023, 6:31 PM
Thankyou
@
Elad Lachmi
. Appreciate this help from you.
Will let you my findings.
e
Elad Lachmi
05/10/2023, 6:32 PM
Sure, happy to help
🙂
v
Vaibhav Kumar
05/13/2023, 6:38 PM
I have made the changes to my
LakefsFilesystem.java
and run it from the go command
lakefs % go run main.go run --local-settings
I have created a hadoop client on my local and trying to test the changes (response code in exception)] to my lakefs code .
The changes that I have made I cannot see those reflecting in the logs when I run my client shown below.
Do you know what could be the issue here?
Copy code
package org.example
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import java.net.URI
object SparkSessionTest {
def main(args : Array[String]) {
val conf = new Configuration()
conf.set("fs.s3a.access.key", "minioadmin")
conf.set("fs.s3a.secret.key", "minioadmin")
conf.set("fs.lakefs.access.key", "AKIAJLDV6JMK2R5TRQSQ")
conf.set("fs.lakefs.secret.key", "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")
conf.set("fs.lakefs.endpoint", "<http://localhost:8000/api/v1>")
conf.set("fs.s3a.endpoint", "<http://127.0.0.1:9090>")
conf.set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
val uri = "<lakefs://test1/main/sample1.json>"
val path = new Path("<lakefs://test1/main/sample1.json>")
val fs = FileSystem.get(URI.create(uri), conf)
fs.getFileStatus(path)
i
Isan Rivkin
05/14/2023, 8:28 AM
Hi, I would first make sure that I’m setting the correct log level, second would be making sure that the code change you made are reflecting in the code that is running.
v
Vaibhav Kumar
05/14/2023, 8:37 AM
@
Isan Rivkin
1. I am using
LOG.info
in
LakefsFilesystem.java
. Let me know if that is incorrect
2. I have the server running from this changed code. I have the client running separately. This thing I have checked.
I have observed something unusual , Earlier when I was running my lakefs,Minio in a docker setup(Baggle). So as soon as I upload file to lakefs repo it got reflected in minio as well.
Now as I running lakefs from terminal using go command and minio still from docker. I cannot see the same behaviour . the file are not syncing. Now I am getting error like
lakeFS blockstore type local unsupported by this FileSystem
i
Isan Rivkin
05/14/2023, 9:04 AM
I can’t tell if INFO log level is the right one, it all depends on the logs you are trying to see, if you’re logging DEBUG then you should probably use that.
Regarding your error, looking at the PhysicalAddressTranslator.java class the Hadoop client supports s3 and does not support local storage.
v
Vaibhav Kumar
05/14/2023, 10:16 AM
Yes but I am using minio storage and I have used the properties accordingly in client
I have tried with
LOG.debug
as well but it is not working.
Can someone from Lakefs connect with me for sometime on this? It is not a really complex PR but to set things for testing it took me a lot of time.
i
Isan Rivkin
05/14/2023, 10:54 AM
It would be more beneficial for the community to be able to search different questions.
Im not sure what is the issue you are currently facing, outside the PR you are trying to do. What’s the issue you are asking about because this thread contains a lot of conversations. Is your current problem that you run lakeFS and expect to see logs that you added but don’t actually see them?
v
Vaibhav Kumar
05/14/2023, 11:37 AM
I am working on this
issue
. Upon further investigating I got to know that I have to work on catching some server responses in this file
File
.
So now I am trying to add some debug statement in the same file under
getFileStatus
function . I am trying to hit the lakeFS from outside by creating the hadoop client and then to check if I get those logs or not.
i
Isan Rivkin
05/14/2023, 12:01 PM
Thanks for the context, To solve the logs issue I would suggest adding simple log even at the start of the code you are running in Java and try to see if it logs or not, then if not debug it from there.
v
Vaibhav Kumar
05/14/2023, 12:28 PM
I have tried almost everything around logs.but it is not working.
i
Isan Rivkin
05/14/2023, 5:24 PM
If you see other logs then I guess it’s either that that the Java code you are running is not the code you think is running therefore your logs are not appearing, or something overrides the log level. Try making a change that’s not related to logs then and see if the behavior changes at all.
v
Vaibhav Kumar
05/14/2023, 6:18 PM
@
Isan Rivkin
I think I got the issue. I am running this
compose
file which obviously pulls the latest lakefs image and not my changed code.
When I added some logs in the main.go file and running it locally I can see the logs from
main.go
on my terminal.
Now to narrow down the problem how shall I set the blockstore to minio when I run using go run command.
In other words what are the argument required to set the below properties in
go run main.go run --local-settings
Copy code
## Commands from docker compose file to set blockstore to MINIO
- LAKEFS_DATABASE_TYPE=local
- LAKEFS_BLOCKSTORE_TYPE=s3
- LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true
- LAKEFS_BLOCKSTORE_S3_ENDPOINT=<http://minio:9000>
- LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=minioadmin
- LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY=minioadmin
- LAKEFS_AUTH_ENCRYPT_SECRET_KEY=some random secret string
- LAKEFS_STATS_ENABLED
- LAKEFS_LOGGING_LEVEL
- LAKECTL_CREDENTIALS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
- LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
- LAKECTL_SERVER_ENDPOINT_URL=<http://localhost:8000>
i
Isan Rivkin
05/14/2023, 6:20 PM
Glad the logs issue solved
v
Vaibhav Kumar
05/14/2023, 6:26 PM
yes, but I still have the other part left to be solved
😄
a
Ariel Shaqed (Scolnicov)
05/14/2023, 6:27 PM
I believe you can change your compose file to run treeverse/lakefs:dev. Now if you
make build-docker
and rm the lakeFS container and start a new one, you should be able to see your code.
Alternatively, you might be able to
docker cp
a lakefs executable into the container on /app/lakefs. Restart that container and your code should run. (This one can be trickier if your container runs a sufficiently different version of Linux than your physical machine.)
v
Vaibhav Kumar
05/14/2023, 6:47 PM
As of now I found this
config do
c . So I was creating `.lakefs.yaml`file and adding configs this way
database.type="local"
let me know if I am going in the right direction.
Sorry, but I didn't followed you exactly on the docker related stuff that you have suggested. I think you are saying to build the local image using pom.xml? Can you share some doc for me to take a look into that as well?
I have set the config the below way in
.lakefs.yaml
but I am getting some syntax issues
🫣
Copy code
database.type="local"
blockstore.type="s3"
傻傻的双杠 · Compiling XLA with CUDA 11.2 support needed - Nx Forum - Elixir Programming Language Forum 3 周前 |
温柔的大葱 · 斗罗:尸祖将臣,初拥复活比比东-月飘柔-微信读书 2 月前 |