Pages

Monday, November 9, 2015

External jars not getting picked up in zeppelin or cli in spark, issue with mysql and spark dependency

If you cant get External jars picked up in zeppelin or cli in spark , this will help you. I first tried pointing to local jars but that did not work, you can resolve using maven dependencies as you will see below In this example I am trying to use a mysql jdbc jar in zeppelin and cli and getting errors But getting an exception : java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/hive at java.sql.DriverManager.getConnection(DriverManager.java:596) at java.sql.DriverManager.getConnection(DriverManager.java:187) Apparently this is not very documented feature of Spark (and not an issue with Zeppelin itself) Here is the code that works for me and solves the similar issue: Dependency loading When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using %dep interpreter. Load libraries recursively from Maven repository Load libraries from local filesystem Add additional maven repository Automatically add libraries to SparkCluster (You can turn off) Dep interpreter leverages scala environment. So you can write any Scala code here. Here's usages.
%dep
z.reset() // clean up previously added artifact and repository

// add maven repository
z.addRepo("RepoName").url("RepoURL")

// add maven snapshot repository
z.addRepo("RepoName").url("RepoURL").snapshot()

// add artifact from filesystem
z.load("/path/to.jar")

// add artifact from maven repository, with no dependency
z.load("groupId:artifactId:version").excludeAll()

// add artifact recursively
z.load("groupId:artifactId:version")

// add artifact recursively except comma separated GroupID:ArtifactId list
z.load("groupId:artifactId:version").exclude("groupId:artifactId,groupId:artifactId, ...")

// exclude with pattern
z.load("groupId:artifactId:version").exclude(*)
z.load("groupId:artifactId:version").exclude("groupId:artifactId:*")
z.load("groupId:artifactId:version").exclude("groupId:*")

// local() skips adding artifact to spark clusters (skipping sc.addJar())
z.load("groupId:artifactId:version").local()
Note that %dep interpreter should be used before %spark, %pyspark, %sql. Thanks to Ali and Neeraj from HWX for help in solving this issue.

1 comment:

  1. Very nice blog,keep sharing more posts with us.
    thank you for info...

    hadoop admin training

    ReplyDelete