Installation
Python
pip install hnswlib-spark==2.0.0b1
JVM
-
// for spark 3.4.x libraryDependencies += "com.github.jelmerk" %% "hnswlib-spark_3_5" % "2.0.0-beta.1" // for spark 3.5.x libraryDependencies += "com.github.jelmerk" %% "hnswlib-spark_3_5" % "2.0.0-beta.1"
-
<properties> <scala.binary.version>2.12</scala.binary.version> </properties> <dependencies> <!-- for spark 3.4.x --> <dependency> <groupId>com.github.jelmerk</groupId> <artifactId>hnswlib-spark_3_4_${scala.binary.version}</artifactId> <version>2.0.0-beta.1</version> </dependency> <!-- for spark 3.5.x --> <dependency> <groupId>com.github.jelmerk</groupId> <artifactId>hnswlib-spark_3_5_${scala.binary.version}</artifactId> <version>2.0.0-beta.1</version> </dependency> </dependencies>
-
ext.scalaBinaryVersion = '2.12' dependencies { // for spark 3.4.x implementation("com.github.jelmerk:hnswlib-spark_3_4_$scalaBinaryVersion:2.0.0-beta.1") // for spark 3.5.x implementation("com.github.jelmerk:hnswlib-spark_3_5_$scalaBinaryVersion:2.0.0-beta.1") }
Databricks
-
Create a cluster if you don’t have one already
-
In Libraries tab inside your cluster go to Install New -> Maven -> Coordinates and enter
for DBR 13.3 LTS:
com.github.jelmerk:hnswlib-spark_3_4_2.12:2.0.0-beta.1
for DBR 14.3 LTS and above:
com.github.jelmerk:hnswlib-spark_3_5_2.12:2.0.0-beta.1
then press install
-
Optionally add the following cluster settings for faster searches
Advanced Options -> Spark -> Environment variables:
JNAME=zulu17-ca-amd64
Advanced Options -> Spark -> Spark config
spark.executor.extraJavaOptions --enable-preview --add-modules jdk.incubator.vector
Now you can attach your notebook to the cluster and use Hnswlib spark!
Spark shell
# for spark 3.4.x`
spark-shell --packages 'com.github.jelmerk:hnswlib-spark_3_4_2.12:2.0.0-beta.1'
# for spark 3.5.x`
spark-shell --packages 'com.github.jelmerk:hnswlib-spark_3_5_2.12:2.0.0-beta.1'
Pyspark shell
# for spark 3.4.x
pyspark --packages 'com.github.jelmerk:hnswlib-spark_3_4_2.12:2.0.0-beta.1'
# for spark 3.5.x and scala 2.12,
pyspark --packages 'com.github.jelmerk:hnswlib-spark_3_5_2.12:2.0.0-beta.1'