On this occasion, I will share experiences on how to run Spark SQL Thrift Server using Beeline. The following are the OS and software specifications that I use on a laptop:
1. OS Windows 7 – Home Premium x64.
2. Java SDK 8u121 (JDK 1.8), download here.
3. Apache hadoop-2.7.2, download here.
4. Apache spark-2.4.3-bin-hadoop2.7, download here.
First, we have to run Spark SQL Thrift Server, here are some things that need to be done:
1. Ensuring Java is installed on your OS. Open the Command Prompt (CMD), then run the java -version command. If you don’t have it, please download JDK 1.8 here.
2. Download hadoop-2.7.2 and spark-2.4.3-bin-hadoop2.7, then create Apache folder in drive C.
3. Extract file hadoop-2.7.2.tar.gz and spark-2.4.3-bin-hadoop2.7.tgz in folder C:\apache. The folder structure will look like this:
4. Add Environment Variables for JAVA_HOME, HADOOP_HOME, and SPARK_HOME in the System Variables part. This is the path that on my laptop:
JAVA_HOME : C:\Program Files\Java\jdk1.8.0_121
HADOOP_HOME : C:\apache\hadoop-2.7.2
SPARK_HOME : C:\apache\spark-2.4.3-bin-hadoop2.7
5. Modify the PATH variable in the System Variables part by adding the following values:
6. Create folder in path C:\tmp\hive.
7. Open new Command Prompt, then submit command as follows:
winutils.exe chmod 777 C:\tmp\hive
8. To start Spark SQL Thrift Server, run the following command:
spark-submit –verbose –class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 –hiveconf hive.server2.thrift.port=10000 –driver-memory 1g
9. Spark SQL Thrift Server is running.
To run the beeline, open a new Command Prompt, and follow the instructions as follows:
1. Type beeline, then Enter.
2. Type !connect jdbc:hive2://localhost:10000, then Enter.
3. Skip username and password (default).
At this point Spark SQL Thrift Server and Beeline have been successfully run. To see the names of databases on the server, type the command show databases;