Answer a question

Hi I am trying to run a simple java program using Apache Hive and Apache Spark. The program compiles without any error, but on runtime I get the following error:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.hive.HiveContext.sql(Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame;
at SparkHiveExample.main(SparkHiveExample.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Following is my code:

 import org.apache.spark.SparkContext;
 import org.apache.spark.SparkConf;
 import org.apache.spark.sql.hive.HiveContext;
 import org.apache.spark.sql.DataFrame;
 public class SparkHiveExample {
 public static void main(String[] args) {
 SparkConf conf = new SparkConf().setAppName("SparkHive Example");
 SparkContext sc = new SparkContext(conf);
 HiveContext hiveContext = new HiveContext(sc);

 System.out.println("Hello World");
 DataFrame df = hiveContext.sql("show tables");
 df.show();
 }
}

My pom.xml file looks as follows:

<project>
  <groupId>edu.berkeley</groupId>
  <artifactId>simple-project</artifactId>
  <modelVersion>4.0.0</modelVersion>
  <name>Simple Project</name>
  <packaging>jar</packaging>
  <version>1.0</version>
  <dependencies>
     <dependency> <!-- Spark dependency -->
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.3.0</version>
     </dependency>
     <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.10</artifactId>
        <version>1.3.0</version>
     </dependency>
 </dependencies>
</project>

What could be the problem?

EDIT: I tried using SQLContext.sql() method and I still get a similar method not found runtime error. This stackoverflow answer suggests that the problem is caused due to dependency problem, but I am unable to figure out what.

Answers

make sure your spark core and spark hive dependencies are set to the scope of provided as shown below. These dependencies are provided by the cluster and not by your application.

And ensure the version of your spark installation is 1.3 or above. prior to 1.3 the sql method returned a RDD (SchemaRDD) instead of DataFrame. It is most likely the version of spark that is installed is older than 1.3.

 <dependency> 
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.3.0</version>
    <scope>provided</scope>
 </dependency>
 <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.10</artifactId>
    <version>1.3.0</version>
    <scope>provided</scope>
 </dependency>

And it is recommended you use SparkSession object to run queries instead of HiveContext. The below code snippet explains the usage of SparkSession.

val spark = SparkSession.builder.
      master("local")
      .appName("spark session example")
      .enableHiveSupport()
      .getOrCreate()

spark.sql("show tables")
Logo

更多推荐