My Spark job with HiveContext and Saxon working fine unless no UDFs defined in code. In case of UDF implementation - HiveContext initialization failed with error. I heard there are saxon\java8 incompability solved in saxon 9.5.1.5, which is not released yet as free version in central maven repository:
Caused by: java.lang.RuntimeException: XPathFactory#newInstance() failed to create an XPathFactory for the default object model: http://java.sun.com/jaxp/xpath/dom with the XPathFactoryConfigurationException: javax.xml.xpath.XPathFactoryConfigurationException: java.util.ServiceConfigurationError: javax.xml.xpath.XPathFactory: jar:file:/JBOD_D19/hadoop/cdh/yarn/nm/usercache/u23120d1/appcache/application_1477998759081_5017/container_e45_1477998759081_5017_01_000001/saxon-xpath-9.1.0.8.jar!/META-INF/services/javax.xml.xpath.XPathFactory:2: Illegal configuration-file syntax at javax.xml.xpath.XPathFactory.newInstance(XPathFactory.java:102) at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.(UDFXPathUtil.java:41) at org.apache.hadoop.hive.ql.udf.xml.GenericUDFXPath.(GenericUDFXPath.java:53)
Correct, using the XPathFactory.newInstance() method from Java8 with an older release of Saxon on the classpath will cause this failure. So use a newer version of Saxon. The current release is 9.7.0.11.
Recent releases of Saxon can be found in Maven. We resisted putting Saxon in Maven for many years because downloading from Maven does not satisfy the condition required some of the third-party software components we use, that say you must not distribute the JAR files without also distributing the legal terms and conditions. We eventually relented because of overwhelming demand, despite the fact that distributing via Maven still violates this condition. Organisations that care about these things should not use Maven.