New to spark and spakrR. For Hadoop, only have a file called winutils/bin/winutils.exe
.
Running system:
- OS
- Windows 10
- Java
- Java version "1.8.0_101"
- Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
- Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
- R
- platform: x86_64-w64-mingw32
- arch: x86_64
- os: mingw32
- RStudio:
- Version 1.0.20 – © 2009-2016 RStudio, Inc.
- Spark
- 2.0.0
I can read data on my local machine, but on the deployed workers, I cannot do that.
Could anybody help me?
Run at local:
Sys.setenv(SPARK_HOME = "D:/SPARK2")
library(SparkR)
sparkR.session(master = "local[*]", enableHiveSupport = FALSE,sparkConfig = list(spark.driver.memory="4g",spark.sql.warehouse.dir = "d:/winutils/bin",sparkPackages = "com.databricks:spark-avro_2.11:3.0.1"))
Java ref type org.apache.spark.sql.SparkSession id 1
multiPeople <- read.json(c(paste(getwd(),"people.json",sep = "/"),"D:/RwizSpark_Private/people2.json"))
rand_10m_x <- read.df(x = "./demo.csv",source = "csv", inferSchema="true",na.strings= "NA")
Run at deployed workers:
Sys.setenv(SPARK_HOME = "D:/SPARK2")
library(SparkR)
sparkR.session(master = "spark:/mymasterIP", enableHiveSupport = FALSE,appName = "sparkRenzhi", sparkConfig = list(spark.driver.memory="6g",spark.sql.warehouse.dir = "d:/winutils/bin",spark.executor.memory = "2g", spark.executor.cores= "2"),sparkPackages = "com.databricks:spark-avro_2.11:3.0.1")
Java ref type org.apache.spark.sql.SparkSession id 1
multiPeople <- read.json(c(paste(getwd(),"people.json",sep = "/"),"D:/RwizSpark_Private/people2.json"))
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, 172.29.110.101): java.io.FileNotFoundException: File file:/D:/RwizSpark_Private/people2.json does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:140) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767) at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109) at org.apache.hadoop.mapre
rand_10m_x <- read.df(x = "./demo.csv",source = "csv", inferSchema="true",na.strings= "NA")
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 11, 172.29.110.101): java.io.FileNotFoundException: File file:/D:/RwizSpark_Private/demo.csv does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:140) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767) at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109) at org.apache.hadoop.mapred.T