I'm beginning with Spark so not really sure where my problem is and looking for a helpful hint here. I'm trying to run Spark (pyspark) on a windows 7 machine as an admin but it does not seem to be working (I still get the WindowsError 5). See image below:
I've downloaded the file (release 1.2.0 with pre-built for Hadoop 2.4 or later), unzipped it using tar via command line and set IPYTHON=1 before calling bin\pyspark. When I call it, pyspark runs but I get the error below as per image.
When I try calling certain SparkContext objects, I get name 'sc' is not defined.
I've got python 2.7.8 installed, Spyder IDE and am in a corporate network environment.
Does any one have a clue what could be going on here? I've looked up a few questions such as Why am i getting WindowsError: [Error 5] Access is denied? but could not find a clue.
Briefly:
I had what should be the same problem. For me, it was that the *.cmd
files in the $spark/bin
directory weren't marked as executable; please try to confirm by:
- right clicking on
pyspark2.cmd
and:
- properties / security tab then examine 'Read & execute'
I found the workaround on another site, that recommended downloading hadoop-winutils-2.6.0.zip
(sorry don't have a link). Here is an example of the cmd to use (after moving to proper directory):
t:\hadoop-winutils-2.6.0\bin\winutils.exe chmod 777 *
I did need to run the chmod 777
cmd to make the /tmp/hive
writeable too.
good luck!
(... new here - sorry for the poor formatting)
(update: Matt thanks for fixing formatting issues!)
root cause: the tar program i used on windows via tar -zxf <file.tgz>
did not apply
the proper attributes to the extracted files. in this case the 'executable' files
weren't properly set. yeah, maybe i should update my version of cygwin.