How to run glue script from Glue Dev Endpoint

I have a glue script (test.py) written say in a editor. I connected to glue dev endpoint and copied the script to endpoint or I can store in S3 bucket. Basically glue endpoint is an EMR cluster, now how can I run the script from the dev endpoint terminal? Can I use spark-submit and run it ?

I know we can run it from glue console,but more interested to know if I can run it from glue end point terminal.

标签： amazon-web-services aws-glue

2条回答

beautiful°

2楼-- · 2019-02-15 19:07

You don't need a notebook; you can ssh to the dev endpoint and run it with the gluepython interpreter (not plain python).

e.g.

radix@localhost:~$ DEV_ENDPOINT=glue@ec2-w-x-y-z.compute-1.amazonaws.com
radix@localhost:~$ scp myscript.py $DEV_ENDPOINT:/home/glue/myscript.py
radix@localhost:~$ ssh -i {private-key} $DEV_ENDPOINT
...
[glue@ip-w-x-y-z ~]$ gluepython myscript.py

You can also run the script directly without getting an interactive shell with ssh (of course, after uploading the script with scp or whatever):

radix@localhost:~$ ssh -i {private-key} $DEV_ENDPOINT gluepython myscript.py

If this is a script that uses the Job class (as the auto-generated Python scripts do), you may need to pass --JOB_NAME and --TempDir parameters.

0人赞添加讨论(0) 举报

别忘想泡老子

3楼-- · 2019-02-15 19:10

For development / testing purpose, you can setup a zeppelin notebook locally, have an SSH connection established using the AWS Glue endpoint URL, so you can have access to the data catalog/crawlers,etc. and also the s3 bucket where your data resides.

After all the testing is completed, you can bundle your code, upload to an S3 bucket. Then create a Job pointing to the ETL script in S3 bucket, so that the job can be run, and scheduled as well.

Please refer here and setting up zeppelin on windows, for any help on setting up local environment. You can use dev instance provided by Glue, but you may incur additional costs for the same(EC2 instance charges).

Once you set up the zeppelin notebook, you can copy the script(test.py) to the zeppelin notebook, and run from the zeppelin.

According to AWS Glue FAQ:

Q: When should I use AWS Glue vs. Amazon EMR?

AWS Glue works on top of the Apache Spark environment to provide a scale-out execution environment for your data transformation jobs. AWS Glue infers, evolves, and monitors your ETL jobs to greatly simplify the process of creating and maintaining jobs. Amazon EMR provides you with direct access to your Hadoop environment, affording you lower-level access and greater flexibility in using tools beyond Spark.

Do you have any specific requirement to run Glue script in an EMR instance? Since in my opinion, EMR gives more flexibility and you can use any 3rd party python libraries and run directly in a EMR Spark cluster.

Regards

0人赞添加讨论(0) 举报

How to run glue script from Glue Dev Endpoint

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间