I have a few file-related use cases that I'm not sure how to best accomplish using Cloud Composer. How should I best accomplish these?
1)I need to use a private key (.pem) file to access an SFTP server. Where should this file be stored and how should it be accessed? In on-prem Airflow, I would just have the file in a folder /keys/ in the same directory as /dags/.
2)I need to move files from an SFTP server to Cloud Storage. With Airflow on prem, I download these from the SFTP server to a specific location on the Airflow worker instance and then upload from there. Am I able to do something similar with Composer, or is there a workaround as I am unable to access the file system?
1) Assuming the .pem file only needs to be accessed at task runtime (as opposed to DAG definition parse time), you can put it in the /data directory of the environment's Cloud Storage bucket. It is mounted with fuse on the path /home/airflow/gcs/data. You can upload files with the Cloud Composer gcloud component.
2) There are 2 options here.
Write from your SFTP server to /home/airflow/gcs/data, which is fuse mounted to your Cloud Storage bucket. You could leave it there or use the GoogleCloudStorageToGoogleCloudStorageOperator to move it to where you really want it.
If you want to copy to local disk and from local disk to Cloud Storage, you'll need to do both steps within the same task (since Cloud Composer environments use the CeleryExecutor, tasks within the same DAG aren't guaranteed to run on the same machine). You should be able to write to /home/airflow and /tmp.
For 2., based on cloud composer documentation:
When you modify DAGs or plugins in the Cloud Storage bucket, Cloud Composer synchronizes the data across all the nodes in the cluster. Cloud Composer synchronizes the dags/ and plugins/ folders uni-directionally by copying locally and synchronizes data/ and logs/ folders bi-directionally by using Cloud Storage FUSE.
you can write files to local directory /home/airflow/gcs/data
in operators and cloud composer will sync the directory with gs://bucket/data
bi-directionally.
more details you can take a look this document to know how google cloud composer interacts with google cloud storage:
https://cloud.google.com/composer/docs/concepts/cloud-storage