Download files from personal OneDrive using Python

2020-05-29 09:13发布

问题:

I have a Python script that is running periodically on an AWS EC2 Ubuntu machine.

This script reads data from some files and sometimes changes data in them.

I want to download these files from OneDrive, do my own thing with them, and upload them back to OneDrive.

I want this to be done automatically, without the need for a user to approve any login or credentials. I'm ok with doing it once (i.e. approving the login on the first run) but the rest has to run automatically, without asking ever again for approvals (unless the permissions change, of course).

What is the best way to do this?

I've been reading the documentation on Microsoft Graph API but I'm struggling with the authentication part. I've created an application in Azure AAD, gave the sample permissions (to test) and created a secret credential.

回答1:

I managed to do it. I'm not sure if it's the best way but it is working now. It's running automatically every hour and I don't need to touch it.

I followed the information on https://docs.microsoft.com/en-gb/azure/active-directory/develop/v2-oauth2-auth-code-flow

This is what I did.

Azure Portal

  • Create an application. Azure Active Directory -> App Registrations -> Applications from personal account
  • In Supported account types, choose the one that has personal Microsoft accounts.
  • In Redirect URI, choose Public client/native. We'll add the specific URI later.
  • In the application details, in the section Overview, take note of the Application (client) ID. We'll need this later.
  • In the section Authentication, click Add a Platform and choose Desktop + devices. You can use your own, I chose one of the suggested: https://login.microsoftonline.com/common/oauth2/nativeclient
  • In the section API permissions, you have to add all the permissions that your app will use. I added User.Read, Files.ReadWrite and offline_access. The offline_access is to be able to get the refresh token, which will be crucial to keep the app running without asking the user to login.
  • I did not create any Certificate or Secret.

Web

Looks like to get a token for the first time we have to use a browser or emulate something like that.

There must be a programmatic way to do this, but I had no idea how to do it. I also thought about using Selenium for this, but since it's only one time and my app will request tokens every hour (keeping the tokens fresh), I dropped that idea.

If we add new permissions, the tokens that we have will become invalid and we have to do this manual part again.

  • Open a browser and go to the URL below. Use the Scopes and the Redirect URI that you set up in Azure Portal.

https://login.microsoftonline.com/common/oauth2/v2.0/authorize?client_id=your_app_client_id&response_type=code&redirect_uri=https%3A%2F%2Flogin.microsoftonline.com%2Fcommon%2Foauth2%2Fnativeclient&response_mode=query&scope=User.Read%20offline_access%20Files.ReadWrite

That URL will redirect you to the Redirect URI that you set up and with a code=something in the URL. Copy that something.

  • Do a POST request with type FORM URL Encoded. I used https://reqbin.com/ for this.

Endpoint: https://login.microsoftonline.com/common/oauth2/v2.0/token

Form URL: grant_type=authorization_code&client_id=your_app_client_id&code=use_the_code_returned_on_previous_step

This will return an Access Token and a Refresh Token. Store the Refresh Token somewhere. I'm saving it in a file.

Python

# Build the POST parameters
params = {
          'grant_type': 'refresh_token', 
          'client_id': your_app_client_id,
          'refresh_token': refresh_token_that_you_got_in_the_previous_step
         }

response = requests.post('https://login.microsoftonline.com/common/oauth2/v2.0/token', data=params)

access_token = response.json()['access_token']
new_refresh_token = response.json()['refresh_token']

# ^ Save somewhere the new refresh token. 
# I just overwrite the file with the new one. 
# This new one will be used next time.

header = {'Authorization': 'Bearer ' + access_token}

# Download the file
response = requests.get('https://graph.microsoft.com/v1.0/me/drive/root:' +
                         PATH_TO_FILE + '/' + FILE_NAME + ':/content', headers=header)

# Save the file in the disk 
with open(file_name, 'wb') as file:
    file.write(response.content)

So basically, I have the Refresh Token always updated.

I call the Token endpoint using that Refresh Token, and the API gives me an Access Token to use during the current session and a new Refresh Token.

I use this new Refresh Token the next time I run the program, and so on.



回答2:

Python library that can help with this:

pip install cloudsync

then:

import cloudsync

prov = cloudsync.create_provider("onedrive")
creds = prov.authenticate()
prov.connect(creds)
with open("/my/local/file", "wb") as f:
    prov.download_path("/path/to/file/on/onedrive", f):