Preface
DBFS is the primary mechanism that Databricks uses to access data from external locations such as Amazon S3 buckets, Azure Blob containers, RDMS databases, and many more.
What makes DBFS powerful is its ability to behave like a local file system within Databricks by interacting with each of these storage platforms.
In other words, DBFS itself doesn’t store any data - it only acts as an interface for moving data between Databricks and the storage platforms it supports.
In this blog, I will quickly show you how to mount Azure Blob containers into Databricks using DBFS. Let’s begin!
Prerequisites
This post assumes you have a storage account and Blob container already set up in the Azure portal
Steps
1. Create an Azure AD app
Go to the Azure portal
Go to Azure Active Directory
Under the Manage header on the lefthand pane, click on App registrations, then click on New registration
Enter a name for the Azure AD app
In the Redirect URI section, select Web as the type then enter “https://localhost” as the URL.
Click Register
2. Create a client secret
Once the app is created enter the Certificates & secrets option under the Manage header
Click on New client secret
Add a description for the new client secret and select an expiration period
Click Add
Note the value of the new client secret (this will be required later)
3. Grant access to the Azure AD App
Go to the Blob container you want to grant access to
Click on Access control (IAM), then click on + Add button
Select Add role assignment
Select the Storage Blob Data Contributor role then click Next
Click on +Select members and search for the Azure AD app you created in the previous steps
Click Select
Click Review + assign, then click on Review + assign for the next page
4. Create SAS token and connection string
Go to Storage accounts
Click on the storage account that contains the container you're after
Under the Security + networking pane on the left-hand menu click on Shared access signature
Under Allowed resource types check the Service, Container and Object boxes
Configure expiry dates to your preference
Click Generate SAS and connection string
5. Create a secret scope to store your Azure credentials as secrets
I’ve already written a blog post on how to create the secret scope and secrets for this stage here. Use the credentials for this section to create the scope, then advance to the next step.
Here are the credentials you need to store as secrets to this point:
Client ID
Client secret
Tenant ID
Storage account name
Container name
SAS token
SAS connection string
Here's how to find each of these credentials:
Client ID
Go to Azure Active Directory
Click on App registrations
Click on the app you've just created
The client ID is the same as the Application (client) ID, which should appear on this page
Client secret
Go to Azure Active Directory
Click on App registrations
Click on the app you've just created
Click on Certificates & secrets
The client secret is the value of the secret created from the previous steps, which may be masked at this stage.
Note: If you don't have the value of the secret, you may be required to create another one as they are only displayed during the creation process. Follow the previous steps to create a new client secret for your app.
Tenant ID
Go to Azure Active Directory
Under the Overview header ****click on the Properties tab
The Tenant ID should appear on the page
Storage account name
Enter Azure portal
On the homepage click on Storage accounts under the Azure services pane
Click on the subscription that holds the storage account of your choice
The storage account name should be on the top left-hand side of the page over the Storage account sub-header
Container name
Enter Azure portal
Click on Storage accounts
Select the subscription that contains the blob storage account your after
Under the Data storage pane on the left-hand menu click on Containers
Select the container that contains the Blob you're after
Click on the blob you want to access
The container name should be displayed at the top of the page
SAS Token & connection string
You can only view these details immediately after creating them. If you don't have these, create a new SAS token and connection string and note them once you've created them again.
You can find the instructions in these steps.
6. Run the configuration code to perform the mount
Retrieve the secrets from the secret scope
Read the secrets from your secret scope and other config details into Python objects:
client_id = dbutils.secrets.get(scope="azure", key="client_id")
client_secret = dbutils.secrets.get(scope="azure", key="client_secret")
tenant_id = dbutils.secrets.get(scope="azure", key="tenant_id")
storage_account_name = dbutils.secrets.get(scope="azure", key="storage_account_name")
container_name = dbutils.secrets.get(scope="azure", key="container_name")
sas_token = dbutils.secrets.get(scope="azure", key="sas_token")
sas_connection_string = dbutils.secrets.get(scope="azure", key="sas_connection_string")
blob_service_sas_url = dbutils.secrets.get(scope="azure", key="blob_service_sas_url")
source_path = f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net"
mount_point = f"/mnt/{container_name}-dbfs"
extra_configs = {
f"fs.azure.account.auth.type.{storage_account_name}.blob.core.windows.net": "OAuth",
f"fs.azure.account.oauth.provider.type.{storage_account_name}.blob.core.windows.net": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
f"fs.azure.account.oauth2.client.id.{storage_account_name}.blob.core.windows.net": client_id,
f"fs.azure.account.oauth2.client.secret.{storage_account_name}.blob.core.windows.net": client_secret,
f"fs.azure.account.oauth2.client.endpoint.{storage_account_name}.blob.core.windows.net": f"<https://login.microsoftonline.com/{tenant_id}/oauth2/token>",
f"fs.azure.sas.{container_name}.{storage_account_name}.blob.core.windows.net": sas_token,
f"fs.azure.account.key.<storage-account-name>.blob.core.windows.net": sas_connection_string
}
Set up the config for mounting the Blob container to DBFS
dbutils.fs.mount(
source = source_path ,
mount_point = mount_point ,
extra_configs = extra_configs
)
This will use the objects from the previous steps to mount the Blob container to the DBFS location specified.
Verify the Blob mount to DBFS
Confirm the mount job was successful by listing the objects in the DBFS mount location:
dbutils.fs.ls(mount_point)
The results should match the content in your actual Azure Blob container.
Feel free to reach out via my handles: LinkedIn| Email | Twitter