How to get metadata (File Shares List) of the Storage account & configure the copy data activity for copying b/w storage accounts in the ADF?

4 min read 29-08-2024
How to get metadata (File Shares List) of the Storage account & configure the copy data activity for copying b/w storage accounts in the ADF?


Copying File Share Data between Azure Storage Accounts: A Detailed Guide

This article will guide you through the process of copying file share data between Azure storage accounts using Azure Data Factory (ADF). We'll explore how to retrieve the file share list using the Get Metadata activity and configure the Copy Data activity for efficient data transfer. This is an extension of the Stack Overflow question found here.

1. Understanding the Problem and Solution

The goal is to automate the transfer of file share data between different Azure Storage accounts, potentially residing in separate subscriptions. This requires a two-step approach:

  1. Get Metadata: We need to list all the file shares in the source storage account.
  2. Copy Data: We need to copy all the file shares and their contents to the destination storage account.

2. Setting up Azure Data Factory

Linked Services:

  • Source Storage: Create a linked service for your source storage account (as shown in the original Stack Overflow post). Make sure the linked service is configured with the appropriate access keys and connection strings.
  • Destination Storage: Create a linked service for your destination storage account, ensuring it's properly linked with access keys and connection strings.

Get Metadata Activity:

  • Source: Set the Source to your source Storage Linked Service.
  • Dataset: Create a new dataset of type "AzureBlob" to represent the file share data. This dataset will hold the information about the file shares you want to copy.
    • Folder Path: Set this to the root of your file share in the source storage account. This will allow the activity to list all the file shares within the specified folder.
    • File Name: Leave this field blank, as we are interested in the file shares themselves, not individual files.

Copy Data Activity:

  • Source:

    • Linked Service: Choose your Source Storage Linked Service.
    • Dataset: Select the "AzureBlob" dataset you created earlier for the Get Metadata Activity.
    • File Pattern: Use a wildcard pattern to match all file shares, for example, *.
  • Destination:

    • Linked Service: Choose your Destination Storage Linked Service.
    • Dataset: Create a new "AzureBlob" dataset for the destination storage account.
    • Folder Path: Specify the path where you want the file shares to be copied in the destination storage account. This should be the root of your desired destination folder.

3. Addressing the Error and Additional Considerations

Error Analysis:

The original post mentions an error: {"code": "BadRequest", "message": null, ...}. This error usually indicates that the specified path for the dataset in the Get Metadata activity does not exist or the application does not have proper permissions to access the data.

Troubleshooting Tips:

  • Verify Dataset Path: Double-check that the folder path you've provided in the "AzureBlob" dataset for the Get Metadata activity is correct.
  • Access Permissions: Ensure your Azure Data Factory account has appropriate read permissions for the source storage account and write permissions for the destination storage account.

Additional Tips:

  • Incremental Transfers: If you need to copy only the new or changed data, you can use the modifiedDatetime property in the Copy Data activity to filter based on file modification timestamps.
  • Performance Optimization: For large data transfers, consider using parallel copies or partitioning your file shares to increase the efficiency of the copy operation.
  • Data Integrity: Verify the data integrity after the copy is complete by comparing the files in the source and destination locations.

4. Example Code (Using PowerShell)

# Define storage account connection details
$sourceStorageAccountName = "yourSourceStorageAccountName"
$sourceStorageAccountKey = "yourSourceStorageAccountKey"
$destinationStorageAccountName = "yourDestinationStorageAccountName"
$destinationStorageAccountKey = "yourDestinationStorageAccountKey"

# Create a new ADF pipeline
$pipeline = New-AzDataFactoryV2Pipeline -ResourceGroupName "yourResourceGroupName" -DataFactoryName "yourDataFactoryName" -Name "CopyFileSharesPipeline"

# Create linked services
$sourceLinkedService = New-AzDataFactoryV2LinkedService -ResourceGroupName "yourResourceGroupName" -DataFactoryName "yourDataFactoryName" -Name "SourceStorageLinkedService" -Type AzureStorage -ConnectionString "DefaultEndpointsProtocol=https;AccountName=$sourceStorageAccountName;AccountKey=$sourceStorageAccountKey"
$destinationLinkedService = New-AzDataFactoryV2LinkedService -ResourceGroupName "yourResourceGroupName" -DataFactoryName "yourDataFactoryName" -Name "DestinationStorageLinkedService" -Type AzureStorage -ConnectionString "DefaultEndpointsProtocol=https;AccountName=$destinationStorageAccountName;AccountKey=$destinationStorageAccountKey"

# Create datasets
$sourceDataset = New-AzDataFactoryV2Dataset -ResourceGroupName "yourResourceGroupName" -DataFactoryName "yourDataFactoryName" -Name "SourceFileShareDataset" -Type AzureBlob -LinkedServiceName "SourceStorageLinkedService" -Folderpath "yourSourceFileSharePath"

$destinationDataset = New-AzDataFactoryV2Dataset -ResourceGroupName "yourResourceGroupName" -DataFactoryName "yourDataFactoryName" -Name "DestinationFileShareDataset" -Type AzureBlob -LinkedServiceName "DestinationStorageLinkedService" -Folderpath "yourDestinationFileSharePath"

# Create Get Metadata activity
$getMetadataActivity = New-AzDataFactoryV2Activity -Pipeline $pipeline -Name "GetFileShares" -Type GetMetadata -Source $sourceDataset

# Create Copy Data activity
$copyDataActivity = New-AzDataFactoryV2Activity -Pipeline $pipeline -Name "CopyFileShares" -Type CopyData -Source $sourceDataset -Destination $destinationDataset

# Add activities to the pipeline
Add-AzDataFactoryV2PipelineActivity -Pipeline $pipeline -Activity $getMetadataActivity
Add-AzDataFactoryV2PipelineActivity -Pipeline $pipeline -Activity $copyDataActivity

# Publish the pipeline
Publish-AzDataFactoryV2Pipeline -ResourceGroupName "yourResourceGroupName" -DataFactoryName "yourDataFactoryName" -Name "CopyFileSharesPipeline"

This PowerShell script demonstrates the basic steps for creating the linked services, datasets, and activities within your ADF pipeline. Remember to customize it with your specific account details, file paths, and folder names.

5. Conclusion

By combining the Get Metadata and Copy Data activities in Azure Data Factory, you can automate the process of copying file shares between Azure storage accounts. By understanding the error messages, carefully configuring datasets, and optimizing your copy operations, you can reliably move your data between different storage locations. This automation allows for efficient data management and provides a robust solution for handling large-scale file share transfers.