Wherever you look, you find tools and resources to help you migrate from something that is not Cosmos DB, to Cosmos DB. The Cosmos DB migration utility tool can only export a JSON file at most, but that can rarely be good enough if you need to move around more than a handful of documents.
Assume that you decided to move your data out of Cosmos DB. I ain’t asking you why. Either way, you’re gonna have a very hard time finding a ready made solution for this.
I recently had such a use case for a client. They wanted to migrate their data from Cosmos DB to SQL Server. They figured out that they would benefit from it both in terms of costs as well as performance in their particular scenario.
It took me a while to realize that I’m not going to find a publicly available migration tool for the job. So… You guessed it… I decided to make one on my own.
Building a simple dotnet console app wasn’t too hard actually. There were a few quirks to work out, such as how to stream a lot of data from Cosmos DB without it timing out, or how to efficiently insert a bulk of data into SQL Server. Eventually, I got it working. Well… Roughly.
The app has a configuration file to let you easily configure a bunch of important stuff, such as connection details to your source and destination, and a few other settings. Other than that, it’s as straightforward as can be.
{
"CosmosDB": {
"sourceEndPoint": "https://<cosmosdbaccount>.documents.azure.com:443/",
"sourceAuthKey": "<CosmosDbAccessKey>",
"sourceDatabase": "CosmosDbDatabase",
"sourceCollection": "CosmosDbCollection",
"sourceQuery": "SELECT c.Field1, c.Field2, c.Field3 FROM c",
"sourceMaxCountPerFetch": 1000
},
"SQLServer": {
"targetHost": "<TargetSqlServerName>.database.windows.net",
"targetDatabase": "SqlDatabaseName",
"targetUsername": "SqlUsername",
"targetPassword": "SqlPassword",
"rowsPerChunk": 5000,
"mergeProcedure": null,
"useBulkCopy": true,
"targetTable": "dbo.SqlTargetStagingTable",
"truncateTargetTable": true,
"targetProcedureTVP": "SqlStoredProcedureWithTVP",
"maxRetries": 10,
"delaySecondsBetweenRetries": 30
},
"FieldsToCopy": [
"Field1",
"Field2",
"Field3"
]
}
How does this work
The general idea is this:
The app executes a Cosmos DB query and collects a number of records into its “buffer”.
Once that “buffer” reaches a certain number of records (configurable), it’s time to “flush” it into the SQL Server. That could be either a database table receiving a Bulk Copy stream, or a stored procedure receiving a table valued parameter (again, configurable).
After the buffer is flushed, we have the option to execute a “merge” procedure. This is a stored procedure that would implement an “upsert” logic from the “staging” table and into the actual destination table.
The merge procedure is optional (yup, configurable). If you don’t specify it in the settings then the app continues onto the next group of records without executing a stored procedure between flushes.
Additionally, there’s the “FieldsToCopy” configuration which holds a simple string array of the columns that we would want to actually copy over from the CosmosDB results to the SQL Server target. This should be useful when you cannot guarantee that all the fields that you want are always returned with values from the CosmosDB query (after all, this is a NoSQL document-based database we’re dealing with, so a field might exist and might not).
There isn’t much sophistication beyond that in the app, even though there’s plenty more that could’ve been added.
Where to get it
As usual, I’m using GitHub for my open-source projects. I created a new repository to host this project and it’s available here:
https://github.com/MadeiraData/CosmosDb-to-MSSQL-Migration
Feel free to download, experiment, and contribute (issues, pull requests… you know the drill).