Janne Mattila

From programmer to programmer -- Programming just for the fun of it

OneDrive tool to help me get out of the mess

Posted on: June 10, 2024

I have been taking pictures and videos for a long time. This means that I have had media files in various memory cards and hard drives over the years:

When OneDrive came, I started to use that to store my media files. I’ve tried to be careful and make backup of the files to external hard drives. But, as you can imagine, over the years I have ended up in mess.

I suddenly noticed that certain files are not in OneDrive. This is from times when phones and cameras were not automatically uploading the files to cloud.

I had to analyze the situation a bit:

Therefore, I decided to write a small tool to help me out: OneDrive tool.

OneDrive tool

Export

I started to think how to solve the problem. First, I needed to get the metadata of the files from OneDrive to a CSV file. If I have the metadata in CSV file, I can easily analyze it with PowerShell or in Excel. And all of this offline without calling to the OneDrive APIs.

Here is how I exported the data from OneDrive:

OneDriveTool `
  --export `
  --onedrive-file onedrive-export.csv

Tool starts to scan the files:

After 18 minute minutes, the tool has exported the data to CSV file:

Size of the file was 18 MB:

Here’s how the exported CSV looks like:

Few important columns from the CSV file:

I can now quickly analyze that I’ve indeed exported the data correctly:

$oneDriveCSV = Import-Csv -Path onedrive-export.csv -Delimiter ";"
"$($oneDriveCSV.Count) files found."

$oneDriveCSV | `
    Select-Object -Property Size -ExpandProperty Size | `
    Measure-Object -Sum | `
    Select-Object -ExpandProperty Sum | `
    ForEach-Object { $_ / 1GB }

Output:

73099 files found.
785,524600764737

Hard drive scan

Next, I wanted to scan the hard drives. Here’s the command I used:

OneDriveTool `
  --scan D:\Backup `
  --scan-file backup-harddrive1.csv `
  --onedrive-file onedrive-export.csv

Two new parameters needed:

Idea of this is to scan the files from hard drive and compare them to the files in OneDrive. By compare, I mean SHA1 hash comparison. In the export step, I exported the SHA1 hash of the files from OneDrive. For local files, I need to calculate the SHA1 hash so that I can compare it to the OneDrive hash. If the hashes match, then the file is already in OneDrive.

Tool starts to scan the files:

In the above screenshot, you can see that some of the files are already in OneDrive but some are not.

The above was test run but the real hard drive scan took a long time to finish:

After the scan, I have now CSV file with the scan results:

As you can see, there is InOneDrive column which tells if the local file is already in OneDrive or not.

The first hard drive had less than 100 files which are not in OneDrive. I decided to copy all the missing .jpg files to a temporary folder waiting for upload:

$csv = Import-Csv -Path backup-harddrive1.csv -Delimiter ";"

$filesMissingFromOneDrive = $csv | `
  Where-Object -Property InOneDrive -Value "FALSE" -EQ

"$($filesMissingFromOneDrive.Count) files found which are not available in OneDrive"

$jpgFiles = $filesMissingFromOneDrive | `
  Where-Object -Property Name -Value "*.jpg" -Like | `
  Select-Object @{
    Name       = 'Output';
    Expression = { $_.Path + "/" + $_.Name }
} | Select-Object -ExpandProperty Output

Copy-Item $jpgFiles \temp\copies -Force

The second hard drive had actually more files which were not in OneDrive. Also many of the file names were not unique.

I decided to generate unique names for the files and copy them to a temporary folder:

$csv = Import-Csv -Path backup-harddrive2.csv -Delimiter ";"

$filesMissingFromOneDrive = $csv | `
  Where-Object -Property InOneDrive -Value "FALSE" -EQ

"$($filesMissingFromOneDrive.Count) files found which are not available in OneDrive"

$allFiles = $filesMissingFromOneDrive | `
  Where-Object -Property Name -Value "*.info" -NotLike | `
  Where-Object -Property Name -Value "*.db" -NotLike | `
  Where-Object -Property Name -Value "*.ini" -NotLike | `
  Select-Object @{
    Name       = 'Output';
    Expression = { $_.Path + "/" + $_.Name }
} | Select-Object -ExpandProperty Output

$index = 10000
foreach ($source in $allFiles) {
    $destinationExtension = Split-Path -Path $source -Resolve -Extension
    Copy-Item $source "\temp\copies\$index$destinationExtension" -Force
    $index++
}

After the above, I just uploaded the files to the OneDrive using the browser. Here are updated stats after the upload:

78617 files found.
793,870651786216

It means that 5000 extra files were found during this process.

Here is the tool in GitHub:

What’s next?

One of the cameras I have used in the past is this Sony Camcorder:

It has been a good camera for me but there is one annoying thing with it. It saves the videos in .mts format. Unfortunately, OneDrive does not support playing .mts files directly in the app or in the browser. This means that I have to either download each video to local machine and watch it there or then I’ll have to convert the files to .mp4 format before uploading them to OneDrive.

Since I have my OneDrive data in CSV file, I can analyze it easily with PowerShell.

$oneDriveCSV = Import-Csv -Path onedrive-export.csv -Delimiter ";"
$mtsFiles = $oneDriveCSV | `
  Where-Object -Property Name -Value "*.mts" -Like

"$($mtsFiles.Count) .mts files found."

$mtsFiles | `
    Select-Object -Property Size -ExpandProperty Size | `
    Measure-Object -Sum | `
    Select-Object -ExpandProperty Sum | `
    ForEach-Object { $_ / 1GB }

Output from the above is:

811 .mts files found.
164,926849365234

So, I have 811 .mts files in my OneDrive and they take 164,9 GB of space. This is something I need to take care of next. Expect a blog post about that in the future.

I hope you find this useful!