As I have been speaking at a number of events recently I also have been updating my GitHub Events repository. Usually I include a markdown file with a short description, my demos and my slides. I had been uploading my files as .pptx and I noticed that the repository edged over 100 MB. This prompted me into reconsidering this approach, I felt I needed to address the following:
- Use the most compatible format available, presentations should be viewable on any device
- Fonts should be correctly represented
- File size should be minimal
In an effort to more efficiently use the space I have available and to use a more compatible format I decided to convert my presentations to .pdf.
Because I do not like doing stuff manually I decided to use PowerShell in combination with a bit of bash scripting to get my repository updated. First lets take a look what kind of data we are dealing with:
1 2 | Get-ChildItem C:\git\Events -File -Filter *pptx -Recurse | Select-Object -Property FullName |
In total 29 presentations uploaded in .pptx format, if I would have to convert these by hand it would take about 30 minutes. Taking a look at what is possible with the PowerPoint.Application Com-Object took about 5 minutes and an additional 5 to put together the following script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | Get-ChildItem C:\git\Events -File -Filter *pptx -Recurse | ForEach-Object -Begin { $null = Add-Type -AssemblyName Microsoft.Office.Interop.PowerPoint $SaveOption = [Microsoft.Office.Interop.PowerPoint.PpSaveAsFileType]::ppSaveAsPDF $PowerPoint = New-Object -ComObject "PowerPoint.Application" } -Process { $Presentation = $PowerPoint.Presentations.Open($_.FullName) $PdfNewName = $_.FullName -replace '\.pptx$','.pdf' $Presentation.SaveAs($PdfNewName,$SaveOption) $Presentation.Close() } -End { $PowerPoint.Quit() Stop-Process -Name POWERPNT -Force } |
This script will recursively look for all .pptx files in the Events repository and the run the following code:
- In the begin block load the PowerPoint Com-Object and the required type for storing files as .pdf
- For each presentation, open the presentation, generate a new name and convert it to .pdf
- Finally at the end close the PowerPoint application and afterwards using Stop-Process to close the window, note that if you had any other PowerPoint windows open they will also be closed.
Now I have both the .pdf and the .pptx stored in the folder, let’s take a look what the difference in file size is:
1 2 3 4 5 6 7 8 9 | foreach ($Extension in ('pptx','pdf')) { Get-ChildItem C:\git\Events -File -Filter "*$Extension" -Recurse | Measure-Object -Property Length -Sum | ForEach-Object { [pscustomobject]@{ 'SizeinMB' = [math]::Round($_.Sum/1MB,2) 'Extension' = $Extension } } } |
A nice decrease in size and a format that is more suitable for sharing, this is looking good. After verifying that the .pdf files are looking good we can remove the .pptx files with the following code:
1 2 | Get-ChildItem C:\git\Events -File -Filter *pptx -Recurse | Remove-Item -Force |
The last step is to commit everything to GitHub and make it available to everyone. I found a nice Stack Overflow thread that explained how to mass remove files:
Removing multiple files from a Git repo that have already been deleted from disk
Which left me with the following commands to run to commit everything to the repository using bash:
1 2 3 4 | git ls-files --deleted -z | xargs -0 git rm git add * git commit -m "Removed pesky pptx and added glorious pdf" git push origin master |
And to view the result here is what is looks like on GitHub now and the commit:
GitHub – JaapBrasser – Events – Commits
Let me know what you think, is .pdf a more useful format over .pptx to share presentations or would you rather see it the other way around?