I’m working through converting my blog from Drupal to Jekyll (it’s a long story) and one of the things I needed to do is to convert a bunch of posts originally written in HTML into Markdown. With a little application of PowerShell, most of the heavy lifting was done fairly quickly - leaving just a manual review and tweak of each post.

Here’s the core of the PowerShell script I used:

foreach( $source in (get-childitem .\_posts\*.md )) {
    $sourceName = $source.Name
    Write-Host $sourceName
 
    # Load the contents of the file as a string 
    $content = get-content $source | join-string -newline
    $content = "$content"
    
    # Convert Links from <a> to Markdown style
    $content = $content -replace '<a\s+href="([^"]+)">([^<]+)</a>', '[$2]($1)'

    # Convert paragraphs and lists
    $content = $content -replace "\s*<ul>\s*", "`r`n"
    $content = $content -replace "\s*</ul>\s*", "`r`n"
    $content = $content -replace "\s*<ol>\s*", "`r`n"
    $content = $content -replace "\s*</ol>\s*", "`r`n"
    $content = $content -replace "<p>", "`r`n"
    $content = $content -replace "</p>", "`r`n"
    $content = $content -replace "<li>", "`r`n  *  "
    $content = $content -replace "</li>", ""
    
    # Word wrap each paragraph
    $content = $content -split "`r`n" | foreach-object { wrap-string $_ 120 } | join-string -separator "`r`n"
    
    # Word/Phrase highlighting    
    $content = $content -replace "<em>", "*"
    $content = $content -replace "</em>", "*"
    $content = $content -replace "<b>", "**"
    $content = $content -replace "</b>", "**"
    $content = $content -replace "<strong>", "**"
    $content = $content -replace "</strong>", "**"
    $content = $content -replace "&quot;", "'"
    
    $content = $content -replace "<!--break-->", ""
    
    # Eliminate excess whitespace
    $content = $content -replace "/^\s*$/",""
    $content = $content -replace "`r`n`r`n`r`n","`r`n`r`n"
    $content = $content -replace "`r`n`r`n`r`n","`r`n`r`n"
    $content = $content -replace "`r`n`r`n`r`n","`r`n`r`n"
    $content = $content -replace "`r`n`r`n`r`n","`r`n`r`n"

    set-content .\_processed\$sourceName -value $content 
}

Comments

blog comments powered by Disqus
Next Post
Time for a change 24 Jun 2014
Prior Post
Language Extensions for C# 19 May 2014
Related Posts
Unit Testing with Psake 19 Aug 2017
Readable output from a Psake build 12 Aug 2017
Finding MSBuild in a Psake build 05 Aug 2017
Build Automation with Psake 29 Jul 2017
Checklists, Automation and Consistency 19 Jul 2015
Powershell Prompts 12 Dec 2013
Powershell Rocks 14 Dec 2012
It's a (PowerShell) Trap 06 Jan 2012
Related Pages
May 2014 archive