Converting Html To Markdown
Author: Tracy WertzAt Alta3 Research we use Markdown to format all of our amazing hands-on labs, internal documentation, and even this blog!
Often, we find ourselves wanting to convert some HTML from some documentation website into Markdown. This doesn’t have to be a perfect solution with 100% accuracy, but we usually need that 80% solution which doesn’t require us to rewrite everything.
These steps will show you how to acquire clean HTML from an online source and convert it to Markdown using the settings we prefer.
The first step to any problem-solving exercise is to narrow the scope and the problem. In this case our goal is to acquire the most clean and uncluttered version of the HTML for future conversion.
For this example we will use a documentation page on kubernetes.io.
-
Navigate to the page, right click on the title of the page, and choose
Inspect Element
-
With the DevTools page up, the
<h1>
element should be highlighted and when we mouse over additional elements we should see the page identify the elements visually. -
Find the overall HTML DOM element which contains all of the HTML that needs to be converted. In this case we are looking for the DOM element which looks like:
<div id="docsContent"> ... lots of content here ... </div>
-
Right click on the containing DOM element (in this case the
<div id="docsContent">
) and chooseCopy > Copy Element
-
Next, we need to do the actual conversion. For this step we prefer to use a tool like TurnDown. This tool is great for doing the HTML to Markdown conversion, while also allowing for tweaking the output Markdown to match our preferred style.
-
Once on the TurnDown page, select all the pre-populated content in the left HTML panel and delete it (Ctrl+a Delete).
-
Paste our copied DOM element into the left panel. Some Markdown should appear in the right panel.
-
Before rushing off with that generated Markdown we choose a few tweaks to the bottom drop-down options. Here are our preferences:
- Heading style:
atx
- Horizontal rule:
___
- Bullet:
*
- Code block style:
fenced
- Fence:
```
- Em delimiter:
*
- Strong delimeter
**
- Link style:
inlined
- Link reference style
full
- Heading style:
Now you have a well formatted, converted Markdown document that can be copied out of the right panel and pasted into other reference material. Like we said in the start of this tutorial, this is likey to be an 80% solution as there may be:
- Lots of additional content you didn’t want that will need edited down
- Some HTML elementes or styling not available to Markdown which will be lost
- Elements that need Markdown styling applied in addition to the results of the conversion.
As always: “Trust but verify!”
Hopefully, this short tutorial on sourcing and converting HTML to Markdown was useful!