- Square brackets with special handling for tables and spans
5. Adds prettier-ignore comments before and after tables to preserve formatting
6. Renames HTML classes to use hyphens instead of underscores
7. Applies special handling for reference pages
### Args
- `markdown_dir`: string representation of the destination Markdown directory
- `is_cleanup`: `True` if additional formatting should be performmed on the generated Markdown, `False` otherwise. This will be `True` when converting from "dirty" HTML.
- `markdown_dir`: String path to directory containing the generated Markdown files
- `is_cleanup`: `True` if additional formatting should be performed (when converting from "dirty" HTML), `False` otherwise
Preprocessing mandatory for conversion from HTML to Markdown. **Operating on a single source HTML file**, performs the following tasks:
1. Removes the table of contents Pandoc adds to the HTML.
2. Performs cleanup tasks if the `--cleanup` flag is passed.
3. Ensure abbreviations' meanings have the correct indentation and alignment.
4. Formats notes and examples to be more managable for humans to modify.
5. Removes Pandoc-generated metadata from headers
1. Removes the table of contents, CSS links, headers/footers, buttons, and other Pandoc-generated elements
2. Removes flex and flex-item elements that aren't needed in Markdown
3. Formats abbreviations to have a more readable structure in Markdown
4. Restructures notes and examples:
- Converts inner divs to paragraphs
- Moves related code blocks into the body of notes/examples (when cleanup flag is enabled)
5. Simplifies headings by:
- Removing data-number attributes
- Generating cleaner IDs based on heading text
- Creating a mapping between old and new IDs
6. Updates links in the document to use expected filenames (when cleanup flag is enabled)
7. Applies additional cleaning operations for "dirty" HTML sources
### Arguments
- `src_path`: the absolute or relative path to the source HTML file
- `dest_path`: the absolute or relative path where the processed HTML file, which Pandoc will then convert to Markdown, will be saved.
- `is_cleanup`: `True` if additional formatting should be performmed on the generated Markdown, `False` otherwise. This will be `True` when converting from "dirty" HTML.
- `css_src`: A list of the absolute or relative paths to all source CSS files. Necessary to pass down to `preprocess_cleaning`.
- `src_path`: The absolute or relative path to the source HTML file
- `dest_path`: The absolute or relative path where the processed HTML file will be saved
- `is_cleanup`: `True` if additional formatting should be performed (when converting from "dirty" HTML), `False` otherwise
- `css_src`: A list of absolute or relative paths to all source CSS files (needed for cleanup processing)
- `filenames_mapping`: Dictionary mapping old filenames to expected filenames
### Returns
- A dictionary mapping old heading IDs to new heading IDs