@@ -57,29 +57,7 @@ Ensure Pandoc version **3.7.0.2** is installed.
`pandoc --version`
## 1.2 Initialize folders
Run the following command to create the necessary folder structure:
```bash
python init.py --folder{folder_name}
```
This will generate the folder structure required for the conversion process.
The expected folder structure is as follows:
```
Current folder
└── GENERATED_FILES
└── {folder_name}
├── md
└── media
├── customCSS.css
├── ETSIstyles.css
```
### 1.2.1 For Use with WSL
## 1.2 For Use with WSL
To ensure the script runs correctly with WSL 1[^1], do one of the following...
@@ -159,20 +137,45 @@ The script `convert.py` handles the following conversions:
## 2.2 Conversion
`convert.py`requires "dirty" HTML to convert to Markdown. Therefore, if the "dirty" HTML does not already exist, it must be generated from the document in docx format.
`convert.py`is a script that handles the conversion of documents between different formats. It can be used to convert Markdown files to HTML, HTML files to Docx, and "dirty" HTML files to "clean" Markdown (when one wants to start a new document from some content in a DOCX file).
**NOTE**: For the following commands, `{docname}` refers to the filename of the document without the extension. For example, the `{docname}` _API_ refers to _API.docx_.
### 2.2.0 General Usage (Markdown to HTML to DOCX)
### 2.2.1 Preparation: Generate "dirty" HTML
It follows the general usage pattern:
### 2.2.1 Preparation
---
**From md to html (validation of Markdown):**
`convert.py --frm md --to html --folder {folder_name} --src relative/or/absolute/source/path`
**NOTE**: The `--src` argument is optional and can be used to specify the source directory when the Markdown files are located in a different directory (e.g. a dedicated repository or workspace). When provided, it will use it to locate the md files. This shall point to the directory containing the Markdown files and the media folder.
**NOTE 2**: If `--src` is omitted, the script will expect to find a "md" folder in the "./GENERATED_FILES/<folder_name>" path containing the Markdown files and the media folder.
**From html to docx (generation of DOCX):**
`convert.py --frm html --to docx --folder {folder_name}`
### 2.2.1 Starting from scratch or existing DOCX to Markdown
Run the following command to create the necessary folder structure:
```bash
python init.py --folder{folder_name}
```
Create Markdown from scratch or convert "dirty" HTML to "clean" Markdown
This will generate the folder structure required for the conversion process.
From now on, either of the following two paths can be taken:
- Create Markdown files from scratch, see [section 2.2.1.1](#2211---create-markdown-from-scratch)
- Convert an existing DOCX file to Markdown, see [section 2.2.1.2](#2212---convert-docx-to-markdown)
In both cases, you will end up with markdown files in *GENERATED_FILES*/*{folder_name}*/*md*. Then proceed to [section 2.2.2](#222---dirty-html-to-markdown) to convert the "dirty" HTML to "clean" Markdown.
#### 2.2.1.1 Create Markdown from scratch
1. If it does not already exist, create the directory *GENERATED_FILES*/*{docname}*/*md* where `convert.py` is.
1. If it does not already exist, create the directory *GENERATED_FILES*/*{folder_name}*/*md* where `convert.py` is by running the init script.
2. Copy template files of [Required Markdown files](./templates/document_skeleton/)
3. Decide whether to use the script's default naming convention or custom clause and annex names.
- Default: Name clauses *clause-{number 4-20}* and annexes *annex-{letter a-z}*. Clauses and annexes will be arranged in order according to the alphanumeric suffix.
@@ -187,67 +190,65 @@ At this point, the document's clauses and annexes can be created. It is importan
#### 2.2.1.2 Generate "dirty" HTML
##### 1) Preprocess *{docname}.docx*
##### 1) Preprocess *{folder_name}.docx*
`preprocessing.py {docname}.docx`
`preprocessing.py {folder_name}.docx`
##### 2) Copy *customCSS.css* to the document's directory
#### Delete temporary files that are no longer needed
`rm -r GENERATED_FILES/{docname}/temp`
`rm -r GENERATED_FILES/{folder_name}/temp`
### 2.2.2 Dirty HTML to Markdown
---
#### Generate Markdown from the "dirty" HTML
Starting with "dirty" HTML contained in the default source location (that is, _GENERATED_FILES_/_{docname}_/_html_dirty_), convert to "clean" Markdown, which will be contained in _GENERATED_FILES_/_{docname}_/_md_.
Starting with "dirty" HTML contained in the default source location (that is, _GENERATED_FILES_/_{folder_name}_/_html_dirty_), convert to "clean" Markdown, which will be contained in _GENERATED_FILES_/_{folder_name}_/_md_.
Starting with Markdown files contained in the default source location (_GENERATED_FILES_/_{docname}_/_md_), convert to HTML. The Markdown is assumed to be in a "clean" state.
Starting with Markdown files contained in the default source location (_GENERATED_FILES_/_{folder_name}_/_md_), convert to HTML. The Markdown is assumed to be in a "clean" state.
`convert.py --frm md --to html --folder {docname}`
`convert.py --frm md --to html --folder {folder_name}`
Specify a different directory containing the Markdown files.
`convert.py --frm md --to html --folder {docname} --src relative/or/absolute/source/path`
`convert.py --frm md --to html --folder {folder_name} --src relative/or/absolute/source/path`
### 2.2.4 HTML to Docx
### 2.2.3 HTML to Docx
---
Starting with HTML files contained in the default source location (_GENERATED_FILES_/_{docname}_/_html_), convert to Docx.
Starting with HTML files contained in the default source location (_GENERATED_FILES_/_{folder_name}_/_html_), convert to Docx.
`convert.py --frm html --to docx --folder {docname}`
`convert.py --frm html --to docx --folder {folder_name}`
Specify a different directory containing the HTML files.
`convert.py --frm html --to docx --folder {docname} --src relative/or/absolute/source/path`
`convert.py --frm html --to docx --folder {folder_name} --src relative/or/absolute/source/path`
[^1]:These steps may not be necessary with WSL 2, but it is recommended to follow them nevertheless.