Commit 968040d7 authored by Marco Cavalli's avatar Marco Cavalli
Browse files

fix: wrong src folder when passing --src parameter

chore: add text in readme with workarounds for WSL users
parent 92d61390
Loading
Loading
Loading
Loading
+32 −5
Original line number Diff line number Diff line
@@ -20,7 +20,13 @@ Latest

Latest

### 1.1.1 Create a virtual environment
### 1.1.2 Optional Software

#### [WSL (Windows Subsystem for Linux)](https://learn.microsoft.com/en-us/windows/wsl/install)

See additional setup steps in [section 1.2.1](#121-for-use-with-wsl).

### 1.1.3 Create a virtual environment

If you prefer to use Minconda or Pyenv, setup a virtual environment according to the following steps. If you are not using Miniconda or Pyenv, you can use any other python environment manger, just go to point 3 of the list and install the requirements using pip.

@@ -45,7 +51,7 @@ If you prefer to use Minconda or Pyenv, setup a virtual environment according to
   1. `cd path/to/ETSI-GS-CIM-009`
   2. `pip install -r requirements.txt`

### 1.1.2 Check Pandoc
### 1.1.4 Check Pandoc

Ensure Pandoc version **3.7.0.2** is installed.

@@ -73,6 +79,25 @@ Current folder
        ├── ETSIstyles.css
```

### 1.2.1 For Use with WSL

To ensure the script runs correctly with WSL 1[^1], do one of the following...

- Ensure the directory containing the script's files is saved in the Linux filesystem (ex., at `\\wsl$\{Distro}\home\{user}\path\to\script\dir`).

  **OR**

- Ensure the current user has full control of the directory containing the script's files within the base Windows filesystem (ex., at `C:\path\to\script\dir`).
    1. Right click on the top-level directory in the file explorer.
    2. Click *Properties*
    3. Go to the *Security* tab
    4. Click *Edit* to modify permissions.
    5. In the list, select the current user.
       
       If the current user is not in the list, click *Add*. In the dialogue that opens, ensure that the *User* object type is selected and type the current user's username in the *Enter object names to select* box. Click *Check names*, then click *OK*.
    
    6. Under *Permissions*, check the box for *Full control*, then click *Apply* and *OK*.

## 1.3 Expected Folder Structure after conversion

Through conversion you will eventually end up with the following folder structure:
@@ -130,7 +155,7 @@ The script `convert.py` handles the following conversions:
  - If this argument is not provided, the source path will be _ETSI-GS-CIM-009_/_GENERATED_FILES_/_{value provided via `--folder`}_/_{value provided via `--frm`}_.

- `--file_order` - Provide the path to a JSON that contains the order in which user-created clauses and annexes should be.
    - If this argument is not provided, the default ordering convention will be used. See [section 2.2.1.1](#2.2.1.1-preparation-create-markdown-from-scratch) for more details.
    - If this argument is not provided, the default ordering convention will be used. See [section 2.2.1.1](#2211---preparation-create-markdown-from-scratch) for more details.

## 2.2  Conversion

@@ -154,7 +179,7 @@ Create Markdown from scratch or convert "dirty" HTML to "clean" Markdown
    - Custom: Create a file ordering JSON according to the [template](./templates/json/file_order.json) and provide it to the script with the `--file_order` argument.
    - In either case, if any other files that follow the naming convention *clause-{some text}* or *annex-{some text}* will be ordered alphabetically after the predefined files in the appropriate section of the document.
4. Create a *media* directory inside the directory created in step 1.
    - As images are added to the Markdown, place the image in `PNG` and `EMF` form inside.[^1]
    - As images are added to the Markdown, place the image in `PNG` and `EMF` form inside.[^2]

At this point, the document's clauses and annexes can be created. It is important to follow the following considerations to ensure the document is created properly:
- Ensure any files defined in a [file ordering JSON](./templates/json/file_order.json) are present in the source directory created in step 1, otherwise the script will print an error and quit prematurely.
@@ -224,4 +249,6 @@ Specify a different directory containing the HTML files.

`convert.py --frm html --to docx --folder {docname} --src relative/or/absolute/source/path`

[^1]: Method subject to change
[^1]: These steps may not be necessary with WSL 2, but it is recommended to follow them nevertheless.

[^2]: Method subject to change
+8 −3
Original line number Diff line number Diff line
@@ -207,15 +207,20 @@ def convert():
        if os.path.exists(DEST):
            shutil.rmtree(DEST)

        preprocess_html(SRC, SRC_TYPE, CONSOLIDATED_MD_PATH, FILE_ORDER_JSON)
        filename_numbers_mapping = preprocess_html(SRC, SRC_TYPE, CONSOLIDATED_MD_PATH, FILE_ORDER_JSON)
        filename_numbers_mapping_path = os.path.join(SRC, "filename_numbers_mapping.json")
        with open(filename_numbers_mapping_path, "w") as f:
            json.dump(filename_numbers_mapping, f, indent=4)

        # Conversion
        command = get_md_to_html_command(SRC, DEST, CONSOLIDATED_MD_PATH, CSS_SRC)

        try:
            subprocess.run(command, check=True, capture_output=True, text=True)
            os.remove(filename_numbers_mapping_path)
        except subprocess.CalledProcessError as e:
            print(f"Error converting Markdown files in {SRC} to HTML:\n{e.stderr}")
            os.remove(filename_numbers_mapping_path)
            sys.exit(1)

        # Copy the media directory back over to preserve the emfs, since Pandoc doesn't bring those over
@@ -225,8 +230,8 @@ def convert():
        shutil.copytree(f"{SRC}/media", f"{DEST}/media")

        # Copy ETSIstyles.css into the parent folder
        styles_css = CSS_SRC[0]
        shutil.copy(styles_css, os.path.join(FILEGEN_DIR, FOLDER))
        for css_file in CSS_SRC:
            shutil.copy(css_file, os.path.join(FILEGEN_DIR, FOLDER))
        shutil.copy("advancedTOCLogic.js", DEST)

        # Cleanup the consolidated Markdown
+1 −1
Original line number Diff line number Diff line
local pandoc = require "pandoc"
-- local logfile = io.open("debug.log", "a")

local mt, filename_numbers_mapping = pandoc.mediabag.fetch("../../../filename_numbers_mapping.json")
local mt, filename_numbers_mapping = pandoc.mediabag.fetch("filename_numbers_mapping.json")
filename_numbers_mapping = pandoc.json.decode(filename_numbers_mapping, false)

local function split(str, delimiter)
+12 −0
Original line number Diff line number Diff line
@@ -148,3 +148,15 @@ DEFAULT_HTML_CLAUSES = [f"clause-{i}" for i in range(1, 21)]

DEFAULT_ANNEXES = [f"annex-{letter}" for letter in "abcdefghijklmnopqrstuvwxyz"]
# endregion

# region Markdown Preprocessing Formatting Checks
METADATA_REGEX = r"(?:\{\.)?\w+(?:\s+\.\w+)*\}?"  # Follows the form `className` or `{.class1 .class2 ... .classN}`
DIV_START_REGEX = rf"[-\s]*:::\s{METADATA_REGEX}\s*"
DIV_END_REGEX = r"\s*:::\s*"

BAD_COLON_GROUP_REGEX = (
    r"(?<!:)(?::{1,2}|:{4,})(?!:)"  # Match all colon groups except the valid `:::`
)
# Match lines that start with : or :: and are not followed by letters
BAD_DIV_DELINEATOR_REGEX = r"^\s*(?::{1,2})(?![a-zA-Z])"
# endregion
+161 −27
Original line number Diff line number Diff line
import os, re, os, json
import sys
from typing_extensions import Literal

from src.constants import (
@@ -6,10 +7,20 @@ from src.constants import (
    INFORMATIVE_REF_FILE,
    DEFAULT_CLAUSES,
    DEFAULT_ANNEXES,
    REFS
    REFS,
    DIV_START_REGEX,
    DIV_END_REGEX,
    BAD_DIV_DELINEATOR_REGEX,
)

from src.utils import handle_consolidated_md, get_file_order, int_to_letter
from src.utils import (
    handle_consolidated_md,
    get_file_order,
    int_to_letter,
    p_warning,
    p_error,
    p_label,
)
from src.constants import MAX_HEADING_LEVEL

files_with_references = [NORMATIVE_REF_FILE, INFORMATIVE_REF_FILE]
@@ -18,6 +29,92 @@ files_with_references = [NORMATIVE_REF_FILE, INFORMATIVE_REF_FILE]
# region Helpers


def run_format_checks(filename: str, file_lines: list[str]):
    """Runs various checks on the Markdown file contents to ensure they are properly formatted. If any improper formatting is detected, display any fatal errors or warnings as necessary."""

    def check_divs():
        """
        ### Display an error and exit when...
        - An opening does not have a closing

        ### Display a warning when...
        - The number of openings and number of closings do not match
        - Find a closing without a corresponding opening, this is likely meant to be an opening and needs metadata
        """
        i = 0
        in_div = False
        in_div_no_metadata = (
            False  # For if/when a div is found that doesn't have any class
        )
        start_line_num = (
            0  # For keeping track of the line number at which the latest div was opened
        )

        # Keep track of numbers of div starts and div ends
        num_div_start = 0
        num_div_end = 0

        while i < len(file_lines):
            line = file_lines[i].replace("\n", "")
            line_num = i + 1

            bad_div_delin_match = re.match(BAD_DIV_DELINEATOR_REGEX, line)
            if bad_div_delin_match and line.startswith(":::") is False:
                # This div delineator doesn't have exactly three colons `:::`
                print(
                    p_error(
                        f"{p_label(filename)}:{p_label(line_num)}: Improperly formatted div delineator in line. Line: {p_label(line)}"
                    )
                )
                raise Exception("DIV_DELINEATOR_ERROR")

            start_match = re.match(DIV_START_REGEX, line)
            num_div_start += 1 if start_match else num_div_start

            if start_match:
                in_div_no_metadata = False  # Set this to false in case it was true from a previous div without metadata
                if in_div:
                    # The previous div wasn't closed, print error and quit
                    print(
                        p_error(
                            f"{p_label(filename)}:{p_label(start_line_num)}: No end tag found for div starting at this line"
                        )
                    )
                    raise Exception("DIV_DELINEATOR_ERROR")
                else:
                    # A normal div opener
                    in_div = True

                start_line_num = line_num
                i += 1
                continue

            end_match = re.match(DIV_END_REGEX, line)
            num_div_end += 1 if end_match else num_div_end

            if end_match:
                if not in_div and not in_div_no_metadata:
                    # This should open a div, but it doesn't have a class assigned to it
                    print(
                        p_warning(
                            f"{p_label(filename)}:{p_label(line_num)}: The delineator at this line seems to open a div, this div or one before it may not be correctly structured."
                        )
                    )
                    in_div_no_metadata = True

                elif not in_div and in_div_no_metadata:
                    # The closing to a classless div
                    in_div_no_metadata = False

                in_div = False
                i += 1
                continue

            i += 1

    check_divs()


def handle_less_than_greater_than_text(file_contents: str):
    """Replace `<` and `>` with `&lt;` and `&gt;` respectively and wrap the whole section in single code ticks to allow the text to render in the HTML"""
    regex = r"\<(?!img\b|span\b|sup|/sup)(.+?)\>"
@@ -92,6 +189,7 @@ def auto_number_content(
    file_contents: str, content_type: Literal["clauses", "annexes"]
):
    global example_counter, note_counter, note_in_table_counter

    def auto_number_heading(line: str):
        global clauses_counters, annexes_counters, figure_counter, table_counter
        new_heading = ""
@@ -157,8 +255,12 @@ def auto_number_content(
        new_line = line
        if "EXAMPLE" not in line:
            example_counter += 1
            if example_counter != 1: # if it is one the number can be omitted, need to check later
                new_line = line.replace(">>> [!tip]", f">>> [!tip] EXAMPLE {example_counter}:")
            if (
                example_counter != 1
            ):  # if it is one the number can be omitted, need to check later
                new_line = line.replace(
                    ">>> [!tip]", f">>> [!tip] EXAMPLE {example_counter}:"
                )
        return new_line

    def auto_number_note(line: str) -> str:
@@ -166,8 +268,12 @@ def auto_number_content(
        new_line = line
        if "NOTE" not in line:
            note_counter += 1
            if note_counter != 1:  # if it is one the number can be omitted, need to check later
                new_line = line.replace(">>> [!note]", f">>> [!note] NOTE {note_counter}:")
            if (
                note_counter != 1
            ):  # if it is one the number can be omitted, need to check later
                new_line = line.replace(
                    ">>> [!note]", f">>> [!note] NOTE {note_counter}:"
                )
        return new_line

    def auto_number_figure(line: str) -> str:
@@ -194,8 +300,12 @@ def auto_number_content(
        global note_in_table_counter
        new_line = line
        note_in_table_counter += 1
        if note_in_table_counter != 1:  # if it is one the number can be omitted, need to check later
            new_line = line.replace(">>> [!note]", f">>> [!note] NOTE {note_in_table_counter}:")
        if (
            note_in_table_counter != 1
        ):  # if it is one the number can be omitted, need to check later
            new_line = line.replace(
                ">>> [!note]", f">>> [!note] NOTE {note_in_table_counter}:"
            )
        return new_line

    # take line and line number and replace the line number
@@ -214,39 +324,45 @@ def auto_number_content(
            previous_heading = new_heading

            if example_counter >= 1 and first_example_line_index != -1:
                lines[first_example_line_index] += f" EXAMPLE{' 1' if example_counter > 1 else ''}:"
                lines[
                    first_example_line_index
                ] += f" EXAMPLE{' 1' if example_counter > 1 else ''}:"
            example_counter = 0
            first_example_line_index = -1

            if note_counter >= 1 and first_note_line_index != -1:
                lines[first_note_line_index] += f" NOTE{' 1' if note_counter > 1 else ''}:"
                lines[
                    first_note_line_index
                ] += f" NOTE{' 1' if note_counter > 1 else ''}:"
            note_counter = 0
            first_note_line_index = -1


        elif line.startswith(">>> [!tip]"):
            new_line = auto_number_example(new_line)
            if example_counter == 1:
                first_example_line_index = i


        elif line.startswith(">>> [!note]"):
            new_line = auto_number_note(new_line)
            if note_counter == 1:
                first_note_line_index = i


        elif previous_line.startswith("::: TF"):
            new_line = auto_number_figure(new_line)


        elif previous_line.startswith("::: TH"):
            new_line = auto_number_table(new_line)

            if note_in_table_counter >= 1 and first_note_in_table_line_index != -1:
                note_string = f" NOTE{' 1' if note_in_table_counter > 1 else ''}:"
                first_index_after_bracket = lines[first_note_in_table_line_index].find("[!note]")
                lines[first_note_in_table_line_index] = lines[first_note_in_table_line_index][:first_index_after_bracket] + note_string + lines[first_note_in_table_line_index][first_index_after_bracket:]
                first_index_after_bracket = lines[first_note_in_table_line_index].find(
                    "[!note]"
                )
                lines[first_note_in_table_line_index] = (
                    lines[first_note_in_table_line_index][:first_index_after_bracket]
                    + note_string
                    + lines[first_note_in_table_line_index][first_index_after_bracket:]
                )
            note_in_table_counter = 0
            first_note_in_table_line_index = -1

@@ -255,22 +371,29 @@ def auto_number_content(
            if note_in_table_counter == 1:
                first_note_in_table_line_index = i


        lines[i] = new_line
        previous_line = line

    ### We need to run again the logic where we add the number in examples and notes since we might not have done it for all cases (it triggers on specific points, and if it happens the element is in the last heading/table it may be skipped)

    if example_counter >= 1 and first_example_line_index != -1:
        lines[first_example_line_index] += f" EXAMPLE{' 1' if example_counter > 1 else ''}:"
        lines[
            first_example_line_index
        ] += f" EXAMPLE{' 1' if example_counter > 1 else ''}:"

    if note_counter >= 1 and first_note_line_index != -1:
        lines[first_note_line_index] += f" NOTE{' 1' if note_counter > 1 else ''}:"

    if note_in_table_counter >= 1 and first_note_in_table_line_index != -1:
        note_string = f" NOTE{' 1' if note_in_table_counter > 1 else ''}:"
        first_index_after_bracket = lines[first_note_in_table_line_index].find("[!note]")
        lines[first_note_in_table_line_index] = lines[first_note_in_table_line_index][:first_index_after_bracket] + note_string + lines[first_note_in_table_line_index][first_index_after_bracket:]
        first_index_after_bracket = lines[first_note_in_table_line_index].find(
            "[!note]"
        )
        lines[first_note_in_table_line_index] = (
            lines[first_note_in_table_line_index][:first_index_after_bracket]
            + note_string
            + lines[first_note_in_table_line_index][first_index_after_bracket:]
        )

    file_contents = "\n".join(lines) + "\n"
    return file_contents
@@ -340,12 +463,18 @@ def preprocess(
            try:
                text = open(input_path, "r", encoding="utf-8").read()

                run_format_checks(filename, text.splitlines())

                if filename in clauses_filenames:
                    text = auto_number_content(text, "clauses")
                    filename_numbers_mapping[filename_without_extension] = clauses_counters[0]
                    filename_numbers_mapping[filename_without_extension] = (
                        clauses_counters[0]
                    )
                elif filename in annexes_filenames:
                    text = auto_number_content(text, "annexes")
                    filename_numbers_mapping[filename_without_extension] = int_to_letter(annexes_counters[0]).lower()
                    filename_numbers_mapping[filename_without_extension] = (
                        int_to_letter(annexes_counters[0]).lower()
                    )
                text = add_ids_to_references(text, filename)
                text = handle_less_than_greater_than_text(text)
                text = add_ids_to_headings(text)
@@ -363,9 +492,14 @@ def preprocess(
                # print(
                #     f"Warning: Could not preprocess {input_path}. It may not be a valid UTF-8 text file or is missing."
                # )
                if e.args[0] == "DIV_DELINEATOR_ERROR":
                    # delete all files that start with --preprocessed--
                    for f in os.listdir(src):
                        if f.startswith("--preprocessed--"):
                            os.remove(os.path.join(src, f))
                    sys.exit(1)
                pass

    with open("filename_numbers_mapping.json", "w") as f:
        json.dump(filename_numbers_mapping, f, indent=4)

    handle_consolidated_md("create", src, consolidated_md_path, preprocessed_filenames)

    return filename_numbers_mapping