Commit 2682d34d authored by Marco Cavalli's avatar Marco Cavalli
Browse files

Align with latest changes

parent ea7fe5fe
Loading
Loading
Loading
Loading
+76 −0
Original line number Diff line number Diff line
# syntax=docker/dockerfile:1.7

FROM python:3.10.16-slim AS base

# Build argument for architecture (amd64 or arm64)
ARG TARGETARCH
RUN echo "Building for architecture: ${TARGETARCH}"

ENV PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1

# Install system dependencies required by convert.py and helper scripts
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    git \
    imagemagick \
    libreoffice \
    nodejs \
    npm \
    wget \
    && rm -rf /var/lib/apt/lists/*

# Install Prettier globally
RUN npm install -g prettier

# Install Pandoc 3.7.0.2 based on architecture
RUN if [ "$TARGETARCH" = "arm64" ]; then \
        wget https://github.com/jgm/pandoc/releases/download/3.7.0.2/pandoc-3.7.0.2-1-arm64.deb \
        && dpkg -i pandoc-3.7.0.2-1-arm64.deb \
        && rm pandoc-3.7.0.2-1-arm64.deb; \
    else \
        wget https://github.com/jgm/pandoc/releases/download/3.7.0.2/pandoc-3.7.0.2-1-amd64.deb \
        && dpkg -i pandoc-3.7.0.2-1-amd64.deb \
        && rm pandoc-3.7.0.2-1-amd64.deb; \
    fi

WORKDIR /app

# Install Python dependencies first for better layer caching
COPY requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt


FROM base AS source

WORKDIR /src

# Copy repository content as build context
COPY . /src

# Remove gitignored files from copied source (if .git metadata is available in CI)
RUN if [ -d .git ]; then \
        git clean -Xdf; \
    else \
        echo "Warning: .git not found in build context, cannot prune gitignored files."; \
    fi


FROM base AS runtime

WORKDIR /app

# Copy pruned source tree into final image
COPY --from=source /src /app

# Runtime folders
RUN mkdir -p /app/GENERATED_FILES /data/sources

RUN git config --system --add safe.directory "*"

# Persist generated artifacts and optional external sources
VOLUME ["/app/GENERATED_FILES", "/data/sources"]

# Pass docker run args directly to convert.py
ENTRYPOINT ["python", "convert.py"]
CMD ["--help"]
 No newline at end of file
+0 −7
Original line number Diff line number Diff line
# User Inputs Folder

This folder is used to contain various input files introduced to the script.

## Contents

**file_order**: Intended to contain various file ordering JSONS that follow the [template](../templates/json/file_order.json).
 No newline at end of file
+0 −65
Original line number Diff line number Diff line
# File Orderings

Place JSON files here that define how files contained in the conversion source directory should be ordered.

### Considerations

#### Ensure files defined in the JSON exist in the source directory

Unlike when using the script's default file ordering, a file that is defined in a file ordering provided to the script that is not present in the conversion source directory will cause the script to fail. It is important to make sure that any files defined in a provided JSON are present.

#### Preceding numbers in top-level headings override the script ordering

In the top-level heading of each Markdown file, a leading number (for example, `# 4 Clause Heading) will override the file order defined in the script. It is important to ensure that:
1. No two Markdown source files use the same preceding number in their top-level heading. It is best to make sure the numbers correspond with the file's intended place in the overall hierarchy.
2. Numbering of non-standard clauses and annexes begins with **4**, because **1**, **2**, and **3** are reserved for the predefined clauses *Scope*, *References*, and *Definitions*.

### Example

The following example specifies the ordering of a few example files. An [empty template](../../templates/json/file_order.json) is provided for convenience.

``` json
{
  "clauses": [
    "clause-example1",
    "clause-example3",
    "clause-example2",
    "example4"
  ],
  "annexes": [
    "annex-example1",
    "annex-3",
    "example2"
  ]
}
```

This will ensure that the following file order is produced.

1. **Universal ETSI initial files**
    1. Intellectual Property Rights
    2. Foreword
    3. Modal verbs terminology
    4. Executive summary
    5. Introduction
2. **ETSI universal initial clauses**
    1. Scope
    2. References
    3. Definition of terms, symbols and abbreviations
3. **Clauses defined in the JSON**
    1. *clause-example1.md*
    2. *clause-example3.md*
    3. *clause-example2.md*
    4. *example4.md*
4. **Other clauses**[^1]
5. **Annexes defined in the JSON**
    1. *annex-example1.md*
    2. *annex-3.md*
    3. *example2.md*
6. **Other annexes**[^2]
7. **Universal ETSI final files**
    1. History

[^1]: If any other files exist in the conversion source directory whose filenames follow the format *clause-{text}.md*, they will be added alphabetically after the JSON-defined clauses.

[^2]: Similar to [^1], any files in the source directory that follow the format *annex-{text}.md* will be added alphabetically after the JSON-defined annexes.
 No newline at end of file
+3 −3
Original line number Diff line number Diff line
@@ -202,7 +202,7 @@ At this point, the document's clauses and annexes can be created. It is importan

##### 2) Copy *customCSS.css* to the document's directory

`cp customCSS.css GENERATED_FILES/{folder_name}/customCSS.css`
`cp css/customCSS.css GENERATED_FILES/{folder_name}/customCSS.css`

##### 3) Prepare the images in *{folder_name}.docx* for the HTML

@@ -216,7 +216,7 @@ At this point, the document's clauses and annexes can be created. It is importan

#### Convert to HTML using Pandoc

`pandoc --resource-path GENERATED_FILES/{folder_name}/temp -f docx -t chunkedhtml -L filter_1.lua -L filter_2.lua --css=customCSS.css --css="{folder_name}.css" -s GENERATED_FILES/{folder_name}/temp/temp.docx -o GENERATED_FILES/{folder_name}/html_dirty --toc --toc-depth 4 --template=official.html --split-level=1`
`pandoc --resource-path GENERATED_FILES/{folder_name}/temp -f docx -t chunkedhtml -L filter_1.lua -L filter_2.lua --css=css/customCSS.css --css="{folder_name}.css" -s GENERATED_FILES/{folder_name}/temp/temp.docx -o GENERATED_FILES/{folder_name}/html_dirty --toc --toc-depth 4 --template=templates/html/official.html --split-level=1`

#### Delete temporary files that are no longer needed

@@ -326,7 +326,7 @@ The main scripts docx_to_md.sh (Linux/Mac) and docx_to_md.bat (Windows) automate
`--arch` optional: this parameter defines the target CPU architecture for the Docker image.
`--rebuild` optional: this parameter forces Docker to re-build the image from the Docker file.
`--help` optional: this parameter displays the available commands in the terminal.
`--docker` optional: this parameter (Linux/Mac only) allows you to run the script using a Docker image
`--no-docker` optional: this parameter (Linux/Mac only) allows you to run the script using a the local installation of python and all the dependencies


# 5. Debug
+115 −39
Original line number Diff line number Diff line
@@ -12,7 +12,6 @@ NOTE_PREFIX = ">>> [!note]"
EXAMPLE_NOTE_POSTFIX = ">>>"
BLOCK_CODE_PREFIX = "```"
BLOCK_CODE_POSTFIX = "```"
TABLE_PREFIX = "::: TAL"
IMAGE_PREFIX = "::: FL"
LABLE_PREFIX = "::: TF"
TABLE_HEADER_PREFIX = "::: TH"
@@ -22,29 +21,49 @@ CUSTOM_BLOCK_POSTFIX = ":::" # This is a common postfix for custom blocks like
PRETTIER_IGNORE_START_COMMENT = "<!-- prettier-ignore-start -->"
PRETTIER_IGNORE_END_COMMENT = "<!-- prettier-ignore-end -->"

TIP_NOTE_OPENING_RE = re.compile(r"^>>>\s*\[!(tip|note)\](.*)$", re.IGNORECASE)

def run_prettier_on_file(file_path):

def parse_tip_or_note_opening(content):
    """
    Parse tip/note opening lines supporting both forms:
    - >>> [!note]
    - >>>[!note]
    Returns (kind, suffix) where kind is 'tip' or 'note'.
    """
    match = TIP_NOTE_OPENING_RE.match(content)
    if not match:
        return None
    return match.group(1).lower(), match.group(2).strip()


def run_prettier_on_content(content, file_path):
    try:
        subprocess.run(
        result = subprocess.run(
            [
                "npx",
                "prettier",
                "--stdin-filepath",
                file_path,
                "--write",
                "--prose-wrap",
                "always",
            ],
            input=content,
            text=True,
            capture_output=True,
            check=True,
        )
    except FileNotFoundError:
        print("Error: 'npx' not found. Make sure Node.js/npm are installed.")
        return False
        return None
    except subprocess.CalledProcessError as error:
        print(
            f"Error: Prettier failed with exit code {error.returncode} on {file_path}."
        )
        return False
    return True
        if error.stderr:
            print(error.stderr.strip())
        return None
    return result.stdout


def remove_prettier_ignore_comments_not_from_this_script(md_lines):
@@ -138,7 +157,6 @@ def add_prettier_ignore(md_lines):
                    currently_in_table_block = True
                    table_type = "ascii"
                    table_beginning = i
                    i += 1
                    continue

                stripped = line.strip()
@@ -149,7 +167,6 @@ def add_prettier_ignore(md_lines):
                        )
                        currently_in_table_block = True
                        table_type = "markdown"
                        i += 1
                        continue

                i += 1
@@ -165,30 +182,36 @@ def add_prettier_ignore(md_lines):

                if should_end_table:
                    table_ending = i
                    i = add_prettier_ignore_comment(
                        lines, PRETTIER_IGNORE_END_COMMENT, i
                    )
                    # Simplify before adding comments (to avoid index shift)
                    if table_type == "ascii":
                        new_table = simplify_table(lines[table_beginning:table_ending])
                        if not new_table is None:
                        if new_table is not None and table_beginning is not None:
                            lines[table_beginning:table_ending] = new_table
                            # After substitution, recalculate where to add the end comment
                            i = table_beginning + len(new_table)
                    # Now add the prettier-ignore-end comment at the current position (before the external line)
                    i = add_prettier_ignore_comment(
                        lines, PRETTIER_IGNORE_END_COMMENT, i
                    )
                    table_beginning = None
                    table_ending = None
                    currently_in_table_block = False
                    table_type = None
                    i += 1
                    continue
                else:
                    i += 1

        if currently_in_table_block:
            table_ending = len(lines)
            add_prettier_ignore_comment(
                lines, PRETTIER_IGNORE_END_COMMENT, table_ending
            )
            if table_type == "ascii" and table_beginning is not None:
                new_table = simplify_table(lines[table_beginning:table_ending])
                if not new_table is None:
                if new_table is not None:
                    lines[table_beginning:table_ending] = new_table
                    table_ending = table_beginning + len(new_table)
            # Add prettier-ignore-end after the table content
            add_prettier_ignore_comment(
                lines, PRETTIER_IGNORE_END_COMMENT, table_ending
            )

        return lines

@@ -219,37 +242,50 @@ def add_prettier_ignore(md_lines):
            continue

        if not post_fix_expected:
            if line.startswith(EXAMPLE_PREFIX) or line.startswith(NOTE_PREFIX):
            stripped_line = line.lstrip(" \t").rstrip("\r\n")
            if parse_tip_or_note_opening(stripped_line):
                i = add_prettier_ignore_comment(
                    md_lines, PRETTIER_IGNORE_START_COMMENT, i
                )
                post_fix_expected = EXAMPLE_NOTE_POSTFIX
                i += 1
                continue
            elif line.startswith(BLOCK_CODE_PREFIX):
                i = add_prettier_ignore_comment(
                    md_lines, PRETTIER_IGNORE_START_COMMENT, i
                )
                post_fix_expected = BLOCK_CODE_POSTFIX
                i += 1
            elif line.startswith(TABLE_PREFIX):
                continue
            elif line.startswith(GENERIC_CUSTOM_BLOCK_PREFIX):
                i = add_prettier_ignore_comment(
                    md_lines, PRETTIER_IGNORE_START_COMMENT, i
                )
                post_fix_expected = CUSTOM_BLOCK_POSTFIX
                i += 1
                table_beginning = i
                # `i` points to the opening custom block line (e.g. ::: TAL).
                # Start simplification from the first table line inside the block.
                table_beginning = i + 1
                continue
            else:
                i += 1
        elif line.strip() == post_fix_expected.strip():
            # md_lines[i] is the closing line (:::, >>>, etc)
            # Simplify content BEFORE adding comments to avoid index shift
            table_ending = i
            i = add_prettier_ignore_comment(md_lines, PRETTIER_IGNORE_END_COMMENT, i)
            post_fix_expected = ""
            
            new_table = simplify_table(md_lines[table_beginning:table_ending])
            if not new_table is None:
            if new_table is not None and table_beginning is not None:
                md_lines[table_beginning:table_ending] = new_table
                # After substitution, closing line is now at: table_beginning + len(new_table)
                closing_line_pos = table_beginning + len(new_table)
            else:
                closing_line_pos = table_ending
            
            # Add prettier-ignore-end AFTER the closing line (insert at closing_line_pos + 1)
            i = add_prettier_ignore_comment(md_lines, PRETTIER_IGNORE_END_COMMENT, closing_line_pos + 1)
            
            table_beginning = None
            table_ending = None
            i += 1
            continue
        else:
            i += 1

@@ -602,7 +638,7 @@ def spacing_note_example_blocks(md_lines):
        stripped = line.strip()

        # Check if the current line starts with an Example or Note prefix
        if stripped.startswith((EXAMPLE_NOTE_POSTFIX, EXAMPLE_PREFIX, NOTE_PREFIX)):
        if stripped == EXAMPLE_NOTE_POSTFIX or parse_tip_or_note_opening(stripped):
            # Add a blank line before if the previous line is not empty
            if new_lines and new_lines[-1].strip() != "":
                new_lines.append("\n")
@@ -753,7 +789,7 @@ def try_open_tip_note(content, stack, i, file_path, in_grid_row):
    Check and handle the opening of a TIP or NOTE block (>>> [!tip] / >>> [!note]).
    Returns True if handled.
    """
    if content.startswith(EXAMPLE_PREFIX) or content.startswith(NOTE_PREFIX):
    if parse_tip_or_note_opening(content):
        # Rule: No same-type nesting for TIP_NOTE
        if any(b["type"] == "TIP_NOTE" for b in stack):
            print(f"CRITICAL ERROR: Cannot nest '{content}' inside another TIP/NOTE block at line {i + 1} in '{file_path}'.")
@@ -877,23 +913,55 @@ def are_tips_notes_syntax_valid(md_lines, file_path):
    - >>> [!tip] EXAMPLE 1: / >>> [!note] NOTE 1:
    """

    tip_pattern = r"^" + re.escape(EXAMPLE_PREFIX) + r"(?: EXAMPLE(?: \d+)?:?)?$"
    note_pattern = r"^" + re.escape(NOTE_PREFIX) + r"(?: NOTE(?: \d+)?:?)?$"
    tip_suffix_pattern = r"^(?:EXAMPLE(?: \d+)?:?)?$"
    note_suffix_pattern = r"^(?:(?:NOTE|EXAMPLE)(?: \d+)?:?)?$"

    for i, line in enumerate(md_lines):
        l_left = line.lstrip(' \t')
        l_clean = l_left.rstrip('\r\n')
        
        if l_clean.startswith(EXAMPLE_PREFIX):
            if not re.match(tip_pattern, l_clean):
        parsed = parse_tip_or_note_opening(l_clean)
        if not parsed:
            continue

        kind, suffix = parsed
        if kind == "tip":
            if not re.match(tip_suffix_pattern, suffix):
                print(f"CRITICAL ERROR: Invalid format after '{EXAMPLE_PREFIX}' at line {i + 1} in '{file_path}'.")
                return False
        elif l_clean.startswith(NOTE_PREFIX):
            if not re.match(note_pattern, l_clean):
        elif kind == "note":
            if not re.match(note_suffix_pattern, suffix):
                print(f"CRITICAL ERROR: Invalid format after '{NOTE_PREFIX}' at line {i + 1} in '{file_path}'.")
                return False
    return True

def correct_notes_and_tips(md_lines):

    for i, line in enumerate(md_lines):
        line_without_newline = line.rstrip("\r\n")
        leading_ws_len = len(line_without_newline) - len(
            line_without_newline.lstrip(" \t")
        )
        leading_ws = line_without_newline[:leading_ws_len]
        content = line_without_newline[leading_ws_len:]

        parsed = parse_tip_or_note_opening(content)
        if not parsed:
            continue

        kind, suffix = parsed
        prefix = EXAMPLE_PREFIX if kind == "tip" else NOTE_PREFIX

        if suffix and not suffix.endswith(":"):
            suffix = f"{suffix}:"

        normalized = f"{leading_ws}{prefix}"
        if suffix:
            normalized += f" {suffix}"
        md_lines[i] = normalized + "\n"

    return md_lines

def run_cleanup(file_path):
    with open(file_path, "r", encoding="utf-8") as f:
        content = f.read()
@@ -901,17 +969,25 @@ def run_cleanup(file_path):
    content_lines = content.splitlines(keepends=True)
    if not check_blocks_closed(content_lines, file_path):
        return
    correct_notes_and_tips(content_lines)
    if not are_tips_notes_syntax_valid(content_lines, file_path):
        sys.exit(1)
    content_lines = ensure_spacing_around_lists(content_lines)
    content_lines = spacing_note_example_blocks(content_lines)
    content_lines = remove_prettier_ignore_comments_not_from_this_script(content_lines)
    content_lines = use_md_image_syntax(content_lines)
    if not are_tips_notes_syntax_valid(content_lines, file_path):
        sys.exit(1)
    content_lines = use_md_label_syntax(content_lines)
    content_lines = format_json_blocks(content_lines)
    # add prettier ignore also simplifies tables.
    content_lines = add_prettier_ignore(content_lines)
    if not run_prettier_on_file(file_path):
    content_with_ignore = ''.join(content_lines)

    prettier_output = run_prettier_on_content(content_with_ignore, file_path)
    if prettier_output is None:
        return

    content_lines = prettier_output.splitlines(keepends=True)

    content_lines = remove_prettier_ignore(content_lines)
    content_lines = collapse_empty_spaces(content_lines)

Loading