Align with latest changes (2682d34d) · Commits · CIM - Context Information Management / NGSI-LD API

md_to_docx_converter/Dockerfile.cicd

0 → 100644

+76 −0

Original line number	Diff line number	Diff line
		# syntax=docker/dockerfile:1.7

		FROM python:3.10.16-slim AS base

		# Build argument for architecture (amd64 or arm64)
		ARG TARGETARCH
		RUN echo "Building for architecture: ${TARGETARCH}"

		ENV PYTHONUNBUFFERED=1 \
		PIP_NO_CACHE_DIR=1

		# Install system dependencies required by convert.py and helper scripts
		RUN apt-get update && apt-get install -y --no-install-recommends \
		curl \
		git \
		imagemagick \
		libreoffice \
		nodejs \
		npm \
		wget \
		&& rm -rf /var/lib/apt/lists/*

		# Install Prettier globally
		RUN npm install -g prettier

		# Install Pandoc 3.7.0.2 based on architecture
		RUN if [ "$TARGETARCH" = "arm64" ]; then \
		wget https://github.com/jgm/pandoc/releases/download/3.7.0.2/pandoc-3.7.0.2-1-arm64.deb \
		&& dpkg -i pandoc-3.7.0.2-1-arm64.deb \
		&& rm pandoc-3.7.0.2-1-arm64.deb; \
		else \
		wget https://github.com/jgm/pandoc/releases/download/3.7.0.2/pandoc-3.7.0.2-1-amd64.deb \
		&& dpkg -i pandoc-3.7.0.2-1-amd64.deb \
		&& rm pandoc-3.7.0.2-1-amd64.deb; \
		fi

		WORKDIR /app

		# Install Python dependencies first for better layer caching
		COPY requirements.txt /tmp/requirements.txt
		RUN pip install -r /tmp/requirements.txt


		FROM base AS source

		WORKDIR /src

		# Copy repository content as build context
		COPY . /src

		# Remove gitignored files from copied source (if .git metadata is available in CI)
		RUN if [ -d .git ]; then \
		git clean -Xdf; \
		else \
		echo "Warning: .git not found in build context, cannot prune gitignored files."; \
		fi


		FROM base AS runtime

		WORKDIR /app

		# Copy pruned source tree into final image
		COPY --from=source /src /app

		# Runtime folders
		RUN mkdir -p /app/GENERATED_FILES /data/sources

		RUN git config --system --add safe.directory "*"

		# Persist generated artifacts and optional external sources
		VOLUME ["/app/GENERATED_FILES", "/data/sources"]

		# Pass docker run args directly to convert.py
		ENTRYPOINT ["python", "convert.py"]
		CMD ["--help"]
		No newline at end of file

md_to_docx_converter/INPUT/README.md

deleted100644 → 0

+0 −7

Original line number	Diff line number	Diff line
		# User Inputs Folder

		This folder is used to contain various input files introduced to the script.

		## Contents

		file_order: Intended to contain various file ordering JSONS that follow the [template](../templates/json/file_order.json).
		No newline at end of file

md_to_docx_converter/INPUT/file_order/README.md

deleted100644 → 0

+0 −65

Original line number	Diff line number	Diff line
		# File Orderings

		Place JSON files here that define how files contained in the conversion source directory should be ordered.

		### Considerations

		#### Ensure files defined in the JSON exist in the source directory

		Unlike when using the script's default file ordering, a file that is defined in a file ordering provided to the script that is not present in the conversion source directory will cause the script to fail. It is important to make sure that any files defined in a provided JSON are present.

		#### Preceding numbers in top-level headings override the script ordering

		In the top-level heading of each Markdown file, a leading number (for example, `# 4 Clause Heading) will override the file order defined in the script. It is important to ensure that:
		1. No two Markdown source files use the same preceding number in their top-level heading. It is best to make sure the numbers correspond with the file's intended place in the overall hierarchy.
		2. Numbering of non-standard clauses and annexes begins with 4, because 1, 2, and 3 are reserved for the predefined clauses Scope, References, and Definitions.

		### Example

		The following example specifies the ordering of a few example files. An [empty template](../../templates/json/file_order.json) is provided for convenience.

		``` json
		{
		"clauses": [
		"clause-example1",
		"clause-example3",
		"clause-example2",
		"example4"
		],
		"annexes": [
		"annex-example1",
		"annex-3",
		"example2"
		]
		}
		```

		This will ensure that the following file order is produced.

		1. Universal ETSI initial files
		1. Intellectual Property Rights
		2. Foreword
		3. Modal verbs terminology
		4. Executive summary
		5. Introduction
		2. ETSI universal initial clauses
		1. Scope
		2. References
		3. Definition of terms, symbols and abbreviations
		3. Clauses defined in the JSON
		1. clause-example1.md
		2. clause-example3.md
		3. clause-example2.md
		4. example4.md
		4. Other clauses[^1]
		5. Annexes defined in the JSON
		1. annex-example1.md
		2. annex-3.md
		3. example2.md
		6. Other annexes[^2]
		7. Universal ETSI final files
		1. History

		[^1]: If any other files exist in the conversion source directory whose filenames follow the format clause-{text}.md, they will be added alphabetically after the JSON-defined clauses.

		[^2]: Similar to [^1], any files in the source directory that follow the format annex-{text}.md will be added alphabetically after the JSON-defined annexes.
		No newline at end of file

md_to_docx_converter/README.md

+3 −3

Original line number	Diff line number	Diff line
		@@ -202,7 +202,7 @@ At this point, the document's clauses and annexes can be created. It is importan

		##### 2) Copy customCSS.css to the document's directory

		`cp customCSS.css GENERATED_FILES/{folder_name}/customCSS.css`
		`cp css/customCSS.css GENERATED_FILES/{folder_name}/customCSS.css`

		##### 3) Prepare the images in {folder_name}.docx for the HTML

		@@ -216,7 +216,7 @@ At this point, the document's clauses and annexes can be created. It is importan

		#### Convert to HTML using Pandoc

		`pandoc --resource-path GENERATED_FILES/{folder_name}/temp -f docx -t chunkedhtml -L filter_1.lua -L filter_2.lua --css=customCSS.css --css="{folder_name}.css" -s GENERATED_FILES/{folder_name}/temp/temp.docx -o GENERATED_FILES/{folder_name}/html_dirty --toc --toc-depth 4 --template=official.html --split-level=1`
		`pandoc --resource-path GENERATED_FILES/{folder_name}/temp -f docx -t chunkedhtml -L filter_1.lua -L filter_2.lua --css=css/customCSS.css --css="{folder_name}.css" -s GENERATED_FILES/{folder_name}/temp/temp.docx -o GENERATED_FILES/{folder_name}/html_dirty --toc --toc-depth 4 --template=templates/html/official.html --split-level=1`

		#### Delete temporary files that are no longer needed

		@@ -326,7 +326,7 @@ The main scripts docx_to_md.sh (Linux/Mac) and docx_to_md.bat (Windows) automate
		`--arch` optional: this parameter defines the target CPU architecture for the Docker image.
		`--rebuild` optional: this parameter forces Docker to re-build the image from the Docker file.
		`--help` optional: this parameter displays the available commands in the terminal.
		`--docker` optional: this parameter (Linux/Mac only) allows you to run the script using a Docker image
		`--no-docker` optional: this parameter (Linux/Mac only) allows you to run the script using a the local installation of python and all the dependencies


		# 5. Debug

md_to_docx_converter/cleanup_md.py

+115 −39

Original line number	Diff line number	Diff line
		@@ -12,7 +12,6 @@ NOTE_PREFIX = ">>> [!note]"
		EXAMPLE_NOTE_POSTFIX = ">>>"
		BLOCK_CODE_PREFIX = "```"
		BLOCK_CODE_POSTFIX = "```"
		TABLE_PREFIX = "::: TAL"
		IMAGE_PREFIX = "::: FL"
		LABLE_PREFIX = "::: TF"
		TABLE_HEADER_PREFIX = "::: TH"
		@@ -22,29 +21,49 @@ CUSTOM_BLOCK_POSTFIX = ":::" # This is a common postfix for custom blocks like
		PRETTIER_IGNORE_START_COMMENT = "<!-- prettier-ignore-start -->"
		PRETTIER_IGNORE_END_COMMENT = "<!-- prettier-ignore-end -->"

		TIP_NOTE_OPENING_RE = re.compile(r"^>>>\s\[!(tip\|note)\](.)$", re.IGNORECASE)

		def run_prettier_on_file(file_path):

		def parse_tip_or_note_opening(content):
		"""
		Parse tip/note opening lines supporting both forms:
		- >>> [!note]
		- >>>[!note]
		Returns (kind, suffix) where kind is 'tip' or 'note'.
		"""
		match = TIP_NOTE_OPENING_RE.match(content)
		if not match:
		return None
		return match.group(1).lower(), match.group(2).strip()


		def run_prettier_on_content(content, file_path):
		try:
		subprocess.run(
		result = subprocess.run(
		[
		"npx",
		"prettier",
		"--stdin-filepath",
		file_path,
		"--write",
		"--prose-wrap",
		"always",
		],
		input=content,
		text=True,
		capture_output=True,
		check=True,
		)
		except FileNotFoundError:
		print("Error: 'npx' not found. Make sure Node.js/npm are installed.")
		return False
		return None
		except subprocess.CalledProcessError as error:
		print(
		f"Error: Prettier failed with exit code {error.returncode} on {file_path}."
		)
		return False
		return True
		if error.stderr:
		print(error.stderr.strip())
		return None
		return result.stdout


		def remove_prettier_ignore_comments_not_from_this_script(md_lines):
		@@ -138,7 +157,6 @@ def add_prettier_ignore(md_lines):
		currently_in_table_block = True
		table_type = "ascii"
		table_beginning = i
		i += 1
		continue

		stripped = line.strip()
		@@ -149,7 +167,6 @@ def add_prettier_ignore(md_lines):
		)
		currently_in_table_block = True
		table_type = "markdown"
		i += 1
		continue

		i += 1
		@@ -165,30 +182,36 @@ def add_prettier_ignore(md_lines):

		if should_end_table:
		table_ending = i
		i = add_prettier_ignore_comment(
		lines, PRETTIER_IGNORE_END_COMMENT, i
		)
		# Simplify before adding comments (to avoid index shift)
		if table_type == "ascii":
		new_table = simplify_table(lines[table_beginning:table_ending])
		if not new_table is None:
		if new_table is not None and table_beginning is not None:
		lines[table_beginning:table_ending] = new_table
		# After substitution, recalculate where to add the end comment
		i = table_beginning + len(new_table)
		# Now add the prettier-ignore-end comment at the current position (before the external line)
		i = add_prettier_ignore_comment(
		lines, PRETTIER_IGNORE_END_COMMENT, i
		)
		table_beginning = None
		table_ending = None
		currently_in_table_block = False
		table_type = None
		i += 1
		continue
		else:
		i += 1

		if currently_in_table_block:
		table_ending = len(lines)
		add_prettier_ignore_comment(
		lines, PRETTIER_IGNORE_END_COMMENT, table_ending
		)
		if table_type == "ascii" and table_beginning is not None:
		new_table = simplify_table(lines[table_beginning:table_ending])
		if not new_table is None:
		if new_table is not None:
		lines[table_beginning:table_ending] = new_table
		table_ending = table_beginning + len(new_table)
		# Add prettier-ignore-end after the table content
		add_prettier_ignore_comment(
		lines, PRETTIER_IGNORE_END_COMMENT, table_ending
		)

		return lines

		@@ -219,37 +242,50 @@ def add_prettier_ignore(md_lines):
		continue

		if not post_fix_expected:
		if line.startswith(EXAMPLE_PREFIX) or line.startswith(NOTE_PREFIX):
		stripped_line = line.lstrip(" \t").rstrip("\r\n")
		if parse_tip_or_note_opening(stripped_line):
		i = add_prettier_ignore_comment(
		md_lines, PRETTIER_IGNORE_START_COMMENT, i
		)
		post_fix_expected = EXAMPLE_NOTE_POSTFIX
		i += 1
		continue
		elif line.startswith(BLOCK_CODE_PREFIX):
		i = add_prettier_ignore_comment(
		md_lines, PRETTIER_IGNORE_START_COMMENT, i
		)
		post_fix_expected = BLOCK_CODE_POSTFIX
		i += 1
		elif line.startswith(TABLE_PREFIX):
		continue
		elif line.startswith(GENERIC_CUSTOM_BLOCK_PREFIX):
		i = add_prettier_ignore_comment(
		md_lines, PRETTIER_IGNORE_START_COMMENT, i
		)
		post_fix_expected = CUSTOM_BLOCK_POSTFIX
		i += 1
		table_beginning = i
		# `i` points to the opening custom block line (e.g. ::: TAL).
		# Start simplification from the first table line inside the block.
		table_beginning = i + 1
		continue
		else:
		i += 1
		elif line.strip() == post_fix_expected.strip():
		# md_lines[i] is the closing line (:::, >>>, etc)
		# Simplify content BEFORE adding comments to avoid index shift
		table_ending = i
		i = add_prettier_ignore_comment(md_lines, PRETTIER_IGNORE_END_COMMENT, i)
		post_fix_expected = ""

		new_table = simplify_table(md_lines[table_beginning:table_ending])
		if not new_table is None:
		if new_table is not None and table_beginning is not None:
		md_lines[table_beginning:table_ending] = new_table
		# After substitution, closing line is now at: table_beginning + len(new_table)
		closing_line_pos = table_beginning + len(new_table)
		else:
		closing_line_pos = table_ending

		# Add prettier-ignore-end AFTER the closing line (insert at closing_line_pos + 1)
		i = add_prettier_ignore_comment(md_lines, PRETTIER_IGNORE_END_COMMENT, closing_line_pos + 1)

		table_beginning = None
		table_ending = None
		i += 1
		continue
		else:
		i += 1

		@@ -602,7 +638,7 @@ def spacing_note_example_blocks(md_lines):
		stripped = line.strip()

		# Check if the current line starts with an Example or Note prefix
		if stripped.startswith((EXAMPLE_NOTE_POSTFIX, EXAMPLE_PREFIX, NOTE_PREFIX)):
		if stripped == EXAMPLE_NOTE_POSTFIX or parse_tip_or_note_opening(stripped):
		# Add a blank line before if the previous line is not empty
		if new_lines and new_lines[-1].strip() != "":
		new_lines.append("\n")
		@@ -753,7 +789,7 @@ def try_open_tip_note(content, stack, i, file_path, in_grid_row):
		Check and handle the opening of a TIP or NOTE block (>>> [!tip] / >>> [!note]).
		Returns True if handled.
		"""
		if content.startswith(EXAMPLE_PREFIX) or content.startswith(NOTE_PREFIX):
		if parse_tip_or_note_opening(content):
		# Rule: No same-type nesting for TIP_NOTE
		if any(b["type"] == "TIP_NOTE" for b in stack):
		print(f"CRITICAL ERROR: Cannot nest '{content}' inside another TIP/NOTE block at line {i + 1} in '{file_path}'.")
		@@ -877,23 +913,55 @@ def are_tips_notes_syntax_valid(md_lines, file_path):
		- >>> [!tip] EXAMPLE 1: / >>> [!note] NOTE 1:
		"""

		tip_pattern = r"^" + re.escape(EXAMPLE_PREFIX) + r"(?: EXAMPLE(?: \d+)?:?)?$"
		note_pattern = r"^" + re.escape(NOTE_PREFIX) + r"(?: NOTE(?: \d+)?:?)?$"
		tip_suffix_pattern = r"^(?:EXAMPLE(?: \d+)?:?)?$"
		note_suffix_pattern = r"^(?:(?:NOTE\|EXAMPLE)(?: \d+)?:?)?$"

		for i, line in enumerate(md_lines):
		l_left = line.lstrip(' \t')
		l_clean = l_left.rstrip('\r\n')

		if l_clean.startswith(EXAMPLE_PREFIX):
		if not re.match(tip_pattern, l_clean):
		parsed = parse_tip_or_note_opening(l_clean)
		if not parsed:
		continue

		kind, suffix = parsed
		if kind == "tip":
		if not re.match(tip_suffix_pattern, suffix):
		print(f"CRITICAL ERROR: Invalid format after '{EXAMPLE_PREFIX}' at line {i + 1} in '{file_path}'.")
		return False
		elif l_clean.startswith(NOTE_PREFIX):
		if not re.match(note_pattern, l_clean):
		elif kind == "note":
		if not re.match(note_suffix_pattern, suffix):
		print(f"CRITICAL ERROR: Invalid format after '{NOTE_PREFIX}' at line {i + 1} in '{file_path}'.")
		return False
		return True

		def correct_notes_and_tips(md_lines):

		for i, line in enumerate(md_lines):
		line_without_newline = line.rstrip("\r\n")
		leading_ws_len = len(line_without_newline) - len(
		line_without_newline.lstrip(" \t")
		)
		leading_ws = line_without_newline[:leading_ws_len]
		content = line_without_newline[leading_ws_len:]

		parsed = parse_tip_or_note_opening(content)
		if not parsed:
		continue

		kind, suffix = parsed
		prefix = EXAMPLE_PREFIX if kind == "tip" else NOTE_PREFIX

		if suffix and not suffix.endswith(":"):
		suffix = f"{suffix}:"

		normalized = f"{leading_ws}{prefix}"
		if suffix:
		normalized += f" {suffix}"
		md_lines[i] = normalized + "\n"

		return md_lines

		def run_cleanup(file_path):
		with open(file_path, "r", encoding="utf-8") as f:
		content = f.read()
		@@ -901,17 +969,25 @@ def run_cleanup(file_path):
		content_lines = content.splitlines(keepends=True)
		if not check_blocks_closed(content_lines, file_path):
		return
		correct_notes_and_tips(content_lines)
		if not are_tips_notes_syntax_valid(content_lines, file_path):
		sys.exit(1)
		content_lines = ensure_spacing_around_lists(content_lines)
		content_lines = spacing_note_example_blocks(content_lines)
		content_lines = remove_prettier_ignore_comments_not_from_this_script(content_lines)
		content_lines = use_md_image_syntax(content_lines)
		if not are_tips_notes_syntax_valid(content_lines, file_path):
		sys.exit(1)
		content_lines = use_md_label_syntax(content_lines)
		content_lines = format_json_blocks(content_lines)
		# add prettier ignore also simplifies tables.
		content_lines = add_prettier_ignore(content_lines)
		if not run_prettier_on_file(file_path):
		content_with_ignore = ''.join(content_lines)

		prettier_output = run_prettier_on_content(content_with_ignore, file_path)
		if prettier_output is None:
		return

		content_lines = prettier_output.splitlines(keepends=True)

		content_lines = remove_prettier_ignore(content_lines)
		content_lines = collapse_empty_spaces(content_lines)