Commit ea7fe5fe authored by Marco Cavalli's avatar Marco Cavalli
Browse files

feat: add scripts for docx to md conversion

fix: styles are not lost from md to docx
parent 7338f9c2
Loading
Loading
Loading
Loading
+5 −0
Original line number Diff line number Diff line
@@ -8,10 +8,15 @@ RUN apt-get update && apt-get install -y \
    wget \
    curl \
    git \
    nodejs \
    npm \
    libreoffice \
    imagemagick \
    && rm -rf /var/lib/apt/lists/*

# Install Prettier globally
RUN npm install -g prettier

# Install Pandoc 3.7.0.2 based on architecture
RUN if [ "$TARGETARCH" = "arm64" ]; then \
        wget https://github.com/jgm/pandoc/releases/download/3.7.0.2/pandoc-3.7.0.2-1-arm64.deb \
+14 −1
Original line number Diff line number Diff line
@@ -22,6 +22,8 @@
.TAC strong {
  font-weight: 700;
  color: #02488d;
  font-size: 1.25em;
  font-family: "Maven Pro", Arial;
}

.TAL {
@@ -88,10 +90,17 @@
.ZA {
  font-family: "Maven Pro", Arial;
  font-size: 20pt;
  text-align: right;
  /* text-align: right; */
  color: #121619;
}

.ZA > p {
  display: flex;
  justify-content: center;
  align-items: center;
  gap: 1rem;
}

/*
.ZT::before {
  content: url("https://www.etsi.org/templates/etsi/img/logo.svg");
@@ -345,3 +354,7 @@ div > .TAN:last-of-type {
  margin-top: 3pt;
  margin-bottom: 3pt;
}

img[alt="ETSI Logo"] {
  width: 100%;
}
 No newline at end of file
+49 −5
Original line number Diff line number Diff line
@@ -20,6 +20,12 @@ Latest

Latest

#### [Node.js & Prettier] (https://nodejs.org/) 

```npm install -g prettier```

Latest

### 1.1.2 Optional Software

#### [WSL (Windows Subsystem for Linux)](https://learn.microsoft.com/en-us/windows/wsl/install)
@@ -277,16 +283,54 @@ The accepted parameters are the same as those explained in [section 2.2](#22--co

**For Mac/Linux:**

`bash convert.sh --parameters [--arch amd64|arm64]`
`bash convert.sh --parameters [--arch amd64|arm64] [--rebuild]`

**For Windows:**

`./convert.bat --parameters [--arch amd64|arm64]`
`./convert.bat --parameters [--arch amd64|arm64] [--rebuild]`

Where `--parameters` are the same as those explained in [section 2.2](#22--conversion), `--arch` is an optional parameter and specifies the Docker architecture (default `amd64`), and `--rebuild` forces a rebuild of the Docker image. 

# 4 Other tools

## 4.1 Cleanup

The cleanup_md.py script is a versatile tool designed to refine and validate Markdown files at any stage of their lifecycle. While the DOCX to MD conversion is standardized, running this script is highly recommended before converting Markdown to HTML to ensure the structural integrity of the source.

It achieves this through operations such as validating that code blocks are correctly matched, converting grid tables into standard Markdown, automatically indenting and validating JSON code blocks, and standardizing the document blocks. It also applies prettier for better readability.

(Linux/Mac users)
`bash cleanup_md.sh --src {path/to/folder} [--arch amd64|arm64] [--rebuild]`
`python cleanup_md.py --src {path/to/folder}`

(Windows)
`./cleanup_md.bat --src {path/to/folder} [--arch amd64|arm64] [--rebuild]`

`--src` is required: it specifies the source directory or a single file.
`--arch` optional: this parameter defines the target CPU architecture for the Docker image.
`--rebuild` optional: this parameter forces Docker to re-build the image from the Docker file.
`--help` optional: this parameter displays the available commands in the terminal.

## 4.2 DOCX to MD

The main scripts docx_to_md.sh (Linux/Mac) and docx_to_md.bat (Windows) automate the transition from Word (.docx) to Markdown (.md).

(Linux/Mac users)
`bash docx_to_md.sh --file {path/to/file.docx} [--start-from <step>] [--arch amd64|arm64] [--rebuild] [--docker]`

(Windows)
`./docx_to_md.bat --file {path/to/file.docx} [--start-from <step>] [--arch amd64|arm64] [--rebuild]`

`--file` is required: this parameter specifies the source directory of the file you wish to convert.
`--start-from` optional: this parameter allows you to start the conversion from a specific step
`--arch` optional: this parameter defines the target CPU architecture for the Docker image.
`--rebuild` optional: this parameter forces Docker to re-build the image from the Docker file.
`--help` optional: this parameter displays the available commands in the terminal.
`--docker` optional: this parameter (Linux/Mac only) allows you to run the script using a Docker image

Where `--parameters` are the same as those explained in [section 2.2](#22--conversion) and `--arch` is an optional parameter to specify the architecture of the Docker image to be built (default is `amd64`).

# 4. Debug
# 5. Debug

## 4.1 Show time reports
## 5.1 Show time reports

Using the `--time` parameter prints the duration of each operation; it is useful for debugging performance.
 No newline at end of file
+447 −0

File added.

Preview size limit exceeded, changes collapsed.

+102 −0
Original line number Diff line number Diff line
@echo off
setlocal enabledelayedexpansion

set "ARCH=amd64"
set "SRC="
set "REBUILD=false"
set "APP_DIR=%CD%"

:parse_args
if "%~1"=="" goto args_done
if "%~1"=="--arch" (
  set "ARCH=%~2"
  shift
  shift
  goto parse_args
)
if "%~1"=="--src" (
  set "SRC=%~2"
  shift
  shift
  goto parse_args
)
if "%~1"=="--rebuild" (
  set "REBUILD=true"
  shift
  goto parse_args
)
if "%~1"=="--help" (
  echo Usage: cleanup_md.bat [--arch <amd64|arm64>] [--rebuild] --src <file_or_directory>
  echo.
  echo Options:
  echo   --arch <amd64|arm64>  Specify the target architecture for the Docker image (default: amd64).
  echo   --src <file_or_directory>   Path to the source Markdown file or directory to process (required).
  echo   --rebuild              Rebuild the Docker image before running the cleanup.
  echo   --help                 Display this help message.
  exit /b 0
)
echo Unknown parameter passed: %~1
exit /b 1

:args_done
if "%ARCH%" NEQ "amd64" if "%ARCH%" NEQ "arm64" (
	echo Error: --arch must be either 'amd64' or 'arm64'
	exit /b 1
)
if "%SRC%"=="" (
  echo Error: --src argument is required.
  exit /b 1
)
rem Ensure the docker image exists; build if missing
docker image inspect md-converter >nul 2>&1
if errorlevel 1 (
	docker build --build-arg TARGETARCH=%ARCH% -t md-converter .
	if errorlevel 1 (
		echo Failed to build image md-converter.
		exit /b 1
	)
)
rem Rebuild the image if --rebuild is present
if "%REBUILD%"=="true" (
	docker build --build-arg TARGETARCH=%ARCH% -t md-converter .
	if errorlevel 1 (
		echo Failed to build image md-converter.
		exit /b 1
	)
)
rem check if SRC is a file or directory
set "SRC_TYPE="
set "SRC_DIR="
set "SRC_FILE="
if exist "%SRC%\*" (
  rem SRC is a directory
  set "SRC_TYPE=directory"
) else if exist "%SRC%" (
  rem SRC is a file
  set "SRC_TYPE=file"
  for %%I in ("%SRC%") do (
    set "SRC_DIR=%%~dpI"
    set "SRC_FILE=%%~nxI"
  )
) else (
  echo Error: The specified --src path does not exist.
  exit /b 1
)

if "%SRC_TYPE%"=="file" (
  rem Process the single file
  docker run --rm ^
    --user 1000:1000 ^
    -v "%APP_DIR%:/app:rw" ^
    -v "%SRC_DIR%:/data/sources:rw" ^
    md-converter cleanup_md.py --src "/data/sources/%SRC_FILE%"
) else if "%SRC_TYPE%"=="directory" (
  rem Process all .md files in the directory
  docker run --rm ^
    --user 1000:1000 ^
    -v "%APP_DIR%:/app:rw" ^
    -v "%SRC%:/data/sources:rw" ^
    md-converter cleanup_md.py --src "/data/sources"
)

endlocal
 No newline at end of file
Loading