Commit 1a2eeea1 authored by Jerediah Fevold's avatar Jerediah Fevold
Browse files

First commit of post-processing scripts and documentation to convert the markdown into HTML and PDF

parent 858476cd
Loading
Loading
Loading
Loading
+1487 −0

File added.

Preview size limit exceeded, changes collapsed.

images/image1.png

0 → 100644
+5.7 KiB
Loading image diff...

images/image2.png

0 → 100644
+6.22 KiB
Loading image diff...

scripts/README.md

0 → 100644
+306 −0
Original line number Original line Diff line number Diff line
# Overview
This readme is intended to walk new users throught the steps to clone the git repository containing the example specification converted to Markdown, edit the specification, and prodcue HTML and PDF versions of the specification from the Markdown source.

## Markdown Basics
Markdown is the file format that is being proposed to write the next generation of 3GPP specifications. Markdown is a language used to present information and not much else. Therefore, the set of features is limited, and through the process we develop, we will add details in post-processing. This allows delegates to focus on the technical aspects while still enabling a professional-grade specification to be produced in the end for public consumption.

### Basic Elements
Markdown supports a variety of basic organizational elements.

The following basic elements are supported.
* Headings
* Paragraphs
* Line Breaks
* Text Bolding
* Text Italicization
* Text Bolding and Italicization
* Block quotes
* Ordered Lists
* Unordered Lists
* Code Blocks
* Images
* Horizontal Rules
* Links
* References
* Inline HTML

More details can be found at [Markdown Guide](https://www.markdownguide.org/basic-syntax/).

### Special Characters
Special characters, used in Markdown syntax, must be preceded by a \\ character to display properly when part of the text. The only exception is for raw blocks of text and code blocks, which start and end with \`\`\` and \~\~\~, respectively.

Details on special characters can be found at [Markdown Guide](https://www.markdownguide.org/basic-syntax/#escaping-characters).

# Converting Markdown to HTML and PDF

## Prerequisites
For the purpose of this demo repository, please download the exact versions of the prerequisite tools to limit the scope of bug fixes.

1. Git
 - Instructions will be provided in this first version of the demo using the Git console.
 - Download and install Git [here](https://git-scm.com/downloads/win).
2. Generate an SSH public key if you don't have one.
 - Follow the instructions [here](https://git-scm.com/book/ms/v2/Git-on-the-Server-Generating-Your-SSH-Public-Key).
3. Add your public key to ETSI Forge.
 - Navigate to the page [here](https://forge.etsi.org/rep/-/user_settings/ssh_keys)
 - Click "Add new key"
 - Print the public key to your screen by launching Git BASH and entering the following command.
 > cat \~/.ssh/id_rsa.pub
 - Copy the public key into the ETSI Forge interface and confirm.
4. Clone the git repository
 - Open Git Bash from the Windows Start Menu.
 - Navigate to the directory to which you would like to download the repository.
 - Enter the command, replacing ***username*** with your ETSI Forge username.
 > git clone ssh://git@forge.etsi.org:29419/***username***/markdown-specification.git
 
5. Pandoc
 - Pandoc flexibly converts to and from more than 40 file formats by converting to and from a Pandoc intermediate format.
 - [General Information](https://pandoc.org/)
 - [Documentation](https://pandoc.org/MANUAL.html)
 - Download Pandoc v3.4 [link](https://github.com/jgm/pandoc/releases/download/3.4/pandoc-3.4-windows-x86_64.zip)
 - Unzip Pandoc v3.4 and copy pandoc.exe to the scripts directory in the repository.
6. WeasyPrint
 - WeasyPrint converts HTML files into professional style PDF documents.
 - [General Information](https://weasyprint.org/)
 - [Documentation](https://doc.courtbouillon.org/weasyprint)
 - WeasyPrint v62.3 [link](https://github.com/Kozea/WeasyPrint/releases/download/v62.3/weasyprint-windows.zip)
 - Unzip Weasyprint v62.3 and copy weasyprint.exe to the scripts directory in the repository.
7. MSC Generator
- MSC Generator converts MSC signalling charts and block diagrams to images.
- MSC Generator v7.3 [link](https://gitlab.com/msc-generator/msc-generator/-/package_files/40764506/download)
- Add MSC generator to your user path
  - Open the Windows Start Menu
  - Type env
  - Click "Edit environment variables for your account"
  - In the window that appears, click "Path" under "User variables for..."
  - Click "Edit..."
  - Click "New"
  - Enter the full path to your MSC Generator installation, which could be C:\Program Files (x86)\Msc-generator, for example.
  - Click "OK"
  - Click "OK"
- **Note:** The change will not apply to currently open Windows Command Prompts. For the change to take effect, open a new Windows Command Prompt.
8. Create a directory called output in the main repository directory, markdown-specification.

After following these steps, you should be left with at least the following directory structure. For brevity, only crucial files are listed.

**markdown-specification**
- 38331-i00.md
- README.md
- *examples*
  - 38331-i00_cover_toc.html
- *images*
  - image1.png
  - image2.png
- *output*
- *scripts*
  - asn_render.lua
  - config.bat
  - html_to_pdf.bat
  - indent_procedural.lua
  - md_to_html.bat
  - msc_to_img.lua
  - pandoc.exe
  - print_style_px.css
  - README.md
  - weasyprint.exe

## Markdown to HTML
The steps in this version of the instructions will require the use of the Windows Command Prompt, which can be accessed by clicking "Command Prompt" in the start menu in the "Windows System" folder. The process will eventually be executed by double clicking a script file, such as a "batch" or "BAT" file, which will automatically run through the steps according to a configuration.

1. Open a Windows Command Prompt
- In the Windows Start Menu, simply type "command", and it should be the first result.

2. Navigate to the directory containing the conversion scripts, replacing the example directory below with the one used in Step 4 of the Prerequisites.
> cd C:\\Users\\\<username\>\\\<git_directory>\markdown-specification\scripts

3. Edit config.bat to match your environment.
- OUTPUT_DIR - This can be the base repository directory or a new output directory can be created in the repository folder.
- INPUT_DIR - This shoudl be the base repostitory directory, the path ending in markdown-specification.
- Nothing else needs to be changed for now.

4. Enter the command. Running the command in the command prompt enables the viewing of any errors or warnings which may arise during the process.
> md_to_html.bat

- While running the script, there will be a few prompts asking to delete a directory. Verify that the directories are listed as expected. If they are, enter a capital "Y" and press enter.

5. Copy the cover page and table of contents to the HTML file.
- The process of inserting the cover page and generating and inserting the table of contents are manual processes, but will be automated in the future. Once the HTML version of the specification is generated through the execution of the steps explained previously, the example cover page and table of contents can be simply copied and pasted to the top of the HTML file. Do this prior to converting the HTML to PDF.
- **Note:** The cover page and table of contents is only provided for the main branch, before any changes have been applied.
- Navigate to the examples directory.
- Open 38331-i00_cover_toc.html
- Select all the contents and copy them to the clip board (e.g., ctrl+c).
- Navigate to the output directory.
- Open 38331-i00.html
- Delete the text between the first line and the line preceding "\<h1 id="foreword"\>Foreword\</h1\>".
  - Note that in Notepad++, this can be accomplished by clicking before the first line and holding ctrl+shift+B, clicking at the end of the line preceding the one described in the bullet above and holding ctrl+shift+B. Now the text is selected, and you can press the "del" key to delete the text, and then copy in the content of cover_toc_38331-i00.html.

## HTML to PDF
The last step is to convert the HTML to a PDF. To do so, execute the following command while in the scripts directory as in the previous steps.
> html_to_pdf.bat

# Rendering Steps
The following information is not required to run the tools and explains how the Markdown file is post-processed to produce the publishable, human-friendly versions of the specification.

The rendering steps are executed directly in Pandoc with scripts written in the [Lua](https://www.lua.org/docs.html) language. The benefit of using this method others which Pandoc supports is that Pandoc has a built-in Lua interpreter, meaning that the scripts are already in the context of the document in Pandoc's intermediate format, reducing the number of lines of code required substantially compared to other methods.

Check scripts/md_to_html.bat for an example of how Lua filters are provided to the Pandoc tool with the -L option.

The main action in the Lua scripts is at the bottom of each file, with supporting functions defined in the first part of the script.

More details on the integration between Pandoc and Lua can be found [here](https://pandoc.org/lua-filters.html).

## ASN Rendering
To colorize and indent the ASN.1 source in the specification, post-processing is required. The post-processing script can be found in scripts/asn1_render.lua.

In the Markdown source, ASN1 blocks are wrapped in \~\~\~ asn1 \<source\> \~\~\~, which are identified by the last function in the script.
> function CodeBlock(block)

The function is called on every *CodeBlock*, identified by an opening and closing series of \~\~\~. Then the script checks if the name of the *CodeBlock* matches "asn1".

The function makes four modifications to the ASN1 blocks.
1. Colorize the ASN.1 keywords identified in the variable "keywords" defined at the top of the script.
  - Colorization is implemented with an HTML span with a style which applies a purple color to the font.  
2. Colorize the comments, which always start with --.
  - Colorization is implemented with an HTML span with a style which applies a grey color to the font. 
3. Indent the ASN.1 considering the brackets.
  - Indentation is implemented with an HTML div with a style corresponding to the indentation level, where the indentation level is provided as an integer after indent-. An example is provided below. Note that the formatting provided here is only for the purpose of legibility. It is not intended the we ever need to read the HTML file.
4. Apply the grey background color to the entire ASN.1 block.
  - A div with a style containing a grey background is applied by wrapping the entire ASN.1 block.  
  
**Markdown Input**
  ```
  ~~~ asn1
  -- ASN1START
  -- TAG-UEASSISTANCEINFORMATION-START

  UEAssistanceInformation ::= SEQUENCE {
  criticalExtensions CHOICE {
  ueAssistanceInformation UEAssistanceInformation-IEs,
  criticalExtensionsFuture SEQUENCE {}
  }
  }
  ```

**HTML Output**
  ```
  <div class="asn1">
    <div class="asn1-indent-0">
      <span class="asn1-comment">-- ASN1START</span>
    </div>
    <div class="asn1-indent-0">
      <span class="asn1-comment">-- TAG-UEASSISTANCEINFORMATION-START</span>
    </div>
    <div class="asn1-indent-0">
        UEAssistanceInformation ::= <span class="asn1-keyword">SEQUENCE</span> {
    </div>
    <div class="asn1-indent-1">
        criticalExtensions <span class="asn1-keyword">CHOICE</span> {
    </div>
    <div class="asn1-indent-2">
        ueAssistanceInformation UEAssistanceInformation-IEs,
    </div>
    <div class="asn1-indent-2">
        criticalExtensionsFuture <span class="asn1-keyword">SEQUENCE</span> {}
    </div>
    <div class="asn1-indent-1">
        }
    </div>
    <div class="asn1-indent-0">
        }
    </div> 
  </div>
  ```

## Equation Rendering
Equations in Markdown can be provided in the Latex equation format. A short "cheat sheet" is provided [here](https://tug.ctan.org/info/undergradmath/undergradmath.pdf). Equations are surrounded by `$ $` for inline equations and `$$ $$` for equations residing in a block apart from text. Basic examples include the following.

| Operator | Syntax |
|----|----|
| Greater than (>) | > |
| Greater than or equal to (≥) | \gte |
| Less than (<) | < |
| Less than or equal to (≤) | \lte ||
| Superscript | ^{<text>} |
| Subscript | _{<text>} |

Equation rendering is still under development, and there are various options. Simple equations can be rendered directly in HTML. For example, subscripts and superscripts, greater and less than / equal to, are all supported directly in HTML. More complicated symbols, such as summations and integrals may need to be converted into images.

## Procedural Indentation
Markdown lacks support for leading whitespace such as spaces and tabs. Therefore, procedural text indentation, even if indentation is provided for readability in the Markdown file, is applied through a script during the conversion process. 

To indent the procedural text in the specification, post-processing is required. The post-processing script can be found in scripts/indent_procedural.lua.

Any line beginning with a digit followed by \\\>, e.g., `2\>`, will be processed as a procedural bullet. The \> symbol isn't used directly since \> is a special character in Markdown. As in the explanation of the ASN.1 rendering, the indentation is implemented using HTML span with a style containing the indentation margin. An example is shown below.

**Markdown Input**
  ```
  1\> for each stored version of a SIB:

  2\> if the *areaScope* is associated and its value for the stored version of the SIB is the same as the value received in the *si-SchedulingInfo* for that SIB from the serving cell:

  3\> if the UE is NPN capable and the cell is an NPN-only cell:

  4\> if the first NPN identity included in the *NPN-IdentityInfoList*, the *systemInformationAreaID* and the v*alueTag* that are included in the *si-SchedulingInfo* for the SIB received from the serving cell are identical to the NPN identity, the *systemInformationAreaID* and the *valueTag* associated with the stored version of that SIB:

  5\> consider the stored SIB as valid for the cell;
  ```

**HTML Output**
  ```
  <p><div class="b1">1&gt; for each stored version of a SIB:</div></p>
  <p><div class="b2">2&gt; if the <em>areaScope</em> is associated and its value for the stored version of the SIB is the same as the value received in the <em>si-SchedulingInfo</em> for that SIB from the serving cell:</div></p>
  <p><div class="b3">3&gt; if the UE is NPN capable and the cell is an NPN-only cell:</div></p>
  <p><div class="b4">4&gt; if the first NPN identity included in the <em>NPN-IdentityInfoList</em>, the <em>systemInformationAreaID</em> and the v<em>alueTag</em> that are included in the <em>si-SchedulingInfo</em> for the SIB received from the serving cell are identical to the NPN identity, the <em>systemInformationAreaID</em> and the <em>valueTag</em> associated with the stored version of that SIB:</div></p>
  <p><div class="b5">5&gt; consider the stored SIB as valid for the cell;</div></p>
  ``` 

## MSC to Image Conversion
Our specifications contain many call flows and block diagrams which are written in the MSC "signalling" and "block" file formats. Normally, the diagrams are created in the MSC Generator graphical user interface (GUI) and copied into Microsoft Word, which can then be edited in the future by double clicking the diagram. Here, we insert the signalling and block diagrams as plaintext into the MD file. The Lua script in `scripts/msc_to_img.lua` inputs the signalling or block diagram directly into the `mscgen` command line tool, retrieves the image binary, and automatically inserts it back into the document. Additionally, the images produces are stored in a `media` directory for use by the HTML document.

Example MSC Signalling and Block diagrams are provided below.

**Markdown Input**

*Signalling Diagram*

The following signalling diagram is Figure 5.2.2.1-1.

``` 
~~~ mscgen
hscale="auto";

U: UE;
N: Network;

|||;
U<-N:MIB [au];
U<N:SIB1 [au];
U>N:SystemInformationRequest [au];
U<N:SystemInformation messages [au];
U<N;
|||;
~~~
```

*Block Diagram*

The following block diagram is Figure 4.2.1-1.

```
~~~ mscgenblock
col {
box A: NR RRC_CONNECTED [line.corner=round, width=300];
space 100;
box B: NR RRC_INACTIVE [line.corner=round, mleft=A@mleft];
space 100;
box C: NR RRC_IDLE [line.corner=round, mleft=A@mleft, width=A];
};

A<->B [routing=vertical, text.ident=left, label.align=middle, label.pos=left]: Resume / Release
 with Suspend;
B->C [routing=vertical, text.ident=left, label.pos=right]: Release;

(A@80%, A@bottom)<->(C@80%, C@top) [label.pos=right, label.align=middle, text.ident=left]: Establish /
  Release;
~~~
```
 No newline at end of file

scripts/asn_render.lua

0 → 100644
+139 −0
Original line number Original line Diff line number Diff line
-- © 2025 Nokia
-- Licensed under the BSD 3-Clause License
-- SPDX-License-Identifier: BSD-3-Clause

-- This pandoc lua script colorizes and indents ASN.1 inline code with HTML.

local keywords = {
    "BIT STRING",
	"BOOLEAN",
	"CHOICE",
	"ENUMERATED",
	"INTEGER",
	"OCTET STRING",
	"OF",
	"OPTIONAL", 
	"SEQUENCE",
	"SIZE", 
}

local function replaceInsideString(strIn, idxF, idxL, strRepl)

	-- strIn: original string
	-- idxF: first index of the text to replace
	-- idxL: last index of the text to replace
	-- strRepl: replacement text

	local firstPart = strIn:sub(1, idxF -1)
	local lastPart = strIn:sub(idxL + 1, string.len(strIn))
	return firstPart..strRepl..lastPart

end

local function colorizeString(str, colorClass)
	colorizedString = string.format('<span class="%s">%s</span>', colorClass, str)
	return colorizedString
end

local function colorizeKeywords(str)

	local colorClass = 'asn1-keyword'

	local processingString = str
	local processingIndex = 1

	for _, keyword in pairs(keywords) do
		processingIndex = 1
		while true do
			firstIndex, lastIndex = processingString:find(keyword, processingIndex)
			if not firstIndex then
				break
			end
			
			colorizedString = colorizeString(keyword, colorClass)
			processingString = replaceInsideString(processingString, firstIndex, lastIndex, colorizedString)
			processingIndex = firstIndex + string.len(colorizedString)
		end
	end

	return processingString
end

local function colorizeComments(str)

	local colorClass = 'asn1-comment'
	local commentPattern = "%-%-[^\r\n]+"
	
	local processingString = str
	local processingIndex = 1
	
	while true do
		firstIndex, lastIndex = processingString:find(commentPattern, processingIndex)
		if not firstIndex then
			break
		end
		
		colorizedString = colorizeString(processingString:sub(firstIndex, lastIndex), colorClass)
		processingString = replaceInsideString(processingString, firstIndex, lastIndex, colorizedString)
		processingIndex = firstIndex + string.len(colorizedString)
	end

	return processingString
end

local function indentASN1Source(str)

	local indentationLevel = 0
	local processingString = ""
	local openIndentDivTemplate = '<div class="asn1-indent-%d">'
	local closeDiv = '</div>'
	
	-- match all characters except newlines.
	for line in string.gmatch(str, "([^\n\r]+)") do
		--local has_open_brace = line:find("{")
		--local has_close_brace = line:find("}")
		
		-- usually braces that would require indenting the next line
		-- are at the very end of the line, before the new line character.
		local has_open_brace = (line:sub(-1) == "{")
		
		-- usually braces that would end an indentation level are
		-- at the very beginning of the line.
		local has_close_brace = (line:sub(1,1) == "}")
		
		if has_close_brace and not has_open_brace then
			-- Temporary fix in case there isn't a new line after a braces
			-- that should be opening a new level of hierarchy.
			indentationLevel = math.max(0, indentationLevel -1)
		end
		
		openIndentDiv = string.format(openIndentDivTemplate, indentationLevel)
		processingString = processingString..openIndentDiv..line..closeDiv.."\n"
		
		if has_open_brace and not has_close_brace then
			indentationLevel = indentationLevel + 1
		end
	end
	
	return processingString

end

local function wrapDivASN1Block(asn1Text)

	return '<div class="asn1">'..asn1Text..'</div>'

end



function CodeBlock(block)
	if block.classes[1] == "asn1" then
		asn1Text = colorizeKeywords(block.text)
		asn1Text = colorizeComments(asn1Text)
		asn1Text = indentASN1Source(asn1Text)
		asn1Text = wrapDivASN1Block(asn1Text)
		return pandoc.Para(pandoc.RawInline('html', asn1Text))
	end
	return block
end
Loading