ETSI's Bug Tracker - Part 01: TTCN-3 Core Language
View Issue Details
0006789Part 01: TTCN-3 Core LanguageNew Featurepublic23-07-2014 10:5106-01-2015 18:26
Gyorgy Rethy 
Gyorgy Rethy 
normalminorhave not tried
closedfixed 
v4.6.1 (published 2014-06) 
v4.7.1 (published 2015-06)v4.7.1 (published 2015-06) 
6.1.1 e)
L.M.Ericsson
0006789: Allow non-ISO646 (UTF-8 compatible) characters in universal charstrings
Nowadays more and more protocols are textual or carrying textual content and do allow non-ASCII characters. For example both XML and JSON allow non-ISO646 characters.

As users need to work more and more with unicode characters, it raises the requirement to be able to enter non-ASCII characters in unicode charstring values directly, instead of using the cumbersome quadruple notation. Using the new, U-coded unicode character reference (see CR6727), makes the syntax better, but doesn't solve the problem completely. The two syntaxes would complement each other: not all unicode characters are supported by UTF-8, not all text editors can represent all UTF-8 graphical characters and some users may want to stick to using ISO646 characters e.g. for backward compatibility reasons.

As TTCN-3 modules shall be stored in UTF-8 (see clause 8), this would not allow misreading the values (to 2, 3 or 4 separate characters), when transferring the code between tools (though an elder tool may interpret the code as being erroneous, if version is not identified in the module header).
No tags attached.
docx draft-res-6789-v1.docx (14,935) 04-11-2014 13:58
http://oldforge.etsi.org/mantis/file_download.php?file_id=3158&type=bug
docx draft-res-6789-v2.docx (25,191) 06-11-2014 15:09
http://oldforge.etsi.org/mantis/file_download.php?file_id=3178&type=bug
Issue History
23-07-2014 10:51Gyorgy RethyNew Issue
06-10-2014 10:55Gyorgy RethyNote Added: 0012224
06-10-2014 10:55Gyorgy RethyTarget Version => v4.7.1 (published 2015-06)
09-10-2014 13:58Jacob Wieland - SpirentNote Added: 0012323
03-11-2014 16:33Gyorgy RethyAssigned To => Axel Rennoch
03-11-2014 16:33Gyorgy RethyStatusnew => assigned
04-11-2014 13:58Axel RennochFile Added: draft-res-6789-v1.docx
04-11-2014 14:00Axel RennochNote Added: 0012401
06-11-2014 08:58Gyorgy RethyNote Added: 0012446
06-11-2014 15:09Axel RennochFile Added: draft-res-6789-v2.docx
06-11-2014 15:11Axel RennochNote Added: 0012472
06-11-2014 15:12Axel RennochNote Added: 0012473
06-11-2014 15:12Axel RennochAssigned ToAxel Rennoch => Gyorgy Rethy
06-11-2014 15:12Axel RennochStatusassigned => acknowledged
07-11-2014 11:50Gyorgy RethyStatusacknowledged => confirmed
07-11-2014 13:37Jacob Wieland - SpirentNote Added: 0012498
06-01-2015 18:23Gyorgy RethyStatusconfirmed => resolved
06-01-2015 18:23Gyorgy RethyResolutionopen => fixed
06-01-2015 18:26Gyorgy RethyNote Added: 0012649
06-01-2015 18:26Gyorgy RethyStatusresolved => closed
06-01-2015 18:26Gyorgy RethyFixed in Version => v4.7.1 (published 2015-06)

Notes
(0012224)
Gyorgy Rethy   
06-10-2014 10:55   
*For STF discussion*
(0012323)
Jacob Wieland - Spirent   
09-10-2014 13:58   
As the only escape-character in TTCN-3 charstring literals is the quote-symbol, I guess this would have to be used.

"aaa"0706"bbb", for instance, could then be the same as "aaa" & <unicode of 0706> & "bbb".

As far as I can see, this does not introduce any backward incompatiblity as there is at the moment no grammar rule which allows a number directly behind a charstring literal.
(0012401)
Axel Rennoch   
04-11-2014 14:00   
Based on Jacob's idea we may allow different representations, please see examples in the attachment, since characters do not appear in this box. ;-)
(0012446)
Gyorgy Rethy   
06-11-2014 08:58   
We shall not extend the scope of the CR. If more/other feature is needed, another CR shall be submitted.

The standard specifies the TTCN-3 modules to be saved in UTF-8, TTCN-3 editors should support UTF-8 characters (at least a reasoable subset), because they are allowed in comments. So, in principle no technical difficulties to allow their direct use in universal charstring values as well.

The additional syntax brings in new problems:
- in case of "aaa"0706"bbb", how to know what the user wanted to write? it may be a simple typing error and he/she meant "aaa""0706""bbb"! For this reason I strongly oppose this syntax, i.e. to extend the smantics associated with the "
character.

Anyway, UTF-8 today covers wast majority of really used characters, therefore the char(U4E2D, U56FD) syntax will become rarely used or used due to local style guides.
(0012472)
Axel Rennoch   
06-11-2014 15:11   
Following the discussion only a note has been added in the attached file to 6.1.1 e).
(0012473)
Axel Rennoch   
06-11-2014 15:12   
Please advice if the new note is sufficient.
(0012498)
Jacob Wieland - Spirent   
07-11-2014 13:37   
no problem with me. I just pointed out that a direct inclusion into the charstring literals needs to use the " character as that is the only escape character we have (unfortunately). In principle, I agree with Gyorgys reasoning that mostly, their will be no need for the char-syntax to be used.

The only exception I see is standardization bodies which want to publish their testsuites in a non-UTF8 based format.
(0012649)
Gyorgy Rethy   
06-01-2015 18:26   
Added to draft V4.6.3