Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trang conversion from .rng to .rnc introduces extraneous text on every line ending within dita:moduleDesc and other brackets #285

Open
Ozc-Y opened this issue Feb 12, 2025 · 0 comments

Comments

@Ozc-Y
Copy link

Ozc-Y commented Feb 12, 2025

Hello. I am trying to familiarize myself with DITA XML 1.3 and the DITA Open Toolkit (DITA-OT) without using proprietary XML editors.

As DITA states that its RELAX NG XML schemas are normative, I am trying to convert DITA-OT's document-type shells from .rng to .rnc files (XML syntax to compact syntax) using trang.

I am able to run the conversion from the CLI; however, the conversion introduces additional text at the end of every line ending within the dita:moduleDesc section:

dita:moduleDesc [
  "\x{a}" ~
  "    "
  dita:moduleTitle [ "DITA Concept Shell" ]
  "\x{a}" ~
  "    "
  dita:headerComment [
    xml:space = "preserve"
    "\x{a}" ~
    "=============================================================\x{a}" ~
    "                   HEADER                                    \x{a}" ~
    "=============================================================\x{a}" ~
    "Darwin Information Typing Architecture (DITA) Version 1.3 Plus Errata 02\x{a}" ~
    "OASIS Standard\x{a}" ~
    "16 January 2018 \x{a}" ~
    "Copyright (c) OASIS Open 2018. All rights reserved. \x{a}" ~
    "Source: http://docs.oasis-open.org/dita/dita/v1.3/errata02/csprd01/complete/part0-overview/dita-v1.3-errata02-csprd01-part0-overview-complete.html\x{a}" ~
    "\x{a}" ~
    "============================================================\x{a}" ~
    " MODULE:    DITA Concept Shell                                 \x{a}" ~
    " VERSION:   1.3                                              \x{a}" ~
    " DATE:      March 2014                                    \x{a}" ~
    "                                                             \x{a}" ~
    "=============================================================\x{a}" ~
    "\x{a}" ~
    "=============================================================\x{a}" ~
    "                   PUBLIC DOCUMENT TYPE DEFINITION           \x{a}" ~
    "                   TYPICAL INVOCATION                        \x{a}" ~
    "                                                             \x{a}" ~
    " Refer to this file by the following public identifier or an \x{a}" ~
    "      appropriate system identifier \x{a}" ~
    "      \x{a}" ~
    'PUBLIC "-//OASIS//DTD DITA Concept//EN"\x{a}' ~
    "\x{a}" ~
    "The public ID above refers to the latest version of this DTD.\x{a}" ~
    "     To refer to this specific version, you may use this value:\x{a}" ~
    "\x{a}" ~
    'PUBLIC "-//OASIS//DTD DITA 1.3 Concept//EN"                       \x{a}' ~
    "\x{a}" ~
    "=============================================================\x{a}" ~
    "SYSTEM:     Darwin Information Typing Architecture (DITA)    \x{a}" ~
    "                                                             \x{a}" ~
    "PURPOSE:    DTD to describe DITA Concepts                    \x{a}" ~
    "                                                             \x{a}" ~
    "ORIGINAL CREATION DATE:                                      \x{a}" ~
    "            March 2001                                       \x{a}" ~
    "                                                             \x{a}" ~
    "            (C) Copyright OASIS Open 2005, 2014.             \x{a}" ~
    "            (C) Copyright IBM Corporation 2001, 2004.        \x{a}" ~
    "            All Rights Reserved.                             \x{a}" ~
    "                                                             \x{a}" ~
    " UPDATES:                                                    \x{a}" ~
    "   2006.06.07 RDA: Added indexing domain                     \x{a}" ~
    "   2006.06.21 RDA: Added props attribute extensions          \x{a}" ~
    "   2008.02.12 RDA: Modify imbeds to use specific 1.2 version \x{a}" ~
    "   2008.04.15 RDA: Added hazard domain                       \x{a}" ~
    "   2014.03.12 RDA: Updated for DITA 1.3. Implemented as \x{a}" ~
    "                   RELAX NG\x{a}" ~
    "=============================================================\x{a}" ~
    "  "
  ]
  "\x{a}" ~
  "    "
  dita:moduleMetadata [
    "\x{a}" ~
    "      "
    dita:moduleType [ "topicshell" ]
    "\x{a}" ~
    "      "
    dita:moduleShortName [ "concept" ]
    "\x{a}" ~
    "      "
    dita:shellPublicIds [
      "\x{a}" ~
      "        "
      dita:dtdShell [
        "-//OASIS//DTD DITA"
        dita:var [ presep = " " name = "ditaver" ]
        " Concept//EN"
      ]
      "\x{a}" ~
      "        "
      dita:rncShell [
        "urn:oasis:names:tc:dita:rnc:concept.rnc"
        dita:var [ presep = ":" name = "ditaver" ]
      ]
      "\x{a}" ~
      "        "
      dita:rngShell [
        "urn:oasis:names:tc:dita:rng:concept.rng"
        dita:var [ presep = ":" name = "ditaver" ]
      ]
      "\x{a}" ~
      "        "
      dita:xsdShell [
        "urn:oasis:names:tc:dita:xsd:concept.xsd"
        dita:var [ presep = ":" name = "ditaver" ]
      ]
      "\x{a}" ~
      "      "
    ]
    "\x{a}" ~
    "    "
  ]
  "\x{a}" ~
  "  "
]

Specifically, the extraneous text I am referring to is this:

"\x{a}" ~
"      "

Interestingly, the number of spaces on the second line seems to be correlated with indentation/nesting: When this text appears right after an additional level of nesting, the number of quoted spaces increases by 2. When this text (snippet? string? artifact?) appears right before a ] that ends a level of nesting, the number of quoted spaces decreases by 2.

For example, the number of quoted spaces increases from 4 to 6 after line 69 ( dita:moduleMetadata [).
On the line before this bracket is closed with ] (line 109), the number of quoted spaces decreases from 6, back down to 4.

The issue is seemingly replicated within a:documentation blocks,

trang pulls in a large number of other .rnc files during this conversion, and these additional .rnc files seem to exhibit the same problem.

I would like to know if there is a way to sidestep this problem. Simplifying the .rng file with jing first and then converting the resulting simplified RELAX NG XML syntax to .rnc seems to work; however, I seem to have a separate problem with that process and thus would prefer to know if I can directly convert DITA-OT document-type shells from .rng to .rnc without problems.

In case it matters, I am using Windows (10) PowerShell via Visual Studio Code's built-in terminal, and java -version (from the same terminal) has the following output:

openjdk version "23.0.2" 2025-01-21
OpenJDK Runtime Environment (build 23.0.2+7-58)
OpenJDK 64-Bit Server VM (build 23.0.2+7-58, mixed mode, sharing)

.zip containing the .rng and .rnc files relevant to this issue:

concept-rnc-and-rng.zip

Please let me know if I can supply any further information.

Edit:

It is likely that the given .rng file cannot be converted alone. If you would like to fully replicate my conversion environment, please download DITA-OT 4.2.4, extract the download, and navigate to .../dita-ot-4.2.4/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng (substitute \ for / if on Windows) before running trang on concept.rng within this directory.

Edit 2:

The problem is not relegated to the dita:moduleDesc brackets. In other files, they also occur elsewhere in a pattern I am not sure I can understand. Comments preceded by ## seem intact, but I suspect a:documentation might be related:

  • Line 29 of svg-basic-clip.rnc: a:documentation [ "\x{a}" ~ " SVG.Clip.attrib\x{a}" ~ " " ]

In the interest of clarity, I will provide the following .zip containing all of my generated .rnc files (disregard concept_simplified files) and some of the corresponding .rng files that were provided within the DITA-OT 4.2.4 distribution. They might not work without the full DITA-OT archive.

rng-files-with-rnc.zip

@Ozc-Y Ozc-Y changed the title Trang conversion from .rng to .rnc introduces extraneous text on every line ending within dita:moduleDesc Trang conversion from .rng to .rnc introduces extraneous text on every line ending within dita:moduleDesc and a:documentation brackets Feb 12, 2025
@Ozc-Y Ozc-Y changed the title Trang conversion from .rng to .rnc introduces extraneous text on every line ending within dita:moduleDesc and a:documentation brackets Trang conversion from .rng to .rnc introduces extraneous text on every line ending within dita:moduleDesc and other brackets Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant