Commit 2f1cdcdf authored by Nick Kew's avatar Nick Kew
Browse files

mod_proxy_html/mod_xml2enc code drop

Part 1: mod_xml2enc code + docs with Apache license,
coding and documentation standards, less some rough edges.


git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1187767 13f79535-47bb-0310-9956-ffa450edef68
parent 19dde943
Loading
Loading
Loading
Loading
+172 −0
Original line number Diff line number Diff line
<?xml version="1.0"?>
<!DOCTYPE modulesynopsis SYSTEM "../style/modulesynopsis.dtd">
<?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?>

<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<modulesynopsis metafile="mod_xml2enc.xml.meta">

<name>mod_xml2enc</name>
<description>Enhanced charset/internationalisation support for libxml2-based
filter modules</description>
<status>Base</status>
<sourcefile>mod_xml2enc.c</sourcefile>
<identifier>xml2enc_module</identifier>
<compatibility>Version 2.4 and later.  Available as a third-party module
for 2.2.x versions</compatibility>

<summary>
    <p>This module provides enhanced internationalisation support for
    markup-aware filter modules such as <module>mod_proxy_html</module>.
    It can automatically detect the encoding of input data and ensure
    they are correctly processed by the <a href="http://xmlsoft.org/"
    >libxml2</a> parser, including converting to Unicode (UTF-8) where
    necessary.  It can also convert data to an encoding of choice
    after markup processing, and will ensure the correct <var>charset</var>
    value is set in the HTTP <var>Content-Type</var> header.</p>
</summary>

<section id="usage"><title>Usage</title>
    <p>There are two usage scenarios: with modules programmed to work
    with mod_xml2enc, and with those that are not aware of it:</p>
    <dl>
    <dt>Filter modules enabled for mod_xml2enc</dt><dd>
    <p>Modules such as <module>mod_proxy_html</module> version 3.1
    and up use the <code>xml2enc_charset</code> optional function to retrieve
    the charset argument to pass to the libxml2 parser, and may use the
    <code>xml2enc_filter</code> optional function to postprocess to another
    encoding.  Using mod_xml2enc with an enabled module, no configuration
    is necessary: the other module will configure mod_xml2enc for you
    (though you may still want to customise it using the configuration
    directives below).</p>
    </dd>
    <dt>Non-enabled modules</dt><dd>
    <p>To use it with a libxml2-based module that isn't explicitly enabled for
    mod_xml2enc, you will have to configure the filter chain yourself.
    So to use it with a filter foo provided by a module mod_foo to
    improve the latter's i18n support with HTML and XML, you could use</p>
    <pre><code>
    FilterProvider iconv    xml2enc Content-Type $text/html
    FilterProvider iconv    xml2enc Content-Type $xml
    FilterProvider markup   foo Content-Type $text/html
    FilterProvider markup   foo Content-Type $xml
    FilterChain     iconv markup
    </code></pre>
    <p>mod_foo will now support any character set supported by either
    (or both) of libxml2 or apr_xlate/iconv.</p>
    </dd></dl>
</section>

<section id="api"><title>Programming API</title>
    <p>Programmers writing libxml2-based filter modules are encouraged to
    enable them for mod_xml2enc, to provide strong i18n support for your
    users without reinventing the wheel.  The programming API is exposed in
    <var>mod_xml2enc.h</var>, and a usage example is
    <module>mod_proxy_html</module>.</p>
</section>

<section id="sniffing"><title>Detecting an Encoding</title>
    <p>Unlike <module>mod_charset_lite</module>, mod_xml2enc is designed
    to work with data whose encoding cannot be known in advance and thus
    configured.  It therefore uses 'sniffing' techniques to detect the
    encoding of HTTP data as follows:</p>
    <ol>
        <li>If the HTTP <var>Content-Type</var> header includes a
        <var>charset</var> parameter, that is used.</li>
        <li>If the data start with an XML Byte Order Mark (BOM) or an
        XML encoding declaration, that is used.</li>
        <li>If an encoding is declared in an HTML <code>&lt;META&gt;</code>
        element, that is used.</li>
        <li>If none of the above match, the default value set by
        <directive>xml2EncDefault</directive> is used.</li>
    </ol>
    <p>The rules are applied in order.  As soon as a match is found,
    it is used and detection is stopped.</p>
</section>

<section id="output"><title>Output Encoding</title>
<p><a href="http://xmlsoft.org/">libxml2</a> always uses UTF-8 (Unicode)
internally, and libxml2-based filter modules will output that by default.
mod_xml2enc can change the output encoding through the API, but there
is currently no way to configure that directly.</p>
<p>Changing the output encoding should (in theory, at least) never be
necessary, and is not recommended due to the extra processing load on
the server of an unnecessary conversion.</p>
</section>

<section id="alias"><title>Unsupported Encodings</title>
<p>If you are working with encodings that are not supported by any of
the conversion methods available on your platform, you can still alias
them to a supported encoding using <directive>xml2EncAlias</directive>.</p>
</section>

<directivesynopsis>
<name>xml2EncDefault</name>
<description>Sets a default encoding to assume when absolutely no information
can be <a href="#sniffing">automatically detected</a></description>
<syntax>xml2EncDefault <var>name</var></syntax>
<contextlist><context>server config</context>
<context>virtual host</context><context>directory</context>
<context>.htaccess</context></contextlist>
<compatibility>Version 2.4.0 and later; available as a third-party
module for earlier versions.</compatibility>

<usage>
    <p>If you are processing data with known encoding but no encoding
    information, you can set this default to help mod_xml2enc process
    the data correctly.  For example, to work with the default value
    of Latin1 (<var>iso-8859-1</var> specified in HTTP/1.0, use</p>
    <example>xml2EncDefault iso-8859-1</example>
</usage>
</directivesynopsis>

<directivesynopsis>
<name>xml2EncAlias</name>
<description>Recognise Aliases for encoding values</description>
<syntax>xml2EncAlias <var>charset alias [alias ...]</var></syntax>
<contextlist><context>server config</context></contextlist>

<usage>
    <p>This server-wide directive aliases one or more encoding to another
    encoding.  This enables encodings not recognised by libxml2 to be handled
    internally by libxml2's encoding support using the translation table for
    a recognised encoding.  This serves two purposes: to support character sets
    (or names) not recognised either by libxml2 or iconv, and to skip
    conversion for an encoding where it is known to be unnecessary.</p>
</usage>
</directivesynopsis>

<directivesynopsis>
<name>xml2StartParse</name>
<description>Advise the parser to skip leading junk.</description>
<syntax>xml2StartParse <var>element [element ...]</var></syntax>
<contextlist><context>server config</context><context>virtual host</context>
<context>directory</context><context>.htaccess</context></contextlist>

<usage>
    <p>Specify that the markup parser should start at the first instance
    of any of the elements specified.  This can be used as a workaround
    where a broken backend inserts leading junk that messes up the parser (<a
    href="http://bahumbug.wordpress.com/2006/10/12/mod_proxy_html-revisited/"
    >example here</a>).</p>
    <p>It should never be used for XML, nor well-formed HTML.</p>
</usage>
</directivesynopsis>

</modulesynopsis>
+12 −0
Original line number Diff line number Diff line
<?xml version="1.0" encoding="UTF-8" ?>
<!-- GENERATED FROM XML: DO NOT EDIT -->

<metafile reference="mod_xml2enc.xml">
  <basename>mod_xml2enc</basename>
  <path>/mod/</path>
  <relpath>..</relpath>

  <variants>
    <variant>en</variant>
  </variants>
</metafile>
+605 −0

File added.

Preview size limit exceeded, changes collapsed.

+55 −0
Original line number Diff line number Diff line
/* Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#ifndef MOD_XML2ENC
#define MOD_XML2ENC

#define ENCIO_INPUT 0x01
#define ENCIO_OUTPUT 0x02
#define ENCIO_INPUT_CHECKS 0x04
#define ENCIO (ENCIO_INPUT|ENCIO_OUTPUT|ENCIO_INPUT_CHECKS)
#define ENCIO_SKIPTO 0x10

/* declarations to deal with WIN32 compile-flag-in-source-code crap */
#if !defined(WIN32)
#define XML2ENC_DECLARE(type)            type
#define XML2ENC_DECLARE_NONSTD(type)     type
#define XML2ENC_DECLARE_DATA
#elif defined(XML2ENC_DECLARE_STATIC)
#define XML2ENC_DECLARE(type)            type __stdcall
#define XML2ENC_DECLARE_NONSTD(type)     type
#define XML2ENC_DECLARE_DATA
#elif defined(XML2ENC_DECLARE_EXPORT)
#define XML2ENC_DECLARE(type)            __declspec(dllexport) type __stdcall
#define XML2ENC_DECLARE_NONSTD(type)     __declspec(dllexport) type
#define XML2ENC_DECLARE_DATA             __declspec(dllexport)
#else
#define XML2ENC_DECLARE(type)            __declspec(dllimport) type __stdcall
#define XML2ENC_DECLARE_NONSTD(type)     __declspec(dllimport) type
#define XML2ENC_DECLARE_DATA             __declspec(dllimport)
#endif

APR_DECLARE_OPTIONAL_FN(apr_status_t, xml2enc_charset,
                        (request_rec* r, xmlCharEncoding* enc,
                         const char** cenc));

APR_DECLARE_OPTIONAL_FN(apr_status_t, xml2enc_filter,
                        (request_rec* r, const char* enc, unsigned int mode));

APR_DECLARE_EXTERNAL_HOOK(xml2enc, XML2ENC, int, preprocess,
                          (ap_filter_t *f, char** bufp, apr_size_t* bytesp))

#endif