Commit 256e130f authored by Richard Bowen's avatar Richard Bowen
Browse files

Rewrites the 'API Phases' section to give a brief intro to what an API

Phase is, and how mod_rewrite handles rewrite rules in two different
phases. Removes some of the condescending tone. References
https://issues.apache.org/bugzilla/show_bug.cgi?id=30021


git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1180925 13f79535-47bb-0310-9956-ffa450edef68
parent 1c84141c
Loading
Loading
Loading
Loading
+73 −73
Original line number Diff line number Diff line
@@ -39,81 +39,81 @@ and URL matching.</p>
<seealso><a href="advanced.html">Advanced techniques</a></seealso>
<seealso><a href="avoid.html">When not to use mod_rewrite</a></seealso>

<section id="Internal"><title>Internal Processing</title>

      <p>The internal processing of this module is very complex but
      needs to be explained once even to the average user to avoid
      common mistakes and to let you exploit its full
      functionality.</p>
</section>

<section id="InternalAPI"><title>API Phases</title>

      <p>First you have to understand that when Apache processes a
      HTTP request it does this in phases. A hook for each of these
      phases is provided by the Apache API. Mod_rewrite uses two of
      these hooks: the URL-to-filename translation hook which is
      used after the HTTP request has been read but before any
      authorization starts and the Fixup hook which is triggered
      after the authorization phases and after the per-directory
      config files (<code>.htaccess</code>) have been read, but
      before the content handler is activated.</p>

      <p>So, after a request comes in and Apache has determined the
      corresponding server (or virtual server) the rewriting engine
      starts processing of all mod_rewrite directives from the
      per-server configuration in the URL-to-filename phase. A few
      steps later when the final data directories are found, the
      per-directory configuration directives of mod_rewrite are
      triggered in the Fixup phase. In both situations mod_rewrite
      rewrites URLs either to new URLs or to filenames, although
      there is no obvious distinction between them. This is a usage
      of the API which was not intended to be this way when the API
      was designed, but as of Apache 1.x this is the only way
      mod_rewrite can operate. To make this point more clear
      remember the following two points:</p>

      <ol>
        <li>Although mod_rewrite rewrites URLs to URLs, URLs to
        filenames and even filenames to filenames, the API
        currently provides only a URL-to-filename hook. In Apache
        2.0 the two missing hooks will be added to make the
        processing more clear. But this point has no drawbacks for
        the user, it is just a fact which should be remembered:
        Apache does more in the URL-to-filename hook than the API
        intends for it.</li>

        <li>
          Unbelievably mod_rewrite provides URL manipulations in
          per-directory context, <em>i.e.</em>, within
          <code>.htaccess</code> files, although these are reached
          a very long time after the URLs have been translated to
          filenames. It has to be this way because
          <code>.htaccess</code> files live in the filesystem, so
          processing has already reached this stage. In other
          words: According to the API phases at this time it is too
          late for any URL manipulations. To overcome this chicken
          and egg problem mod_rewrite uses a trick: When you
          manipulate a URL/filename in per-directory context
          mod_rewrite first rewrites the filename back to its
          corresponding URL (which is usually impossible, but see
          the <code>RewriteBase</code> directive below for the
          trick to achieve this) and then initiates a new internal
          sub-request with the new URL. This restarts processing of
          the API phases.

          <p>Again mod_rewrite tries hard to make this complicated
          step totally transparent to the user, but you should
          remember here: While URL manipulations in per-server
          context are really fast and efficient, per-directory
          rewrites are slow and inefficient due to this chicken and
          egg problem. But on the other hand this is the only way
          mod_rewrite can provide (locally restricted) URL
          manipulations to the average user.</p>
        </li>
      </ol>

      <p>Don't forget these two points!</p>
    <p>The Apache HTTP Server handles requests in several phases. At
    each of these phases, one or more modules may be called upon to
    handle that portion of the request lifecycle. Phases include things
    like URL-to-filename translation, authentication, authorization,
    content, and logging. (These is not an exhaustive list.)</p>

    <p>mod_rewrite acts in two of these phases (or "hooks", as they are
    sometimes called) to influence how URLs may be rewritten.</p>

    <p>First, it uses the URL-to-filename translation hook, which occurs
    after the HTTP request has been read, but before any authorization
    starts. Secondly, it uses the Fixup hook, which is after the
    authorizatin phases, and after per-directory configuration files
    (<code>.htaccess</code> files) have been read, but before the
    content handler is called.</p>

    <p>So, after a request comes in and a corresponding server or
    virtual host has been determined, the rewriting engine starts
    processing any <code>mod_rewrite</code> directives appearing in the
    per-server configuration. (ie, in the main server configuration file
    and <directive module="core" type="section">Virtualhost</directive>
    sections.) This happens in the URL-to-filename phase.</p>

    <p>A few steps later, when the finaly data directories are found,
    the per-directory configuration directives (<code>.htaccess</code>
    files and <directive module="core"
    type="section">Directory</directive> blocks) are applied. This
    happens in the Fixup phase.</p>

    <p>In each of these cases, mod_rewrite rewrites the
    <code>REQUEST_URI</code> either to a new URI, or to a filename.</p>

    <p>In per-directory context (ie, within <code>.htaccess</code> files
    and <code>Directory</code> blocks), these rules are being applied
    after a URI has already been translated to a filename. Because of
    this, mod_rewrite temporarily translates the filename back into a URI,
    by stripping off directory paty before appling the rules. (See the
    <directive module="mod_rewrite">RewriteBase</directive> directive to
    see how you can further manipulate how this is handled.) Then, a new
    internal subrequest is issued with the new URI. This restarts
    processing of the API phases.</p>

    <p>Because of this further manipulation of the URI in per-directory
    context, you'll need to take care to craft your rewrite rules
    differently in that context. In particular, remember that the
    leading directory path will be stripped off of the URI that your
    rewrite rules will see. Consider the examples below for further
    clarification.</p>

    <table border="1">

        <tr>
            <th>Location of rule</th>
            <th>Rule</th>
        </tr>

        <tr>
            <td>VirtualHost section</td>
            <td>RewriteRule ^/images/(.+)\.jpg /images/$1.gif</td>
        </tr>

        <tr>
            <td>.htaccess file in document root</td>
            <td>RewriteRule ^images/(.+)\.jpg images/$1.gif</td>
        </tr>

        <tr>
            <td>.htaccess file in images directory</td>
            <td>RewriteRule ^(.+)\.jpg $1.gif</td>
        </tr>

    </table>

</section>

<section id="InternalRuleset"><title>Ruleset Processing</title>