Search
+1 (617) 7782998 | info@iispeed.com

Authorizing and Mapping Urls and Domains

Authorizing domains

In addition to optimizing HTML resources, PageSpeed restricts itself to optimizing resources (JavaScript, CSS, images) that are served from domains, with optional paths, that must be explicitly listed in the configuration file. For example:

pagespeed Domain http://example.com
pagespeed Domain cdn.example.com
pagespeed Domain http://styles.example.com/css
pagespeed Domain *.example.org

PageSpeed will rewrite resources found from these explicitly listed domains, although in the case of styles.example.com only resources under the css directory will be rewritten. Additionally, it will rewrite resources that are served from the same domain as the HTML file, or are specified as a path relative to the HTML. When resources are rewritten, their domain and path are not changed. However, the leaf name is changed to encode rewriting information that can be used to identify and serve the optimized resource.

The leading "http://" is optional; bare hostnames will be interpreted as referring to HTTP. Wildcards can be used in the domain.

These directives can be used in location-specific configuration sections.

Mapping origin domains

In order to improve the performance of web pages, PageSpeed must examine and modify the content of resources referenced on those pages. To do that, it must fetch those resources using HTTP, using the URL reference specified on the HTML page.

In some cases, the URL specified in the HTML file is not the best URL to use to fetch the resource. Scenarios where this is a concern include:

  1. If the server is behind a load balancer, and it's more efficient to reference the server directly by its IP address, or as 'localhost'.
  2. The server has a special DNS configuration
  3. The server is behind a firewall preventing outbound connections
  4. The server is running in a CDN or proxy, and must go back to the origin server for the resources
  5. The server needs to service https requests

In these situations the remedy is to map the origin domain:

pagespeed MapOriginDomain origin_to_fetch_from origin_specified_in_html [host_header]

Note: The optional [host_header] argument is a new feature as of IISpeed 2.0 / PageSpeed 1.8.31.2 .

Wildcards can also be used in the origin_specified_in_html, e.g.

pagespeed MapOriginDomain localhost *.example.com

The origin_to_fetch_from can include a path after the domain name, e.g.

pagespeed MapOriginDomain localhost/example *.example.com

When a path is specified, the source domain is mapped to the destination domain and the source path is mapped to the concatenation of the path from origin_to_fetch_from and the source path. For example, given the above mapping, http://www.example.com/index.html will be mapped to http://localhost/example/index.html.

The origin_specified_in_html can specify https but the origin_to_fetch_from can only specify http, e.g.

pagespeed MapOriginDomain http://localhost https://www.example.com

This directive lets the server accept https requests for www.example.com without requiring a SSL certificate to fetch resources - in fact, this is the only way PageSpeed can service https requests as currently it cannot use https to fetch resources. For example, given the above mapping, and assuming the server is configured for https support, PageSpeed will fetch and optimize resources accessed using https://www.example.com, fetching the resources from http://localhost, which can be the same server process or a different server process.

pagespeed MapOriginDomain http://localhost https://www.example.com
pagespeed ShardDomain https://www.example.com https://example1.cdn.com,https://example2.cdn.com

In this example the https origin domain is mapped to localhost and sharding is used to parallelize downloads across hostnames. Note that the shards also specify https.

By specifying a source domain in this directive, you are authorizing PageSpeed to rewrite resources found in that domain. For example, in the above directives, '*.example.com' gets authorized for rewrites from HTML files, but 'localhost' does not. See Domain.

When PageSpeed fetches resources from a mapped origin domain, it specifies the source domain in the Host: header in the request. You can override the Host: header value with the optional third parameter host_header. See Mapping Origins with a Shared Domain for an example.

See also LoadFromFile to load origin resource directly from the filesystem and avoid an HTTP connection altogether.

These directives can be used in location-specific configuration sections.

Mapping rewrite domains

When PageSpeed rewrites a resource, it updates the HTML to refer to the resource by its new name. Generally PageSpeed leaves the resource at the same origin and path that was originally found in the HTML. However, it is possible to map the domain of rewritten resources. Examples of why this might be desirable include:

  1. Serving static content from cookieless domains, to reduce the size of HTTP requests from the browser.
  2. To move content to a Content Delivery Network (CDN)

This is done using the configuration file directive:

pagespeed MapRewriteDomain domain_to_write_into_html domain_specified_in_html

Wildcards can also be used in the domain_specified_in_html:

pagespeed MapRewriteDomain cdn.com/example *.example.com

The domain_to_write_into_html can include a path after the domain name:

pagespeed MapRewriteDomain cdn.com/example *.example.com

When a path is specified, the source domain is rewritten to the destination domain and the source path is rewritten to the concatenation of the path from domain_to_write_into_html and the source path. For example, given the above mapping, http://www.example.com/index.html will be rewritten to http://cdn.com/example/index.html.

Note: It is the responsibility of the site administrator to ensure that PageSpeed is installed on the domain_to_write_into_html. This might be a separate server, or there may be a single server with multiple domains mapped into it. The files must be accessible via the same path on the destination server as was specified in the HTML file. No other files should be stored on the domain_to_write_into_html -- it should be functionally equivalent to domain_specified_in_html. See also MapProxyDomain which enables proxying content from a different server.

For example, if PageSpeed cache_extends http://www.example.com/styles/style.css to http://cdn.example.com/styles/style.css.pagespeed.ce.HASH.css, then cdn.example.com will have to have a mechanism in place to either rewrite that file in place, or refer back to the origin server to pull the rewritten content.

Note: It is the responsibility of the site administrator to ensure that moving resources onto domains does not create a security vulnerability. In particular, if the target domain has cookies, then any JavaScript loaded from a resource moved to a domain with cookies will gain access to those cookies. In general, moving resources to a cookieless domain is a great way to improve security. Be aware that CSS can load JavaScript in certain environments.

By specifying a domain in this directive, either as source or destination, you are authorizing PageSpeed to rewrite resources found in this domain. See Domain.

These directives can be used in location-specific configuration sections.

Mapping Origins with a Shared CDN

Consider a scenario where an installation serving multiple domains uses a single CDN for caching and delivery of all content. The origin fetches need to be routed to the correct VirtualHost on the server. This can be achieved by using a subdirectory per domain in the CDN, and then using that subdirectory to map to the correct VirtualHost at origin. The host-header control offered by the third argument to MapOriginDomain makes this feasible.

In the example below, resources with a domain of sharedcdn.example.com and path starting with /vhost1 will be fetched from localhost but with a Host: header value of vhost1.example.com. Without the third argument to MapOriginDomain, the Host: header would be sharedcdn.example.com.

pagespeed MapOriginDomain localhost sharedcdn.example.com/vhost1 vhost1.example.com
pagespeed MapRewriteDomain sharedcdn.example.com/vhost1 vhost1.example.com

This would be used in conjunction with a VirtualHost setup for vhost1.example.com, and a single CDN setup for multple hosts segregated by subdirectory.

Sharding domains

Best practices suggest minimizing round-trip times by parallelizing downloads across hostnames. PageSpeed can partially automate this for resources that it rewrites, using the directive:

pagespeed ShardDomain domain_to_shard shard1,shard2,shard3...

Wildcards cannot be used in this directive.

This will distribute the domains for rewritten URLs among the specified shards. The shard selected for a particular URL is computed from the original URL.

pagespeed ShardDomain example.com static1.example.com,static2.example.com

Using this directive, PageSpeed will distribute roughly half the resources rewritten from example.com into static1.example.com, and the rest to static2.example.com. You can specify as many shards as you like. The optimum number of shards is a topic of active research, and is browser-dependent. Configuring between 2 and 4 shards should yield good results. Changing the number of shards will cause PageSpeed to choose different names for resources, resulting in a partial cache flush.

When used in combination with RewriteDomain, the Rewrite mappings will be done first. Then the shard selection occurs. Origin domains are always tracked so that when a browser sends a sharded URL back to the server, PageSpeed can find it.

Let's look at an example:

pagespeed ShardDomain example.com static1.example.com,static2.example.com
pagespeed MapRewriteDomain example.com www.example.com
pagespeed MapOriginDomain localhost example.com

In this example, example.com and www.example.com are "tied" together via MapRewriteDomain. The origin-mapping to localhost propagates automatically to www.example.com, static1.example.com, and static2.example.com. So when PageSpeed cache-extends an HTML stylesheet reference http://www.example.com/styles.css, it will be:

  1. Fetched by the server rewriting the HTML from localhost
  2. Rewritten to http://example.com/styles.css.pagespeed.ce.HASH.css
  3. Sharded to http://static1.example.com/styles.css.pagespeed.ce.HASH.css

Proxying and optimizing resources from trusted domains

Proxying resources is desirable under several scenarios:

  • The resources on the origin domain may benefit from optimizations done by PageSpeed.
  • SPDY may work better if there are fewer domains on a page.
  • The target domain running PageSpeed may have better serving infrastructure than the origin.

It is possible to proxy and optimize resources whose origin is a trusted domain that may not be running PageSpeed. This cannot be directly achieved with MapRewriteDomain because that is a declaration that the domains listed are functionally equivalent to one another, either because they are backed by the same storage, or because the target is acting as a proxy (e.g. a CDN). MapProxyDomain makes it technically possible to proxy and optimize resources from any domain that you trust.

You must only proxy resources that are controlled by an organization you trust because it is possible for malicious content (e.g. GIFAR) proxied from an untrustworthy domain to gain access to private content on your domain, compromising your site or its viewers. You must never map directories that may contain files that may be controlled by a third party.

There may be legal issues restricting the optimization of resources you don't own. If in doubt consult a lawyer.

pagespeed MapProxyDomain target_domain/subdir
                         origin_domain/subdir [rewrite_domain/subdir]

If the optional rewrite_domain/subdir argument is supplied then optimized resources will be rewritten to that location. This is useful for rewriting optimized resources proxied from an external origin to a CDN.

It is important to specify a subdirectory in the target domain, because PageSpeed will need to be able to unambiguously identify the origin domain given the target when fetching content. Thus each MapProxyDomain command should be given a distinct subdirectory of the target domain.

It is important to specify a subdirectory in the origin domain to limit the scope of the proxying.

Real world example

You can also see proxy-mapping in action on this example.

Fetch server restrictions

PageSpeed will only fetch resources from localhost and domains explicitly mentioned in domain configuration directives such as Domain, MapRewriteDomain and MapOriginDomain.

Specifying additional URL-valued attributes

All PageSpeed filters that process URLs need to know which attributes of which elements to consider. By default they consider those in the HTML4 and HTML5 specifications and a few common extensions:

<A href=...>
<AREA href=...>
<AUDIO src=...>
<BLOCKQUOTE cite=...>
<BODY background=...>
<BUTTON formaction=...>
<COMMAND icon=...>
<DEL cite=...>
<EMBED src=...>
<FORM action=...>
<FRAME src=...>
<HTML manifest=...>
<IFRAME src=...>
<IMG src=...>
<INPUT type="image" src=...>
<INS cite=...>
<LINK href=...>
<Q cite=...>
<SCRIPT src=...>
<SOURCE src=...>
<TD background=...>
<TH background=...>
<TABLE background=...>
<TBODY background=...>
<TFOOT background=...>
<THEAD background=...>
<TRACK src=...>
<VIDEO src=...>

If your site uses a non-standard attribute for URLs, PageSpeed won't know to rewrite them or the resources they reference. To identify them to PageSpeed, use the UrlValuedAttribute directive. For example:

pagespeed UrlValuedAttribute span src hyperlink
pagespeed UrlValuedAttribute div background image

These would identify <span src=...> and <div background=...> as containing URLs. Further, the background attribute of div elements would be treated as referring to an image and would be treated just like an image resource referenced with <img src=...>. The general form is:

pagespeed UrlValuedAttribute ELEMENT ATTRIBUTE CATEGORY

All fields are case-insensitive. Valid categories are:

  • script
  • image
  • stylesheet
  • otherResource
    • Any other URL that will be automatically loaded by the browser along with the main page. For example, the manifest attribute of the html element or the src attribute of an iframe element.
  • hyperlink
    • A link to another page or resource that a browser wouldn't normally load in connection to this page (like the href attribute of an a element). These URLs will still be rewritten by MapRewriteDomain and similar directives, but they will not be sharded and PageSpeed will not load the URL and rewrite the resource.
When in doubt, hyperlink is the safest choice.

Loading static files from disk

By default PageSpeed loads sub-resources via an HTTP fetch. It would be faster to load sub-resources directly from the filesystem, however this may not be safe to do because the sub-resources may be dynamically generated or the sub-resources may not be stored on the same server.

However, you can explicitly tell PageSpeed to load static sub-resources from disk by using the LoadFromFile directive. For example:

pagespeed LoadFromFile "http://www.example.com/static/" "c:\www\static/"

tells PageSpeed to load all resources whose URLs start with http://www.example.com/static/ from the filesystem under c:\www\static/. For example, http://www.example.com/static/images/foo.png will be loaded from the file c:\www\static/images/foo.png. However, http://www.example.com/bar.jpg will still be fetched using HTTP.

If you need more sophisticated prefix-matching behavior, you can use the LoadFromFileMatch directive, which supports RE2-formatted regular expressions. (Note that this is not the same format as the wildcards used above and elsewhere in PageSpeed.) For example:

pagespeed LoadFromFileMatch "^https?://example.com/~([^/]*)/static/" "c:\www\static/\\1"

Will load http://example.com/~pat/static/cat.jpg from c:\www\static/pat/cat.jpg, http://example.com/~sam/static/images/dog.jpg from c:\www\static/sam/images/dog.jpg, and https://example.com/~al/static/css/ie from c:\www\static/al/css/ie. The resource http://example.com/~pat/images/static/puppy.gif, however, would not be matched by this directive and would be fetched using HTTP.

Because PageSpeed is loading the files directly from the filesystem, no custom headers will be set.

You can also use the LoadFromFile directive to load HTTPS resources which would not be otherwise fetchable directly. For example:

pagespeed LoadFromFile "https://www.example.com/static/" "c:\www\static/";

The filesystem path must be an absolute path.

You can specify multiple LoadFromFile associations in configuration files. Note that large numbers of such directives may impact performance.

If the sub-resource cannot be loaded from file in the directory specified, the sub-request will fail (rather than fall back to HTTP fetch). Part of the reason for this is to indicate a configuration error more clearly.

As an added benefit. If resources are loaded from file, the rewritten versions will be updated immediately when you change the associated file. Resources loaded via normal HTTP fetches are refreshed only when they expire from the cache (by default every 5 minutes). Therefore, the rewritten versions are only updated as often as the cache is refreshed. Resources loaded from file are not subject to caching behavior because they are accessed directly from the filesystem for every request for the rewritten version.

See also MapOriginDomain.

This directive can not be used in location-specific configuration sections.

Limiting Direct Loading

A mapping set up with LoadFromFile allows filesystem loading for anything it matches. If you have directories or file types that cannot be loaded directly from the filesystem, LoadFromFileRule lets you add fine-grained rules to control which files will be loaded directly and which will fall back to the standard process, over HTTP.

When given a URL PageSpeed first determines whether any LoadFromFile mappings apply. If one does, it calculates the mapped filename and checks for applicable LoadFromFileRules. Considering rules in the reverse order of definition, it takes the first applicable one and uses that to determine whether to load from file or fall back to HTTP.

Some examples may be helpful. Consider a website that is entirely static content except for a /cgi-bin directory:

c:\www\index.html
c:\www\css\style.css
c:\www\gfx\image.png
c:\www\bin\webapp.dll

While most of the site can be loaded directly from the filesystem, webapp.dll and web.config are files that need to be interpreted before serving -- or not served at all! Adding a rule disallowing the /bin directory tells us to fall back to HTTP appropriately:

pagespeed LoadFromFile http://example.com/ c:\www\
pagespeed LoadFromFileRule Disallow c:\www\bin

The LoadFromFileRule directive takes two arguments. The first must be either Allow or Disallow while the second is a prefix that specifies which filesystem paths it should apply to. Because the default is to allow loading from the filesystem for all paths listed in any LoadFromFile statement, most of the time you will be using Disallow to turn off filesystem loading for some subset of those paths. You would use Allow only after a Disallow that was overly general.

Not all sites are well suited for prefix-based control. Consider a site with aspx files mixed in with ordinary static files:

  c:\www\index.html
  c:\www\webmail.aspx
  c:\www\webmail.css
  c:\www\blog/index.aspx
  c:\www\blog/header.png
  c:\www\blog/blog.css

Blacklisting just the .aspx files so they fall back to an HTTP fetch allows everything else to be loaded directly from the filesystem:

pagespeed LoadFromFile http://example.com/ c:\www\;
pagespeed LoadFromFileRuleMatch Disallow \.aspx;

The LoadFromFileRuleMatch directive also takes two arguments. The first is either Allow or Disallow and functions just like for LoadFromFileRule above. The second argument, however, is a RE2-format regular expression instead of a file prefix. Remember to escape characters that have special meaning in regular expressions. For example, if instead of \.aspx$ we had simply .aspx$ then a file named example.notphp would still be forced to load over HTTP because "." is special syntax for "match any single character".

Consider a site with the opposite problem: a few file types can be reliably loaded from file but the rest need interpretation first. For example:

  c:\www\index.html
  c:\www\site.css
  c:\www\script-using-ssi.js
  c:\www\generate-image.ashx
  c:\www\

In this site generate-image.ashx needs to be interpreted to make images. The only resources on the site that are generally safe to load are .css ones. By first blacklisting everything and then whitelisting only the .css files, we can make PageSpeed do this:

pagespeed LoadFromFile http://example.com/ c:\www\
pagespeed LoadFromFileRuleMatch disallow .*
pagespeed LoadFromFileRuleMatch allow \.css$

This works because order is significant: later rules take precedence over earlier ones.

Inlining resources without explicit authorization

Note: New feature as of IISpeed 2.0 / PageSpeed 1.8.31.2

Several filters in PageSpeed operate by inlining content from resources into the HTML: inline_css, inline_javascript and prioritize_critical_css are a few of the filters that operate in this manner. If resources from third-party domains are not authorized explicitly, the effectiveness of these filters decreases. For instance, prioritize_critical_css attempts to remove blocking CSS requests needed for the initial render by inlining critical CSS snippets into the HTML, however, the CSS resources that are not authorized will continue to block. This option allows such resources to be inlined without having to authorize all the individual domains.

The InlineResourcesWithoutExplicitAuthorization directive can be used to allow resources from third-party domains to be inlined into the HTML without requiring explicit authorization for each domain. This option is “off” by default, and takes a comma-separated list of strings representing resource categories for which the option should be enabled. The list of valid resource categories is given here. Currently, only Script and Stylesheet resource types are supported for this option.

This option can be enabled as follows:
pagespeed InlineResourcesWithoutExplicitAuthorization Script,Stylesheet

Warning: Enabling InlineResourcesWithoutExplicitAuthorization could permit hostile third parties to access any machine and port that the server running mod_pagespeed has access to, including potentially those behind firewalls. Please read the following information for details.

This directive should only be enabled if all of the following conditions are met for the resource types for which this option is enabled:

  1. The webmaster is confident that the resources referenced on their pages are from trusted domains only.
  2. The site does not allow user-injected resources for the enabled resource types.
  3. Fetches from the PageSpeed server should have no more access to machines or ports than anyone on the Internet, and machines it can access should not treat its traffic specially. Specifically, the PageSpeed servers should not be able to access anything that is internal to a firewall. Please refer to Fetch server restrictions sections for more details.

Note that resources inlined into HTML via this option will not be accessible directly via a pagespeed URL, since that involves different security risks. Resources will also not be inlined into other non-HTML resources via this option. This means that flatten_css_imports will not flatten third-party CSS into another CSS resource, unless the relevant third-party domains are authorized explicitly via one of the techniques mentioned in the previous sections.



Share View Comments
.
Some content on this website represents a modified version of the official Google PageSpeed documentation