Create your own free reverse proxy with Azure Web Apps

Create your own free reverse proxy with Azure Web Apps

Tom Chantler

Summary

This article explains how to use Azure Web Apps (the new name for Azure Websites) to create a free reverse proxy such that all requests to tomssl-proxy.azurewebsites.net actually serve content from tomssl.com, without this being apparent to the end user. We will also force the connection to be made securely over SSL (using the azurewebsites.net SSL certificate, not the certificate from tomssl.com, which means we can do this even if the existing live website doesn't support SSL). Then, just for fun, we will edit some of the content that's returned and finally we'll add a red warning banner, which will give the game away (but you can be confident that somebody doing this maliciously wouldn't do that).

NOTE: It's possible to do all of these things using IIS URL rewriting and Application Request Routing (ARR) in a standard installation of Internet Information Services (IIS) and indeed that's what Azure Web Apps uses under the hood. However, you'd need a server for that, whereas not only do we not need a server, we don't even need to spend any money as we can use a free Microsoft Azure Web App. You can sign up for a free trial and try it yourself.

Let's see exactly what we're going to do:

  1. Make all requests to tomssl-proxy.azurewebsites.net secretly retrieve the content from tomssl.com, without us knowing (the address bar won't change).
  2. Rewrite all the references (including hyperlinks) to tomssl.com so they actually point to tomssl-proxy.azurewebsites.net, thus aiding the deception we're practising on ourselves.
  3. Blank out some of the text to make it look like some government agency has been censoring our internet connection.
  4. Make words that end in ing end in in'.
  5. Alter some dates (changing 2015 to 2014).
  6. Replace the pictues of me with somebody else.
  7. Add a floating header to the page explaining what's going on and featuring links to this blog post and to the original page, effectively undermining the subterfuge from (1) and (2).

I expect you'll agree that if we can do this then surely hilarity will ensue. But on a serious note, it's fairly easy to see how this could be useful in a more formal context.

  • You could improve the security of an insecure live website by forcing communication over HTTPS without having access to the source code of the original website (or anything other than the live URL).
  • Say you have separate websites for your blog (blog.onlineblogservice.com), your business (mydomain.example.com) and your store (mydomain.onlinestoreprovider.com). Using this method you could bring them all together under mydomain.com/blog, mydomain.com and mydomain.com/store.
  • You could append a new copyright notice to certain content.
  • You could add a cookie-acceptance header warning to a page which uses cookies but which predates such rules.
  • You could add javascript for tracking or analytics.
  • You could attempt to do all sorts of malicious things which we won't discuss here.

A brief aside - a real example

As previously mentioned, if we weren't using Azure websites but were instead using a server installation of IIS then we could also do this, as in the case of a recent client of mine. Their infrastructure looked like this:

Web Server - exposed to the internet, connected internally to...

Application Server - not exposed to the internet, connected internally to...

Database server - not exposed to the internet or to the web server

This is a fairly common setup where the web server can see the application server, but cannot communicate directly with the database server. The problem they had was that they wanted to install a simple two-tier web application requiring database access and for it to be accessible via the internet.

  • If they installed the application on the web server then it couldn't see the database.
  • If they installed the application on the application server then, whilst it could see the database, it couldn't connect to the outside world.

The solution was essentially the same as what is described here, but the configuration was done on their web server instead of in Azure; we installed a simple reverse proxy on the externally visible web server which redirected all requests to the application server and thus security wasn't compromised by exposing the database server to the web server and neither the application server nor the database server had to be connected to the outside world. It took a few minutes to configure and worked well.

Back to our example

In our example we want to go to https://tomssl-proxy.azurewebsites.net/ (don't click it yet) and for it to retrieve content from GHOST_URL/ without us noticing. We also want to edit the response which is sent back to the browser, changing all references to GHOST_URL/ therein. We'll make a few other changes to the visible content, changing some dates and images, blanking out some words and dropping the terminal g from others. Finally we'll add a banner to the top of the page explaining what's been done.

If we had access to the source code then this would be fairly straightforward, although somewhat onerous and error-prone. For the purposes of this experiment we won't have access to anything except the URL of the live website.

Enter IIS URL rewriting and Application Request Routing (ARR). The bit we're interested in enables you to intercept requests and send them somewhere else and also to edit the response which is sent back to the browser, in my case changing any links therein.

For a resource giving some useful examples of URL rewriting in IIS, see Ruslan Yakushev's blog post 10 URL Rewriting Tips and Tricks. Not only that, he's already written briefly about using an Azure Web Site as a reverse proxy.

To achieve our aim we are going to use an XDT transform (part of Azure Site Extensions) to tweak the applicationHost.config file that is generated for us. Then we'll be able to access previously unavailable values in our web.config file.

We're only going to upload two files to our free Azure Web App: applicationHost.xdt and web.config. If you've already got an Azure subscription you can get this up and running in a few minutes.

applicationHost.xdt

We're going to add and configure the <proxy> node and also add some server variables which we will access in the web.config file.

UPDATE: The allowedServerVariables used to be marked as Insert, but should be InsertIfMissing, just in case any other xdt files are also adding the same server variables, since duplicate values can cause the site to break. I have now corrected this (see below). Thanks to David Ebbo for pointing this out in the comments.

<?xml version="1.0"?>
<configuration xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform">
	<system.webServer>
		<proxy xdt:Transform="InsertIfMissing" enabled="true" preserveHostHeader="false" reverseRewriteHostInResponseHeaders="false" />
		<rewrite>
			<allowedServerVariables>
				<add name="HTTP_X_ORIGINAL_HOST" xdt:Transform="InsertIfMissing" />
				<add name="HTTP_X_UNPROXIED_URL" xdt:Transform="InsertIfMissing" />
				<add name="HTTP_X_ORIGINAL_ACCEPT_ENCODING" xdt:Transform="InsertIfMissing" />
				<add name="HTTP_ACCEPT_ENCODING" xdt:Transform="InsertIfMissing" />
			</allowedServerVariables>
		</rewrite>
	</system.webServer>
</configuration>

Save the above in a new file called applicationHost.xdt and upload it to the /site folder of your Azure website.

There are a lot of different ways you can do this. In this case I'd suggest it might be easiest using SFTP via WinSCP, which is what I did when writing this article.

web.config

There are two main parts of the web.config file which we're going to edit (inbound and outbound rules) and they are both under the configurationsystem.webServerrewrite node.

Inbound Rules

<rules>
		<rule name="ForceSSL" stopProcessing="true">
			<match url="(.*)" />
			<conditions>
				<add input="{HTTPS}" pattern="^OFF$" ignoreCase="true" />
			</conditions>
			<action type="Redirect" url="https://{HTTP_HOST}/{R:1}" redirectType="Permanent" />
		</rule>
		<rule name="Proxy" stopProcessing="true">
			<match url="(.*)" />
			<action type="Rewrite" url="https://tomssl.com/{R:1}" />
			<serverVariables>
				<set name="HTTP_X_UNPROXIED_URL" value="http://tomssl.com/{R:1}" /> 
				<set name="HTTP_X_ORIGINAL_ACCEPT_ENCODING" value="{HTTP_ACCEPT_ENCODING}" /> 
				<set name="HTTP_X_ORIGINAL_HOST" value="{HTTP_HOST}" />
				<set name="HTTP_ACCEPT_ENCODING" value="" />
			</serverVariables>
		</rule>
</rules>

The first rewrite rule is fairly self-explanatory.

ForceSSL - makes sure that you are coming in using SSL by permanently redirecting requests from http to https.

The second rule is slightly more complex.

Proxy - retrieves content from tomssl.com, grabs the original HTTP_ACCEPT_ENCODING and HTTP_HOST headers and stores them for later use and then blanks out the HTTP_ACCEPT_ENCODING header. It also stores the original unproxied URL so that we can provide a link to the original page in our banner.

If you don't blank the HTTP_ACCEPT_ENCODING header on the way in, then the outbound rules won't work. It's not possible to rewrite content which has already been compressed.

In other words, if you remove this line,

<set name="HTTP_ACCEPT_ENCODING" value="" />

then add system.webServerhttpErrors (just to show more detailed errors) like this,

<system.webServer>
	<httpErrors errorMode="Detailed" />
    ...

you'll see this error:

HTTP 500.52 Error

Outbound Rules

If we only wanted to change outbound links and not general text in the body of the page, we'd use a regular expression to make sure we only rewrote the relevant parts, otherwise we'd be placing extra load on the server for no reason. In other words we'd check we were dealing with HTML and then we'd edit the filterByTags attribute to select <a> tags (and only those that didn't start with a single /, since relative links are already okay). Check the documentation for more.

<rule name="ChangeReferencesToOriginalUrl" patternSyntax="ExactMatch" preCondition="CheckContentType">
	<match filterByTags="None" pattern="https://tomssl.com" />
   	<action type="Rewrite" value="https://{HTTP_X_ORIGINAL_HOST}" />
</rule>

Change References To Original Url - this rule changes all references to GHOST_URL/ so that they refer to the original URL we used to visit the site (remember we saved it in the original inbound rule).

I have omitted the trailing / from the pattern here as the Home link on my original web page doesn't end with '/'. If you want your rewrite rules to work properly every time, do it this way.

Next we need to add the rules for redaction and g dropping.

Redacting

<rule name="WordRedactionFilter1" patternSyntax="ExactMatch" preCondition="CheckHTML">
	<match filterByTags="None" pattern=" the " />
	<action type="Rewrite" value=" &lt;span style='background-color:black; color:black; cursor:help' title='REDACTED'&gt;XXXXX&lt;/span&gt;&#160;" />
</rule>

This rule replaces occurrences of " the " with XXXXX in black on a black background, adds a tooltip, changes the mouse pointer and adds a non-breaking space to the end, like this: XXXXX  (try selecting the text with your cursor).

We have a similar rule for " a " too. I'm checking for the space in each case to make sure we don't accidentally blank out partial words or, worse still, alter parts of URLs and stop them from working.

Dropping the g

The rewrite rule for changing ing to in' is a bit more simple, as shown below:

<rule name="WordSubstitutionFilter1" patternSyntax="ExactMatch" preCondition="CheckHTML">
	<match filterByTags="None" pattern="ing " />
	<action type="Rewrite" value="in' " />
</rule>

The rule for changing 2015 to 2014 is very similar too.

Changing the images

You might be tempted to confine the image-substitution rule to <img> tags by setting a value in filterByTags and in some cases that might be sufficient, but if you want to be sure that you capture all references to your images then you might be better off doing something like this:

<rule name="ImageSubstitutionFilter1" patternSyntax="ExactMatch" preCondition="CheckContentType">
	<match filterByTags="None" pattern="//www.gravatar.com/avatar/b32f804a7aaf295a3517e63e563c1a83" />
	<action type="Rewrite" value="/content/images/2015/06/einstein_250.jpg" />
</rule>

Adding the floating header

It's unlikely that you'd add a header quite like this, but you might need to add a cookie acceptance notice to a legacy site, or a notice of an impending event of some kind.

<rule name="AppendHeader" patternSyntax="ExactMatch" preCondition="CheckContentType">
	<match filterByTags="None" pattern="&lt;/body&gt;" />
	<action type="Rewrite" value="&lt;div style='font-family:&#34;Open Sans&#34;,san-serif;font-size:1.5rem;text-align:center;padding:2px;background-color:#FF0000;color:#FFFFFF;z-index:99;position:fixed;top:0px;width:100%;border-bottom:1px solid grey;'&gt;This page has been altered by a free Microsoft Azure proxy. Details &lt;a href='https://tomssl.com/'&gt;here&lt;/a&gt;. See the original page &lt;a href='{HTTP_X_UNPROXIED_URL}'&gt;here&lt;/a&gt;&lt;/div&gt;&lt;/body&gt;" />
</rule>

As you can see, I have html-encoded the markup for the banner and am adding it to the end of the page just before the closing </body> tag.

Preconditions

For each of the outbound rules, we have specified a precondition which performs a check to make sure we don't alter the wrong types of data (e.g. we don't want to apply our filter to any image data, etc).

<preCondition name="CheckContentType">
	<add input="{RESPONSE_CONTENT_TYPE}" pattern="^(text/html|text/plain|text/xml|application/rss\+xml)" />
</preCondition>

I am changing all text and also the RSS feed - remember, in the case of the redaction we are pretending that the data has been permanently removed (large red banner at the top of the screen notwithstanding), so I don't want to let you see the original just by consuming the RSS feed.

Don't be tempted to set pattern="^(text/*|application/rss\+xml)" as you risk altering your CSS files.

If you don't want to change all text remember that HTML is of type text/html, things like robots.txt are text/plain, sitemap data is text/xml and rss data is application/rss+xml, so you could always do something like pattern="^(text/html|application/rss\+xml)" (we need to escape the + with a \) to change the HTML and the RSS feed, but not the robots.txt or the sitemap. For our example we have to change everything.

A note about gzip compression

I believe that by unlocking the httpCompression element in applicationHost.xdt, setting a few extra variables in the web.config and copying the AcceptEncoding on the way in, clearing it temporarily and then setting it again on the way out after the rewriting, it should be possible to get gzip compression to work on the rewritten content. However, this is only available if you are using a paid Azure tier; since we're using a free tier this won't work. Annoyingly, so far I have been unable to get that to work even when using a paid tier. With a bit of luck I'll get it working and update this article in due course.

Putting it all together

Now all that remains to be done is to combine the elements above into a web.config file and upload it to our Azure Web App.

web.config

<?xml version="1.0" encoding="utf-8"?>
<configuration>
	<system.webServer>
		<httpErrors errorMode="Detailed" />
		<rewrite>
			<rules>
				<rule name="ForceSSL" stopProcessing="true">
					<match url="(.*)" />
					<conditions>
						<add input="{HTTPS}" pattern="^OFF$" ignoreCase="true" />
					</conditions>
					<action type="Redirect" url="https://{HTTP_HOST}/{R:1}" redirectType="Permanent" />
				</rule>
				<rule name="Proxy" stopProcessing="true">
					<match url="(.*)" />
					<action type="Rewrite" url="https://tomssl.com/{R:1}" />
					<serverVariables>
						<set name="HTTP_X_UNPROXIED_URL" value="https://tomssl.com/{R:1}" /> 
						<set name="HTTP_X_ORIGINAL_ACCEPT_ENCODING" value="{HTTP_ACCEPT_ENCODING}" /> 
						<set name="HTTP_X_ORIGINAL_HOST" value="{HTTP_HOST}" />
						<set name="HTTP_ACCEPT_ENCODING" value="" />
					</serverVariables>
				</rule>
			</rules>
			<outboundRules>
				<rule name="ChangeReferencesToOriginalUrl" patternSyntax="ExactMatch" preCondition="CheckContentType">
   					<match filterByTags="None" pattern="https://tomssl.com" />
   					<action type="Rewrite" value="https://{HTTP_X_ORIGINAL_HOST}" />
  				</rule>
		        <rule name="WordRedactionFilter1" patternSyntax="ExactMatch" preCondition="CheckContentType">
   					<match filterByTags="None" pattern=" the " />
   					<action type="Rewrite" value=" &lt;span style='background-color:black; color:black; cursor:help' title='REDACTED'&gt;XXXXX&lt;/span&gt;&#160;" />
  				</rule>
				<rule name="WordRedactionFilter2" patternSyntax="ExactMatch" preCondition="CheckContentType">
   					<match filterByTags="None" pattern=" a " />
   					<action type="Rewrite" value=" &lt;span style='background-color:black; color:black; cursor:help' title='REDACTED'&gt;XXXXX&lt;/span&gt;&#160;" />
  				</rule>
				<rule name="WordSubstitutionFilter1" patternSyntax="ExactMatch" preCondition="CheckContentType">
   					<match filterByTags="None" pattern="ing " />
   					<action type="Rewrite" value="in' " />
  				</rule>
				<rule name="WordSubstitutionFilter2" patternSyntax="ExactMatch" preCondition="CheckContentType">
   					<match filterByTags="None" pattern=" 2015" />
   					<action type="Rewrite" value=" 2014" />
  				</rule>
				<rule name="ImageSubstitutionFilter1" patternSyntax="ExactMatch" preCondition="CheckContentType">
   					<match filterByTags="None" pattern="//www.gravatar.com/avatar/b32f804a7aaf295a3517e63e563c1a83" />
   					<action type="Rewrite" value="/content/images/2015/06/einstein_250.jpg" />
  				</rule>
				<rule name="AppendHeader" patternSyntax="ExactMatch" preCondition="CheckContentType">
   					<match filterByTags="None" pattern="&lt;/body&gt;" />
   					<action type="Rewrite" value="&lt;div style='font-family:&#34;Open Sans&#34;,san-serif;font-size:1.5rem;text-align:center;padding:2px;background-color:#FF0000;color:#FFFFFF;z-index:99;position:fixed;top:0px;width:100%;border-bottom:1px solid grey;'&gt;This page has been altered by a free Microsoft Azure proxy. Details &lt;a href='https://tomssl.com/'&gt;here&lt;/a&gt;. See the original page &lt;a href='{HTTP_X_UNPROXIED_URL}'&gt;here&lt;/a&gt;&lt;/div&gt;&lt;/body&gt;" />
  				</rule>
				<preConditions>
					<preCondition name="CheckContentType">
						<add input="{RESPONSE_CONTENT_TYPE}" pattern="^(text/html|text/plain|text/xml|application/rss\+xml)" />
					</preCondition>
				</preConditions>
			</outboundRules>
		</rewrite>
	</system.webServer>
</configuration>

Save the above in a new file called web.config and upload it to the /site/wwwroot folder of your Azure website.

And that's it, you can browse to your site and see the fruits of your labours, like this:

https://tomssl-proxy.azurewebsites.net/

TomSSL Proxied

Conclusion

IIS URL rewriting and Application Request Routing (ARR) are very powerful and can enable you to create a sophisticated reverse proxy with only a few lines of configuration code. In the past the barrier to entry was the requirement to have some kind of server running IIS. Now we can achieve the same thing using a free Azure web app. To demonstrate this, I've created https://tomssl-proxy.azurewebsites.net/ which is a reverse proxy of this website with a few (slightly) humorous changes. It's completely free; it doesn't cost me anything whatsoever (no MSDN dev credits; literally nothing).

I think it's pretty amazing. Why not sign up for a free trial of Azure and give it a go?