Create your own free reverse proxy with Azure Web Apps
Summary
This article explains how to use Azure Web Apps (the new name for Azure Websites) to create a free reverse proxy such that all requests to tomssl-proxy.azurewebsites.net
actually serve content from tomssl.com
, without this being apparent to the end user. We will also force the connection to be made securely over SSL (using the azurewebsites.net
SSL certificate, not the certificate from tomssl.com
, which means we can do this even if the existing live website doesn't support SSL). Then, just for fun, we will edit some of the content that's returned and finally we'll add a red warning banner, which will give the game away (but you can be confident that somebody doing this maliciously wouldn't do that).
NOTE: It's possible to do all of these things using IIS URL rewriting and Application Request Routing (ARR) in a standard installation of Internet Information Services (IIS) and indeed that's what Azure Web Apps uses under the hood. However, you'd need a server for that, whereas not only do we not need a server, we don't even need to spend any money as we can use a free Microsoft Azure Web App. You can sign up for a free trial and try it yourself.
Let's see exactly what we're going to do:
- Make all requests to
tomssl-proxy.azurewebsites.net
secretly retrieve the content fromtomssl.com
, without us knowing (the address bar won't change).- Rewrite all the references (including hyperlinks) to
tomssl.com
so they actually point totomssl-proxy.azurewebsites.net
, thus aiding the deception we're practising on ourselves.- Blank out some of the text to make it look like some government agency has been censoring our internet connection.
- Make words that end in ing end in in'.
- Alter some dates (changing 2015 to 2014).
- Replace the pictues of me with somebody else.
- Add a floating header to the page explaining what's going on and featuring links to this blog post and to the original page, effectively undermining the subterfuge from (1) and (2).
I expect you'll agree that if we can do this then surely hilarity will ensue. But on a serious note, it's fairly easy to see how this could be useful in a more formal context.
- You could access an internal intranet site over the internet without exposing the webserver to the internet (e.g. by using Azure Web Apps hybrid connection).
- You could improve the security of an insecure live website by forcing communication over HTTPS without having access to the source code of the original website (or anything other than the live URL).
- Say you have separate websites for your blog (
blog.onlineblogservice.com
), your business (mydomain.example.com
) and your store (mydomain.onlinestoreprovider.com
). Using this method you could bring them all together undermydomain.com/blog
,mydomain.com
andmydomain.com/store
. - You could append a new copyright notice to certain content.
- You could add a cookie-acceptance header warning to a page which uses cookies but which predates such rules.
- You could add javascript for tracking or analytics.
- You could attempt to do all sorts of malicious things which we won't discuss here.
A brief aside - a real example
As previously mentioned, if we weren't using Azure websites but were instead using a server installation of IIS then we could also do this, as in the case of a recent client of mine. Their infrastructure looked like this:
Web Server - exposed to the internet, connected internally to...
↓
Application Server - not exposed to the internet, connected internally to...
↓
Database server - not exposed to the internet or to the web server
This is a fairly common setup where the web server can see the application server, but cannot communicate directly with the database server. The problem they had was that they wanted to install a simple two-tier web application requiring database access and for it to be accessible via the internet.
- If they installed the application on the web server then it couldn't see the database.
- If they installed the application on the application server then, whilst it could see the database, it couldn't connect to the outside world.
The solution was essentially the same as what is described here, but the configuration was done on their web server instead of in Azure; we installed a simple reverse proxy on the externally visible web server which redirected all requests to the application server and thus security wasn't compromised by exposing the database server to the web server and neither the application server nor the database server had to be connected to the outside world. It took a few minutes to configure and worked well.
Back to our example
In our example we want to go to https://tomssl-proxy.azurewebsites.net/ (don't click it yet) and for it to retrieve content from GHOST_URL/ without us noticing. We also want to edit the response which is sent back to the browser, changing all references to GHOST_URL/ therein. We'll make a few other changes to the visible content, changing some dates and images, blanking out some words and dropping the terminal g from others. Finally we'll add a banner to the top of the page explaining what's been done.
If we had access to the source code then this would be fairly straightforward, although somewhat onerous and error-prone. For the purposes of this experiment we won't have access to anything except the URL of the live website.
Enter IIS URL rewriting and Application Request Routing (ARR). The bit we're interested in enables you to intercept requests and send them somewhere else and also to edit the response which is sent back to the browser, in my case changing any links therein.
For a resource giving some useful examples of URL rewriting in IIS, see Ruslan Yakushev's blog post 10 URL Rewriting Tips and Tricks. Not only that, he's already written briefly about using an Azure Web Site as a reverse proxy.
To achieve our aim we are going to use an XDT transform (part of Azure Site Extensions) to tweak the applicationHost.config
file that is generated for us. Then we'll be able to access previously unavailable values in our web.config
file.
We're only going to upload two files to our free Azure Web App:
applicationHost.xdt
andweb.config
. If you've already got an Azure subscription you can get this up and running in a few minutes.
applicationHost.xdt
We're going to add and configure the <proxy>
node and also add some server variables which we will access in the web.config
file.
UPDATE: The allowedServerVariables used to be marked as Insert, but should be InsertIfMissing, just in case any other xdt files are also adding the same server variables, since duplicate values can cause the site to break. I have now corrected this (see below). Thanks to David Ebbo for pointing this out in the comments.
<?xml version="1.0"?>
<configuration xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform">
<system.webServer>
<proxy xdt:Transform="InsertIfMissing" enabled="true" preserveHostHeader="false" reverseRewriteHostInResponseHeaders="false" />
<rewrite>
<allowedServerVariables>
<add name="HTTP_X_ORIGINAL_HOST" xdt:Transform="InsertIfMissing" />
<add name="HTTP_X_UNPROXIED_URL" xdt:Transform="InsertIfMissing" />
<add name="HTTP_X_ORIGINAL_ACCEPT_ENCODING" xdt:Transform="InsertIfMissing" />
<add name="HTTP_ACCEPT_ENCODING" xdt:Transform="InsertIfMissing" />
</allowedServerVariables>
</rewrite>
</system.webServer>
</configuration>
Save the above in a new file called
applicationHost.xdt
and upload it to the/site
folder of your Azure website.
There are a lot of different ways you can do this. In this case I'd suggest it might be easiest using SFTP via WinSCP, which is what I did when writing this article.
web.config
There are two main parts of the web.config file which we're going to edit (inbound and outbound rules) and they are both under the configuration
→ system.webServer
→ rewrite
node.
Inbound Rules
<rules>
<rule name="ForceSSL" stopProcessing="true">
<match url="(.*)" />
<conditions>
<add input="{HTTPS}" pattern="^OFF$" ignoreCase="true" />
</conditions>
<action type="Redirect" url="https://{HTTP_HOST}/{R:1}" redirectType="Permanent" />
</rule>
<rule name="Proxy" stopProcessing="true">
<match url="(.*)" />
<action type="Rewrite" url="https://tomssl.com/{R:1}" />
<serverVariables>
<set name="HTTP_X_UNPROXIED_URL" value="http://tomssl.com/{R:1}" />
<set name="HTTP_X_ORIGINAL_ACCEPT_ENCODING" value="{HTTP_ACCEPT_ENCODING}" />
<set name="HTTP_X_ORIGINAL_HOST" value="{HTTP_HOST}" />
<set name="HTTP_ACCEPT_ENCODING" value="" />
</serverVariables>
</rule>
</rules>
The first rewrite rule is fairly self-explanatory.
ForceSSL - makes sure that you are coming in using SSL by permanently redirecting requests from http to https.
The second rule is slightly more complex.
Proxy - retrieves content from tomssl.com, grabs the original HTTP_ACCEPT_ENCODING
and HTTP_HOST
headers and stores them for later use and then blanks out the HTTP_ACCEPT_ENCODING
header. It also stores the original unproxied URL so that we can provide a link to the original page in our banner.
If you don't blank the
HTTP_ACCEPT_ENCODING
header on the way in, then the outbound rules won't work. It's not possible to rewrite content which has already been compressed.
In other words, if you remove this line,
<set name="HTTP_ACCEPT_ENCODING" value="" />
then add system.webServer
→ httpErrors
(just to show more detailed errors) like this,
<system.webServer>
<httpErrors errorMode="Detailed" />
...
you'll see this error:
Outbound Rules
If we only wanted to change outbound links and not general text in the body of the page, we'd use a regular expression to make sure we only rewrote the relevant parts, otherwise we'd be placing extra load on the server for no reason. In other words we'd check we were dealing with HTML and then we'd edit the filterByTags
attribute to select <a>
tags (and only those that didn't start with a single /
, since relative links are already okay). Check the documentation for more.
<rule name="ChangeReferencesToOriginalUrl" patternSyntax="ExactMatch" preCondition="CheckContentType">
<match filterByTags="None" pattern="https://tomssl.com" />
<action type="Rewrite" value="https://{HTTP_X_ORIGINAL_HOST}" />
</rule>
Change References To Original Url - this rule changes all references to GHOST_URL/ so that they refer to the original URL we used to visit the site (remember we saved it in the original inbound rule).
I have omitted the trailing
/
from the pattern here as the Home link on my original web page doesn't end with '/'. If you want your rewrite rules to work properly every time, do it this way.
Next we need to add the rules for redaction and g dropping.
Redacting
<rule name="WordRedactionFilter1" patternSyntax="ExactMatch" preCondition="CheckHTML">
<match filterByTags="None" pattern=" the " />
<action type="Rewrite" value=" <span style='background-color:black; color:black; cursor:help' title='REDACTED'>XXXXX</span> " />
</rule>
This rule replaces occurrences of " the " with XXXXX in black on a black background, adds a tooltip, changes the mouse pointer and adds a non-breaking space to the end, like this: XXXXX (try selecting the text with your cursor).
We have a similar rule for " a " too. I'm checking for the space in each case to make sure we don't accidentally blank out partial words or, worse still, alter parts of URLs and stop them from working.
Dropping the g
The rewrite rule for changing ing to in' is a bit more simple, as shown below:
<rule name="WordSubstitutionFilter1" patternSyntax="ExactMatch" preCondition="CheckHTML">
<match filterByTags="None" pattern="ing " />
<action type="Rewrite" value="in' " />
</rule>
The rule for changing 2015 to 2014 is very similar too.
Changing the images
You might be tempted to confine the image-substitution rule to <img>
tags by setting a value in filterByTags and in some cases that might be sufficient, but if you want to be sure that you capture all references to your images then you might be better off doing something like this:
<rule name="ImageSubstitutionFilter1" patternSyntax="ExactMatch" preCondition="CheckContentType">
<match filterByTags="None" pattern="//www.gravatar.com/avatar/b32f804a7aaf295a3517e63e563c1a83" />
<action type="Rewrite" value="/content/images/2015/06/einstein_250.jpg" />
</rule>
Adding the floating header
It's unlikely that you'd add a header quite like this, but you might need to add a cookie acceptance notice to a legacy site, or a notice of an impending event of some kind.
<rule name="AppendHeader" patternSyntax="ExactMatch" preCondition="CheckContentType">
<match filterByTags="None" pattern="</body>" />
<action type="Rewrite" value="<div style='font-family:"Open Sans",san-serif;font-size:1.5rem;text-align:center;padding:2px;background-color:#FF0000;color:#FFFFFF;z-index:99;position:fixed;top:0px;width:100%;border-bottom:1px solid grey;'>This page has been altered by a free Microsoft Azure proxy. Details <a href='https://tomssl.com/'>here</a>. See the original page <a href='{HTTP_X_UNPROXIED_URL}'>here</a></div></body>" />
</rule>
As you can see, I have html-encoded the markup for the banner and am adding it to the end of the page just before the closing </body>
tag.
Preconditions
For each of the outbound rules, we have specified a precondition which performs a check to make sure we don't alter the wrong types of data (e.g. we don't want to apply our filter to any image data, etc).
<preCondition name="CheckContentType">
<add input="{RESPONSE_CONTENT_TYPE}" pattern="^(text/html|text/plain|text/xml|application/rss\+xml)" />
</preCondition>
I am changing all text and also the RSS feed - remember, in the case of the redaction we are pretending that the data has been permanently removed (large red banner at the top of the screen notwithstanding), so I don't want to let you see the original just by consuming the RSS feed.
Don't be tempted to set
pattern="^(text/*|application/rss\+xml)"
as you risk altering your CSS files.
If you don't want to change all text remember that HTML is of type text/html
, things like robots.txt are text/plain
, sitemap data is text/xml
and rss data is application/rss+xml
, so you could always do something like pattern="^(text/html|application/rss\+xml)"
(we need to escape the +
with a \
) to change the HTML and the RSS feed, but not the robots.txt or the sitemap. For our example we have to change everything.
A note about gzip compression
I believe that by unlocking the httpCompression
element in applicationHost.xdt
, setting a few extra variables in the web.config
and copying the AcceptEncoding
on the way in, clearing it temporarily and then setting it again on the way out after the rewriting, it should be possible to get gzip compression to work on the rewritten content. However, this is only available if you are using a paid Azure tier; since we're using a free tier this won't work. Annoyingly, so far I have been unable to get that to work even when using a paid tier. With a bit of luck I'll get it working and update this article in due course.
Putting it all together
Now all that remains to be done is to combine the elements above into a web.config
file and upload it to our Azure Web App.
web.config
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<system.webServer>
<httpErrors errorMode="Detailed" />
<rewrite>
<rules>
<rule name="ForceSSL" stopProcessing="true">
<match url="(.*)" />
<conditions>
<add input="{HTTPS}" pattern="^OFF$" ignoreCase="true" />
</conditions>
<action type="Redirect" url="https://{HTTP_HOST}/{R:1}" redirectType="Permanent" />
</rule>
<rule name="Proxy" stopProcessing="true">
<match url="(.*)" />
<action type="Rewrite" url="https://tomssl.com/{R:1}" />
<serverVariables>
<set name="HTTP_X_UNPROXIED_URL" value="https://tomssl.com/{R:1}" />
<set name="HTTP_X_ORIGINAL_ACCEPT_ENCODING" value="{HTTP_ACCEPT_ENCODING}" />
<set name="HTTP_X_ORIGINAL_HOST" value="{HTTP_HOST}" />
<set name="HTTP_ACCEPT_ENCODING" value="" />
</serverVariables>
</rule>
</rules>
<outboundRules>
<rule name="ChangeReferencesToOriginalUrl" patternSyntax="ExactMatch" preCondition="CheckContentType">
<match filterByTags="None" pattern="https://tomssl.com" />
<action type="Rewrite" value="https://{HTTP_X_ORIGINAL_HOST}" />
</rule>
<rule name="WordRedactionFilter1" patternSyntax="ExactMatch" preCondition="CheckContentType">
<match filterByTags="None" pattern=" the " />
<action type="Rewrite" value=" <span style='background-color:black; color:black; cursor:help' title='REDACTED'>XXXXX</span> " />
</rule>
<rule name="WordRedactionFilter2" patternSyntax="ExactMatch" preCondition="CheckContentType">
<match filterByTags="None" pattern=" a " />
<action type="Rewrite" value=" <span style='background-color:black; color:black; cursor:help' title='REDACTED'>XXXXX</span> " />
</rule>
<rule name="WordSubstitutionFilter1" patternSyntax="ExactMatch" preCondition="CheckContentType">
<match filterByTags="None" pattern="ing " />
<action type="Rewrite" value="in' " />
</rule>
<rule name="WordSubstitutionFilter2" patternSyntax="ExactMatch" preCondition="CheckContentType">
<match filterByTags="None" pattern=" 2015" />
<action type="Rewrite" value=" 2014" />
</rule>
<rule name="ImageSubstitutionFilter1" patternSyntax="ExactMatch" preCondition="CheckContentType">
<match filterByTags="None" pattern="//www.gravatar.com/avatar/b32f804a7aaf295a3517e63e563c1a83" />
<action type="Rewrite" value="/content/images/2015/06/einstein_250.jpg" />
</rule>
<rule name="AppendHeader" patternSyntax="ExactMatch" preCondition="CheckContentType">
<match filterByTags="None" pattern="</body>" />
<action type="Rewrite" value="<div style='font-family:"Open Sans",san-serif;font-size:1.5rem;text-align:center;padding:2px;background-color:#FF0000;color:#FFFFFF;z-index:99;position:fixed;top:0px;width:100%;border-bottom:1px solid grey;'>This page has been altered by a free Microsoft Azure proxy. Details <a href='https://tomssl.com/'>here</a>. See the original page <a href='{HTTP_X_UNPROXIED_URL}'>here</a></div></body>" />
</rule>
<preConditions>
<preCondition name="CheckContentType">
<add input="{RESPONSE_CONTENT_TYPE}" pattern="^(text/html|text/plain|text/xml|application/rss\+xml)" />
</preCondition>
</preConditions>
</outboundRules>
</rewrite>
</system.webServer>
</configuration>
Save the above in a new file called
web.config
and upload it to the/site/wwwroot
folder of your Azure website.
And that's it, you can browse to your site and see the fruits of your labours, like this:
https://tomssl-proxy.azurewebsites.net/
Conclusion
IIS URL rewriting and Application Request Routing (ARR) are very powerful and can enable you to create a sophisticated reverse proxy with only a few lines of configuration code. In the past the barrier to entry was the requirement to have some kind of server running IIS. Now we can achieve the same thing using a free Azure web app. To demonstrate this, I've created https://tomssl-proxy.azurewebsites.net/ which is a reverse proxy of this website with a few (slightly) humorous changes. It's completely free; it doesn't cost me anything whatsoever (no MSDN dev credits; literally nothing).
I think it's pretty amazing. Why not sign up for a free trial of Azure and give it a go?