Related to the previous article step 1 of a resilient WordPress setup is to mirror the web page somehow. A 2nd WordPress with file level synchronization of the WordPress directory and MySQL multi-master replication sounds great…not.
WordPress keeps files in its directory tree (YAPB pictures as examples, and plugins) and while rsync could handle this, it gets messy quickly. Multi-master MySQL is possible. Overkill for my purpose though.
The easier and more universal way would be to simply grab the web page and keep a static content available. While it’s missing the ability to log in and edit/write articles, that’s fine as most readers will simply read.
Naturally this became a Docker container. It’s hosted on Docker Hub under hkubota/webmirror.
The Dockerfile is simple:
FROM debian:8 MAINTAINER Harald Kubota <email@example.com> RUN apt-get update ; apt-get -y install lighttpd wget curl openssh-client ; apt-get clean # httrack in usr/local/ COPY usr/ /usr/ RUN ldconfig -v # The script to run COPY mirror.sh /root/ # The lighttpd configuration COPY lighttpd.conf /root/ ENTRYPOINT ["dumb-init", "/root/mirror.sh"] # It's a web server, so expose port 80 EXPOSE 80 WORKDIR /root
It’s using mainly httrack which I compiled from sources, and lighttpd as a web server since I need to export the web pages via a web server again. wget, curl and openssh-client are more for completeness as I was testing with httrack and wget and ssh’ing out.
I tested it on several other web pages (www.heise.de, www.theregister.co.uk and some other ones) and it works quite well. Note that defaults are 2 recursive levels, which allow for anything to be clickable on the first page. Also after 5min the copying stops as I had endless loops happening sometimes. If your network bandwidth is very fast or slow, you might have to adjust this.
To run the hkubota/webmirror Docker image, do:
docker run -e web_source="http://www.heise.de/" -e recursive=2 -e refresh=24 max_time=300 -e other_flags="-v" -p 80:80 -d hkubota/webmirror
If you want to watch what happens (mainly to see httrack output), replace the “-d” by “-it” or watch the logs via “docker logs CONTAINER”.
Next is the actual load balancer…