/ programming

Javascript, nginx and prerender.io - help for search crawlers to render single page applications

Some time ago I've wrote a note about single page application vs server side rendering. I've mentioned about nginx and prerender.io. I've used prerender.io for rendering pages for search crawlers because search crawlers doesn't render correctly pages which are written in javascript, like single page applications. I will just post my configuration here. I'm not using it anymore but maybe it will help someone.

Nginx server config.

server {
    listen  80;
    server_name lastlog.it www.lastlog.it;
    client_max_body_size 20M;
    access_log /var/log/nginx/lastlog_access.log;
    error_log /var/log/nginx/lastlog_error.log;
    error_page 404 = @404;

    location @404 {
        rewrite  .*  / permanent;
    }

    root /projects/lastlog.it/project/project/assets/dist;

    location / {
        include prerender.conf;
        try_files $uri $uri/ /index.html;
    }

    location ~ "^/([a-zA-Z0-9]+)$" {
            if ($http_user_agent ~* "facebookexternalhit|LinkedInBot|(Google \(\+https\:\/\/developers\.google\.com\/\+\/web\snippet\/\))|Twitterbot|Pinterest") {
                    rewrite (.+) /api/content/item$1?social=1;
            }
            try_files $uri $uri/ /index.html;
    }

    location /api/ {
        uwsgi_pass  unix:///tmp/lastlog.sock;
        include     uwsgi_params;
    }
}

As you can see all static files are in /projects/lastlog.it/project/project/assets/dist.

I've created a special parameter for my API, social=1. Prerender.io is needed for web crawlers but we don't need this for pages like facebook, linkedin, etc. The fact is that when someone "likes" our page or tries to share on those portals some API from for example facebook tries to render our page and gets some data like title or read open graph tags. It doesn't need whole page. Above configuration checks if a request is from facebook, twitter, linkedin and google plus and if yes, it calls API with parameter social=1. API without social=1 parameter returns only
JSON data but with social=1 it returns simple html:

<!DOCTYPE html>
<html>
<head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb#">
        <title>{{ title }}</title>
        <meta property="og:title" content="{{ title }}" />
        <meta property="og:description" content="lastlog.it - Watch last logs from IT world!" />
        <meta property="og:url" content="http://lastlog.it/{{ id }}" />
        <meta property="og:image" content="{{ image }}" />
</head>
<body>
        {{ title }}
</body>
</html>

That's all. Nothing more is needed. Let's back to prerender.io and nginx.

/etc/nginx/prerender.conf (it is included in nginx server config)

set $needPrerender "";

if ($request_uri ~ '_escaped_fragment_') {
  set $needPrerender "Y";
}

if ($http_user_agent ~* (googlebot|google.com|bingbot|bing.com|yandexbot|yandex.com|yahooseeker|yahoo.com|slurp|feedfetcher|blekkobot|crawler) ) {
  set $needPrerender "Y";
}

#if ($http_accept ~* 'html') {
#  set $needPrerender "${needPrerender}ES";
#}

if ($needPrerender = "Y") {
  rewrite .* /$scheme://$http_host$request_uri? break;
  proxy_pass http://localhost:3000;
}

As you can see prerender.conf contains proxy to http://localhost:3000. It is a proxy to prerender.io. I used supervisord for this. Here is a config - /etc/supervisor/conf.d/prerender.conf

[program:prerender-lastlog]
environment=PYTHONPATH="/projects/lastlog.it-prerender/"
command = /projects/lastlog.it-prerender/bin/node /projects/lastlog.it-prerender/prerender/lastlog.js
directory = /projects/lastlog.it-prerender/prerender/
autostart = true
autorestart = true
stopasgroup = true
process_name = %(program_name)s_%(process_num)02d
numprocs = 1
user = eshlox

What can i tell about this. It is a simple configuration file for supervisord.

Below is my config for prerender.io - lastlog.js (it is used in supervisord configuration)

#!/usr/bin/env node
var prerender = require('./lib')

var server = prerender({
    workers: 2,
    iterations: 4,
    phantomBasePort: process.env.PHANTOM_CLUSTER_BASE_PORT,
    messageTimeout: process.env.PHANTOM_CLUSTER_MESSAGE_TIMEOUT
});

server.use(prerender.blacklist());
server.use(prerender.removeScriptTags());
server.use(prerender.httpHeaders());
server.use(prerender.inMemoryHtmlCache());

server.start();

That's all. Just modify configuration files to your project. Reload nginx, run supervisord. Read documentation for prerender.io because it has more options and.. very like RAM memory ;-)