Large Web App Development

Tips and tricks for larger web apps.

Here is my (never up to date) list of web application tips.

Use bundler

Using individual script tags quickly breaks down. Instead each source file should declare what it needs. If our module "app.js" needs jQuery and modules "foo" and "bar", we do not want to list the script tags in certain "magic" order.

1
2
3
4
5
<!-- this does not scale -->
<script src="jquery.js"></script>
<script src="foo.js"></script>
<script src="bar.js"></script>
<script src="app.js"></script>

Instead we want to list dependencies in each source file. In this case we just want "app.js".

1
2
3
4
// app.js
const $ = require('jquery')
const foo = require('foo')
const bar = require('bar')

In order to create the bundle to be included in the web page, use a "bundler". Common ones are Browserify and WebPack. I used both, and they are great. A good intro to WebPack is Module Bundling and Webpack in Simple Terms - highly recommended reading. Both bundlers can even separate libraries from application code. Because library code rarely changes, the bundle stays the same, and the user browser caches it. You can find how to do this in my Using WebPack blog post.

Use reactive stream library

Emitting events is the most flexible way for different parts in the application to communicate. One module can just subscribe to another module's events and do something whenever an event happens. There are two parts how to take this to the next level.

  1. Use central pub-sub hub with hierarchical messages.

Instead of each module discovering a reference to every module it wants to subscribe, have a single central object, whose purpose is only to dispatch events. For example, using pubsub-js any module can subscribe to a topic

1
2
const PubSub = require('pubsub-js')
PubSub.subscribe('topic', (msg, data) => {...})

Since a module no longer needs to get a reference to another module before subscribing this greatly simplifies the start up and decouples modules from each other.

Look for pub-sub library that has hierarchical topics. For example, you should be able to subscribe to all messages in the "user" channel

1
2
3
4
5
6
const PubSub = require('pubsub-js')
PubSub.subscribe('user', callback)
// somewhere else
PubSub.publish('user.login', {name: 'joe'})
PubSub.publish('user.logout', {name: 'joe'})
// callback function has been called twice

Or we could subscribe to "user.login" messages

1
2
3
4
5
6
const PubSub = require('pubsub-js')
PubSub.subscribe('user.login', callback)
// somewhere else
PubSub.publish('user.login', {name: 'joe'})
PubSub.publish('user.logout', {name: 'joe'})
// callback function has been called once

Ability to "hear" all messages or a part of them leads to much cleaner callback implementation, since you no longer have to filter the messages yourself.

See pubsub-js addressing documentation for more examples.

Tip: you can use PubSub to drive the application from end-to-end tests, read Control The Application Through PubSub From Cypress.

  1. Build more complex streams from "primitive" events

Take the events in the previous example, and assume we have the following situation. A "user.login" event happens 100ms after start, and then again 1000ms after start. Maybe something went wrong, and the user could not login the first time. Then a second later, user logs out. In code this would look like this

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
const PubSub = require('pubsub-js')
setTimeout(() => {
PubSub.publish('user.login')
}, 100)
setTimeout(() => {
PubSub.publish('user.login')
}, 1000)
setTimeout(() => {
PubSub.publish('user.logout')
}, 2000)
PubSub.subscribe('user', message => console.log(message))
/*
user.login
user.login
user.logout
*/

Our business logic requires us to keep track of user sessions, which is a "login" followed by "logout" event. How do we know when a session "event" happens? We could keep track of the state in our code, but that will quickly generate lots of extra code. If only there was a "standard" way to work with event streams. Luckily, just like there are array iterator methods "map", "filter", etc, there are reactive stream libraries that deal with events. Common libraries are RxJS, Bacon.js, Kefir, most.js and many others. First, we need to connect to pub-sub and make individual streams from events. For this example I will use "kefir" library. We will create separate streams for "login" and "logout" events.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
const PubSub = require('pubsub-js')
const kefir = require('kefir')
setTimeout(() => {
PubSub.publish('user.login')
}, 100)
setTimeout(() => {
PubSub.publish('user.login')
}, 1000)
setTimeout(() => {
PubSub.publish('user.logout')
}, 2000)
const login$ = kefir.stream(emitter => {
PubSub.subscribe('user.login', msg => emitter.emit(msg))
})
const logout$ = kefir.stream(emitter => {
PubSub.subscribe('user.logout', msg => emitter.emit(msg))
})
login$.log()
logout$.log()
/*
[stream] <value> user.login
[stream] <value> user.login
[stream] <value> user.logout
*/

By convention I end variables referencing a stream with $ character. It is like plural "s" at the end of English noun. A "login" is an event, multiple "logins" are flowing through "login$" stream.

To create a session event we want to combine two streams, and we need latest "login" event followed by a "logout" event. Luckily the stream libraries have operators for this common use case, just like JavaScript Array has methods for common cases "map", "filter", "some", etc.

For example, we could buffer login events, until a logout event happens. The the buffered login events would be passed as an array, but we could map the array to a string.

1
2
3
login$.bufferBy(logout$)
.map(() => 'session ended')
.log('session')

The stream operators usually are shown using marble diagrams. In the above case the events are looking like this (I call "login" event "L" and "logout" event "O"). The first column shows the stream variables, and the arrows are showing time flowing from left to right.

1
2
3
4
login$:    --L----L--------------------->
logout$: ------------O---------------->
.bufferBy ------------[L, L]----------->
.map ------------"session ended"-->

Events in the first stream are buffered until an event in the second stream occurs. Then the array is becomes a new event, which is mapped by the .map operator into a "session ended" event, which is then logged using .log() operator. Notice that each operator returns new stream, even .log(). We can confirm this by attaching another .log() operator with a different string label.

1
2
3
4
5
6
7
8
login$.bufferBy(logout$)
.map(() => 'session ended')
.log('session')
.log('session2')
/*
session <value> session ended
session2 <value> session ended
*/

Using off the shelf reactive library gives you power to work with streams of asynchronous events. The hard part is finding the right operator! RxJs has an excellent page with all the operators together grouped by their main purpose, see operators.md. It also gives an example of making your own operator, which is a viable solution if none of the above operators suite your needs.

Caution you need to think (and test) the order of events in the above scenarios. What happens if "logout" event is before "login" event for some reason? What happens if there are two "logout" events?

Use debug logging library

Something always goes wrong. That is why a good logging library is a good idea from the start. I prefer using debug. It has two main features

  • cross platform (Node and browser)

  • has hierarchical logging

    1
    2
    3
    4
    const logLogin = require('debug')('user:login')
    logLogin('user logged in')
    const logLogout = require('debug')('user:logout')
    logLogout('user logged out')

    If you want to only see "user:login" messages

    1
    2
    $ DEBUG=user:login npm start
    user logged in

    If you want to only see all "user:" messages

    1
    2
    3
    $ DEBUG=user:* npm start
    user logged in
    user logged out

There are lots of options for debug, including formatting and colors. You can safely leave the debug statements in your code. If you do not enable the environment variable (command line) or localStorage.debug (in the browser), the log statements will be do nothing.

Local development using folders

If you split your large project into smaller NPM modules (as I strongly believe you should), the development might slow down due to release / upgrade cycle. While you can have everything automated, you might still want to have local dependencies during development.

Luckily, NPM allows you to:

  • install a local folder as a single command. For example npm install ../foo will install whatever is in the folder ../foo, it only assumes there is ../foo/package.json that is a valid NPM module.
  • describe that you are installing a local folder in the package.json using file:<path> syntax, see package.json#local-paths documentation.
  • use a local folder as a source for a module and bring it with you. You can even commit the local folder into your source repo to skip the install. A good way to do this is by using shrinkpack tool.

I am a little wary of the above approaches and would prefer using private NPM modules instead as a general solution.

Misc advice

  • Install crash reporting service early on, see blog posts
  • Use Cypress.io to test your website, see blog posts
  • Use immutable deploys with end to end testing before switching the domain name alias. An example tool based on Zeit.co Now is now-pipeline.

Related posts