Your business plan needs hosting costs, start by modeling your website visits per hour
In the last couple of posts I explored how virtual worlds generate revenue. To understand the moderation and hosting costs we first need to build a model that predicts the number of visitors per hour.
While I’ve focused on virtual worlds, this model can be applied to any web property; you need enough server hardware to cope with your peak visitors. Using the spreadsheet developed in this post you’ll be able to estimate your peak visits per hour, page views per second, and consequentially your hosting costs for your business plan.
By the end of the post we’ll end up with a model that looks something like this:

Normal Distribution
We’ll keep the model simple and assume that the number of visitors to the world is normally distributed over a 24 hour period. For a single geographic region the result is a chart that looks something like the one below.

Notice how the traffic peaks at 12 o’clock. Of course, its unlikely your world will be busiest in the middle of the day, but the assumption helps simplify the math. Don’t worry, it doesn’t effect the outcome; think of the chart as a snapshot of a 24 hour period, not a real day.
This chart tells us the probability that a player will visit at 4AM, or 1PM, or any time during the day. For example, if we wanted to know the likelihood that a player might visit between 4AM and 8AM we would calculate the area under the graph, highlighted in blue, in the chart below.

We’re going to use excels NORMDIST function to find the area under the graph.
If you’re interested in the math that describes the normal distribution check out this wikipedia article. But don’t worry about understanding it all, we just need to know how to use it!
NORMDIST takes two configuration options: the mean, and the standard deviation.
To keep things simple we want a symmetrical bell curve, we do that by setting set the mean to 12 o’clock midday. Another way to think about the mean is as an average, for example, we might say that, on average, our players visit at 12 o’clock midday.
The standard deviation of the curve is set to 4 hours; think of it as a measure of the width of the bell, and therefore the spread of visitors throughout the day. A small standard deviation means that our players are even more likely to visit around 12 midday. While a large standard deviation spreads the visits out over the day.
So, to find the probability of a player visiting between 4AM and 8AM we use the following equation:
NORMDIST(8AM, MEAN, STANDARD DEVIATION, CUMULATIVE) – NORMDIST(4AM, MEAN, STANDARD DEVIATION, CUMULATIVE)
or with actual values:
NORMDIST(8, 12, 4, 1) – NORMDIST(4, 12, 4, 1) = 0.14
We find that the probabilty is 0.14, or 1 in 7 players visit between 4AM and 8AM.
Standard Deviation
Lets look at the 8 hour stretch from 8AM through to 4PM highlighted in the chart below.

We can calculate the probability using the same process:
NORMDIST(16, 12, 4, 1) – NORMDIST(8, 12, 4, 1) = 0.682
We find that the probability of a player visiting is roughly 0.68, or another way to think about it is that 68% of our players will visit between 8AM and 4PM.
This 8 hour stretch is known as ‘one standard deviation from the mean’; the mean is 12 midday, the standard deviation is 4 hours, so 8AM is 4 hours before the mean, and 4PM is 4 hours after the mean.
No matter what the standard deviation, 68% of all occurrences will always fall within one standard deviation of the mean.
We use this to help us estimate a reasonable standard deviation. For example, if your world is targeting tweens then its reasonable to assume most of your players will be in bed by 9PM, and can only start playing after finishing school at 4PM.
Following these assumptions we would set our standard deviation to 2.5 hours, so that 68% of our visitors fall within the 5 hour window from 4PM to 9PM.
Adjusting For 24 Hours
Up to this point I’ve been telling you that the probabilities described by the NORMDIST function are the probabilities that a player will visit at any given time. This isn’t quite correct.
If you look closely at the chart you’ll see that the line never touches 0. In fact the line will never ever touch zero; a normal distribution is an infinite continuum of probabilities, it just goes on and on and on.
This a problem for us. While some days might feel like they drag on and on and on, every day is in fact only 24 hours long. No longer, no shorter. Yet, our normal distribution extends beyond 24 hours!
So when we said that 68% of our visitors come between 8AM and 4PM we were not adjusting for the fact that a day is only 24 hours.
We make the adjustment by finding the area under the graph for the whole day, from 0 to 24 hours, and then finding what percentage of the total area that 0.68 represents.
To find the area under the graph for the whole day we first find the probability that a player will visit after midday, and then double the value to include the probability they’ll visit in the morning (thats why symmetry around 12 midday makes it easier!).
The formula looks something like this:
(NORMDIST(24, 12, 4, 1) – NORMDIST(12, 12, 4, 1)) × 2 = 0.997
The we can take the 0.682 we calculated previously and find out what percentage of the area under the graph that represents with the equation:
0.682 / 0.997 = 0.685 = 68.5%
So we’ve calculated that 68.5% of our visitors will arrive between 8AM and 4PM.
In this case the difference is minor, a meager 0.3 percentage points. However, the adjustment does become more important when the standard deviation is larger.
Visits Per Hour
Now we have a way to find the probability a player will visit calculating visits per hour is easy.
All we need to do is use the process above to find the probability for any given hour then multiply by our total daily visitors.
For example, if we have 1000 visitors, we can find our how many visit between 10AM and 11AM using the formula:
Probability × 1000 = Players
Probability = (NORMDIST(11, 12, 4, 1) – NORMDIST(10, 12, 4, 1)) / 0.997 = 0.093
0.093 × 1000 = 93 Players
Where 0.997 is the total area under the normal distribution for a single 24 hour period.
Multiple Time Zones
Imagine your UK players leave school and start logging into your world at around 4PM GMT, 7 hours later their American West coast cousins are doing the same. We can’t model these two geographies with a single distribution curve, instead we need to model each time zone separately.
If you have not already, nows a good time to check out the accompanying Google spreadsheet.
I’ve used the formulas we created to build a table showing visits per hour customized by time zone, standard deviation, and daily visitors per geographic region. The result is our international visits per hour chart:

The red line shows the cumulative visits per hour across the three major geographies of UK/Europe, East Coast, and West Coast. Notice how three different normal distributions lead to our ‘humpy’ red line. Although the numbers I’ve entered are not real, we do see broadly similar traffic charts for some of the worlds and site we run. Your results we be different depending on your audience, demographics, and geographies.
To use this table for your business model all you need to do is enter your major visitor geographies, how many daily visitors you expect per geography, and your assumed standard deviation.
Concurrent Players
To find the number of concurrent players we divide the total visits per hour by our average session length.
For example, lets assume that our average visitor spends 8 minutes in our world, with 100 visits per hour we can calculate the number of concurrent players:
100 concurrent players / (60 minutes in an hour / 8 minute session length) = 13 players
Next Steps
I hope this post will be useful for anyone who is building a business model for a web based property and wants to estimate their hosting costs.
Your hosting costs are usually determined by your peak traffic. Using this spreadsheet you can find your peak visits per hour, from here its simple to estimate page views per second, and therefore hardware requirements.
In the next post I’ll use the formulas to calculate the moderation costs for a virtual world.
Like the other posts in this series, there are plenty of places for typos and mistakes. Do let me know where I’ve gone wrong! Or maybe I’ve just not explained some of the concepts very well, post a comment and let me know.
Matthew Warneford
Follow me on twitter here
-
http://sorebuttcheeks.blogspot.com/ steroids