Time to First Byte, Why It is Important and How You Can Improve Your Time

Have you ever sat in a retail store checkout line for a long time? Great, then you understand the pain of waiting on something that should not keep you from doing what you want. On the web this time is amplified so much that by 3 seconds most visitors to your site have started to leave. Psychological research shows you really have 1 second or 1000ms seconds to start rendering before the human mind starts perceiving latency and building anxiety. Because study after study shows end users want fast web sites Google, Bing and search engines measure page load efficiency as a search engine ranking signal.

Bad Time to First Byte

Search engine marketers have correlated the time to first byte directly to search engine rankings. The Time to First Byte (TTFB) is the time your browser spends waiting on the web server to send back the data. The reason why this is used as a ranking factor is its ease to measure. TO be honest this is a very key metric because the browser cannot begin rendering content until it has the initial markup. The page's markup contains references to external resources like scripts, CSS, images, fonts, etc. A slow response from the server can be killer. Yet 90% or more of web performance optimization issues relate to front-end architecture and code quality. But that 5-10% related to the back-end can be the most deadly.

Fortunately there are things developers across all platforms can implement to improve time to first byte. I am going to break the initial request and response into different phases and then focus on web server optimization. The first phase is DNS resolution, where the client machine needs to interrogate name servers to find the web server's IP address. Once the server is located a connection must be made, then the request made and that may involve SSL negotiation. At this point the server does what it needs to create the response. The response is then sent back across the Internet and received by the browser, which then begins the rendering process.

Examining TTFB Using WebPageTest.org

There are four main areas that affect time to first byte, DNS resolution, SSL negotiation, server rendering and network latencies. For the sake of this article I am going to ignore DNS, SSL and network latencies. I will cover those in future posts. But lets look at a WebPageTest.org waterfall to see timings for each of these phases.

Example Bad Time to First Byte

In the image you can see the different phases by their corresponding color. The dark aqua green is DNS resolution. The purple is SSL negotiation. The orange is creating a new connection to the server. The green is the time the server takes to render the markup and the blue the time it takes to come across the network to the client machine. In this example there is a very large time to first byte. For the record you should try to target 300ms or less if you want to achieve a sub 1 second render. My personal goal is always under 1000ms over broadband and 3000ms over GPRS. Below is a typical response for this Blog:

Love2Dev.com Home Time to First Byte

A critical part of identifying web performance issues is using a tool or set of tools. WebPageTest is my favorite free synthetic testing tool. Synthetic tools are like robots that execute your site like a real user. They are synthetic because they are not being loaded by real users. For real user measurements or RUM you need a different set of tools. Both approaches should be used, but I often focus on synthetic testing in the development life cycle and I highly recommend WebPageTest. Synthetic tests can flesh out common TTFB issues.

How the Server Composes a Response

A perfect server responds instantly, leaving only network latency to slow the response. The reality is the server has many processes it must run to produce the markup before sending it down the wire. You can think of these processes as necessary friction. Unless the target page is static HTML, which is rare these days, you have at least two primary processes, data retrieval and rendering pipeline.

To understand what the server does I want to examine two scenarios, static content and dynamic. Static content is simply returning a file without any rendering, just as it is. Dynamic involves a composition process, known as rendering. Typically this means examining the request route, querystring, headers, possibly cookies, etc and building the response. Usually the server process will also need to access a data store either directly or through web services. This is why dynamic content often takes longer to return than static. If you add on compression, which affects both static and dynamic content you have yet another step the server needs to make before anything is served.

Dynamic server stacks include ASP.NET, Java Server Pages, Ruby, Node.js, PHP, etc. These stacks are usually hosted on top of web servers like IIS, Apache, NGINX, Websphere, etc. As an ASP.NET developer I know there are over 20 separate events in the ASP.NET pipeline involved in rendering before you even consider the razor templates used to define markup to render against a data model. For almost all rendering engines not only are you rendering against retrieved data the engine must also compose the markup of multiple files, often called includes.

ASP.NET Page Life Cycle

Anytime the server processes need a file they must perform disk I/O to retrieve the content before they compose the response. Disk I/O is expensive, even against modern SSD drives. A good example of how this affect can be amplified review Instagram's scalability case study and see how not having a favicon caused their servers to crash from disk thrashing.

A proper web server will attempt to cache as much as it can in memory and serve the response from cache instead of hitting the disk and performing the rendering process. ASP.NET utilizes a feature called Output Caching to enable granular cache control. It works by caching the final rendered product in memory. Developers can control how Output cache works by correlating cached objects by header values, querystring parameters, language, etc. This is helpful because a page can be composed by many different parameters, so it offers the most flexible scenario. When the web server can avoid hitting the disk and the raw composition process it can serve the response much faster.

When the server is composing a response and needs to wait on a slow database query (as an example) the response becomes delayed. Instead you should try to flush any content that does not depend on the database query. Again most platforms offer some sort of response flushing mechanism. For example PHP offers the ob_flush function. This feature seemed to be missing from ASP.NET MVC, so my friend Nik Molnar created Courtesy Flush to enable this functionality. When flushing content you are sending early bytes, while keeping the connection and process live for additional content. The initial content should include CSS and any scripts to start building the page. There is an art to crafting a good early content flush, so you will most likely need to experiment to get this optimized.

Of course you can always look to reduce your data and service access times. This is one of the reasons why document databases like Elastic Search, MongoDB and Azure Table Storage are becoming popular. They allow applications to reduce database overhead by eliminating queries. Instead they take advantage of denormalized data where known query results are stored ahead of time in a quick retrieval state. Instead of composing a complicated SQL statement a document database relies on you asking for a record or set of records by a simple index. This solution provides a very quick response time and eliminates the traditional relational database from the web serving pipeline.


If your server does not have 100ms time to first byte response times you should review your server side processing architecture. Make sure your web server is optimized for speed. Make sure you are leveraging a caching mechanism like ASP.NET output cache to eliminate the response rendering pipeline and disk I/O as much as possible. Flush responses early when the full response is dependent on a slower process and consider moving your data from a slower database to a denormalized document database solution. These steps will help you get your initial markup to the client i record time, increasing your customer satisfaction and hopefully improving your bottom line.

Share This Article With Your Friends!

Googles Ads Facebook Pixel Bing Pixel LinkedIn Pixel