Improve ⌚Time to First Byte for Better Page Speed & SEO
Everyone wants a fast website. This means your page must render quickly. Ideally you reach time to first interaction within 1-3 seconds.
Psychological research shows you really have 1 second or 1000ms seconds to start rendering before the human mind starts perceiving latency and building anxiety. Because study after study shows end users want fast web sites Google, Bing and search engines measure page load efficiency as a search engine ranking signal.
Of course if the page's markup does not reach the browser fast the rest of the assets can't be loaded.
This time is called time to frist byte and is a measurement of how long it takes the first bits to reach the browser from the initial request to the server.
TTFB is a very key metric because the browser cannot begin rendering content until it has the initial markup. The page's markup contains references to external resources like scripts, CSS, images, fonts, etc. A slow response from the server can be killer.
In today's online world a key question you should ask is if this is an important performance metric to measure? Is it still relevant?
- What Is Time To First Byte?
- The Performance Golden Rule - Is it still Golden?
- Measuring Time to First Byte
- How the Server Composes a Response
- How to Improve Time to First Byte
- Let's Wrap This Up
What I will teach you today is what Time to First Byte (TTFB) is, why it is important, how you can measure TTFB.
Along the way I will review how it is relevant in today's world and how its important has changed.
I will also offer some guidance you can apply to your servers to help improve your time to first byte times.
What Is Time To First Byte?
Time to First Byte or TTFB is the amount of time spent waiting for the initial response. In technical terms it is the latency of a round trip to and from the server.
It is an important key performance indicator to track because it indicates how fast or slow your server is. If you have a bad TTFB then you need to give your server a tune up.
While network latencies affect time to first byte, often latency is introduced due to server inefficiencies. You want network latency to be the biggest part of your time to first byte because it indicates your server is optimized.
This is how 84 Lumber wasted millions on their 2017 Super Bowl ad campaign, well that and their ads were weird.
Their server was poorly optimized, even when it was not getting hammered in response to their ad. I tested a few days ahead of the game for my results. The average TTFB I found on their site was over 30 seconds. And worse, when the commercial aired the server crashed.
As much as you would like to think you control the network you really don't. Requests are routed through different Internet routers all over the place, like a pinball, between the client and the server.
Distance and client network speed play more into network latency than your data center. It really is the last mile, and trust me you have no control over that.
Any reliable web performance measuring tool will break out the different steps in the full time to first byte. These include the following:
- Time to Initate the Connection
- DNS Lookup
- SSL Negotiation
- Time to First Byte (time required by the server to start sending the response)
There are still many sites with poor server configurations and measuring and tracking TTFB is a key metric to make sure your servers are humming along with efficiency.
You may have everything optimized and a new deployment just messes everything up. If you have automated testing in place you will recognize the issue quickly and be able to respond.
Google suggests a 200ms or less time to first byte. The 200ms is the time for the first bytes of the response to hit the browser minus the network latency. This means your server rendering cycle.
The Performance Golden Rule - Is it still Golden?
What seems like decades ago Steve Souders defined the web performance golden rule as 80% of a web pages performance issues were due to client-side architecture and 20% server-side concerns. This was a key tenant of Steve's High Performance Web Sites, a book I consider one of the most important web development books ever.
Seriously, this book changed the way I look at web development and has probably had more influence over my carerr trajectory than any other reference.
The 5% is the role TTFB plays in the overall time it takes to render a typical page.
I will even go back to High Performance Websites and quote the following, prophetic statement:
there is more potential for improvement in focusing on the front-end. If we were able to cut back-end response times in half, the end user response time would decrease only 5-10% overall. If, instead, we reduce the front-end performance by half, we would reduce overall response times by 40-45%.
So even then was Steve saying Time to First Byte is not important?
Well no, but the opportunity to really improve your user experience and beat your competition really lies in improving your front-end, or what the browser compiles into the rendered page.
However, I still see the vast majority of developers fail to understand why this rule is true and still focus their efforts on server-side optimizations.
Don't get me wrong server-side performance is important, but it is a different environment than the client. It is much more controllable. This means it should be easier to realize performance gains by tuning your server than most front-ends today.
The impact just wont be a big.
With that being said, if the HTML and other page resources are slow to traverse the network your site will never be rendered.
The user will just leave.
If your page is not perceived as rendered within 3 seconds 50% of the visitors will leave. Google DoubleClick's research has shown 53% bounce from a page if they don't think it has rendered within 3 seconds.
A 3 second time to first byte means you can pretty much say goodbye to your traffic.
So how can you measure time to first byte and how can you improve your time?
Measuring Time to First Byte
The first step to any web performance optimization campaign is to measure, everything. The first place I start is by collecting a network waterfall.
** Note: You can do this running a localhost website, but you will only get real data testing against a real web server. **
This can be done using your browser's F12 developer tools. Each browser has a network tab, which a page is loaded records each network request and displays a chart, we call a waterfall.
By selecting the first request, this would be to your page's HTML, you can get more details of the request and response. You will want to select the timings tab.
As you can see the real time to first byte is a combination of steps.
- DNS Lookup
- Initial Connection
- SSL Negotiation
- Time to First Byte
For the full document to be loaded you should also include the Content Loaded Value. It is not until you complete the Content Loaded step the browser can parse the document and begin requesting additional resources.
If average time to first byte is 500ms or greater then you need server-side optimizations. Common bottlenecks include poorly optimized database queries and web servers. We recommend using a NoSQL database or front-end caching service like MongoDB, Rediis or one of the cloud hosted services.
There are many other places you can optimize. But you need to understand how the server composes the HTML it serves.
How the Server Composes a Response
A perfect server responds instantly, leaving only network latency to slow the response.
The reality is the server has many processes it must run to render the markup before sending it down the wire. This assumes your server needs to render the markup each time the resource is requested.
You can think of these processes as necessary friction. Unless the target page is static HTML, which is more common today, you have at least two primary processes, data retrieval and rendering pipeline.
To understand what the server does I want to examine two scenarios, static content and dynamic.
Static content is simply returning a file without any rendering, just as it is. This typically means you have an index.html file in a folder. I use a workflow of AWS Lambda function using Nodejs these days to pre-render my HTML files and serve them via CloudFront using AWS S3 as the CDN origin.
Ok, that might have gotten a little too technical, but I wanted to clarify that static websites are what I recommend today. That's because on-demand rendering engines require too much overhead when page speed is critical.
Dynamic involves a composition process, known as rendering. Typically this means examining the request route, querystring, headers, possibly cookies, etc and building the response.
Usually the server process will also need to access a data store (database) either directly or through web services. This is why dynamic content often takes longer to return than static. If you add on compression, which affects both static and dynamic content you have yet another step the server needs to make before anything is served.
Dynamic server stacks include ASP.NET, Java Server Pages, Ruby, Express, PHP, etc. Most CMS platforms like WordPress all operate using this methodology.
WordPress is a great example. Often site owners are a bit too enthusiastic about plugins and go overboard. They often install too many as well as poorly written plugins. You should audit these regularly to ensure they are up to date and add real value to your site.
These stacks are usually hosted on top of web servers like IIS, Apache, NGINX, Websphere, etc. ASP.NET has over 20 separate events in it's rendering pipeline before you even consider the razor templates (MVC) used to define markup to render against a data model.
How to Improve Time to First Byte
For almost all rendering engines not only are you rendering against retrieved data the engine must also compose the markup of multiple files, often called includes.
This can be the main app shell, one or more child layouts and individual UI components.
Anytime the server processes need a file they must perform disk I/O to retrieve the content before they compose the response. Disk I/O is expensive, even against modern SSD drives. A good example of how this affect can be amplified review Instagram's scalability case study and see how not having a favicon caused their servers to crash from disk thrashing.
A properly configured web server will attempt to cache as much as it can in memory and serve the response from cache instead of hitting the disk and performing the rendering process. ASP.NET utilizes a feature called Output Caching to enable granular cache control. It works by caching the final rendered product in memory. You can even designate parts of a page be cached in memory, while others are allowed to be dynamic.
Developers can control how Output Cache works by correlating cached objects by header values, querystring parameters, language, etc. This is helpful because a page can be composed by many different parameters, so it offers the most flexible scenario. When the web server can avoid hitting the disk and the raw composition process it can serve the response much faster.
When the server is composing a response and needs to wait on a slow database query (as an example) the response becomes delayed. Instead you should try to flush any content that does not depend on the database query. Again most platforms offer some sort of response flushing mechanism.
When flushing content you are sending early bytes, while keeping the connection and process live for additional content. The initial content should include CSS and any scripts to start building the page. There is an art to crafting a good early content flush, so you will most likely need to experiment to get this optimized.
Of course you can always look to reduce your data and service access times. This is one of the reasons why document databases like Elastic Search, MongoDB, DynamoDB and Cosmos become popular.
7 Way to Improve Your Time to First Byte
- Use a Pre-Rendered Static Website
- Use a Content Delivery Network
- Optimize Page Rendering Process (ASP.NET, PHP, Express, etc)
- Optimize Database Queries
- Use a Document Database
- Cache Dynamic Assets in Memory
- Flush Rendered Content While Waiting on Additional Content
They allow applications to reduce database overhead by eliminating queries. Instead they take advantage of denormalized data where known query results are stored ahead of time in a quick retrieval state. Instead of composing a complicated SQL statement a document database relies on you asking for a record or set of records by a simple index. This solution provides a very quick response time and eliminates the traditional relational database from the web serving pipeline.
If you need to query the a relational database, make sure these queries are optimized. I can't tell you how often I have encounter poorly written stored procedures or missing indexes that cause high demand SQL request take minutes instead of milliseconds.
Other steps you can take include using a content delivery network (CDN). Often you can use a CDN as a front-end cache that also distributes your content closer to users.
Let's Wrap This Up
Time to first byte is the most common page speed metric measured by developers and marketers. While important, it is only a small piece of the modern web page's loading profile.
You should aim for 500ms or less time to first byte over a high speed connection and 1 second or less over 3G. This gives you a fighting chance to make your page interactive in the 3 seconds consumers allow before bouncing from your page.