Cache (as defined by Wikipedia) is a component that transparently stores data such that future requests for data can be faster. I hereby presume that you understand cache as a component and any architectural patterns around caching and thereby with this presumption I will not go into depth of caching in this article. This article will cover some of the very basics of fundamentals of caching (wherever relevant) and then will take a deep dive into the point-of-view on the caching architecture with respect a Content Management Plan in context to Adobe’s AEM implementation.
Principles for high performance and high availability don’t change but for conversation sakes lets assume we have a website where we have to meet the following needs.
- 1 Billion hits on a weekend (hit is defined by a call to the resource and includes static resources like CSS, JS, Images, etc.)
- 700 million hits in a day
- 7.2 million page views in a day
- 2.2 million page views in an hour
- 80K hits in a second
- 40K page views in a minute
- 612 page views in a second
- 24×7 site availability
- 99.99% uptime
- Content availability to consumers in under 5 minutes from the time editors publish content
While the data looks steep the use case is not uncommon one. In current world where everyone is moving to devices, and digital there will be cases when brands are running campaigns. When those campaigns are running there will be needs for support such steep loads. These loads don’t stay for long but when then come they come fast, they come thick and we will have to support them.
For the record, this is not some random theory I am writing, I have had the opportunity of being on a project (I cant name) where we supported similar number.
The use case I picked here is of a Digital Media Platform where we have a large portion of the content is static, but the principles I am going to talk here will apply to any other platform or application.
The problems that we want to solve here are:
- Performance: Caching is a pattern that we employ to increase the overall performance the application by storing the (processed) data in a store that is a) closest to the consumer of the data and b) is accessible quickly
- Scalability: In cases when we need to make the same data-set available to various consumers of the system, caching as a pattern makes it possible for us to scale the systems much better. Caching as we discussed earlier allows us to have processed data which takes away the need to run the same processing time and again which facilitates scalability
- Availability: Building on similar principles as of scalability, caching allows us to put in place data in areas where systems/components can survive outages be it network or other components. While it may lead to surfacing stale data at points, the systems are still available to the end users.
Adobe AEM Perspective
Adobe AEM as an application container (let’s remember AEM is not built on top of a true application container, though you can deploy in one), have it’s own nuances with scalability. In this article I will not dive into the scalability aspects, but as you scale the AEM publishers horizontally it leads to increase in several concerns around operations, licensing, cost, etc. The OOTB architecture and components we get with Adobe AEM themselves tell you to make use of cache on the web servers (using dispatcher). However, when you have to support Non Functional Requirements (NFRs) I listed above, the standard OOTB architecture will need a massive infrastructure falls short. We can’t just setup CQ with some backend servers and an apache front-ending with local cache, throw hardware at it with capacity and hope it will come together magically.
As I explained, there is no magic in this world and everything needs to happen via some sort of science. Let’s put in perspective the fact that a standard apache web servers can handle a few thousand requests in a second and when you need to handle 80K hits in a second which includes resources like HTML, JS, CSS, images, etc.; with a variety of sizes. Without going into the sizing aspects, it is pretty clear that you would need not just a cluster of servers but a farm of servers to cater to all that traffic. With a farm of servers, you get yourself a nightmare to setup an organization and processes around operations and maintenance to ensure that you keep the site up and running 24×7.
Cache, cache and cache
Before we dive into the design here, we need to understand some key aspects around caching. These concepts will be talked about in the POV below and it is critial that you understand these terminologies clearly.
- Cache miss refers to a scenario when a process requests the data from a cache store and the object does not exist in the store
- Cache hit refers to a scenario when a process requests the data from a cache store the data is available in the store. This event can only happen only when a cache object for the requests has been primed
- CachePrimeis a term associated with the process where we fill up the caching storage with the data. You can do this in two waysinwhichthiscan be achieved
- Pre-prime is the method where we run a process (generally at the startup of the application) proactively to load all the various objects whose state we are aware of we can cache. This can be achieved by either using an asynchronous process or by an event notification
- OnDemand-prime is a method where the cache objects are primes real-time i.e. when a process which needs the resource does not finds it in the cache store, the cache is primed up as the resource is served back to the process itself
- Expiration is a mechanism that allows the cache controller to remove objects from memory based on either a time duration or an event
- TTL known as Time To Live a method which defines a time duration for which a data can live on a computer. In caching world this is a common expiration strategy where a cache object is expired (flagged to be evicted if needed) based on a time duration provided when the cache object is created
- Event-based (I can’t find a standards naming convention for this) cache expiration method is one where we can fire an event to mark a cache object as expired so that if needed it can be evicted from memory to make way for new objects or re-primed on the next request
- Eviction is a mechanism when the cache objects is removed from the cache memory to optimize for space
The Design // Where Data should be Cached
This point-of-view builds on top of an architecture that designed by a team which I was a part of (I was the load architect) for a digital media platform. The NFRs we spoke about earlier are the ones that we have successfully supported on the platform during a weekend as part of a campaign the brand was running. Also, since then we continue to support very high traffic week everytime there are such events and campaigns. The design that we have in places takes care of various layers of caching in the architecture.
When we talk about such high traffic load, we must understand that this traffic has certain characteristics that work in our favor.
a) All this traffic is generated by a smaller subset of people who access the site. In our case, when we say that we serve 1 billion hits on a single weekend, it is worthwhile to note that there are only 300,000 visitors on the site on that day who generate this much load
c) As the users interact with the site/platform, there is content and data which maps back to their profile and preferences and it does not necessarily changes frequently and also this is managed directly and only by the user themselves
Content Delivery Networks (CDN) are service providers that enable serving content to end consumers with high performance and availability. To know more about CDN networks, you can read here. In the overall architecture CDN plays a very critical role. Akamai, AWS’s CoudFront and CloudFlare are some of the CDN providers with which we integrate very well.
CDN for us over other things provides a highly available architecture. Some of these CDN provide you an ability to configure your environment such that if the origin servers (pointing to your data center) are unavailable they continue to serve content for a limited time from their local cache. This essentially means that while the backend services may be down, the consumer facing site is never down. Some aspects of the platform like content delivery and new publications are activated and in certain cases like breaking news we may have an impact on a SLA, but the consumers /end users never see your site as unavailable.
In our architecture we use CDN to cache all the static content be it HTML pages, images or Videos. Once a static content is published via the Content Delivery Network, those pages are cached on the CDN for a certain duration. These durations are determined based on the refresh duration but the underlying philosophy is to break the content in tiers of Platinum, Gold, Silver and then assign duration for which each of these would be cached. In a platform like NFL where we are say pushing game feeds these need to be classified as Platinum and they had a TTL of 5 seconds, while content types like Home Page, News (not breaking news) etc. have a TTL of 10 minutes and has been classifies as Gold. Then on the same project we have TTL of 2 hours (or so) so sections like search and have been classified as Bronze. The intent was to identify and classify if not all most of the key sections and ensure that we leverage CDN cache effectively.
We have observed that even for shorter TTLs like Platinum with increase/spike in traffic the Offload %age (defined as the number of hits served by CDN to humber of hits sent to backend) grows and touched a peak of 99.9% where the average offload %age is around 98%.
Varnish is a web-accelerator which (if i may classify it as is as web server on steroids). If you are hearing its name for the first time, I strongly urge you to hop over here to get to know more about it.
We had introduced Varnish as a layer to solve for the following:
- Boost Performance (reduce number of servers) – We have realized that Varnish bring an in-memory accelerator gives you a boost of anywhere between 5x-10x over using apache. This basically means that you can handle several times the load with Varnish sitting on top of Apache. We had done rigorous testing to prove these numbers out. The x-factor was mostly dependent on the page size aka the amount of content we loaded over the network
- Avoid DoS attacks – we had realized that if there are cases where you see a lot of influx of traffic coming into your server (directed and intentional or arbitrary) and if you cant to block all such traffic, your chances of successfully blocking the traffic on varnish without bringing down the server when compared to doing same on apache increase many fold. We also use Varnish as a mechanism to block any traffic that we don’t want to hit our infrastructure and those could be spiders and bots from markets and regions not targeted by campaigns we run
- Avoid Dog-pile effect – if you have heard this term the first time, then hop over here hype-free: Avoiding the dogpile effect. In high traffic situations and even when you have CDN networks setup, it is quite normal for your infrastructure to be hit by a dog-pile as cache expired. Chances of dog-pile effect to increase, increase as you hit lower TTL. Using varnish we have setup something that we call a Grace Configuration where we don’t allow requests for same URLs to pass through. These are queued and after a certain while if the primary request is still not getting through consequent requests are served off of stale cache.
If you haven’t heard about Apache WebServer, you might have heard about httpd. If none of these ring a bell, then this (Welcome! – The Apache HTTP Server Project) will make explain things. AEM’s answer to scale is what sits on this layer and is famously known as Dispatcher. This is a neat little module which can be installed on the Apache HTTP server and acts as a reverse proxy with a local disk cache. This module only supports one model of cache eviction which is event based. We can configure either the authoring or the publishing systems to send events of deleting and invalidating cache files on these servers in which case the next call on this server will be passed back to the publisher. The simplest of the model in AEM world and also recommended by one of the Adobe is to let everything invalidate (set statlevelfile = 0 or 1). This design simplifies the page/component design as now we don’t have to figure out any inter-component dependencies.
While this is the simplest of the things to do, when we have to support such complex needs it calls for some sophistication in design and setups. I would recommend that is not the right way to go as it reduces the cache usage. We made sure that the site hierarchy is such that when a content is published we would never invalidate the entire site hierarchy and only relevant and contextual content is what gets evicted (invalidated in case of dispatcher).
AEM publishing layer which is the last layer in this layer cake, seems like something which should be simplest and all figured out. That’s not the case. This is where you can be hit most (and it will be below the belt). AEM’s architecture is designed to work in a specific way and if you deviate from it, you are bound to fall into this trap. There are 2 things you need to be aware of
- When we start writing components that are heavily dependent on queries it will eventually lead to system crumbling. You should be very careful with AEM Queries (which is dependent on underlying AEM’s Lucene implementation).
- This article tells us that we have about 4 layers of caching before anything should hit publisher. This means that the number of calls that should ever hit this layer is only a minuscule number. From here on, you need to establish how many calls your servers will receive in a second/minute. We have seen in cases where we have used search heavily AEM’s supported TPS takes a nose dive. I have instances across multiple projects where this number is lower then 5 transactions per second.
The answer is to build for some sort of an application cache which we used to do in a typical JEE application. This will solve this issues, assuming that content creation either manually by authors or via ingestion is limited which means the load we put on search can be reduced significantly. The caveat you should be aware of is that we are adding one more layer of cache which is difficult to mange and if you have cluster of publishers this is one layer which will have distributed cache across servers and can lead to old pages cached on dispatcher. The chances of that happening will increase as the number of calls coming into publishers increase or as the number of servers in the cluster increase.
In my next article I will explain how you can go about modeling the load your platform/site is expected to get and discuss about how the finer details of the architecture.