The Language of Risk « The IT Risk Manager | Kapil | Scratch Pad | Java | Architecture | Design | Open Source
The language is important because it helps you think about the problem in the right way.
This statement stuck a chord reminding me of an instance not so long ago . She asked me – “why did my project had just a couple of risks?” . She went on to probe us (my PM and me) to understand if we were not thinking of the risk.
At that time, I did not answer her in the way the Author summarizes it. We presumed that we had a functional scope documented and hence the risk of not being able to deliver what was required did not exist.
I was a young budding developers when I was first introduced to the concept of Cache. My Senior Architect then told me
Cache is a component that will magically store data so that future requests of that same data will not be to the Remote Server, and hence it will improve the performance of our application significantly faster
We were working on a website which was to integrate with an existing application through the use of APIs and for purposes of closer integration we had decided to store the data in form of XML as artifacts in this tool as tracker items – no RDBMS. It was like running two applications joint at the hip.
I had gathered enormous experience working with SourceForge platform APIs as I had integrated them with Ms-Excel and now we were going to build an entire application on ALM space using the same APIs. Our biggest challenge was going to be performance because of the use of APIs against a remote server sitting in a different geography. And Cache was going to be instrumental in helping us solve that problem.
We had just decided to implement Write-through cache that helped us build a system which was significantly faster that anyone could have thought about. And since then this is one pattern that I have come to use (if possible) whenever I am working with diversified systems. Surprisingly, the percentage of people who have used cache have never heard of this pattern of cache implementation when the underlying issues with the systems can be solved using this pattern (I am still mystified as to why not?).
Before we dig deep, I want to run through some definitions that we are going to use during the course of this article:
- Cache hit refers to a request to the data and finding it in the cache
- Cache miss refers to a request to the data and not finding it in the cache
- Dirty refers to a cached data if it has not the same as the original data
- Lazy refers to an action if it not performed real time, but only when it is required
In its simplest form, a cache implementation is going to something similar to the image below.
As you can already see that this is just one part of the cache implementation and implementing this workflow alone would mean that once I have a data in the cache it would only always fetch the data from the local cache location and go back to the original data source only if the cache data is Dirty. And there are several ways we have to mark the data as dirty – it can be an action we configure in our system like – “If we are update records in the data source, we try to find the key and mark it dirty”. Another way is to decide a time after which the the cache should be expired automatically.
While, this approach is simple enough, this does presents a unique “problem” (and it may not be a problem for everyone). Lets revisit the reason for which we decided to implement cache.
Cache is a component that will magically store data so that future requests of that same data will not be to the Remote Server, and hence it will improve the performance of our application significantly faster.
This implies that “Caching is a mechanism that is faster when compared to our data source when it comes to data loading”. I have observed cache implementations against RDBMS sitting next to a Application Server, which means that loading data from the RDBMS is still faster and hence there is really no need to improvise on the cache flow as defined earlier.
In our case, we were dealing with an external system from where we fetched XML over HTTPS and then converted the XML to an object. This entire process was time consuming – 3 seconds for one object and there was nothing we could do about reducing the transportation time. It also meant that the classic workflow for us would not work either, especially if a user would update a specific record, and if the cache would be marked dirty, the next request would mean a significant delay time.
We improvised and it was then we used the write-through cache logic that allowed us to manage the data in the cache in real-time with the data-source. The workflow was changed to the one below:
As simple as it may look it was not so. Lets see first what we did. We added a hook to the code which was required to save the data in the DataSource to do two things:
- Find the cache entry and mark it dirty if it was a hit and;
- Update the cache after a successful update to the data store
This allowed us to keep the cache in sync with the DataSource and hence not requiring to spend additional time to load the data back again.
But, as I feel that every solution will bring its challenges, this one had as well especially when we decided to scale and move over to a cluster of application server. This meant that a local cache would simply not work because it was meant that an update to the DataSource meant that the cache on other application servers was out dated and users would not get the most latest data making it impossible to work. We did use version to records to manage the concurrency checks, and not keeping the cache in sync meant that other users will see their updates fail because of that very fail-safe. Eventually, we had to find a cache that can be scaled in a cluster which only made things more complicated.
A pattern that I learn in my development adolescence, this had proved to be a powerful technique to build solutions that would work fine in a given scenario.
I came across this article which I just want to share with others – I found this a good read.
I heard about OSGI sometime early last year, but I did not care about it – it meant start thinking about a new way of development and deployment (thats what I heard from my friends) and I did not want to learn something else when Spring worked great for me. And, my colleagues who spoke about OSGI did not do a good job of advocating OSGI. Last month, I came across Adobe Day as a potential platform for a project implementation. Day was amongst some of the CMS platforms that I was evaluating, namely – SDL Tridion, Oracle Stellant and Interwoven.
It was Adobe Day that introduced me to OSGI and during one of the webinars, I was introduced to OSGI using a simple yet powerful graphic (see below).
And from that moment on, I was simply hooked on. I have been a big fan of Java and its deployment frameworks like Maven – but essentially using those frameworks and tools also meant that sooner or later I would be dealing with various versions of same library (ehCache, Log4j being the most common) and when I wen to deployment on JBoss or Apache Tomcat, more often than not I would have to tweak my project dependencies to or servers to ensure that those “dependencies” are resolved appropriately.
Suddenly, OSGI seems to be the answer to everything (or almost everything). I would be able to create and deploy components and then choose to include them as needed; I did not have to worry about backward compatibility or which service to deploy – opportunities were endless.
So today, I am going to start off my journey to read more about OSGI and explore some common available containers like Equinox (used by Eclipse) and Apache Felix (used by Adobe Day). And I hope that while I learn more I am able to share my thoughts and some of the best practices around the same.
- OSGI: The new Toy (scratchpad101.com)
Recently, when I decided to make my projects available to the Java Community under Open Source license, I did a lot of things like Hosting my code on Google Code, using Maven to build and test the code and also making sure that I have a project site up and running. However, even after all of this, there was one thing that was missing which was the most essential – “Hosting the project files on a repository so that users who may find these utilities useful may use it”.
I found out that Sonatype would allow me to deploy my code to Maven Central Repository. However, after 3 days of trial and error, I have finally been able to release a version of my code to the Central Repo. I am writing this post so that if you ever want to do this, you will not have to go through the same pain as I did.
Read the full article here (http://scratchpad101.com/2011/09/08/project-files-maven-central/).
EAMSteps is a framework that makes Unit Testing easier because of its following principles:
- External data store for test data like excel spreadsheet
- Automated assertions using co-related inputs and outputs
- Minimal lines of code to get started
It all started in 2009, when I decided that in long term I would need to have a framework in place that will help me meet my objectives and I started to look for any existing tools. When I did not find any framework that would help me, I decided that it was time to write one and here after 2 years I have something breathing.
You can read more about this here.
In my case, it was not an invention, but it sure was a learning experience as I started exploring maven-site-plugin. This plugin generates various reports for a maven project including those that have been configured in the pom.xml file. However, it was not a straight forward implementation because I had some additional requirements like placing Google Analytics Code and placing custom Project information on my site. In this post, I will make an attempt to explain some of the things I did to help me generate my site.
Read the complete article here (http://scratchpad101.com/2011/08/30/publishing-project-maven/).