http://www.springsource.com/newsevents/vmware-acquire-springsource?__utma=1.989943887005353600.1229312626.1248839927.1249962234.28&__utmb=1.1.10.1249962234&__utmc=1&__utmx=-&__utmz=1.1248780117.26.15.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=spring%20webflow&__utmv=-&__utmk=245757188
‘The Null’ Nuisance
3 February, 2009While working on enahncements on a project already in production, I had a very interesting conversation. Let me give a brief background – the core architecture is all in place and we need to build in new functionality. Of course, refactoring is being done along the road. In a specific scenario, I got into a conversation with a fellow architect on usage of “nulls” and “null checks”. The theme of the conversation was “Should a method return a null or an initialized instance of the class”. Let me take an example:
There is a service method that connects to a database loading records for all users in the system. In the DAO we are loading the recordset from the database and converting to an ArrayList of DTO (ValueObject). A sample code to map the a DTO generally is:
List<User> users = null;
for(int index = 0; index < recordSet.size(); index++)
{
User user = new User();
user.setFirstName(recordSet.getString(“firstName”);
user.setMiddleName(recordSet.getString(“middleName”);
user.setLastName(recordSet.getString(“lastName”); user.add(user);
}
return users;
I had an objection to this style of coding. The simple reason being, on the front-end, I had to put a check for null which was un-necessary. Hence, the other classes that were consuming the results had to write the following code:
List<Users> users = loadAll();
if(users != null)
{
/// do something
}
else
{
if(users.get(index).getMidleName() != null)
{
// show the middle name
}
else
{
// do not show the middle name
}
}
Now, consider a scenario with complex objects having lists all down the hierarchy. It means that before we access a property, we will have to provide a null check. Soon, this “do nothing” null check will become a headache. Someone has coded a null propogation somewhere and we can not trace it. We feel the easiest way is to put in a null check. In my given example, I would have my JSP strewen with null checks cluttering my code.
Unfortunately, this will not solve the real problem. A simple solution is to identify the code where a null reference can be introduced and handle it there. The rest will be happy about it.
More importantly, et us pause for a minute and ask ourselves – Is there something that the application can do, with an object refering to nothing? Let us go back to my example and see how is the application going to use the user list. We need the list of the users to display a report for the users listed. If no users are returned, the uer should see “No users exist”. The UI is no sure, what represents users – a null object or an initialized object with 0 size or an exception. This will mean that the developer consuming the method will have to write these multple conditions for a simple check.
We can do oe of the following:
1. Throwing a business exception that voilates a business logic can be an effective strategy. However, it largely depends on how do you use exceptions in applications. Remember, raising an exception is an expensive operation.
2. Alternatively, you can provide an Empty implementation of the object that can do something useful like logging an info or an error to the log system.
I am not a hugh fan of throwing an Exception, and also because it is expensive, I am exploring the second option. This changes my code to:
List<User> users = null;
for(int index = 0; index < recordSet.size(); index++)
{
User user = new User();
user.setFirstName(recordSet.getString(“firstName”));
if(recordSet.getString(“middleName”) == null) // You can also use StringUtils from apache.lang
{
user.setMiddleName(“”);
}
else
{
user.setMiddleName(recordSet.getString(“middleName”));
}
user.setLastName(recordSet.getString(“lastName”)); user.add(user);
}
if(users == null)
{
// throw new business exception
}
// else we return an initialized list.
return new ArrayList<User>();
This will change the UI code to:
List<Users> users = loadAll();
// code to show the middle name – if it does not exist, it will show up as blank.
The most evident benefits is – “No more if statements for null checks on the UI. Check is being pushed down in the call hierarchy. Hence, multiple methods calling the same method will not have to worry about nulls.”
The most important question is “Is this approach safe?” Nothing ever is. There is no reason for someone to code incorrectly. Of course, we can not on external libraries never to return null references, but when you write your own code, following this approach can lead to a less cluttered application and a better control over source code.
Remember: The approach is not always necessary, just ensure that the null reference should not be catastrophic.
ScribeFire
3 February, 2009I am facing issues using Firefox with wordpress lately and decided to check out alternatives. Firefox’s plugin – ScribeFire seems interesting. Lets see what it has to offer.
How to write unmaintainable code :: Humour
3 February, 2009Read this famous article here: http://www.freevbcode.com/ShowCode.Asp?ID=2547
Unit Testing
23 January, 2009Unit testing is a way of testing a small unit of functionality. A unit is the smallest testable unit of program usually referred by a function, procedure etc which is a part of a class. Ideally, a unit test is independent of any other unit tests.
Benefits
The primary goal of a unit test is to isolate a smallest unit of a program and prove that it works correctly. A unit test can be referred to a requirement that a function should satisfy. This is the primary reason it has several benefits; one of the key benefit to find bugs early on in the development.
Facilitates Change
Unit testing allows a developer to refactor code at a later stage in the application development phase or in maintenance phase. It provides an umbrella ensuring that all the modules work correctly and as expected after that change. It is a great way identifying regression errors. The correct process if to write a unit test case for all methods and functions; so that if a change has introduced a defect it can be identified and quickly.
A good unit test suite ensures that all the paths in the code are covered including if-conditions and loops.
Simplifies Integration
Unit testing helps to eliminate defects and ambiguity in the functions themselves. This leads to a much simple integration approach when we want to test all the functions together. If we have all the methods do what they are expected to do, doing an integration testing automatically becomes easy.
Documentation
In last few years, how we document a code is changing. In past years, the community has been moving away from writing word documents and using Java Docs and inline comments. Unit testing can also provide a document to the system. Developers looking to ramp up on an application can use the unit testing to have a basic understanding of the application.
Unit test for a method encompass all testing scenarios. While positive test case provides information on how to use the method, a negative test case provided scenarios how not to use the method and what to expect.
While a normal textual document has a high likelihood of drifting away from the actual implementation, unit tests will remain aligned to the implementation. However, developers should not reply solely on unit tests for documentation.
Design
If you use Test Driven Development to develop software, Unit test can be used to provide a formal design. Each unit test can be looked at design specification providing information on interfaces, classes, methods, return types, error conditions. Let us look at the code sample below:
A test case that specifies there has to be a static class called MathUtil with a method called divide. This method takes two parameters of type int and returns a parameter of type double. In addition, you can expect an exception from the method.
public class TestMathUtil
{
Public void testDivide()
{
Double result = MathUtil.divide(10, 2);
assertTrue(result, 5);try
{
Double resultError = MathUtil.divide(10, 0);
assertFail(“We expect an exception);
}
catch(Exception ex)
{
assertTrue(ex.getMessage(), “Divide by zero”);
}
}
You can clearly see that how looking at this test case a developer (consumer) of the MathUtil class understands the requirements. One significant advantage of using Unit testing as design element over UML based design is ensuring that the implementation adheres to the design. A developer reading a UML diagram can potentially name the class MathUtilities, which will instantly make the design disharmonious with implementation.
However, now we have code generation tools available for all major languages that eliminate such inconsistencies.
Limitations of Unit Testing
We can expect to catch all errors during testing – it is impossible to evaluate all possible paths for all but trivial scenarios. This is as applicable to unit testing as it is applicable to other forms of testing.
Unit testing by definition, tests only units of functionality and does not guarantee catching integration errors; it only facilitates integration. Unit testing may not catch integration errors across multiple units.
Effort needed
Software testing is combinational problem. For every decision case, we need at-least two test cases. If you have complex conditional logics, the complexity of unit test cases will increase exponentially. As a result, there will be times when the code written for test cases will be mush more than the code itself.
Discipline
To achieve the most from unit testing activity, a team needs rigorous sense of discipline throughout the software development lifecycle. It is most essential to keep a record of failing test cases. A very close eye has to be kept on when test cases fail. There should be a process in place ensuring review of test case failures every day and addressing them actively.
Use of a Continuous Integration tool is a most common practice in the software development lifecycle to through out test cases results post a build cycle.
A Team should also consider using a version control for the development process so that they can look at various baseline versions for changes in code to identify regression scenarios.
Writing unit test is an art
It is very easy to get overwhelmed when starting to write unit test cases. The best way is to create unit test cases for new code. Although, it is possible to create unit test cases for existing code but it is not worth the effort. Start with the new code added to the application, get familiar with the process and then revisit the decision to write test cases for existing code.
Mastering the technique for unit test cases if 95% mental and 5% technical. You have to be patient with the Java Compiler. When creating a new test cases you should assume that the class or the method exist. Stick to the various syntax errors that are displayed. When you write your class things will come to order.
Getting developers to think like testers will be your greatest challenge. Many projects that I have seen fail unit testing, fail for this reason.
Walking the fine line
Very often it is not clear when a test cases is actually a functional test case. TO be honest, it is not clear to me if I know the line myself. However, I try to stick to the following guidelines. A unit test case might be a functional test case:
- If a unit test cases crosses class boundaries;
- If a unit test case is becoming complicated
- If a unit test case becomes fragile
- If a unit test case is harder to write than the code itself
- If a unit test case has lots of asserts
Remember, there are no rules. If you find another approach that works for you, best feel free to use it. Be careful to document it so that the entire team can use it consistently.
Top 25 Most Dangerous Programming Errors
14 January, 2009The 2009 CWE/SANS Top 25 Most Dangerous Programming Errors is a list of the most significant programming errors that can lead to serious software vulnerabilities. They occur frequently, are often easy to find, and easy to exploit. They are dangerous because they will frequently allow attackers to completely take over the software, steal data, or prevent the software from working at all.
Read complete article here:http://cwe.mitre.org/top25/#CWE-319
Extract, Transform, Load
14 January, 2009ETL in computing terminology refers to Extract, Transform and Load process. This is related mostly to data warehousing projects. A ETL framework involves the following three steps:
1. Extract: This is a process to load the data from a data source which could be a database, or a file dump from another system
2. Transform: This step involves, massaging the data to an appropriate form. This may need to to trim down the data or aggregate data from multiple data sources
3. Load: This is the final step, which uploads the data in another data source like database or generate a flat file.
Extract
This first part of the process involves reading data from various data sources databases. The data itself could be in different format. Some of the very commonly used data formats are databases and flat files. In some cases the data sources may also include some non-relational data sources.
Transform
This next step involves application of various rules on the dataset and prepare the data for the next step of Load. Some of the datasets may need very little or no transformation, while there may be other data sources that need very complext levels of transformations to meet the business requirements. Some of the common operations that may be needed here are:
- Filtering the data set for a subset of records
- Generating new values based on existing columns (using pre-defined formulas)
- Splitting of data set into different tables
- Aggregating data from various data sets
Load
The load phase loads the data in the target. This phase can do various things depending on the business needs. Sometimes the load may need uploading a fresh data set on a incremental basis. In other cases it may require to update an existing dataset
ETL Flow
1. Cycle Initiation: This is the very first step in the ETL process, where you collect all the reference data and validate that the settings provided are correct. This is the initialization phase. If there are errors during initialization, the ERL process fails.
2. Extract: In this step, you read the data from the datasource
3. Validate: Here the data is validated against a pre-defined business ruleset
4. Transform: Apply any transformation rules
5. Stage: This can be categorized a sub set of the transform stage. A business requirement may need us to load the data in a temp space like when we need to aggregate data from more than one data sources. In that case, we use a tamp database to hold the data sets before we can apply transformation.
6. Load: Load the data into final data source.
7. Cleanup: Clean up any temp files / databases.
Challenges
Some of the common challenges are:
- An ETL process involved considerable complexity and significant problems can come up with an incorrect designed solution
- Data sets in production can be vastly different than what developers of the system use. This can lead to huge performance bottlenecks
- These types of solutions grow horizontally which involves adding more data sources either to extract or load. The solutions should be designed to support addition of such data sources with minimal effort
Performance
This is the biggest challenge that any ETL solution has to struggle with. Most often the slowest part of the ETL process is the load phase where we have to take care of the various database structures, integrity of the records and indexes. The transform phase can also lead to some performance bolltenecks if there are needs to perform some extensive data transformations.
Best Practices
Layered Architecture Design
Core Layer: This is the primary layer which holds all the business logic or core processing like Extract, Transform and Load
Job Management Layer: This layer should take care of scheduling jobs, managing queues and other operational activities like activation of tasks, alerts etc
Auditing and Error Handling: This layer should be dedicated to auditing process, logging entries to log files or database. Also, providing error handling support
Utilities: A common layer to provide common functionality across layers
Core Layer
This is the most important layer and holds the most logic. As a good practice, this layer should be divided into three sections, which should be controlled around a commoin Processing logic. Some common components of this layer can be:
1. Controller: These hold the processing logic which co-ordinate the entire ETL lifecycle. They hold the details of the various utilities and invoke them as needed.
2. Readers: These hold logic to read data from data sources like databases and flat files. Their responsibility should be to load the data set and make it available for next phase.
3. Transformer: These components hold the logic for applying transformations to the data. Transformations can be business validations, mappings or other logic.
4. Mappers: These hold the mapping for a transformation. The controller should be aware of the mappings that are to be applied to the loaded data. In most common cases, the framework should make interfaces available to the consumers of the framework to define mappings. The framework (via controller) should consume those mappings
5. Validation: If there are validations needed to applied, these components should be defined individually. Again, in most cases, the framework should make these available as interfaces and concrete implementations would be provided by consumers of the framework.
6. Loaders: These hold logic to load the data into a data source like databases and flat files.
I am not a Subject matter expert on ETL, but hope this helps.
2009
9 January, 2009A very happy new year to all of you. Thanks for making Scratch Pad a success in 2008. Last year when I started this blog i could not have thought of bringing it to where it is today. For sometime, I have been thinking where do I want to take this blog too. If you think of it, it is not different then where I want to focus on technology.
1. Utilities Java Application Framework (code named ujaf)- Across projects, there are many things that I have been using over and over again. As these Java files are created in a project they carry along with them their package names (specific to projects). This utility aims to bring all such utilities in an open source project under sourceforge.net umbrella
2. Flex controls – I started to write custom controls when i started this blog. For last six months I have working away from Flex. On Jan 10, I am kicking off a new project in Flex (I am very excited). With this, any custom controls I need i will make them as a library and make them available as an open source library
3. Spring – This year will be find me focus a lot on Spring technology stack and use the various projects in my utilities.
4. Portal – I have been working on Portal technologies for last six months. This year will see me get into portal technology and do an integration of the same with RIA.
5. Converting my blog – By now this would have been done. Seems like, this has to wait for some time. By mid of Q2, 2009 this should be through. I hope you like the change.
There may be more things, I am unsure as of now. I have been asking you, if there are things on your mind share with me; it could of same interest.
God bless and have a great 2009.
We are still here
9 January, 2009As much as I would like to move over, I have to be here still. I ran into roadblocks with hosting and seems like this will take some time. Until then, lets continue …!
Bye Bye
7 January, 2009This is going to be my penultimate entry. Not that I plan to stop blogging. I am moving to a hosted solution. The current setup is leading me to compromises and I feel that I am unable to bring the quality out of my posts like posting PDFs for code etc.
This blog will go down during this week and come back in its new avatar. Catch you all there.
Posted by Kapil Viren Ahuja
Posted by Kapil Viren Ahuja
Posted by Kapil Viren Ahuja