Architect Toolkit RAID Analysis

In this article, I will share one of the essential toolkit that should be used as soon as a project starts. It is RAID which stands for Risks, Assumptions, Issues and Dependencies.

In my experience, risks and issues related to architecture design that occurred in later state of project often due to lacking of proper analysis work in the initial stage. You may argue that it would depend on scale or project type. Alright! Let’s think about this as a toolkit from architectural perspective, from medium to enterprise level in which you as system architecture needs to make sure the healthy and alignment of the whole ecosystem upon any change.

Let’s look at another perspective. As project or program manager / director, will you accept any project plan without a report of risks, assumptions, issues and dependencies analysis ? The answer is certainly No if you are playing your role. 🙂

Enough for long-winded opinion. In the next section, I promise to make it be short and sweet by briefly describing the goal of each steps and following with list of questions which by answering that will provide you or your project team a nice RAID analysis report.

Let’s read on.

Risks Analysis

It aims to let stakeholders know the key areas of uncertainty then allow the project team to develop risk mitigation plans.

Questions:

  • What risk cause project to be delayed or not delivered ?
  • Is there any s/w dev skills not currently employed in your areas ?
  • Are new technologies required that the company is not familar with ?
  • Is there any specific contracting needs ?
  • Is there any scaling needs that business not willing to pay for ?
  • Is there any lower environment need that will not be fulfilled ?
  • Is there any not funded testing efforts ?
  • Is there any significant business risks being introduced ?

Assumption Analysis

Capturing assumptions before project starts helps levels-set people’s thinking about the architecture and serve as the issue resolution later on when the problem arises.

Questions:

  • What assumption being made ?
  • Do you assume that you can successfully develop some new capacity ?
  • Do you assume that certain group will do particular part of their work ?
  • Do you assume that certain refactoring will occur with an existing system ?
  • Do you assume that certain integrations will be required or will explicitly not be supported ?
  • Do you assume some research and development needs to occur ?

Issues Analysis

The key objective of this step is to give project a sense of what area of the architecture have not been resolved and need to be dealt with in the future.

Questions:

  • What are areas of the architecture that have not been resolved ?
  • What areas of the architecture have not been finalized ?
  • Any areas of technology that you or your team have any concern of known problem ?
  • Are there contractual issues in play ?
  • Has a key resource recently moved to another part of the company ?
  • Is the deadline for delivery overly aggressive ?

Dependencies Analysis

Dependencies are anything that the architecture depends on including items, projects and tasks. Dependencies need to be clearly stated and made visible to the executive staff. It helps them to manage dependencies for you as it’s their interests.

Questions:

  • What project are you dependent on for your project to complete ?
  • What licensing agreements are you dependent on to provide needed functionality ?
  • What purchases or other procurement needed ?
  • What business arrangement needed ?
  • What hardware needs to be purchased or operationalized ?
  • What infrastructural software needs to be operationalized ?
  • Is software integration with specific tools or services required ?

Hope you enjoy it.

Until we meet again, happy designing and coding.

Great Questions on Scalability

The definition

Scalability is one of the biggest architectural concerns in modern software developments. In technical term, scalability enables a system to gracefully respond to the demands that are placed upon it, e.g. storage IO, database access, CPU utilization, memory utilization , App servers farms and network utilization are most common area requires scalability attention.

The challenges

In my experience, when designing or even developing a scalable solution, it’s difficult to make the right prediction on the demand for the future system and the potential area of optimization. Those are coming through the real experiences upon the system being up and running in production and being used and assessed by users.

It is arguable.

As the architect whom are responsible of the scalable design and solutions, we must plan in scalability as part of the development and deliverable cycle. It could be achieved by chunking, testing and details monitoring to validate the system behaviors.

The options

Two most common options for scalability are scale-up and scale-out. Scale-up means to buy bigger hardware. Scale out means to have multiple sets of hardware that can response to the same requests.

In my early career, the scale-up is often the favorite choices because it provide full control and ownership the the hardware and, most importantly, it is usually budgeted. Not even virtualization of VM concept was employed yet since the technology is not so popular. Then after cloud was introduced in latest 2008, there is a momentum shift to scale-out option which is more cost effective. Why? It simply allows to start small and add system resources as the demand for system’s capability increases overtime.

The questions

Now it comes to the most interesting part of this article: the area need to consider when designing and implementing scalable solutions. For me, I like to ask questions because I often have different answers sometime that interest me and cultivate my interest to ask more. So, here they are.

1. How many users (online and batch) will concurrently access the system ?

2. How much data will the system be able to manage ?

3.  How many read / write operations per second does the data store need to handle ?

4. What is the peak concurrency access to the system ?

5. How much data can be cached to minimize the depth within the system that the requests need to travel before being responded to ?

  • Can data be cached outside the system in content distribution network (CDN) to help to keep traffic away from site ?
  • Is it worth caching ?

6. Is data replication required for the system ? How long is it acceptable for the data synchronization to take place ?

7. How much logging and events are required to the system to support the operational needs of the system, for now and future performance analysis ?

8. Are there area of data contention ?

9. Are the CPU intensive operations ?

10. How do you plan to measure usage of the system ?

11. Do you plan to meter services to throttle excessive usage ?

12. Do you have ability to auto-provision additional servers to meet the demand ?

13. Can you schedule batch operations to occur at non-peak times ?

I leave it for you, the readers, to decide which questions are most important for you.

The practice

For me, it is important to setup the set of rules and alert so that key personnel will be notified upon certain threshold, in related to system performance. For example, the operational warning for operation team will be triggered upon system resource reaching 80% utilization. If it is over 90%, the urgent notification is needed. And action to be taken to resolve the problem. I love the idea of auto provision base on system usage, it is fully automated and greatly improve the system performance. It is certainly that the rule for demolishing those underutilized VM or instance should be set.

The takeaway

The key to scalability is to test and validate our assumption about system behavior. It is to drive system pass its limit to the breaking points so that we could find out how system fails under load.