Quantify Software Characteristics

To measure to to know that increases the odd of being success

Motivation

Architect’s is largely to help the system have these desired qualities (fast, responsive, extensive), and to balance the inevitable conflicts and inconsistencies between them.
Without objective criteria, architects are at the mercy of capricious users (“no, I won’t accept it, still not fast enough”) and of obsessive programmers (‘no, I won’t release it, still not fast enough”)
If these are not quantified, then there is no basis for acceptance of the system by its users, value guidance is stolen from its builders as they work, and the vision is blurred for whom architecting it.

How to questions

How many ? In what period ? How often ? How soon ? Increasing or decreasing ? At what rate ?
Answers of those questions should be in the business case for the system. If they are not, then the job of architect is to go to get it from the business owner.
Next time if someone tell you that a system needs to be “scalable”, ask that person where new users are going to come from and why. Ask how many and by when? Reject “lots” and “soon” as answers.

Techniques

Uncertain quantitative criteria must be given a range: at least, the nominal and the most
- If the range cannot be given, then required behavior is not understood.
- Finding these range and checking again them is a time-consuming and expensive business.
- If no one cares enough to pay for performance trials, then more likely performance doesn’t matter. You should be free to focus your architecture efforts on aspects of the system that are worth paying for.
As the architecture unfolds, it can be checked against these criteria to see if it is still in tolerance. As the performance against some criteria drifts over time, valuable feedback is obtained.
Example of a good requirement:
- “Must response to user input in no more than 1,500 ms. Under normal load (defined as …), the avg response time must be between 750 and 1,250 ms. Response times less than 500 ms can’t be distinguished by the user, so we won’t pay to go below that

Clean Coder Summary

Clean coder is one of the classical book that every developer should read written by Robert C. Martin (Uncle Bob). The book contains 14 chapters provides guiding principles to be an outstanding developer.

This summary is solely based on my filtered eyes hence could be used mainly for a quick reference after you have read the book at least once. So please go ahead to read the book first to ensure you yield the most learning.

Here we go for the 14 long chapters. Hope you enjoy it.

Chapter 1: Professionalism

Being a professional means taking full responsibility for one’s actions.
- “What would happen if you allowed a bug to slip through a module, and it cost your company $10,000?
- The non-professional would shrug his shoulders, say ‘stuff happens’, and start writing the next module.
- The professional would write the company a check for $10,000!”
- [real disaster story about field-failure of a routine each test of which took hours]
First rule is not doing harm to the function nor the structure of the software.
- You will always make occasional mistakes, but you must learn from each. Promptly
You should be certain about all code you release and firmly expect QA to find nothing wrong with it.
- Test it. Test it again.
- Automate your tests.
- Demand 100% coverage.
- Design your code to be easy to test.
You should follow the Boy Scout rule and always leave a module a little cleaner than you found it so that it becomes easier to change over time, not harder.
- Suitable automated tests can allow you to not be afraid to change the code and continually changing it makes sure it stays that way
Your career is your responsibility, not your boss nor your employers.
- Spending 20 hours a week beyond your normal work to improve your knowledge and skills
- Read, experiment, practice (kata), talk to other, collaborate, look over the fence, mentor
- It should be fun.
Also, know your domain, identify with your customer (no “us vs. them”, ever).
- Be aware of arrogance inherent in programming and learn to be humble, too.

Chapter 2: Saying No

Real disaster story about premature deployment of a totally immature distributed system
Professionals speak truth to power. Professionals have the courage to say no to their managers.
Managers and developers have roles that are often adversarial, because on the short term, their goals tend to conflict.
- The manager will defend her objectives, but will also expect you to defend yours for the best overall outcome, which is reaching the largest goal that you and the manager share, which may be tricky to determine
The higher the stakes, the more valuable a “no” becomes, and the harder to say.
Good teams will successfully work towards a yes, but only a right yes, that will later work out in practice.
- Saying “no” is often a prerequisite for getting that right yes.

Chapter 3: Saying Yes

There are three parts to making a commitment
- You say you will do it
- You mean it
- You actually do it
A professional will not stop after step 1 or step 2.
Unconditional commitment always takes a form equivalent to “I will achieve goal X until time Y”
- Commitment means taking full responsibility.
- Most results depend on conditions you cannot fully control, so you will often only commit to actions or commit only conditionally
Your commitment must respect the limits of what you expect (based on your experience) you can and cannot do
- IF you recognize you will probably not be able to meet a commitment, you need to raise a red flag immediately

Chapter 4: Coding

Coding requires a level of concentration and focus that few other disciplines require.
- A clean coder codes only if she can guarantee enough focus
- Distractions (personal, environmental, or whatever) are a problem.
- Overtime is a problem.
Flow (“the zone”) is not as good as people think: you will be locally productive, but will often lose sight of the bigger picture and possibly product no-so-good designs
Interruptions are bad distractions.
- Pair programming is helpful to cope with them.
- TDD helps to make the pre-interruption context reproducible
  - Minimize time spent debugging
If you have writer’s block, start pair programming
- Make sure you take in enough creative input, e.g. reading fiction books. Find out what works for you
Coding is a marathon, not a sprint, so conserve the energy and creativity.
- Go home when it’s time, even in the middle of something important.
- Showers and cars are problem-solving resources too.
Continuously re-estimate your best/likely/worst completion time and speak up as soon as you recognize you will likely be late.
- Do not allow anyone to rush you.
- Consider overtime only for short stretch (2 weeks max.) and only if there is fallback plan as well.
- Use a proper definition of “done”, with sufficiently high quality requirements
Programming is too hard for anyone, so get help and provide help to others, in particular (but not only) in mentoring style.
- Don’t protect your turf, don’t shy away from asking, don’t show away others who ask.

Chapter 5: Test-driven Development

TDD is not a cure-all and is impractical or inappropriate in some (rare) cases.
Three principles:
1. You are not allowed to write any production code until you have first written a failing unit test
2. You are not allowed to write more of a unit test than is sufficient to fail, and not compiling is failing
3. You are not allowed to write more production code than is sufficient to pass the currently failing unit test
The cycle only about 30 second long
- It provides certainly not having broken anything when making changes
- It reduces defect injection rates often 2-10x
- It provides courage for cleaning up messy code
- It documents how code is to be used
- It makes you create designs with low coupling

Chapter 6: Practicing

A programming Kata is a precise set of choreographed keystrokes and mouse movements that simulates the solving of some programming problem.
- You aren’t actually solving the problem because you already know the solution.
- Rather, you are practicing the movements and decisions involved in solving the problem (IDE, TDD, CI)

Chapter 7: Acceptance Testings

Avoid garbage in, garbage out. Make sure you understand the requirements, and expect your customer to initially not understand them.
- Creating this understanding means removing ambiguity
Best way to do this is defining acceptance tests:
- Ask the customer for all conditions they will plausibly want the software behavior to fulfill and turn them into automated tests.
  - Customers often will not want to answer all your questions, so developers or tests will have to guess, in particular for the failure cases, and then validate the result with them
- Success of those tests constitutes the definition of “Done”
Code implementation should start only when the test implementation is complete.
- Look out for silly, awkward, or plain incorrect tests and work with the test authors to improve them
Unlike unit tests (which are for programmers only), the audience of acceptance tests are both business and developers.
- Prime purpose of both kinds is specification, testing is only secondary.
Test GUIs mostly one level below the actual GUI (on abstractions of the GUI elements) to reduce test volatility
Run all tests in a continuous integration and immediately fix any failures that may occur.

Chapter 8: Test Strategies

Consider QA part of the team. They act as specifiers: writing acceptance tests, including the failure cases and corner cases, and perform exploratory testing
Testing pyramid
- Most tests are unit tests
  - By developer and For developers
  - Executing almost every statement of the class and asserting its behavior
- Many tests are component or integration tests
  - By QA or Business assisted by Developers, For Business and Developers
  - In a component testing framework, executing all relevant paths through larger combination of classes.
  - Component tests mock away other parts of the system and assert correct business rules.
  - Integration tests may or may not mock and assert correct choreography of the pieces
- Some tests are automated system tests of the whole, usually at GUI level with the respective tools
- A bit more testing is done manually at system level in creative, exploratory fashion

Chapter 9: Time management

Software development, especially in management roles, requires good time management discipline
Meeting are necessary but are also often huge time wasters, so avoid meeting that have no clear benefit -> this is a professional obligation.
Meeting must have an agenda and a clear goal
- Agile stand up meetings can be an efficient format
- Iteration planning should take 5% of the iteration (2 hours for one-week iteration)
Any argument that can’t be settled in five minutes can’t be settled by arguing. So don’t try to; make measurements, flip a coin or vote
Concentration (focus) is a scarce resource
- Use it well when present and recharge with simpler tasks (meetings) and breaks in between
- How to improve?
  - Sport
  - Creative input
  - Pomodoro techniques (25 – 30′ block time)
Professionals work on their real tasks, in a sensible priority order, even if they don’t like some of them
- They admit when they have chosen the wrong path and leave it quickly
- They recognize messes (whether their own or others’) and never accept them, they clean up.
  - Nothing brings down productivity more than a mess.

Chapter 10: Estimation

Estimation is the source of most distrust between business people and developers, because the latter provide estimate which the former treat like commitments.
- both are insufficiently aware that the estimate really is a probability distribution, not a fixed number
The PERT technique computes and uses such distribution based on base-case, nominal and worst-case estimate for the project or better each task
- Program Evaluation and Review Technique (PERT): analyse and represent the activity in a project to illustrate the flow of events in a project
- PERT is a method to evaluate the time required to complete a task within deadlines, also cost.
- PERT involves following steps
  1. Identifying Tasks and Milestones.
    - Every project involves a series of required tasks.
    - These tasks are listed in a table allowing additional information on sequence and timing to be added later
  2. Placing the tasks in a Proper Sequence
    - Tasks are analysed and placed in a sequence to get the desired results
  3. Network Diagramming
    - A network diagram is drawn using the activity sequence data showing the sequence of serial and parallel activities
  4. Time Estimating: this is the time required to carry out each activity, in three parts:
    1. Optimistic timing: the shortest time to complete an activity
    2. The most likely timing: the completion time having the highest probability
    3. Pessimistic timing: the longest time to complete an activity
  5. Critical Path Estimating:
    - Determines the total time required to complete a project
Wideband Delphi (e.g. Planning Poker) is an estimation procedure where several estimators iteratively work towards agreement.
- Can be combine with PERT

Chapter 11: Pressure

The professional developer is calm and decisive under pressure, adhering to his training and disciplines, knowing that they are the best way to meet the pressing deadlines and commitments
Avoid situations that cause pressure via
- make only commitments you can fulfill
- keep your code clean
- work in such a way that you need not change it when in crisis
Don’t panic. Make a plan (and talk with your team). Don’t rush. Trust your disciplines.
Offer pairing to others in crisis

Chapter 12: Collaboration

Not all but most programmers like working alone. But we need to understand the goals of the people around us, including business folks.
- This requires communication.
Likewise, within the development team: only collective code ownership and pairing produce a good level of communication.
- Programming is all about communication.

Chapter 13: Teams and Projects

Teams need time (months) to gel, to really get to know each other and learn to truly work together
Assigning fractional people to projects is a bad idea, as is breaking up a good team at the end of a project.
Instead, assigning several projects to one team can work well

Chapter 14: Mentoring, Apprenticeship and Craftsmanship

Young programmers need mentoring.
- Mentoring can be implicit or explicit
  - Implicit = reading a good manual or observing someone working)
- Medicine has established a system of apprenticeship for new practitioners (a full year) in which mentoring is likely to occur and another 3 – 5 years of apprenticeship are required to become a professional in medicine specialty
Given that we entrust software with all aspects of our lives, a reasonable period of training and supervised practice would be appropriate.
- A system of masters, journeymen and apprentices as in the crafts might be suitable.
- We currently do not impose on the elders a responsibility to teach the young.
  - We are missing the mindset of craftsmanship and so the elders fail to consciously act as the role model that would make the young adopt the craftsmanship attitude as well

Application And Tools

Version management:
- Enterprise tools
- CVS, SVN
- Git
- Branching
Editors / IDEs
- vi, Emacs, IntelliJ, Eclipse, Textmate
Issue Tracking
- Pivotal tracker
- Light house
- Wiki
- Bulletin board
- Issue dumps
Continuous build: Jenkins
Unit testing tools
- JUnits, RSpect, NUnit, Midje, CppUTest
Components testing tools
- FitNesse
- RobotFX
- Green Pepper
- Cucumber
- JBehave
Integration Testing Tools
- Selenium
- Watir
UML / MDA (Model Driven Architecture)
- Code vs. details as main problem

Good questions architects should ask

I was inspired by the O’Reilly Software Architecture Conference New York 2018 talked on topic of developing a chaos architecture mindset by Adrian Cockcroft (AWS), when he mentioned his best role as architecture in his early career was not to define the standard or to tell other people what the architecture should be. Instead, it was to ask awkward questions.

Question 1: what problem are you trying to solve ?

I recalled times when being ask this by my boss when I was the engineer or lead that I often described technical challenges I or the team was facing (e.g. building or installing framework etc.). It was wrong answer. A good answer is something involving users or business needs. It helps sharpening thinking as well as the way you could communicate the problem more effectively.

Following are list of remaining questions mentioned in the talk and important points being tagged along

Question 2: What is the user need ?

Question 3: What does the value chain look like to support the user needs ?

What’s your time-to-value?
- Days
- Months
How is the value chain evolving ?
Have you heard of Simon Wardley ?
- Wardleys maps is used to visualize evolution and innovate further up the value chain
  - What is your position and movement on the maps ?

Question 4: What should your system do when something fails ?

Stops ? Because it can’t do something safely
Carry on with reduced functionality ?

Question 5: If a permissions look up fails, should you stop or continue ?

Permissive failure, what’s the real cost of continue ?
See Memories, Guesses and Apologies by Pat Hellad

Question 6: Do you have a backup datacenter ?

How often do you failover apps to it ?
How often do you failover the whole datacenter at once ?
A fairy tale
- Once upon a time, in theory, if everything works perfectly, we had a plan to survive the disasters we thought of in advance.
- How did that work out ?
Common problems. Things happened
- SaaS vendor:
  - Forgot to renew domain name….
- Entertainment site:
  - Didn’t update security certificate and it expired
- Finance company
  - Datacenter flooded in hurricane Sandy

You can’t legislate against failure, focus on fast detection and response (Chris Pinkham)

Question 7: Do you have a defined architecture ?

Are processes and roles documented ?
Are documents up to date ?
Do people follow the documented process ?
Is the architecture implemented as designed ?

Question 8: How do you try to make people comply with your architecture ?

Authoritarian High Modernism Workers should do what they’re told
Taylorism Management as an exact science
Synoptic Illegibility
- If you can’t write down exactly what really happens, you can’t write a synopsis and the architecture is ad-hoc and messy

Book recommended

The Safety Anarchist – Sydney Decker
Drift Into Failure – Sydney Dekker
Release It ! – Michael Nyberg

Hope you find it helpful. Until we meet again, happy coding and questioning !

Cheers.
Mike

Architect Toolkit RAID Analysis

In this article, I will share one of the essential toolkit that should be used as soon as a project starts. It is RAID which stands for Risks, Assumptions, Issues and Dependencies.

In my experience, risks and issues related to architecture design that occurred in later state of project often due to lacking of proper analysis work in the initial stage. You may argue that it would depend on scale or project type. Alright! Let’s think about this as a toolkit from architectural perspective, from medium to enterprise level in which you as system architecture needs to make sure the healthy and alignment of the whole ecosystem upon any change.

Let’s look at another perspective. As project or program manager / director, will you accept any project plan without a report of risks, assumptions, issues and dependencies analysis ? The answer is certainly No if you are playing your role. 🙂

Enough for long-winded opinion. In the next section, I promise to make it be short and sweet by briefly describing the goal of each steps and following with list of questions which by answering that will provide you or your project team a nice RAID analysis report.

Let’s read on.

Risks Analysis

It aims to let stakeholders know the key areas of uncertainty then allow the project team to develop risk mitigation plans.

Questions:

What risk cause project to be delayed or not delivered ?
Is there any s/w dev skills not currently employed in your areas ?
Are new technologies required that the company is not familar with ?
Is there any specific contracting needs ?
Is there any scaling needs that business not willing to pay for ?
Is there any lower environment need that will not be fulfilled ?
Is there any not funded testing efforts ?
Is there any significant business risks being introduced ?

Assumption Analysis

Capturing assumptions before project starts helps levels-set people’s thinking about the architecture and serve as the issue resolution later on when the problem arises.

Questions:

What assumption being made ?
Do you assume that you can successfully develop some new capacity ?
Do you assume that certain group will do particular part of their work ?
Do you assume that certain refactoring will occur with an existing system ?
Do you assume that certain integrations will be required or will explicitly not be supported ?
Do you assume some research and development needs to occur ?

Issues Analysis

The key objective of this step is to give project a sense of what area of the architecture have not been resolved and need to be dealt with in the future.

Questions:

What are areas of the architecture that have not been resolved ?
What areas of the architecture have not been finalized ?
Any areas of technology that you or your team have any concern of known problem ?
Are there contractual issues in play ?
Has a key resource recently moved to another part of the company ?
Is the deadline for delivery overly aggressive ?

Dependencies Analysis

Dependencies are anything that the architecture depends on including items, projects and tasks. Dependencies need to be clearly stated and made visible to the executive staff. It helps them to manage dependencies for you as it’s their interests.

Questions:

What project are you dependent on for your project to complete ?
What licensing agreements are you dependent on to provide needed functionality ?
What purchases or other procurement needed ?
What business arrangement needed ?
What hardware needs to be purchased or operationalized ?
What infrastructural software needs to be operationalized ?
Is software integration with specific tools or services required ?

Hope you enjoy it.

Until we meet again, happy designing and coding.

Great Questions on Scalability

The definition

Scalability is one of the biggest architectural concerns in modern software developments. In technical term, scalability enables a system to gracefully respond to the demands that are placed upon it, e.g. storage IO, database access, CPU utilization, memory utilization , App servers farms and network utilization are most common area requires scalability attention.

The challenges

In my experience, when designing or even developing a scalable solution, it’s difficult to make the right prediction on the demand for the future system and the potential area of optimization. Those are coming through the real experiences upon the system being up and running in production and being used and assessed by users.

It is arguable.

As the architect whom are responsible of the scalable design and solutions, we must plan in scalability as part of the development and deliverable cycle. It could be achieved by chunking, testing and details monitoring to validate the system behaviors.

The options

Two most common options for scalability are scale-up and scale-out. Scale-up means to buy bigger hardware. Scale out means to have multiple sets of hardware that can response to the same requests.

In my early career, the scale-up is often the favorite choices because it provide full control and ownership the the hardware and, most importantly, it is usually budgeted. Not even virtualization of VM concept was employed yet since the technology is not so popular. Then after cloud was introduced in latest 2008, there is a momentum shift to scale-out option which is more cost effective. Why? It simply allows to start small and add system resources as the demand for system’s capability increases overtime.

The questions

Now it comes to the most interesting part of this article: the area need to consider when designing and implementing scalable solutions. For me, I like to ask questions because I often have different answers sometime that interest me and cultivate my interest to ask more. So, here they are.

1. How many users (online and batch) will concurrently access the system ?

2. How much data will the system be able to manage ?

3. How many read / write operations per second does the data store need to handle ?

4. What is the peak concurrency access to the system ?

5. How much data can be cached to minimize the depth within the system that the requests need to travel before being responded to ?

Can data be cached outside the system in content distribution network (CDN) to help to keep traffic away from site ?
Is it worth caching ?

6. Is data replication required for the system ? How long is it acceptable for the data synchronization to take place ?

7. How much logging and events are required to the system to support the operational needs of the system, for now and future performance analysis ?

8. Are there area of data contention ?

9. Are the CPU intensive operations ?

10. How do you plan to measure usage of the system ?

11. Do you plan to meter services to throttle excessive usage ?

12. Do you have ability to auto-provision additional servers to meet the demand ?

13. Can you schedule batch operations to occur at non-peak times ?

I leave it for you, the readers, to decide which questions are most important for you.

The practice

For me, it is important to setup the set of rules and alert so that key personnel will be notified upon certain threshold, in related to system performance. For example, the operational warning for operation team will be triggered upon system resource reaching 80% utilization. If it is over 90%, the urgent notification is needed. And action to be taken to resolve the problem. I love the idea of auto provision base on system usage, it is fully automated and greatly improve the system performance. It is certainly that the rule for demolishing those underutilized VM or instance should be set.

The takeaway

The key to scalability is to test and validate our assumption about system behavior. It is to drive system pass its limit to the breaking points so that we could find out how system fails under load.