Hooked On Hyperion: Hyperion Testing Methodologies

Summary

In this post I try to apply some of quality management methodologies to Hyperion development processes.

Information quality (IQ) is always assessed from the point of view of information consumer (in our case FPNA and other teams that consume financial data). Software quality characteristics include functionality, reliability, usability, efficiency, maintainability and flexibility.

Here I concentrate on the testing processes as a means of data collection. Relevant and accurate data gathered from the test phase, bugs reports from “the field”, and ongoing system monitoring can provide insights about efficiency of existing processes, and what issues should be addressed in what order. It is the key to improving all other quality characteristics.

Hyperion processes in many organizations rarely follow the same standards as other software development projects because of a relatively low system complexity. Low complexity is explained by the following factors:

Hyperion software is developed with intent to enable small teams of consultants to develop robust applications without involvement of software engineers.

Applications that do not have significant amount of custom code, and implement consistent business logic across organization, impose different requirements on development process, versioning, and quality control.

Often the complexity of implementation becomes apparent during the development phase (as opposed to requirements gathering phase) or after multiple change requests. To accommodate those changes, development budget is usually increased, but consulting companies rarely emphasise the fact that after some threshold organizations need to implement full-scale information quality control processes (that would require additional resources). Complexity of the system also grows as a result of growing number of users, after some period of maintaining and enhancing the applications.

The following factors contribute to high complexity of Hyperion environment, and are the incentives to implement good testing and quality control methodologies:

Hyperion environment consists of large number of applications.

Inconsistent business processes

Large amount of custom code

Lack of standard ETL tool

Ongoing development and continuous addition of new features and applications.

1.0 Data gathering

You must quantify an action before you can improve it.

- W. Edwards Deming and J. M. Juran

Imagine the following Hyperion environment:

Testing processes are unstandardized, root cause analysis does not provide aggregate meaningful data.

Data is not gathered from each test case, available information is not categorized. A large number of bugs is reported due to unstable environment, but in a format that does not support bug data gathering and analysis.

Initiated changes to processes are not data driven.

Large percentage of tests fails not because of the errors in feature code, but because other components or test initial conditions are found in states inconsistent with test requirements.

There is no database that specifies relationships and dependencies between objects participating in testing. As a result:

Component testing starts and ends too far from the tested component. I.e. there are multiple components/integration points between test original input and tested component. This is due to uncertainty about the objects that may be affected by the current test or may affect current test input.

Test execution takes much longer than in conditions of known relationships between objects. Multiple tests cannot be scheduled simultaneously on the test environment because of potential conflict.

When changes are done to large objects that involve multiple applications, development process of multiple applications is stalled.

There is no formal unit testing done for custom scripts

The common denominator of the issues above is testing that doesn’t support data gathering about the system under test. By establishing standardized testing processes that provide accurate and relevant data we get benefits of efficient testing in narrow sense and the foundation for the process of information quality improvement in the broad sense.

2.0 Definitions

Testing spectrum. The spectrum of testing is defined by the test granularity: fine-grained test cases allow the tester to check low-level details, often internal to the system; a coarse-grained test case provides the tester with information about general system behaviour.

Structural (white box) tests are the most fine-grained tests that find bugs in low-level operations such as those that occur down at the levels of lines of code.

Unit testing. It is white-box testing almost always done by the programmer which knows the internal structure of the the unit under test.

Behavioral (black box) tests are used to find bugs in high-level operations, at the level of features. It involves a detailed understanding of the application domain, the business problem being solved, and the mission the system serves.

Live/alfa/beta testing involves putting customers, early adopters and other users in front of the system.

3.0 Testing processes in a nutshell

Every test project involves the following stages:

Risks identification and prioritization.

A thorough test plan. Such a plan addresses the issue of scope, test strategy, responsibilities (who does what), resources (time slots, test environments), configuration management, scheduling, milestones and budgeting (working hours spent).

Test system architecture.

Staffing (assigning people to execute the tests)

Test development (building and deploying test scripts, creating test suites and test case libraries). Test suits development proceeds in the priority order.

Test execution (running the tests, recording test status, and reporting results).

Monitoring. This assumes existence of a comprehensive test tracking database, with ability to roll individual tests data into aggregate views.

Track multiple iterations of the tests

Comparing estimated schedules with actual completion dates

Identifying factors that affected the plan, finding appropriate actions

Analysis of the common causes of deviation from the test plan.

Process improvement/elimination of the disruption factors

The benefits of the formalized testing processes (as opposed to sporadic testing) are:

Standardization of tests planning, scheduling, and implementation across projects in the common domain.

Faster identification of the problem, ability to eliminate it before it propagated into other tests and system components.

Convergence of the actual testing to test plans, and as a result more efficient resource allocation.

Reduction of the number of the test cycles

Transparency of the testing process

Faster and more flexible development process.

3.1 Risks identification and prioritization

One of the methodologies of risk identification is “failure mode and effect analysis” (FMEA). Observed symptoms of bugs in system/behavior/feature under test are referred to as failure modes. We break the system into more granular subsystems/components and associate with them potential risks. FMEA can be used for both structural and behavioral testing. It also provides a means of tracking process improvements such as closed-loop corrective actions. Below are the major categories defined by FMEA:

Break down the system into more basic functions.

Potential failure modes - quality risks associated with each function

Failure impacts

Causes of failure

Severity

Detection methods

Likelihood

RPN (Risk Priority Number = Severity*Likelihood)

Recommended action

Who

When

Action results and effectiveness.

With FMEA approach we have a prioritized list of quality risks. Now we can apply schedule, cost and quality constraints to that list. Cost in our case is testers/analysts’ working hours. For the sake of testing team sanity we should assume that we have fixed supply of working hours in the short term. So the only constraints that we can play with are schedules/deadlines and the subset of the quality risks that we are going to address.

If we need to test a large system or component, FMEA could be done before detailed technical or functional design, so it will not have a breakdown of testing activities into the level of individual test cases or test suites. Detailed test cases can be added to the FMEA at the later stages.

3.2 Test Plan

Although test plan structure may look similar to FMEA, they serve different purposes. Test plan and testing itself is a way to mitigate identified risks. With testing we test hypotheses provided by the FMEA. Obviously, when we design testing management database we can and should cross-reference these concepts. From process point of view we differentiate between risk analysis and the test plan since they have different objectives and are performed at different times.

Below is a list of most common sections within the test plan:

Scope. In this section, boundaries are set for the test plan by discussing what will and will not be tested. It is important to define both categories. When test fails or a bug is discovered at a later stage, we’ll want to know if the feature which contains the bug was planned to be tested, planned not to be tested (and why), or wasn’t mentioned in the plan at all.

Quality Risks. References FMEA.

Proposed Schedule of Milestones. Here we consider the order in which primary test types will be executed. Phased test approach marches across the granularity spectrum and has the following benefits:

Structural testing builds system stability. Some bugs are simple for developers to fix, do not require environments set up for integration testing, and if not fixed, will significantly delay the overall testing (failure at the subsequent phases will require resetting testing environment for example).

Structural tests can start early and usually do not depend on other tests.

Software industry studies show that the cost of fixing a bug found just one test phase earlier can be lower by an order of magnitude.

Transitions. For each test phase, the system under test must satisfy a minimal set of qualifications before the test team can effectively and efficiently run tests. This section of the test plan should specify the criteria essential for beginning and completing various test phases. These are usually referred to as entry, exit, and continuation criteria.

Entry Criteria. Should address questions such as:

Are the necessary documentation, design, requirements, diagrams, available and will allow testers to operate the system and judge correct behavior?

Is the system at the appropriate level of quality? Such a question implies that some or all of a previous test phase has been successfully completed.

Are related objects isolated and brought to the initial condition? Is the test environment ready?

Continuation criteria. Continuation criteria define those conditions and situations that must prevail in the testing process to allow testing to continue efficiently.

Exit Criteria. Exit criteria address the issue of how to determine when testing has been completed. For example, one exit criterion might be that all the planned test cases and the regression tests have been run.

Plan for configuration of test environments. This section defines which hardware, servers, networks i will use to conduct the testing.

Testing Strategy. See below as a separate subsection.

Plan of test system development. Typically test projects include some amount of work to design and develop various test objects such as test case, test tools, test procedures, test suites, automated test scripts, and so forth. Collectively these objects are referred to as test systems.

Test execution plan. Scheduling, resource allocation.

Key Participants. Accountable parties, escalation process, hand-off points.

Test Case Tracking. Refers to spreadsheet or database used to manage all the test cases in the test suites, and how progress is tracked through that listing.

Bug isolation and classification. Bugs in this sense are test case failures. Classification assigns a bug to a particular category that indicates how the bug should be communicated and handled. Bug information should also include party responsible for the bug. Classification allows further slicing and dicing by categories. Aggregation of multiple test cases and test projects by bug categories provides knowledge about areas that need immediate improvement, data about working hours lost in each category, and allows ROI calculation by category. Examples of the bug categories would be:

Requirements failure. Failure of the system to meet its requirements. The appropriate party will resolve the problem.

Non-requirements failure. The bug reported is not covered by the system requirements, but it significantly affects the quality of the system in unacceptable ways. The appropriate party will resolve the problem.

Test system was in inappropriate state for testing.

Release Management

Test Cycles. Refers to running all of the test suites planned for a given test phase. Test cycles are usually associated with a single test release of the system under test.

Risks and Contingencies. Potential or likely events that that can make a test plan difficult to or impossible to carry out.

Change History. This part of a document records the changes and revisions that have been made to the test plan itself. Results of the first test cycles may reveal that additional development work needs to be performed, test cases need to be modified, or testing environment needs to be configured differently. It is important to document the reason for test plan change. Changes in test plan usually result in higher costs, delayed deadlines, or reduced project scope. The reasons for plan change should be categorized similarly to bugs categorization. By categorizing test plan changes we can observe what business processes require improvements or additional budgeting (more time spent on design, requirements gathering, structural/unit testing, etc.).

3.2.1 Testing strategy

The criterion to judge if testing goes well (and that defines test development) is test escapes - the number of field reported bugs that your test system missed, but could reasonably have detected during testing. The metric that reflects this concept is DDP (defect detection percentage):

DDP=bugstest/(bugstest+bugscustomer)x100%

A related criterion that drives test cases development is test coverage. Two types of coverage are usually considered.

Functional coverage approach involves assigning specific test cases to each requirement, specification or feature from the design document. The caveat of this approach is that we test something we know the system does. We don’t test something it shouldn’t do. Also by focusing narrowly on functions it is easy to overlook factors such as stability, performance, data quality and other system problems.

Whatever approach is used for test case development i need to ensure quality risk coverage from the FMEA analysis.

Configuration or states coverage. The initial conditions of the system under test can be defined by multiple parameters. Each parameter may accept multiple values resulting in N-dimensional matrix of possible states. System under test may have different business logic or workflows in each or some of those states. Testing all of those states could be either impossible or very costly. The following techniques can be useful in managing this problem:

Pick the right subset of states. Factors to consider:

Distribution and frequency of parameters that define system state

States that differ by their behavior and business logic. I.e. when system is programmed to have different logic in different subsets of states.

Risk of data corruption in particular states

Cost of error propagation to the next test phase from a particular state.

Increase state coverage through careful use of test cycles. By reshuffling variables used in each test cycle you can get more complete coverage. For example, in the image below each blue cell represents a separate test cycle. By the end of the fourth cycle the system was tested in all values of Var2(A:D), and Var1(1:4). This method would work if the system behaves consistently across all values of Var2(A:D), and Var1(1:4).

It may not work when bugs arise in relation to a particular set of states and their business logic. For example, if system behaves inconsistently in state a posteriori A1 and A3. In that case you would need to run another cycle to confirm bugs resolution in that particular set of states.

Widespread beta testing. Although it may sound like impossible option for financial systems, if business logic differs between different states, usually there are users that are responsible for managing those states. In this case beta testing can be a part of user acceptance testing.

Both functional and states coverage answer the question “Have we tested X?”. They don’t answer the question “What bugs haven’t we found?”. To estimate the number of bugs that we haven’t found the most common technique is to rely on historical data. Based on the size of current project, the sizes of past projects, and the number of bugs found in the past projects, we can estimate the number of bugs in the current project. Project size can be expressed in lines of code of function points.

Another important concept related to defining testing strategy is regression analysis. If, as a result of a change in the system under test, a new revision of the system contains a defect not present in prior revision, the system contains regression test gap. Plainly speaking, regression occurs when some previously correct operation misbehaves.

Below are some regression risk mitigation strategies:

Test automation. An automated test system runs more tests in a given period than a manual system does, and, consequently, the regression test gap is reduced. Nevertheless, automation is not a panacea. The most practical approach is to focus on automation efforts of stable functionality introduced in previous releases.

Rerun every test case that failed the last time it was run. The rationale here is that the tests that have failed before are likely to fail again, and that changes tend to be localized and break features that are “nearby”. These assumptions are not valid for all systems and all types of bugs. But without any additional knowledge about the system, or if testers have time constraints to analyse structural relationships between components, this assumption can be a good starting point.

Change analysis. We sift through what we know about the system as it exists now and what we know about the proposed changes.We look at our risk assessments, coverage information and our past bug logs. Based on this data we conclude that the changes increased probability of some particular failure mode in the system.

3.3 Test system architecture

Here we assess the quality of the test system itself (as opposed to system under test).Test system is an organizational capability for testing created by the testing processes, the testware, and the test environment. The test system characteristics would be:

Functionality. What are the essential tasks of a good test system? The answer goes back to quality risks analysis described by FMEA. We try to identify the risks to system under test, prioritize them, and put an estimate in place for how much time and money we can cover risks of higher priority.

Reliability.

One attribute implied by reliable test system is that you can produce the same result repeatedly. Repeatable results depend very much on

The ability to teardown test cases quickly

Allocate time slots for test

Independence on a particular subject knowledge of the tester.

A low degree of coupling between test cases is another attribute of reliability. The failure of one test case shouldn’t prevent you from running the others. This requires unchanged state of shared system objects, or output data.

Usability. No matter how automated test system is, testers must set up the environment, start the tests, interpret the results, reproduce anomalies, deduce failure cause, and update test case or test plan accordingly. In addition tester will usually need to maintain test cases and suites.

Efficiency. Tests that run in sequence are less efficient than those that run in parallel. Performance is another aspect of efficiency.

Maintainability. One aspect of this is flexibility over time. A minor change in the behavior of the system under test should not topple the test system.

3.4 Test development

In this stage we start writing test case library - a collection of independent, reusable test cases. Test case is where the actions are taken on the system under test. As a result of test case execution system under test ends up in some state with resulting outputs and behaviors that testers can compare to expected results. Each test case consists of a sequence of three stages:

The test case setup - describes the steps needed to configure the test environment to run the test case.

Test conditions/logic: allows the tester to assess the quality of the system in relation to a particular risk.

The testcase teardown - specifies the steps required to restore the test environment to a clean condition after execution of the test case.

Because test cases are reusable, I can incorporate each into various suites. In fact, in the testware, a test suite is simply a framework for the execution of test cases, a way of grouping cases. The advantage of a test suite is that it allows me to combine test cases to create unique test conditions.

3.5 Execution, Monitoring, and Analysis

The basic information from test tracking database would be the cumulative number of bugs opened against the cumulative number of bugs closed on a daily basis. This information gives insights on the readiness of the system for release or the next testing phase. When the cumulative opened curve levels off at asymptotic limit, testing is usually considered complete.

The most valuable information coming from test execution is bug reports. Bugs are reported either during testing or from the field. Besides the obvious information that bug report should provide (description, steps to reproduce, isolation, etc.) it should specify the following:

What bug relates to in the sense of subsystem, configuration, quality risk

Where the bug came from: resolution and root cause. It is important to validate that the root cause and related objects are specified in the test case as test case dependencies. For example, if root cause was identified as incorrect variable state, that variable should be among the test case dependencies, and should be included in the test case configuration steps. Some bugs, especially stability problems, often arise from complex interactions of components, any one of which is not necessarily an error. In those cases root cause could be system design or test case design itself.

One of the technique to identify the root cause, and not the symptom, is “5 Whys”. The 5 Whys is a question-asking technique used to explore the cause-and-effect relationships underlying a particular problem.

The following example demonstrates the basic process:

The vehicle will not start. (the problem)

Why? - The battery is dead. (first why)

Why? - The alternator is not functioning. (second why)

Why? - The alternator belt has broken. (third why)

Why? - The alternator belt was well beyond its useful service life and not replaced. (fourth why)

Why? - The vehicle was not maintained according to the recommended service schedule. (fifth why, a root cause)

Root cause analysis and other fields that allow categorization organize bugs into a taxonomy. Examples of bug taxonomy for software development could be:

Functional

Specification. The specification is wrong.

Function. Implementation is wrong.

Test. The system is developed correctly, but the test reports a bug.

System

Software architecture. Technical design has deficiencies.

Hardware devices. Network, disk, memory limitations, etc.

Operating system or third party components failures

Process

Calculations

Initialization. Operation fails due to incorrect initial state.

Control of sequence. Action occurs at the wrong time or in the wrong order. Conflicts between tasks.

Documentation. Incomplete documentation of business logic.

Standards. The system fails to meet industry or client standards.

Root cause analysis data shows the contribution of each error factor to the total number of bugs. In a broader sense data collected from the test system provides insights on quality variability. Multiple statistical analysis techniques such as ANOVA, hypothesis testing, experimental design, etc. can be used to decompose effect of various factors on quality variability.

4.0 Miscellaneous good practices for quality test systems

Differentiation of test activities by skills. Tests should be broken into trivial steps that should be easy enough to be executed by junior analysts or test technicians. People with broad knowledge about processes and applications should write test scripts, test plans and analyse test data. Their time is too expensive to be wasted on running and monitoring test jobs. Also by documenting and writing test scripts they proliferate their knowledge across organization.

Testing requires a tight focus. Testing an area that is already covered by another test, adds little value, waists time and money. Furthermore, avoiding testing areas that are not directly related to your test may also have a benefit of enforcing accountability. Reversely, if i test areas that cross responsibilities of other testers, that would open opportunity for others to do low quality testing.

References:

“Managing the Testing Process: Practical Tools and Techniques for Managing Hardware and Software Testing”, Black, Rex

Tuesday, September 3, 2013

Hyperion Testing Methodologies

1 comment:

Dmitry Kryuk