Tester

Introduction

A software tester is any human being who tests the software. So, what do you answer if anybody asks you 'Does software testing need a specialized skill'? It looks very simple to answer this question, isn't it? But ofcourse not. Software Testing is not as easy as you might think of. Needless to say, It's one of the most important and difficult tasks in the entire gamut of software development life cycle process.

Human beings reaction in this complex world of happenings varies widely with respect to situations, surroundings, emotions, need, requirements, time frame, money, visualization, belief, education, knowledge, expertise, intuition and so on. Such complex is the nature of human being and certainly there's no exception at work place too. Therefore it's cynical to say that the job being done (software testing) by a software tester is simple & complete.

Nevertheless, the quality of the job done by the software tester is directly proportional to his or her psychological maturity and profoundness acquired, adopted and developed with age and experience.

"The psychology of the persons carrying out the testing will have an impact on the testing process [Meyer 1979]."

Let's examine the psychology of the person under discussion (tester) by describing the definition of software testing under three circumstances.

1. "Software testing is the process to prove that the software works correctly"

The above definition sounds good. If the person under discussion (software tester) is the one who has developed the software he/she will try to use the above definition. This person's intentions would mostly revolve around the point to prove the software works. He/She will only give those inputs for which correct results are obtained.

This above explanation is the typical psychology of testing by a software developer.

2. "Testing is the process to prove that the software doesn't work"

This definition sounds very good especially if the aim of the tester is to prove that the software doesn't work for what it's supposed to do. This type of psychology would bring out the most of the defects in the software. A software tester should therefore have such a psychology to push the software beyond its boundaries.

Nevertheless, this definition involves a practical difficulty – If you question "how many days does the software should be tested to conclude that the software works perfectly?", perhaps the answer would again recur to be a question in itself.

It's never a right conclusion to infer that there are no bugs in the software if one says that "the software has no bugs even after testing for more than a week or a month or an year" nor it is wise to say that the software is completely reliable.

Testing does not guarantee defect less or zero bug software at all times but can only reduce or minimize known defects because the very nature of a bug can be volatile. If the above definition is to be strictly implemented, then perhaps the most Operating Systems & other Commercial Software packages which we use today would not have been in the market yet. If so, then the above definition would seem to be unrealistic.

3. "Testing is the process to detect the defects and minimize the risks associated with the residual defects"

This definition appears to sound realistic. Practically, if at any point, the software product development starts, the testing process should start and keep in track the number of bugs being detected while correcting them.

At some stage of a planned testing, there would be a stage where no bugs are identified after many days or weeks or sometimes months of testing which statistically allows you to conclude that the software is "good enough" to be released into the market. i.e there may still exist some bugs undetected, but the risk associated with the residual defects is not very high or are tolerable.

The decision to release a software may thus vary based on many other factors like business value, time constraint, client requirements & satisfaction, company's quality policies etc.

From the above three definitions, we can understand that the psychology of a software tester plays a vital role through out the software development cycle & the quality processes of software testing.

"The objective of testing is to uncover as many bugs as possible. The testing has to be done without any emotional attachment to the software. If someone points out the bugs, it is only to improve the quality of the software rather than to find fault with you."

Role & Characteristics of a Software Tester

Software Development & Software Testing go hand in hand. Both aim at meeting the predefined requirements and purpose. Both are highly creative jobs in nature but the general outlook towards these two individuals is psychological rather than distinctive classification.

If success is perceived as achieving something constructive or creative, like the job of developing a software to work, software testing in contrast is perceived as destructive job or negative job by majority of the masses. Nevertheless, software testing itself has to be perceived as an equally creative job on par with the development job, but this perception can only be practically possible with a different outlook and mindset.

Software Testers require technical skills similar to their development counterparts, but software testers need to acquire other skills, as well.

Keen Observation

Detective Skills

Destructive Creativity

Understanding Product as integration of its parts

Customer Oriented Perspective

Cynical but Affable Attitude

Organized, Flexible & Patience at job

Objective & Neutral attitude

Keen Observation – The software tester must possess the qualities of an 'eye for detail'. A keen observation is the prime quality any software tester must possess. Not all bugs are visible clearly to the naked eye in a software. With keen observation, the tester can easily identify or detect many critical bugs. Observing the software for established parameters like 'look & feel' of GUI, incorrect data representation, user friendliness etc needs this type of characteristics.

Detective Skills – Ideally the software under development would be documented before, after and throughout the development process. Unfortunately, there is every chance of not updating the documentation (specification, defect reports etc) due to time and resource constraints.

The software under testing would be completely described in a well defined set library of functions and design specifications in documents called specs or SRS etc. which needs constant update. The tester should therefore apply his knowledge of rationalization in knowing about the product from formal system like system specifications, design & functional specifications. From this information, the tester should look for researching more analytical information through other sources called non-formal sources of information like developers, support personnel, bugs and related product documents, reviews of related and (seemingly) unrelated documents.

A tester should therefore possess the quality of a 'detective' to explore the product under test, more rationally.

Destructive Creativity – The Tester need to develop destructive skills – means skills to perturb and crash the software workflow and its functionality. In other words, the tester should 'not hesitate to break the software' fearing to buy it or being empathetic to the creation of developers. In software testing, boundaries are meant to be crossed not obeyed.

A creative oriented but destructive approach is necessary while testing a software by the tester to make the software evolve more robustly and reliably.

"The Software under test should be tested like a Mercedes-Benz car. "

Understanding the Product as an integration of its parts – The software(read as product) is a culmination of lines of code interacting with data through user interface and database.

It is an integration of separate group of code interacting with other groups assembled together to function as a whole product. The developers might be working on a respective piece of code module focusing more on those modules under work, at a time.

It is no wonder if the developer sometimes may not even know the complete workflow of the product and not necessary too. Where as, in the case of a tester, being the rider of the software under test, should understand and know about the complete specifications (operational and workflow requirements) of the product.

The tester may not be the best expert on any one module of the software but definitely he/she should gain expertise on the overall operation of the product. In fact, the tester should possess a 'Systems' view of the product because they are the only people to see and test the complete functionality of interdependent modules and compatibility between the software and hardware specifications laid down by the client before it is sent to the customer.

Customer Oriented Perspective – Testers need to possess a customer oriented perspective. Customers (read as users) need not be as wise as software engineers or testers. As the number of computer users are growing every day (need not be as wise as engineers or testers), the end-users may neither be comfortable nor tolerate any bugs, explanations, or frequent upgrades to fix the bugs. As any customer would simply be interested in consuming the product (software) by deriving its value the software tester must adopt a customer oriented perspective while testing the software product.

Hence the tester must develop his abilities to suffice himself/herself in the place of the customer, testing the product as a mere end-user.

Cynical but Affable Attitude – Irrespective of the nature of the project, the tester need to be tenacious in questioning even the smallest ambiguity until it is proved. In other words ,the tester must chant the words "In God we Trust- Everything Else gets Tested" throughout the testing cycle.

There may arise some situations during the course of testing – large number of bugs might be encountered unusually which might compel to further delay in shipping of the product. This may lead to heat up the relation between the testers and other development teams. The tester should balance this relationship not at the cost of the bugs but by convincing and upholding their intentions to "assault the software but not the software developers".

Organized, Flexible and Patience at Job: Software testers must remember the fact that as the world is shrinking since the advent of internet and web, so, change is very dynamic and no exception for modern software. Not all the tests planned are performed completely and some tests dependent on other tests has to be blocked for later testing.

This needs an organized approach by the tester in attempting to phase out the bugs. Sometimes, significant tests has to be rerun which would change the fundamental functionality of the software. The tester should therefore have patience to retest the planned bugs and any new bugs that may arise.

It is even more important to the testers to be patient and keep prepared in the event of a dynamic development and test model. Development keeps changing continuously as requirements and technology keeps changing rapidly and so should testing. The testers must take these changes into account and plan to perform tests while maintaining the control of test environment to ensure valid test results.

Objective and Neutral Attitude – Nobody would like to hear and believe bad news, right? Well, testers are sometimes viewed as messengers of bad news in a software project team. No wonder, how good the tester is (meaning very negative) and brilliant in doing his job (identifying bugs-no one likes to do it but most human beings are taken for granted to be naturally very good at it, at least from childhood), he/she might always be a messenger of communicating the bad part of the software, which, the creators (developers) doesn't like it.

"Nobody who builds the houses likes to have them burned."

The tester must be able to deal with the situation where he/she is blamed for doing his/her job (detecting bugs) too good. The tester's jobs must be appreciated and the bugs should be welcomed by the development team because every potential bug found by the tester would mean a reduction of one bug that would potentially might have been encountered by the customer.

Irrespective of the negative aspects of doing such a highly skillful job, the role of a tester is to report honestly every known bug encountered in the product with always an objective and a neutral attitude.

Here are few links for keywords:

Psychology :

http://www.codeproject.com/KB/bugs/Pratap.aspx

Error incident :

error: A human action that produces an incorrect result. [After IEEE 610]

incident: Any event occurring that requires investigation. [After IEEE 1008]

Test ware:

testware: Artifacts produced during the test process required to plan, design, and execute

tests, such as documentation, scripts, inputs, expected results, set-up and clear-up

procedures, files, databases, environment, and any additional software or utilities used in

testing. [After Fewster and Graham]

Life cycle of bug:

http://www.softwaretestinghelp.com/bug-life-cycle/

http://www.bugzilla.org/docs/2.18/html/lifecycle.html

Bug effect:

http://www.mmm.ucar.edu/mm5/workshop/ws04/PosterSession/Berleant.Dan.pdf

Effect Analysis (FMEA): A systematic approach to risk identification

and analysis of identifying possible modes of failure and attempting to prevent their

occurrence. See also Failure Mode, Effect and Criticality Analysis (FMECA).

Bug classification :

http://www.softwaretestingstuff.com/2008/05/classification-of-defects-bugs.html

Testing Principles:

http://www.the-software-experts.de/e_dta-sw-test-principles.htm

http://www.cs.rit.edu/~afb/20012/cs1/slides/testing-02.html

http://www.qbit-testing.net/ctp/ctp1.pdf

Verfication of requirements

http://www.sstc-online.org/Proceedings/2002/SpkrPDFS/ThrTracs/p961.pdf

State table testing

http://www.cs.helsinki.fi/u/paakki/software-testing-s05-Ch611.pdf

Desion based testing

http://www.cs.swan.ac.uk/~csmarkus/CS339/dissertations/FerridayC.pdf

error guessing: A test design technique where the experience of the tester is used to

anticipate what defects might be present in the component or system under test as a result

of errors made, and to design tests specifically to expose them.

mutation testing:

Testing in which two or more variants of a component or system are

executed with the same inputs, the outputs compared, and analyzed in cases of

discrepancies. [IEEE 610]

Type of static testing

http://wiki.answers.com/Q/What_are_the_types_of_static_testing_in_software_testing

Static testing

Certified Tester

Foundation Level Syllabus

Page no 58

http://www-dssz.informatik.tu-cottbus.de/information/testen/02_human_testing_6up.pdf

static testing: Testing of a component or system at specification or implementation level

without execution of that software, e.g. reviews or static code analysis.

Inspections;

inspection: A type of peer review that relies on visual examination of documents to detect

defects, e.g. violations of development standards and non-conformance to higher level

documentation. The most formal review technique and therefore always based on a

documented procedure. [After IEEE 610, IEEE 1028] See also peer review.

Inspection process

http://www.aqmd.gov/comply/compliance_inspections.html

http://en.wikipedia.org/wiki/Software_inspection

Inspection

Key characteristics:

o led by trained moderator (not the author);

o usually peer examination;

o defined roles;

o includes metrics;

o formal process based on rules and checklists with entry and exit criteria;

o pre-meeting preparation;

o inspection report, list of findings;

o formal follow-up process;

o optionally, process improvement and reader;

o main purpose: find defects.

Structured walk through

A step-by-step presentation by the author of a document in order to gather

information and to establish a common understanding of its content. [Freedman and

Weinberg, IEEE 1028]

advantage of static testing

http://issco-www.unige.ch/en/research/projects/ewg96/node81.html

Recovery testing

The process of testing to determine the recoverability of a software

product. See also reliability testing.

security testing: Testing to determine the security of the software product. See also

functionality testing.

stress testing: A type of performance testing conducted to evaluate a system or component at

or beyond the limits of its anticipated or specified work loads, or with reduced availability

of resources such as access to memory or servers. [After IEEE 610] See also performance

testing, load testing

performance testing: The process of testing to determine the performance of a software

product. See also efficiency testing.

usability testing: Testing to determine the extent to which the software product is

understood, easy to learn, easy to operate and attractive to the users under specified

conditions. [After ISO 9126]

Software measurement

http://www.testinggeek.com/measurment.asp

Reducing test cases

http://pdf.aiaa.org/preview/CDReadyMIA07_1486/PV2007_2979.pdf

http://www.testrepublic.com/forum/topics/1178155:Topic:31804?page=1&commentId=1178155%3AComment%3A31898&x=1#1178155Comment31898

Test case management

http://wiki.services.openoffice.org/wiki/Test_Case_Management

Bug life cycle

Posted In | Testing Life cycle, Bug Defect tracking, Software Testing Templates

What is Bug/Defect?

Simple Wikipedia definition of Bug is: “A computer bug is an error, flaw, mistake, failure, or fault in a computer program that prevents it from working correctly or produces an incorrect result. Bugs arise from mistakes and errors, made by people, in either a program’s source code or its design.”

Other definitions can be:

An unwanted and unintended property of a program or piece of hardware, especially one that causes it to malfunction.

A fault in a program, which causes the program to perform in an unintended or unanticipated manner.

Lastly the general definition of bug is: “failure to conform to specifications”.

If you want to detect and resolve the defect in early development stage, defect tracking and software development phases should start simultaneously.

We will discuss more on Writing effective bug report in another article. Let’s concentrate here on bug/defect life cycle.

Life cycle of Bug:

1) Log new defect

When tester logs any new bug the mandatory fields are:

Build version, Submit On, Product, Module, Severity, Synopsis and Description to Reproduce

In above list you can add some optional fields if you are using manual Bug submission template:

These Optional Fields are: Customer name, Browser, Operating system, File Attachments or screenshots.

The following fields remain either specified or blank:

If you have authority to add bug Status, Priority and ‘Assigned to’ fields them you can specify these fields. Otherwise Test manager will set status, Bug priority and assign the bug to respective module owner.

Look at the following Bug life cycle:

[Click on the image to view full size] Ref: Bugzilla bug life cycle

The figure is quite complicated but when you consider the significant steps in bug life cycle you will get quick idea of bug life.

On successful logging the bug is reviewed by Development or Test manager. Test manager can set the bug status as Open, can Assign the bug to developer or bug may be deferred until next release.

When bug gets assigned to developer and can start working on it. Developer can set bug status as won’t fix, Couldn’t reproduce, Need more information or ‘Fixed’.

If the bug status set by developer is either ‘Need more info’ or Fixed then QA responds with specific action. If bug is fixed then QA verifies the bug and can set the bug status as verified closed or Reopen.

Bug status description:

These are various stages of bug life cycle. The status caption may vary depending on the bug tracking system you are using.

1) New: When QA files new bug.

2) Deferred: If the bug is not related to current build or can not be fixed in this release or bug is not important to fix immediately then the project manager can set the bug status as deferred.

3) Assigned: ‘Assigned to’ field is set by project lead or manager and assigns bug to developer.

4) Resolved/Fixed: When developer makes necessary code changes and verifies the changes then he/she can make bug status as ‘Fixed’ and the bug is passed to testing team.

5) Could not reproduce: If developer is not able to reproduce the bug by the steps given in bug report by QA then developer can mark the bug as ‘CNR’. QA needs action to check if bug is reproduced and can assign to developer with detailed reproducing steps.

6) Need more information: If developer is not clear about the bug reproduce steps provided by QA to reproduce the bug, then he/she can mark it as “Need more information’. In this case QA needs to add detailed reproducing steps and assign bug back to dev for fix.

7) Reopen: If QA is not satisfy with the fix and if bug is still reproducible even after fix then QA can mark it as ‘Reopen’ so that developer can take appropriate action.

8 ) Closed: If bug is verified by the QA team and if the fix is ok and problem is solved then QA can mark bug as ‘Closed’.

9) Rejected/Invalid: Some times developer or team lead can mark the bug as Rejected or invalid if the system is working according to specifications and bug is just due to some misinterpretation.

ABSTRACT

We investigate the impact of bugs in a well-known

weather simulation system, MM5. The findings help fill a

gap in knowledge about the dependability of this widely

used system, leading to both new understanding and

further questions.

In the research reported here, bugs were artificially added

to MM5. Their effects were analyzed to statistically

understand the effects of bugs on MM5. In one analysis,

different source files were compared with respect to their

susceptibility to bugs, allowing conclusions regarding for

which files software testing is likely to be particularly

valuable. In another analysis, we compare the effects of

bugs on sensitivity analysis to their effects on forecasting.

The results have implications for the use of MM5 and

perhaps for weather and climate simulation more generally.

1. MOTIVATION

Computer simulation is widely used, including in

transportation, digital system design, aerospace

engineering, weather prediction, and many other diverse

fields. Simulation results have significant impact on

decision-making and will continue to in the coming years.

However complex simulation programs, like other large

software systems, have defects. These adversely affect the

match between the model that the software is intended to

simulate, and what the software actually does. Simulation

programs produce quantitative outputs and thus typify

software for which bugs can lead to insidious numerical

errors. Weather simulators exemplify this danger because

of the large amount of code they contain, often with

insufficient comments; the complex interactions among

sections of code modeling different aspects of the domain;

and the plausibility that outputs can have even if they

contain significant error. These incorrect results may

escape notice even as they influence the decisions that they

are intended to support. This is an important dependability

issue for complex simulation systems. Hence, investigating

the effects of bugs in a simulation system can illuminate

the robustness of its outputs to the presence of bugs. This

in turn yields better understanding of important quality

issues, like trustworthiness of the outputs, and important

quality control issues, like software testing strategy.

The artificial generation of bugs and observation of their

effects is termed mutation analysis. We have done a

mutation analysis of MM5, an influential weather

simulator available from the National Center for

Atmospheric Research (NCAR). The results obtained

illuminate important aspects of this software system. We

focus on (1) what sections of the code are most likely to

have undetected bugs that cause erroneous results, and

hence are particularly important to test thoroughly; and (2)

the relative dependability of the software for sensitivity

analysis as compared to point prediction, and consequently

for forecasting vs. sensitivity analyses.

2. CONNECTION TO RELATED WORK

Model Error. Errors in a simulation program can be from

errors in the model underlying the system [1], or errors in

the implementation of a correct underlying model. The

current work deals with the latter, but the two are related.

Software Mutation Testing. This is the process of taking

a software system and injecting errors (i.e. adding bugs) to

it to see how its behavior changes [4][5][6]. Typically,

many different buggy versions will be tested and the

responses summarized statistically. The present work tests

several thousand versions of MM5, each with one unique

bug injected.

Sensitivity Analysis. A software system’s sensitivity

s o

= is the amount of change in its output ?o that

occurs due to a change in input by amount ?i [2][3]. In

contrast, examining the change in output due to changes in

actual software code (rather than in input parameters) is

termed software mutation testing. The work reported here

uses both, because one of the questions we focus on is,

“How do software mutations affect sensitivity analyses?”

Ensemble Forecasting [7]. A sophisticated form of

sensitivity analysis in which the system is run many times,

each on its own set of perturbed input parameters, and each

therefore giving its own forecast as output. The set of

forecasts is then statistically characterized. This enables

conclusions such as forecasts that are relatively resistant to

inaccurate specification of initial conditions (using

ensemble means), and forecasts containing distribution

Gene Takle

Department of Geological

and Atmospheric Sciences

Iowa State University

Ames, IA 50011

gstakle@iastate.edu

Zaitao Pan

Department of Earth &

Atmospheric Sciences

St. Louis University

St. Louis, MO 63103

panz@eas.slu.edu

functions describing the probabilities of different values

for a forecasted quantity.

3. APPROACH

Two related studies were performed. In the first,

simulation results were obtained from numerous variants

of MM5. Each variant had a different mutation (“bug”),

deliberately inserted into code within a selected subset of

its Fortran source code files. The set of 24-hour forecasts

produced by these variants for region of the U.S. midwest

with a time step of 4 minutes allows comparison of the

abilities of the source code files to resist erroneous outputs

despite the presence of bugs. That in turn has implications

for software testing strategy.

In the second study, the same mutated variants were used

but the initial conditions were slightly different. The

simulation results for this perturbed initial state were

compared to the results obtained in the first study for each

variant. This gives evidence about whether sensitivity

analyses (in which the ratio of output change to input

change is computed) tend to be affected by bugs less or

more compared to forecasts (in which a single weather

prediction scenario is computed). If less, the dependability

of MM5 for studies using sensitivity analysis is increased

relative to its dependability for forecasting. If more, MM5

would be relatively more dependable for forecasting.

3.1 Study #1: Effects of Bugs on Forecasts

Several thousand different mutations were tested. For each,

the MM5 weather simulator was compiled and run using a

typical American midwest weather scenario for

initialization. Effects of the mutation on the final state of

the forecast were recorded. Each mutation was then

classified into one of three categories (Figure 1).

Figure 1. The three possible effects of a mutation (bolded).

The “Fail to complete” category includes cases where (1)

the simulator terminated (“crashed”) prior to providing a

24-hour forecast, or (2) the simulator did not terminate.

The “Results affected” category includes cases in which

one or more members of a set of 11 important output

parameters had a different forecasted value than it had in

the original, unmutated version of MM5.

Each source code file c in which mutations were made was

analyzed as follows. Define

rc = # of mutations in the “Results affected” category; and

fc = # of mutations in the “Fail to complete” category.

A dependability metric, dc, for rating source code file c

was defined as

dc=fc/(rc+fc).

Value dc estimates the likelihood that a bug inadvertently

introduced during software development will be detected,

therefore removed, and hence not affect results

subsequently. Thus low dc for a file suggests a need to

compensate with extra effort in testing and debugging.

The “Results not affected” category, which has no

influence on dc, contains mutations for which (1) the

mutated code cannot have an effect (meaning the code,

despite its presence, has no function), (2) the mutated code

would fall into one of the previous two categories given

other initial conditions, or (3) the mutation affected some

output but not the output parameters we examined for

effects. Mutations in case (1) do not impact the

dependability issues of interest here, and therefore are

ignorable. Of mutations in case (2), there is no reason to

expect that their effects, when triggered by other initial

conditions, would cause other than random changes to the

various dc. Those in case (3) are in a gray area. If their

effects can be considered insignificant relative to effects on

the output parameters that were analyzed, they can be

ignored. Otherwise, had they been detected, dc would have

been lower. Thus the dc values calculated in this work are

upper bounds relative to an alternate view of the software

outputs that classifies all outputs as equally important.

3.2 Study #2: Comparing Effects on

Forecasts to Effects on Sensitivity Analyses

The results of study #1 provide data on the effects of bugs

on forecasts. By introducing a perturbation to the initial

data and then obtaining data parallel to the data of study

#1, there will be two sets of data that can be compared.

The perturbation in the initial data can be summarized as a

number ?i. The resulting change in the output of the

unmutated software can be summarized as a number ?o.

For each mutated version m that runs to completion under

the two input conditions, there are two corresponding

output scenarios whose difference can be summarized as a

number ?om. Sensitivities (changes in outputs divided by

changes in inputs) can now be calculated.

Let s be the sensitivity of the original, unmutated software:

s=?o/?i.

Let sm be the sensitivity of the software as modified by

mutation m:

sm=?om/?i.

To compare the effect of a mutation m on forecasting to its

effect on sensitivity analysis, we must define measures for

the magnitudes of its effects on forecasting and on

sensitivity analysis, and compare those measures. We

define the magnitude of its effect on forecasting as

o o

F m

where om is a number derived from the output parameters

of the software as modified by mutation m, and o is an

Results not affected

Results affected

One bug in one file

Runs successfully

Fail to complete

analogous number for the unmutated software. Thus, Fm

describes the change in the forecast due to mutation m, as a

proportion of the nominally correct output o.

The magnitude of mutation m’s effect on sensitivity is

analogously defined as

s s

S m

where s and sm are as defined earlier.

For a given mutation, if Fm>Sm, then the mutation affected

the forecast more than the sensitivity analysis. On the other

hand, if Fm<Sm then the opposite is true. Considering the

mutations m collectively, if Fm>Sm for most of them, this

suggests that the MM5 software resists the deleterious

effects of bugs on sensitivity analysis better than it resists

their effects on forecasting. On the other hand, if Fm<Sm,

that suggests the opposite.

Meteorological uses of sensitivity analysis include

predicting the effects of interventions on climate, and

doing ensemble forecasting. Because the present study

relied on 24-hour forecasts, study #2 provides data

relevant to ensemble forecasting. The next section

summarizes the results.

4. RESULTS

Results related to study #1 on the effects of bugs on

forecasts are given in section 4.1. Results related to study

#2, which builds on study #1 with additional investigation

of sensitivity analyses are given in section 4.2. Some

caveats are given in section 4.3.

4.1 The Effects of Bugs on Forecasts

It is useful to be able to compare source code files based

on the likelihood of each that an important bug in it will be

readily detected. Such comparisons would help focus

software testing activities on files for which undetected

bugs are most likely to reside. This motivated defining the

metric dc for the dependability of source code file c in

section 3.1.

Low values of dc mean that a bug is relatively likely to

allow a simulation to complete but with error in its output.

High values of dc mean that a bug is relatively likely to

cause the program to crash without giving output. Thus, a

low value of dc indicates that file c is relatively likely to

contain undetected bugs and therefore that file c is a good

candidate for careful testing to find bugs. Values of dc for a

number of important files in MM5 are shown in Figure 2.

Conclusion: of the files tested, bugs in exmoiss.f, hadv.f,

init.f, mrfpbl.f, param.f, and vadv.f are more likely than

bugs in the others to have insidious rather than obvious

effects. Hence these files might be expected to benefit

from correspondingly thorough testing.

4.2 Effects on Forecasts Vs. Effects on

Sensitivity Analyses

MM5 and other weather and climate simulation systems

are useful for both forecasting and for sensitivity analysis.

It is therefore an interesting question about MM5 which,

forecasts or sensitivity analysis results, are more resistant

to bugs which, as noted earlier, are undoubtedly present.

Likelihood a serious bug will be evident

0.2

0.4

0.6

0.8

1.2

exmoiss

hadv

init

kfpara2

kfpara

lexmoiss

mrfpbl

param

paramr

solve

vadv

tridi2

source code file name

d_c

Figure 2. Values of dc for some MM5 source code files. Lower

values suggest files that are good candidates for focusing

testing effort on.

To answer this question, the metrics described in section

3.2 were used. Instead of a composite number

summarizing 11 output parameters, for this study we used

8 output parameters and analyzed each separately. Because

a total of 10,893 different mutations were tested on both

the base and perturbed inputs, and 8 output parameters

were observed for each mutation, a total of 87,144

different sensitivities (i.e., values of Sm as defined in

section 3.2) were observed. Similarly, the same number of

forecasted parameter values (i.e., values of Fm as defined

in section 3.2) were observed. Although most were

unaffected by the mutation, 12,835 were affected (first two

rows of Table 1).

Perturbation to the input conditions was done as follows.

Some variables in the file init.f were changed by 0.0001%.

The variables chosen for this were the prognostic 3D

variables (UA, UB, VA, VB, TA, TB, QVA, QVB) for

each grid in the domain (these variables are described e.g.

in http://www.mmm.ucar.edu/mm5 /documents/mm5-

code-pdf/sec6.pdf.). The percentage of perturbation was

chosen to be close to the smallest percentage for which the

program produced significant changes to the output.

For many mutations and observed output parameters, the

sensitivities and forecasted values of an observed output

parameter r were exactly the same as for the unmutated

program, so for each such mutation m and parameter r,

Fm(r)=Sm(r). For other mutations and parameters, the

change caused by that mutation to the forecast was greater

than the change to the sensitivity. For those, Fm(r)>Sm(r).

Finally, for the remaining mutations and parameters, the

situation was reversed and Fm(r)<Sm(r). In order to

determine whether MM5’s forecasts or sensitivity analyses

were more resistant to bugs, we simply compare the

quantity of parameter/mutation pairs for which Fm(r)>Sm(r)

to the number for which Fm(r)<Sm(r). If more pairs have

Fm(r)>Sm(r) than have Fm(r)<Sm(r), then forecasts are more

likely to be affected by bugs than sensitivity analyses and

therefore MM5 is observed to be more dependable for

sensitivity analyses than forecasts. If fewer pairs, then the

opposite is observed: MM5 would be observed to be more

dependable for forecasts. Results will be presented at the

meeting.

4.3 Details, Caveats, and Needs for Further

Work

Study #1. The results of this study on the effects of bugs

on forecasts required mutations to be applied to a number

of different source files. A number of different types of

mutations were applied, each designed to be plausible as a

kind of bug a human programmer might accidentally make.

As examples, loops can suffer from “off-by-one” bugs,

additions can be programmed into calculations when

subtractions should be, multiplication and division can be

incorrectly substituted similarly, and so on. Table 2 shows

the types of mutations that were used. Each was applied

opportunistically to a source file at each point in it where

the source code would permit such a mutation. Thus, for

example, off-by-one bugs can be applied to points in the

code where loop control variables were tested.

This process of using a number of different, seemingly

plausible bug types leads to two caveats.

1) The proportion of each bug type in one source file

may not match the proportion in another source

file. The question this leads to is whether

differences observed across source code files

(Figure 2) could be due in part to differences in

the effects of mutations across different bug

types. (Similarly, differences in effects of

different bug types could be due in part to

differences across the source files containing

them.) Appropriate statistical analyses should be

able to separate the effects due to source code file

from those due to bug type.

2) Although the bug types used have intuitive appeal

as mistakes that human programmers might make,

there is no claim that all such mistakes are

captured by the set of bug types used for

mutations in this (or any) work. In particular,

humans can make diffuse mistakes that cover a

number of lines of code, and these are hard to

mimic when generating mutations artificially.

Mutation analyses have historically assumed that

automatically generated mutations are similar in

their effects to human programmer errors,

however.

Another limitation of the study is its reliance on a single

weather forecasting scenario. While within the range of

typical forecasting problems, it is possible that other initial

conditions could lead to different results for the

dependability metrics of the source files. This could in

principle be addressed by seeing if similar results are

obtained for a set of diverse forecasting problems.

Study #2. This study comparing the ability of sensitivity

analyses and forecasts to resist the effects of bugs relied on

a particular perturbation to the input conditions, a

particular weather scenario (as in the other study), and a

particular time period of 24 hours. Each of these may

potentially have an influence on the results.

The perturbation to the input conditions was chosen to be

small in order to stay within the linear response region of

the simulation system. However weather simulation is

well-known to be mathematically chaotic. Thus, there may

be legitimate doubt about whether in fact this experiment

did stay within the linear response region (or even if trying

to do so is worth doing). The questions this raises is

whether different input perturbations might lead to

different assessments of which resists bugs better, weather

forecasts or sensitivity analyses. The solution here is more

extensive testing that includes and compares different

input perturbations.

The relative abilities of forecasts vs. sensitivity analyses to

resist bugs may also potentially depend on the time period

of the simulation. While 24 hours incorporates both day

and night, thereby exercising varied portions of the system,

other time periods are also of interest. This suggests

additional testing that incorporates a range of different

time periods.

Finally, as in the other study, more extensive testing could

usefully include different weather scenarios.

Severity Wise:

Major: A defect, which will cause an observable product failure or departure from requirements.

Minor: A defect that will not cause a failure in execution of the product.

Fatal: A defect that will cause the system to crash or close abruptly or effect other applications.

Work product wise:

SSD: A defect from System Study document

FSD: A defect from Functional Specification document

ADS: A defect from Architectural Design Document

DDS: A defect from Detailed Design document

Source code: A defect from Source code

Test Plan/ Test Cases: A defect from Test Plan/ Test Cases

User Documentation: A defect from User manuals, Operating manuals

Type of Errors Wise:

Comments: Inadequate/ incorrect/ misleading or missing comments in the source code

Computational Error: Improper computation of the formulae / improper business validations in code.

Data error: Incorrect data population / update in database

Database Error: Error in the database schema/Design

Missing Design: Design features/approach missed/not documented in the design document and hence does not correspond to requirements

Inadequate or sub optimal Design: Design features/approach needs additional inputs for it to be completeDesign features described does not provide the best approach (optimal approach) towards the solution required

In correct Design: Wrong or inaccurate Design

Ambiguous Design: Design feature/approach is not clear to the reviewer. Also includes ambiguous use of words or unclear design features.

Boundary Conditions Neglected: Boundary conditions not addressed/incorrect

Interface Error: Internal or external to application interfacing error, Incorrect handling of passing parameters, Incorrect alignment, incorrect/misplaced fields/objects, un friendly window/screen positions

Logic Error: Missing or Inadequate or irrelevant or ambiguous functionality in source code

Message Error: Inadequate/ incorrect/ misleading or missing error messages in source code

Navigation Error: Navigation not coded correctly in source code

Performance Error: An error related to performance/optimality of the code

Missing Requirements: Implicit/Explicit requirements are missed/not documented during requirement phase

Inadequate Requirements: Requirement needs additional inputs for to be complete

Incorrect Requirements: Wrong or inaccurate requirements

Ambiguous Requirements: Requirement is not clear to the reviewer. Also includes ambiguous use of words – e.g. Like, such as, may be, could be, might etc.

Sequencing / Timing Error: Error due to incorrect/missing consideration to timeouts and improper/missing sequencing in source code.

Standards: Standards not followed like improper exception handling, use of E & D Formats and project related design/requirements/coding standards

System Error: Hardware and Operating System related error, Memory leak

Test Plan / Cases Error: Inadequate/ incorrect/ ambiguous or duplicate or missing - Test Plan/ Test Cases & Test Scripts, Incorrect/Incomplete test setup

Typographical Error: Spelling / Grammar mistake in documents/source code

Variable Declaration Error: Improper declaration / usage of variables, Type mismatch error in source code

Status Wise:

Open

Closed

Deferred

Cancelled

These are the major ways in which defects can be classified. I'll write more regarding the probable causes of these defects in one of my next posts... :)

Principles of Software TestingWhat is not Software Testing?Test is not debugging. Debugging has the goal to remove errors. The existence and the approximate location of the error are known. Debugging is not documented. There is no specification and there will be no record (log) or report. Debugging is the result of testing but never a substitution for it.

Test can never find 100% of the included errors. There will be always a rest of remaining errors which can not be found. Each kind of test will find a different kind of errors.

Test has the goal to find errors and not their reasons. Therefore the activity of testing will not include any bug fixing or implementation of functionality. The result of testing is a test report. A tester must not modify the code he is testing, by no means! This has to be done by the developer based on the test report he receives from the tester.

What is Software Testing?Test is a formal activity. It involves a strategy and a systematic approach. The different stages of tests supplement each other. Tests are always specified and recorded.

Test is a planned activity. The workflow and the expected results are specified. Therefore the duration of the activities can be estimated. The point in time where tests are executed is defined.

Test is the formal proof of software quality.

Overview of Test Methods Static tests

The software is not executed but analyzed offline. In this category would be code inspections (e.g. Fagan inspections), Lint checks, cross reference checks, etc.

Dynamic tests

This requires the execution of the software or parts of the software (using stubs). It can be executed in the target system, an emulator or simulator. Within the dynamic tests the state of the art distinguishes between structural and functional tests.

Structural tests

These are so called "white-box tests" because they are performed with the knowledge of the source code details. Input interfaces are stimulated with the aim to run through certain predefined branches or paths in the software. The software is stressed with critical values at the boundaries of the input values or even with illegal input values. The behavior of the output interface is recorded and compared with the expected (predefined) values.

Functional tests

These are the so called "black-box" tests. The software is regarded as a unit with unknown content. Inputs are stimulated and the values at the output results are recorded and compared to the expected and specified values.

Test by progressive Stages The various tests are able to find different kinds of errors. Therefore it is not enough to rely on one kind of test and completely neglect the other. E.g. white-box tests will be able to find coding errors. To detect the same coding error in the system test is very difficult. The system malfunction which may result from the coding error will not necessarily allow conclusions about the location of the coding error. Test therefore should be progressive and supplement each other in stages in order to find each kind of error with the appropriate method.

Module test

A module is the smallest compilable unit of source code. Often it is too small to allow functional tests (black-box tests). However it is the ideal candidate for white-box tests. These have to be first of all static tests (e.g. Lint and inspections) followed by dynamic tests to check boundaries, branches and paths. This will usually require the employment of stubs and special test tools.

Component test

This is the black-box test of modules or groups of modules which represent certain functionality. There are no rules about what can be called a component. It is just what the tester defined to be a component, however it should make sense and be a testable unit. Components can be step by step integrated to bigger components and tested as such.

Integration test

The software is step by step completed and tested by tests covering a collaboration of modules or classes. The integration depends on the kind of system. E.g. the steps could be to run the operating system first and gradually add one component after the other and check if the black-box tests still run (the test cases of course will increase with every added component). The integration is still done in the laboratory. It may be done using simulators or emulators. Input signals may be stimulated.

System test

This is a black-box test of the complete software in the target system. The environmental conditions have to be realistic (complete original hardware in the destination environment).

Testing Principles

Out of Glenford Myers, ``The Art of Software Testing'':

A necessary part of a test case is a definition of the expected output or result.

A programmer should avoid attempting to test his or own program.

A programming organization should not test its own programs.

Thoroughly inspect the results of each test.

Test cases must be written for invalid and unexpected, as well as valid and expected, input conditions.

Examining a program to see if it does not do what it is supposed to do is only half of the battle. The other half is seeing whether the program does what it is not supposed to do.

Avoid throw-away test cases unless the program is truly a throw-away program.

Do not plan a testing effort under the tacit assumption that no errors will be found.

The probability of the existence of more errors in a section of a program is proportional to the number of errors already found in that section.

Testing Principles

Out of Glenford Myers, ``The Art of Software Testing'':

A necessary part of a test case is a definition of the expected output or result.

A programmer should avoid attempting to test his or own program.

A programming organization should not test its own programs.

Thoroughly inspect the results of each test.

Test cases must be written for invalid and unexpected, as well as valid and expected, input conditions.

Examining a program to see if it does not do what it is supposed to do is only half of the battle. The other half is seeing whether the program does what it is not supposed to do.

Avoid throw-away test cases unless the program is truly a throw-away program.

Do not plan a testing effort under the tacit assumption that no errors will be found.

The probability of the existence of more errors in a section of a program is proportional to the number of errors already found in that section.

Testing Principles

Out of Glenford Myers, ``The Art of Software Testing'':

A necessary part of a test case is a definition of the expected output or result.

A programmer should avoid attempting to test his or own program.

A programming organization should not test its own programs.

Thoroughly inspect the results of each test.

Test cases must be written for invalid and unexpected, as well as valid and expected, input conditions.

Examining a program to see if it does not do what it is supposed to do is only half of the battle. The other half is seeing whether the program does what it is not supposed to do.

Avoid throw-away test cases unless the program is truly a throw-away program.

Do not plan a testing effort under the tacit assumption that no errors will be found.

The probability of the existence of more errors in a section of a program is proportional to the number of errors already found in that section.

Testing Principles

Out of Glenford Myers, ``The Art of Software Testing'':

A necessary part of a test case is a definition of the expected output or result.

A programmer should avoid attempting to test his or own program.

A programming organization should not test its own programs.

Thoroughly inspect the results of each test.

Test cases must be written for invalid and unexpected, as well as valid and expected, input conditions.

Examining a program to see if it does not do what it is supposed to do is only half of the battle. The other half is seeing whether the program does what it is not supposed to do.

Avoid throw-away test cases unless the program is truly a throw-away program.

Do not plan a testing effort under the tacit assumption that no errors will be found.

The probability of the existence of more errors in a section of a program is proportional to the number of errors already found in that section.

STSC

As Of: 5/8/2002 David A. Cook V&V 10

Verification – Part 1 Inspections

• I would love to cover inspections and

reviews – but time does not permit. If

you are interested, email me and I will

send you a presentation on reviews

and inspections that I co-presented

last year.

• If you can only do ONE inspection on

a project, you get the biggest “bang

for the buck” (ROI) performing

requirements inspections.

way?

• Are the requirements complete?

STSC

As Of: 5/8/2002 David A. Cook V&V 12

Interpretation - Exercise

Count the number of occurrences below of the

letter e . No questions – just count!!

ANSWER: the letter e occurs ___ times.

“Any activity you can perform to reduce

testing errors is cost-efficient. Inspections

are very effective, and requirements

inspections provide the the biggest ROI of

any inspection effort.”

STSC

As Of: 5/8/2002 David A. Cook V&V 13

Are the Requirements Complete?

• The best way to determine this is to use

checklists to ensure that you are asking the

right questions at inspection time.

• In addition, such things as a trained and

prepared inspection team plus adequate

preparation help.

Ada

STSC

As Of: 5/8/2002 David A. Cook V&V 14

What to Inspect:

• The software requirements specification (or

however you list the requirements).

• The sources or preliminaries for the SRS

(concept of operations) or any documents that

preceded the SRS.

If it is used as a

requirements

reference, then

inspect it!

STSC

As Of: 5/8/2002 David A. Cook V&V 15

Let’s see – I have to

pay the grocer,

pay the electricity bill,

pay the mortgage,

and make a car payment.

This is an expensive check list!

Sample Checklists

• Following are some sample checklists.

• These checklists grew out of research with

several customers.

STSC

As Of: 5/8/2002 David A. Cook V&V 16

Requirements Review Checklist

• Is problem partitioning complete?

• Are external and internal interfaces

properly defined?

• Can each requirement be tested?

• Can each requirement be numbered and

easily identified? (Is the requirement

tracable?)

STSC

As Of: 5/8/2002 David A. Cook V&V 17

Requirements Review Checklist

(Cont.)

• Has necessary prototyping been conducted

for users?

• Have implied requirements (such as speed,

errors, response time) been stated?

• Is performance achievable within constraints

of other system elements?

STSC

As Of: 5/8/2002 David A. Cook V&V 18

Requirements Review Checklist (Cont.)

• Are requirements consistent with schedule,

resources and budget?

• Is information to be displayed to the user

listed in the requirements?

• Have future issues (upgrades, planned

migration, long-term use) been addressed in

requirements?

STSC

As Of: 5/8/2002 David A. Cook V&V 19

Even I can’t remember

all of these at once –

so I inspect

requirements when I

am finished writing

them!

What You Are Looking For…

• Are requirements that are

• Unambiguous

• Complete

• Verifiable

• Consistent

• Modifiable

• Traceable

• Usable

• Prioritized (optional)

STSC

As Of: 5/8/2002 David A. Cook V&V 20

Metrics for

Requirements

• During the

requirements phase,

there are few metrics

that are very useful.

• One simple metric is

simply the % of

requirements that

have been inspected.

STSC

As Of: 5/8/2002 David A. Cook V&V 21

Metrics for Requirements

• Another useful metric

# Of requirements that reviewers

interpreted the same

Total # of requirements reviewed

STSC

As Of: 5/8/2002 David A. Cook V&V 22

Configuration Management

“The most frustrating software problems are

often caused by poor configuration

management. The problems are frustrating

because they take time to fix. They often

happen at the worst time, and they are totally

unnecessary”.

-Watts Humphrey,

“Managing The Software Process.”

STSC

As Of: 5/8/2002 David A. Cook V&V 23

Configuration Management

• Again, time does not permit a complete

discussion of CM. If you want an intro, email

and I’ll send you a presentation.

• Requirements require a

centralized location and

STRICT configuration

management. If you are

“sloppy” here – soon

it all goes downhill.

STSC

As Of: 5/8/2002 David A. Cook V&V 24

Validation of Requirements

STSC

As Of: 5/8/2002 David A. Cook V&V 25

Validation Is Difficult for Software

• Based on three concepts

• Testing

• Metrics

• Quality assurance teams

• Testing is the least important

• Difficult to test requirements prior to coding

• Most requirement methodologies (prototyping,

simulation) rely on many assumptions and

simplifications

STSC

As Of: 5/8/2002 David A. Cook V&V 26

Metrics for Validation

• Metrics can be useful, but

mostly in hindsight.

• If you know how many of

your bugs or how much of

your defect fix time are

requirements-related, you

can adjust inspections and

reviews accordingly.

STSC

As Of: 5/8/2002 David A. Cook V&V 27

Validation of Requirements

• The best way to validate requirements is to

involve customers in the requirements

inspection process.

• End-users.

• Program office.

• User management.

• End-users are the most effective, but hardest

to include.

• End-users typically only see the “small

picture”, and requirements are written in the

large.

STSC

As Of: 5/8/2002 David A. Cook V&V 28

If Dogs Wrote

Requirements

Validation Typically Occurs Twice

• During requirements gathering/analysis/

review.

• After coding and during test. This SHOULD

be the function of the QA team.

STSC

As Of: 5/8/2002 David A. Cook V&V 29

Danger, Danger

• QA and testing are

EXTREMELY EXPENSIVE!

• Anything you can do to shorten the QA/test

phase is useful. If you rely on QA and testing

to find and fix errors – your

software is probably

• Late

• Over budget

• Still full of errors after you deliver it!!

STSC

As Of: 5/8/2002 David A. Cook V&V 30

Validation Summary

• In summary – validation requires a tie-in

between implementers and users. If you can’t

involve users as much as you would like, then

designate people to (ALAC) Act Like A

customer.

• I have no checklists for validation, but suggest

that you focus on two areas:

• External interfaces to systems.

• Internal interfaces between modules or sub-systems.

STSC

As Of: 5/8/2002 David A. Cook V&V 31

STSC

As Of: 5/8/2002 David A. Cook V&V 32

Good Source of Information

• Software Verification and Validation for

Practitioners and Managers, by Steven R.

Rakitin

STSC

As Of: 5/8/2002 David A. Cook V&V 33

In Short … There Are Solutions!

6. State-based testing

State machine: implementation-independent specification

(model) of the dynamic behavior of the system

§ state: abstract situation in the life cycle of a system entity

(for instance, the contents of an object)

§ event: a particular input (for instance, a message or

method call)

§ action: the result, output or operation that follows an event

§ transition: an allowable two-state sequence, that is, a

change of state (”firing”) caused by an event

§ guard: predicate expression associated with an event,

stating a Boolean restriction for a transition to fire

Jukka Paakki 2

initial state /

action

final state /

action

event [ guard ] /

action

Jukka Paakki 3

There are several types of state machines:

§ finite automaton (no guards or actions)

§ Mealy machine (no actions associated with states)

§ Moore machine (no actions associated with transitions)

§ statechart (hierarchical states: common superstates)

§ state transition diagram: graphic representation of a state

machine

§ state transition table: tabular representation of a state machine

Jukka Paakki 4

Example: Mealy model of a two-player video game

§ each player has a start button

§ the player who presses the start button first gets the

first serve

§ the current player serves and a volley follows:

– if the server misses the ball, the server’s opponent becomes

the server

– if the server’s opponent misses the ball, the server’s score

is incremented and the server gets to serve again

– if the server’s opponent misses the ball and the server’s

score is at game point, the server is declared the winner

(here: a score of 21 wins)

Jukka Paakki 5

Game

started

Player 1

served

Player 2

served

Player 1

won

Player 2

won

p1_Start /

simulateVolley( )

p2_Start /

simulateVolley( )

p1_WinsVolley

[ p1_Score < 20 ] /

p1AddPoint( )

simulateVolley( )

p2_WinsVolley

[ p2_Score < 20 ] /

p2AddPoint( )

p1_WinsVolley / simulateVolley( )

simulateVolley( )

p2_WinsVolley /

p1_WinsVolley simulateVolley( )

[ p1_Score = 20 ] /

p1AddPoint

p2_WinsVolley

[ p2_Score = 20 ] /

p2AddPoint

p1_IsWinner ? /

return TRUE

p2_IsWinner ? /

return TRUE

Jukka Paakki 6

General properties of state machines

§ typically incomplete

– just the most important states, events and transitions are given

– usually just legal events are associated with transitions; illegal events

(such as p1_Start from state Player 1 served) are left undefined

§ may be deterministic or nondeterministic

– deterministic: any state/event/guard triple fires a unique transition

– nondeterministic: the same state/event/guard triple may fire several

transitions, and the firing transition may differ in different cases

§ may have several final states (or none: infinite computations)

§ may contain empty events (default transitions)

§ may be concurrent: the machine (statechart) can be in several

different states at the same time

Jukka Paakki 7

The role of state machines in software testing

§ Framework for model testing, where an executable model (state

machine) is executed or simulated with event sequences as test

cases, before starting the actual implemention phase

§ Support for testing the system implementation (program) against

the system specification (state machine)

§ Support for automatic generation of test cases for the

implementation

– there must be an explicit mapping between the elements of the state

machine (states, events, actions, transitions, guards) and the elements of

the implementation (e.g., classes, objects, attributes, messages, methods,

expressions)

– the current state of the state machine underlying the implementation must

be checkable, either by the runtime environment or by the implementation

itself (built-in tests with, e.g., assertions and class invariants)

Jukka Paakki 8

Validation of state machines

Checklist for analyzing that the state machine is complete and

consistent enough for model or implementation testing:

– one state is designated as the initial state with outgoing transitions

– at least one state is designated as a final state with only incoming

transitions; if not, the conditions for termination shall be made explicit

– there are no equivalent states (states for which all possible outbound event

sequences result in identical action sequences)

– every state is reachable from the initial state

– at least one final state is reachable from all the other states

– every defined event and action appears in at least one transition (or state)

– except for the initial and final states, every state has at least one incoming

transition and at least one outgoing transition

Jukka Paakki 9

– for deterministic machines, the events accepted in a particular state are

unique or differentiated by mutually exclusive guard expressions

– the state machine is completely specified: every state/event pair has at least

one transition, resulting in a defined state; or there is an explicit

specification of an error-handling or exception-handling mechanism for

events that are implicitly rejected (with no specified transition)

– the entire range of truth values (true, false) must be covered by the guard

expressions associated with the same event accepted in a particular state

– the evaluation of a guard expression does not produce any side effects in

the implementation under test

– no action produces side effects that would corrupt or change the resultant

state associated with that action

– a timeout interval (with a recovery mechanism) is specified for each state

– state, event and action names are unambiguous and meaningful in the

context of the application

Jukka Paakki 10

Control faults

When testing an implementation against a state machine, one shall

study the following typical control faults (incorrect sequences of

events, transitions, or actions):

– missing transition (nothing happens with an event)

– incorrect transition (the resultant state is incorrect)

– missing or incorrect event

– missing or incorrect action (wrong things happen as a result of a transition)

– extra, missing or corrupt state

– sneak path (an event is accepted when it should not be)

– trap door (the implementation accepts undefined events)

Jukka Paakki 11

Game

started

Player 1

served

Player 2

served

Player 1

won

Player 2

won

p1_Start /

simulateVolley( )

p2_Start /

simulateVolley( )

p1_WinsVolley

[ p1_Score < 20 ] /

p1AddPoint( )

simulateVolley( )

p2_WinsVolley

[ p2_Score < 20 ] /

p2AddPoint( )

simulateVolley( )

p2_WinsVolley /

p1_WinsVolley simulateVolley( )

[ p1_Score = 20 ] /

p1AddPoint

p2_WinsVolley

[ p2_Score = 20 ] /

p2AddPoint

p1_IsWinner ? /

return TRUE

p2_IsWinner ? /

return TRUE

Missing transition: Player 2 loses the volley, but continues as server

Jukka Paakki 12

Game

started

Player 1

served

Player 2

served

Player 1

won

Player 2

won

p1_Start /

simulateVolley( )

p2_Start /

simulateVolley( )

p1_WinsVolley

[ p1_Score < 20 ] /

p1AddPoint( )

simulateVolley( )

p2_WinsVolley

[ p2_Score < 20 ] /

p2AddPoint( )

simulateVolley( )

p1_WinsVolley /

simulateVolley( )

p2_WinsVolley /

p1_WinsVolley simulateVolley( )

[ p1_Score = 20 ] /

p1AddPoint

p2_WinsVolley

[ p2_Score = 20 ] /

p2AddPoint

p1_IsWinner ? /

return TRUE

p2_IsWinner ? /

return TRUE

Incorrect transition: After player 2 misses, the game resets

Jukka Paakki 13

Game

started

Player 1

served

Player 2

served

Player 1

won

Player 2

won

p1_Start p2_Start

p1_WinsVolley

[ p1_Score < 20 ] /

p1AddPoint( )

simulateVolley( )

p2_WinsVolley

[ p2_Score < 20 ] /

p2AddPoint( )

p1_WinsVolley / simulateVolley( )

simulateVolley( )

p2_WinsVolley /

p1_WinsVolley simulateVolley( )

[ p1_Score = 20 ] /

p1AddPoint

p2_WinsVolley

[ p2_Score = 20 ] /

p2AddPoint

p1_IsWinner ? /

return TRUE

p2_IsWinner ? /

return TRUE

Missing actions: No volley is generated, and the system will wait forever

Jukka Paakki 14

Game

started

Player 1

served

Player 2

served

Player 1

won

Player 2

won

p1_Start /

simulateVolley( )

p2_Start /

simulateVolley( )

p1_WinsVolley

[ p1_Score < 20 ] /

p1AddPoint( )

simulateVolley( )

p2_WinsVolley

[ p2_Score < 20 ] /

p1AddPoint( )

p1_WinsVolley / simulateVolley( )

simulateVolley( )

p2_WinsVolley /

p1_WinsVolley simulateVolley( )

[ p1_Score = 20 ] /

p1AddPoint

p2_WinsVolley

[ p2_Score = 20 ] /

p2AddPoint

p1_IsWinner ? /

return TRUE

p2_IsWinner ? /

return TRUE

Incorrect action: Player 2 can never win

Jukka Paakki 15

Game

started

Player 1

served

Player 2

served

Player 1

won

Player 2

won

p1_Start /

simulateVolley( )

p2_Start /

simulateVolley( )

p1_WinsVolley

[ p1_Score < 20 ] /

p1AddPoint( )

simulateVolley( )

p2_WinsVolley

[ p2_Score < 20 ] /

p2AddPoint( )

p1_WinsVolley / simulateVolley( )

simulateVolley( )

p2_WinsVolley /

p1_WinsVolley simulateVolley( )

[ p1_Score = 20 ] /

p1AddPoint

p2_WinsVolley

[ p2_Score = 20 ] /

p2AddPoint

p1_IsWinner ? /

return TRUE

p2_IsWinner ? /

return TRUE

Sneak path: Player 2 can immediately win by pressing the start button at serve

p2_Start

Jukka Paakki 16

Game

started

Player 1

served

Player 2

served

Player 1

won

Player 2

won

p1_Start /

simulateVolley( )

p2_Start /

simulateVolley( )

p1_WinsVolley

[ p1_Score < 20 ] /

p1AddPoint( )

simulateVolley( )

p2_WinsVolley

[ p2_Score < 20 ] /

p2AddPoint( )

p1_WinsVolley / simulateVolley( )

simulateVolley( )

p2_WinsVolley /

p1_WinsVolley simulateVolley( )

[ p1_Score = 20 ] /

p1AddPoint

p2_WinsVolley

[ p2_Score = 20 ] /

p2AddPoint

p1_IsWinner ? /

return TRUE

p2_IsWinner ? /

return TRUE

Trap door: Player 1 can immediately win by pressing the Esc key at serve

Esc

Jukka Paakki 17

Test design strategies for state-based testing

Test cases for state machines and their implementations can be

designed using the same notion of coverage as in white-box

testing:

§ test case = sequence of input events

§ all-events coverage: each event of the state machine is

included in the test suite (is part of at least one test case)

§ all-states coverage: each state of the state machine is exercised

at least once during testing, by some test case in the test suite

§ all-actions coverage: each action is executed at least once

Jukka Paakki 18

§ all-transitions: each transition is exercised at least once

– implies (subsumes) all-events coverage, all-states coverage,

and all-actions coverage

– ”minimum acceptable strategy for responsible testing of a state machine”

§ all n-transition sequences: every transition sequence generated

by n events is exercised at least once

– all transitions = all 1-transition sequences

– all n-transition sequences implies (subsumes) all (n-1)-transition

sequences

§ all round-trip paths: every sequence of transitions beginning

and ending in the same state is exercised at least once

§ exhaustive: every path over the state machine is exercised at

least once

– usually totally impossible or at least unpractical

Jukka Paakki 19

Game

started

Player 1

served

Player 2

served

Player 1

won

Player 2

won

p1_Start /

simulateVolley( )

p2_Start /

simulateVolley( )

p1_WinsVolley

[ p1_Score < 20 ] /

p1AddPoint( )

simulateVolley( )

p2_WinsVolley

[ p2_Score < 20 ] /

p2AddPoint( )

p1_WinsVolley / simulateVolley( )

simulateVolley( )

p2_WinsVolley /

p1_WinsVolley simulateVolley( )

[ p1_Score = 20 ] /

p1AddPoint

p2_WinsVolley

[ p2_Score = 20 ] /

p2AddPoint

p1_IsWinner ? /

return TRUE

p2_IsWinner ? /

return TRUE

all-states coverage

Jukka Paakki 20

Game

started

Player 1

served

Player 2

served

Player 1

won

Player 2

won

p1_Start /

simulateVolley( )

p2_Start /

simulateVolley( )

p1_WinsVolley

[ p1_Score < 20 ] /

p1AddPoint( )

simulateVolley( )

p2_WinsVolley

[ p2_Score < 20 ] /

p2AddPoint( )

p1_WinsVolley / simulateVolley( )

simulateVolley( )

p2_WinsVolley /

p1_WinsVolley simulateVolley( )

[ p1_Score = 20 ] /

p1AddPoint

p2_WinsVolley

[ p2_Score = 20 ] /

p2AddPoint

p1_IsWinner ? /

return TRUE

p2_IsWinner ? /

return TRUE

all-transitions coverage

Jukka Paakki 21

7. Testing object-oriented software

§ The special characteristics of the object-oriented software

engineering paradigm provide some advantages but also

present some new problems for testing

– advantages: well-founded design techniques and models (UML), clean

architectures and interfaces, reusable and mature software patterns

– problems: inheritance, polymorphism, late (dynamic) binding

§ There is a need for object-oriented testing process, objectoriented

test case design, object-oriented coverage metrics, and

object-oriented test automation

Jukka Paakki 22

Object-oriented code defects

– buggy interaction of individually correct superclass and subclass methods

– omitting a subclass-specific override for a high-level superclass method in a

deep inheritance hierarchy

– subclass inheriting inappropriate methods from superclasses (”fat

inheritance”)

– failure of a subclass to follow superclass contracts in polymorphic servers

– omitted or incorrect superclass-initialization in subclasses

– incorrect updating of superclass instance variables in subclasses

– spaghetti polymorphism resulting in loss of execution control (the ”yo-yo”

problem caused by dynamic binding and self and super objects)

– subclasses violating the state model or invariant of the superclass

– instantiation of generic class with an untested type parameter

– corrupt inter-modular control relationships, due to delocalization of

functionality in mosaic small-scale and encapsulated classes

– unanticipated dynamic bindings resulting from scoping nuances in multiple

and repeated inheritance

Jukka Paakki 23

Object-oriented challenges for test automation

– low controllability, observability, and testability of the system, due to a

huge number of small encapsulated objects

– difficulties in analyzing white-box coverage, due to a large number of

complex and dynamic code dependencies

– construction of drivers, stubs, and test suites that conform to the inheritance

structure of the system

– reuse of superclass test cases in (regression) testing of subclasses

– incomplete applications (object-oriented frameworks)

Jukka Paakki 24

Object-oriented challenges for testing process

– need for a larger number of testing phases, due to an iterative and

incremental software development process

– unclear notion of ”module” or ”unit”, due to individual classes being too

small and too much coupled with other classes

– (UML) models being too abstract to support test case design

– need for more test cases than for conventional software, due to the inherent

dynamic complexity of object-oriented code

– general and partly abstract reusable classes and frameworks, making it

necessary to test them also for future (unknown) applications

Jukka Paakki 25

7.1. UML and software testing

UML (Unified Modeling Language): a design and modeling

language for (object-oriented) software

– widely used ”de facto” standard

– provides techniques to model the software from different perspectives

– supports facilities both for abstract high-level modeling and for more

detailed low-level modeling

– consists of a variety of graphical diagram types that can be extended for

specific application areas

– associated with a formal textual language, OCL (Object Constraint

Language)

– provides support for model-based testing: (1) testing of an executable

UML model, (2) testing of an implementation against its UML model

Jukka Paakki 26

UML diagrams in software testing

§ Use case diagrams: testing of system-level functional

requirements, acceptance testing

§ Class diagrams: class (module / unit) testing, integration testing

§ Sequence diagrams, collaboration diagrams : integration testing,

testing of control and interaction between objects, testing of

communication protocols between (distributed) objects

§ Activity diagrams: testing of work flow and synchronization

within the system, white-box testing of control flow

§ State diagrams (statecharts): state-based testing

§ Package diagrams, component diagrams: integration testing

§ Deployment diagrams: system testing

Jukka Paakki 27

7.2. Testing based on use cases

§ Use case: a sequence of interactions by which the user

accomplishes a task in a dialogue with the system

– use case = one particular way to use the system

– use case = user requirement

– set of all use cases = complete functionality of the system

– set of all use cases = interface between users (actors) and system

§ Scenario: an instance of a use case, expressing a specific task by

a specific actor at a specific time and using specific data

§ UML: use case model = set of use case diagrams, each

associated with a textual description of the user’s task

– for testing, the use case model must be extended with (1) the domain of

each variable participating in the use cases, (2) the input/output

relationships among the variables, (3) the relative frequency of the use

cases, (4) the sequential (partial) order among the use cases

Jukka Paakki 28

Example: Automatic Teller Machine (ATM)

Bank

customer

Establish

session

Inquiry

Bank

transaction

Withdraw Deposit

Repair

Replenish

<<extends>> <<extends>>

ATM

technician

Jukka Paakki 29

Some ATM use cases and scenarios

Establish session /

Bank customer

Withdraw /

Bank customer

Replenish /

ATM technician

1) Wrong PIN entered. Display ”Reenter PIN”.

2) Valid PIN entered; bank not online. Display

”Try later”.

3) Valid PIN entered; account closed. Display

”Account closed, call your bank”.

4) Requests 50 €; account open; balance 51 €. 50 €

dispensed.

5) Requests 100 €; account closed. Display

”Account closed, call your bank”.

6) ATM opened; cash dispenser empty; 50000 €

added.

7) ATM opened; cash dispenser full.

Use case / actor Scenario

Jukka Paakki 30

From use cases to test cases

1. Identification of the operational variables: explicit inputs and outputs,

environmental conditions, states of the system, interface elements

– Establish session: PIN on the card, PIN entered, response from the bank, status of

the customer’s account

2. Domain definitions (and equivalence classes) of the variables

– Establish session: PIN Î [ 0000 – 9999 ]

3. Development of an operational relation among the variables, modeling the

distinct responses of the system as a decision table of variants

– variant: combination of equivalence classes, resulting in a specific system action

– variants shall be mutually exclusive

– scenarios are represented by variants

4. Development of the test cases for the variants

– at least one ”true” test case that satisfies all the variant’s requirements on variable

values

– at least one ”false” test case that violates the variant’s requirements

– typically every ”false” test case is a ”true” test case for another variant

Jukka Paakki 31

Does not – Invalid card Eject

acknowledge

7 Revoked –

Card Retain

revoked

6 Revoked – Acknowledges –

Select None

service

Matches Acknowledges Open

card

5 Valid

Insert ATM Eject

card

4 Invalid – – –

Matches Acknowledges Closed Call bank Eject

card

3 Valid

Does not – Try later Eject

acknowledge

Matches

card

2 Valid

Reenter None

Doesn’t – –

match card

1 Valid

ATM card

action

ATM

message

Account

status

Bank

response

Entered

Card

Variant

Establish session: Operational relation

Operational variables Expected action

Jukka Paakki 32

Eject

None

Invalid card

Reenter

–

NACK

–

0000

5555

9999

7F (1T)

Retain

None

Revoked

Reenter

–

ACK

–

9999

5555

9998

6F (1T)

None

Eject

Select

Try later

Open

–

ACK

NACK

3210

0001

3210

0001

5F (2T)

Eject

None

Insert ATM

Select

–

Open

–

ACK

–

0001

%&+?

0001

4F (5T)

Eject

Call bank

Insert ATM

Closed

–

ACK

–

0000

–

0000

000?

3F (4T)

Eject

Try later

Call bank

–

Closed

NACK

ACK

9999

2F (3T)

None

Eject

Reenter

Try later

–

NACK

1134

1234

1F (2T)

ATM card

action

ATM

message

Account

status

Bank

response

Entered

Card

Variant

Establish session: Test cases

Jukka Paakki 33

Some questions when designing the testing of subclasses:

§ Should we test inherited methods?

– extension: inclusion of superclass features in a subclass, inheriting

method implementation and interface (name and arguments)

– overriding: new implementation of a subclass method, with the same

inherited interface as that of a superclass method

– specialization: definition of subclass-specific methods and instance

variables, not inherited from a superclass

§ Can we reuse superclass tests for extended and overridden

methods?

§ To what extent should we exercise interaction among methods

of all superclasses and of the subclass under test?

7.3. Class-based testing

Jukka Paakki 34

1. Superclass modifications: a method is changed in superclass

– the changed method and all its interactions with changed and

unchanged methods must be retested in the superclass

– the method must be retested in all the subclasses inheriting the

method as extension

– the ”super” references to the method must be retested in subclasses

2. Subclass modifications: a subclass is added or modified,

without changing the superclasses

– all the methods inherited from a superclass must be retested in the

context of the subclass, even those which were just included by

extension but not overridden

Superclass-subclass development scenarios

Jukka Paakki 35

class Account {

Money balance; // current balance

void openAccount (Money amount)

{ balance = amount; } // account is

// opened

// …

};

class Stocks : public Account {

void takeShares (Money shares)

{ balance = shares ´ unitPrice ; }

// value of shares is deposit

// into account

void giveProportion ( )

{ print (totalPrice / balance); }

// …

};

Shared data

The balance may be

computed in openAccount

instead of takeShares,

so openAccount

must be retested for finding

the defect in giveProportion

Contract: balancemay be 0

Contract: balance may not be 0

Jukka Paakki 36

3. Reuse of the superclass test suite: a method is added to a

subclass, overriding a superclass method

– the new implementation of the method must be tested

– test cases for the superclass method can be reused, but additional test

cases are also needed for subclass-specific behavior

4. Addition of a subclass method: an entirely new method is

added to a specialized subclass

– the new method must be tested with method-specific test cases

– interactions between the new method and the old methods must be

tested with new test cases

Superclass-subclass development scenarios

Jukka Paakki 37

5. Change to an abstract superclass interface

– ”abstract” superclass: some of the methods have been left without

implementation; just the interface has been defined

– all the subclasses must be retested, even if they have not changed

6. Overriding a superclass method used by another superclass

method

– the superclass method that is using the overridden method will

behave differently, even though it has not been changed itself

– so, the superclass method must be retested

Superclass-subclass development scenarios

Jukka Paakki 38

class Account {

rollOver( ) { … yearEnd( ) … }

yearEnd ( ) { … foo ( ) … }

// …

}

class Deposit : public Account {

yearEnd ( ) { … bar ( ) … }

// …

}

The inherited

rollOver will now activate

”bar” instead of ”foo”,

so it must be retested

Jukka Paakki 39

8. Integration testing

Integration testing: search for module faults that cause failures

in interoperability of the modules

– ”module”: generic term for a program element that

implements some restricted, cohesive functionality

– typical modules: class, component, package

– ”interoperability”: interaction between different modules

– interaction may fail even when each individual module

works perfectly by itself and has been module-tested

– usually related to the call interface between modules: a

function call or its parameters are buggy

Jukka Paakki 40

Typical interface bugs

§ missing, overlapping, or conflicting functions

§ incorrect or inconsistent data structure used for a file or database

§ violation of the data integrity of global store or database

§ unexpected runtime binding of a method

§ client sending a message that violates the server’s constraints

§ wrong polymorphic object bound to message

§ incorrect parameter value

§ attempt to allocate too much resources from the target, making

the target crash

§ incorrect usage of virtual machine, object request broker, or

operating system service

§ incompatible module versions or inconsistent configuration

Jukka Paakki 41

Interaction dependencies between modules:

– function calls (the most usual case)

– remote procedure calls

– communication through global data or persistent storage

– client-server architecture

– composition and aggregation

– inheritance

– calls to an application programming interface (API)

– objects used as message parameters

– proxies

– pointers to objects

– dynamic binding

Jukka Paakki 42

Dependency tree: hierarchical representation of the

dependencies between modules

In integration testing: uses-dependency

– usually: module A uses module B =

(some function in) module A calls (some function in) module B

– can be generated by automated static dependency analyzers

B C

A uses (calls) B

A uses (calls) C

C uses (calls) D

Jukka Paakki 43

A: Air traffic

control system

B: Radar

input module

C: Aircraft

tracking module

D: Controller

interface module

E: Aircraft

position module

F: Aircraft identification

module

G: Position prediction

module

H: Display

update module

(1) Big-bang integration: all the modules are tested at the same time

– One test configuration: {A, B, C, D, E, F, G, H}

– Failure => where is the bug?

– OK for small and well-structured systems

– OK for systems constructed from trusted components

Integration testing strategies

Jukka Paakki 44

A: Air traffic

control system

B: Radar

input module

C: Aircraft

tracking module

D: Controller

interface module

E: Aircraft

position module

F: Aircraft identification

module

G: Position prediction

module

H: Display

update module

(2) Top-down integration: the system is tested incrementally,

level by level with respect to the dependency tree,

starting from the top level (root of the tree)

1. The main module A is tested by itself. Stubs: B, C, and D.

2. The subsystem {A, B, C, D} is tested. Stubs: E, F, G, and H.

3. Finally, the entire system is tested. No stubs.

– Failure in step 3: the bug is in E, F, G, or H

Jukka Paakki 45

Advantages:

– in case of failure, the suspect area is limited

– testing may begin early: when the top-level modules have been coded

– early validation of main functionality

– modules may be developed in parallel with testing

Disadvantages:

– lowest-level (most often used) modules are tested last, so performance

problems are encountered late

– requires stubs: partial proxy implementations of called modules

– since stubs must provide some degree of real functionality, it may be

necessary to have a set of test case specific stubs for each module

– dependency cycles must be resolved by testing the whole cycle as a

group or with a special cycle-breaking stub

B C

Jukka Paakki 46

A: Air traffic

control system

B: Radar

input module

C: Aircraft

tracking module

D: Controller

interface module

E: Aircraft

position module

F: Aircraft identification

module

G: Position prediction

module

H: Display

update module

(3) Bottom-up integration: the system is tested incrementally,

starting from the bottom level (leaves of the dependency tree)

1. The module E is tested alone. Driver: C. [The driver may have been developed

already in module testing of E.]

2. The module F is tested. (Extended) driver: C.

3. The module G is tested. (Extended) driver: C.

4. The module H is tested. (Extended) driver: C.

5. The module B is tested. Driver: A.

6. The module D is tested. (Extended) driver: A.

7. The modules {C, E, F, G, H} are tested together. (Extended) driver: A.

Failure => bug in C or its downwards-uses interfaces.

8. Finally, the entire system is tested. No drivers.

Failure => bug in A or its uses interfaces.

Jukka Paakki 47

Advantages:

– in case of failure, the suspect area is very narrow: one module and its

interfaces

– testing may begin early: as soon as any leaf-level module is ready

– initially, testing may proceed in parallel

– early validation of performance-critical modules

Disadvantages:

– main functionality (usability) and control interface of the system are

validated late

– requires drivers: partial proxy implementations of calling modules

– dependency cycles must be resolved

– requires many tests, especially if the dependency tree is broad and has

a large number of leaves

Jukka Paakki 48

A: Air traffic

control system

B: Radar

input module

C: Aircraft

tracking module

D: Controller

interface module

E: Aircraft

position module

F: Aircraft identification

module

G: Position prediction

module

H: Display

update module

(4) Backbone integration: combination of big-bang, top-down, and bottom-up

1. The backbone (kernel) modules of the system are tested first, bottom-up.

2. The system control is tested next, top-down.

3. The backbone modules are big-banged together, bottom-up.

4. Top-down integration is continued, until the backbone has been included.

1. Each backbone module E, F, G, H is tested alone, in isolation. Drivers are needed.

2. The control subsystem {A} is tested. Stubs are needed.

3. The backbone and its controller {E, F, G, H, C} are tested. Driver for C is needed.

4. The subsystem {A, B, C, D} is tested. Stubs are needed.

5. The entire system is tested. No stubs or drivers.

Backbone:

E, F, G, H

Jukka Paakki 49

Usually bottom-up preferable:

+ Drivers are much easier to write than stubs, and can even be

automatically generated.

+ The approach provides a greater opportunity for parallelism than the

other approaches; that is, there can be several teams testing different

subsystems at the same time.

+ The approach is effective for detecting detailed design or coding errors

early enough.

+ The approach detects critical performance flaws that are generally

associated with low-level modules.

+ The approach supports the modern software engineering paradigms

based on classes, objects, and reusable stand-alone components.

- A prototype of the system is not available for (user) testing until at the

very end.

- The approach is not effective in detecting architectural design flaws of

large scale.

- May be too laborious for large systems.

Jukka Paakki 50

9. Regression testing

ANSI/IEEE Standard of Software Engineering

Terminology:

selective testing of a system or component to

verify that modifications have not caused

unintended effects and that the system or

component still complies with its specifications

Jukka Paakki 51

Usually integrated with maintenance, to check the validity of the

modifications:

§ corrective maintenance (fixes)

§ adaptive maintenance (porting to a new operational

environment)

§ perfective maintenance (enhancements and

improvements to the functionality)

Jukka Paakki 52

Differences between “ordinary” testing and

regression testing:

§ regression testing uses a (possibly) modified

specification, a modified implementation, and an old test

plan (to be updated)

§ regression testing checks the correctness of some parts

of the implementation only

§ regression testing is usually not included in the total cost

and schedule of the development

§ regression testing is done many times in the life-cycle of

the system (during bug fixing and maintenance)

Jukka Paakki 53

General approach:

– P: program (module, component), already tested

– P’: modified version of P

– T: test suite used for testing P

1. Select T’ Í T, a set of tests to execute on P’.

2. Test P’ with T’, establishing the correctness of P’ with

respect to T’.

3. If necessary, create T’’, a set of new functional (black-box)

or structural (white-box) tests for P’.

4. Test P’ with T’’, establishing the correctness of P’ with

respect to T’’.

5. Create T’’’, a new test suite and test history for P’, from T,

T’, and T’’: T’’’ = T È T’’

Jukka Paakki 54

§ naïve regression testing (re-running all the existing test

cases) not cost-effective, although the common strategy in

practice

§ principle 1: it is useless to test unmodified parts of the

software again

§ principle 2: modified parts shall be tested with existing test

cases (may not be possible, if the interface has been changed:

GUI capture-replay problem!)

§ principle 3: new parts shall be tested with new test cases

§ database of test cases needed

§ additional automation support possible (data-flow analysis)

Jukka Paakki 55

When and how?

§ A new subclass has been developed: rerun the superclass tests on

the subclass, run new tests for the subclass

§ A superclass is changed: rerun the superclass tests on the

superclass and on its subclasses, rerun the subclass tests, test the

superclass changes

§ A server (class) is changed: rerun tests for the clients of the

server, test the server changes

§ A bug has been fixed: rerun the test that revealed the bug, rerun

tests on any parts of the system that depend on the changed code

§ A new system build has been generated: rerun the build test suite

§ The final release has been generated: rerun the entire system

test suite

Jukka Paakki 56

Selective regression test strategies

§ Risk-based heuristics: rerun tests for (1) unstable,

(2) complex, (3) functionally critical, or (4) frequently

modified modules (classes, functions, and such)

§ Profile-based heuristics: rerun tests for those use cases,

properties, or functions that are the most frequently used

§ Coverage-based heuristics: rerun those tests that yield the

highest white-box (statement, branch, …) code coverage

§ Reachability-based heuristics: rerun those tests that reach an

explicitly or implicitly changed or deleted module

§ Dataflow-based heuristics: rerun those tests that exercise

modified or new definition-use pairs

§ Slice-based heuristics: rerun those tests that generate a similar

data-flow slice over the old and the new software version

Jukka Paakki 57

10. Statistical testing

§ operational profile: distribution of functions actually used

Þ probability distribution of inputs

§ most frequently used functions / properties tested more

carefully, with a larger number of test cases

§ more complex, error-prone functions tested more carefully

§ central “kernel” functions tested more carefully

§ useful strategy, when in lack of time or testing resources

§ based on experience with and statistics over previous use

§ history data over existing systems must be available

Jukka Paakki 58

Example: file processing

- create: probability of use 0.5 (50 %)

- delete: probability of use 0.25 (25 %)

- modify: probability of use 0.25 (25 %)

ð create: 50 test cases

ð delete: 25 test cases

ð modify: 25 test cases

In total: 100

Jukka Paakki 59

Improved strategy: relative probability of failure

- modify twice as complex as create and delete

ð create: 40 test cases

ð delete: 20 test cases

ð modify: 40 test cases

In total: 100

Jukka Paakki 60

Calendar: 0.05

Alarm: 0.05

Games: 0.4

Messages: 0.2 Calls: 0.3

Memory: 0.1

Snake: 0.8

Logic: 0.1

123: 0.3

ABC: 0.7

– Memory: 0.4 * 0.1 = 4 %

– Snake: 0.4 * 0.8 = 32 %

– Logic: 0.4 * 0.1 = 4 %

– Calls: 30 %

Games: 40 %

Profile of a mobile

phone as a graph

Complexity ?!?

Jukka Paakki 61

11. Practical aspects of testing

When to stop?

§ all the planned test cases have been executed

§ required (white-box) coverage has been reached (e.g. all the

branches have been tested)

§ all the (black-box) operations, their parameters and their

equivalence classes have been tested

§ required percentage (e.g., 95%) of estimated total number of

errors has been found (known from company’s project history)

§ required percentage (e.g., 95%) of seeded errors has been found

§ mean time to failure (in full operation) is greater than a required

threshold time (e.g. 1 week)

Jukka Paakki 62

Frequently made mistakes in software testing

§ Most organizations believe that the purpose of testing is

just to find bugs, while the key is to find the important bugs.

§ Test managers are reporting bug data without putting it into

context.

§ Testing is started too late in the project.

§ Installation procedures are not tested.

Jukka Paakki 63

§ Organizations overrely on beta testing done by the

customers.

§ One often fails to correctly identify risky areas (areas that

are used by more customers or would be particularly severe

if not functioning perfectly).

§ Testing is a transitional job for new, novice programmers.

§ Testers are recruited from the ranks of failed programmers.

§ Testers are not domain experts.

Jukka Paakki 64

§ The developers and testers are physically separated.

§ Programmers are not supposed to test their own code.

§ More attention is paid to running tests than to designing them.

§ Test designs are not reviewed / inspected.

§ It is checked that the product does what it’s supposed to do,

but not that it doesn’t do what is isn’t supposed to do.

Jukka Paakki 65

§ Testing is only conducted through the user-visible interface.

§ Bug reporting is poor.

§ Only existing test cases (no new ones) are applied in

regression testing of system modifications.

§ All the tests are automated, without considering economic

issues.

§ The organization does not learn, but makes the same testing

mistakes over and over again.

Jukka Paakki 66

Empirical hints for defect removal

B. Boehm, V.R. Basili: Software Defect Reduction Top 10 List. (IEEE)

Computer, 34, 1 (January), 2001, 135-137.

1. Finding and fixing a software problem after delivery is often 100 times more

expensive than finding and fixing it during the requirements and design

phase.

2. Current software projects spend about 40 to 50 percent of their effort on

avoidable rework.

3. About 80 percent of avoidable rework comes from 20 percent of the defects.

4. About 80 percent of the defects come from 20 percent of the modules, and

about half of the modules are defect free.

5. About 90 percent of the downtime comes from, at most, 10 percent of the

defects.

Jukka Paakki 67

6. Peer reviews catch 60 percent of the defects.

7. Perspective-based reviews catch 35 percent more defects than

nondirected reviews.

8. Disciplined personal practices can reduce defect introduction rates by up

to 75 percent.

9. All other things being equal, it costs 50 percent more per source

instruction to develop high-dependability software products than to

develop low-dependability software products. However, the investment is

more than worth it if the project involves significant operations and

maintenance costs.

10. About 40 to 50 percent of user programs contain nontrivial defects.

Conclusion: Try to avoid testing as much as possible. However, if

you have to test, do it carefully and with focus.

Jukka Paakki 68

Observations

§ Testing is not considered as a profession, but rather as an art.

– Psychologically, testing is more “boring” than coding

§ Too little resources are allocated to testing.

– Exception: Microsoft has about as many testers as programmers

§ Quality of testing is not measured.

– A general test plan is usually written, but the process is not tracked

§ The significance of testing tools is overestimated.

§ The significance of inspections is underestimated.

1 Introduction

From the beginning of software development testing has always been incorporated

into the final stages. Over the years the technicality of software has increased

dramatically. As this complexity increases, programmers realise that testing is just as

important as the development stages.

Nowadays there are two main types of stages, White box testing and Black box

testing. Grey box testing is another type, but it’s not so well known and is sometimes

used with Decision Table-Based Testing:

White box – testing concerned with the internal structure of the program.

Black box – testing concerned with input/output of the program.

Grey box – using the logical relationships to analyse the input/output of

the program.

Testing has been modularised into different categories as it’s been practised and

researched since the 1970’s. This paper is going to discuss and analyse Decision

Table-Based Testing which is a Functional Testing method, also known as Black box

testing.

In this paper I aim to explore the fundamental concepts of Decision Table-Based

Testing and how it differs from other functional testing methods. Using examples I

will also explain how Decision Table-Based Testing operates and how to use it.

The remainder of this document is split up into 3 areas:-

Background – An overview of DT-BT’s origin and its relationship with other

Functional Testing methods.

Applications – A discussion on the ways DT-BT can be used, along with

examples and how it compares with different Functional Testing methods.

Summary & Further Work – A brief outline of the material covered and

further work that will complement it.

A Review Paper on Decision Table-Based Testing

Cai Ferriday 345399 Software Testing

2 Background

2.1 Origin

Decision Table-Based Testing has been around since the early 1960’s; it is used to

depict complex logical relationships between input data. There are two closely related

methods of Functional Testing:

• The Cause-Effect Graphing (Elmendorf, 1973; Myers, 1979), and

• The Decision Tableau Method (Mosley, 1993).

These methods are a little different to Decision Table-Based Testing, but use similar

concepts of which I will explain later on. I won’t go into great detail as these methods

are awkward and unnecessary with the use of Decision Tables.

2.2 Definitions

2.2.1 Decision Table-Based Testing?

A Decision Table is the method used to build a complete set of test cases without

using the internal structure of the program in question. In order to create test cases we

use a table to contain the input and output values of a program. Such a table is split up

into four sections as shown below in fig 2.1.

Figure 2.1 The Basic Structure of a Decision Table.

In fig 2.1 there are two lines which divide the table into its main structure. The solid

vertical line separates the Stub and Entry portions of the table, and the solid horizontal

line is the boundary between the Conditions and Actions. So these lines separate the

table into four portions, Condition Stub, Action Stub, Condition Entries and Action

Entries.

A column in the entry portion of the table is known as a rule. Values which are in the

condition entry columns are known as inputs and values inside the action entry

portions are known as outputs. Outputs are calculated depending on the inputs and

specification of the program.

A Review Paper on Decision Table-Based Testing

Cai Ferriday 345399 Software Testing

In fig 2.2 there is an example of a typical Decision Table. The inputs in this given

table derive the outputs depending on what conditions these inputs meet. Notice the

use of “-“in the table below, these are known as don’t care entries. Don’t care entries

are normally viewed as being false values which don’t require the value to define the

output.

Figure 2.2 a Typical Structure of a Decision Table

Figure 2.2 shows its values from the inputs as true(T) or false(F) values which are

binary conditions, tables which use binary conditions are known as limited entry

decision tables. Tables which use multiple conditions are known as extended entry

decision tables. One important aspect to notice about decision tables is that they aren’t

imperative as that they don’t apply any particular order over the conditions or actions.

2.2.2 Cause-Effect Graphing?

Cause-Effect Graphing is very similar to Decision Table-Based Testing, where logical

relationships of the inputs produce outputs; this is shown in the form of a graph. The

graph used is similar to that of a Finite State Machine (FSM). Symbols are used to

show the relationships between input conditions, those symbols are similar to the

symbols used in propositional logic.

A Review Paper on Decision Table-Based Testing

Cai Ferriday 345399 Software Testing

2.3 Functional Relationships

There are 3 main functional methods:

Boundary Value Analysis (BVA)

Equivalence Class Testing (ECT)

Decision Table-Based Testing (DT-BT)

All three functional testing methods compliment each other; the functional testing

outcome can not be completed to a satisfactory level using just one of these functional

testing strategies, or even two.

Decision Table-Based Testing has evolved from Equivalence Class Testing in some

way; Equivalence Class Testing groups together inputs of the same manner which

behave similarly. DT-BT follows on from ECT by grouping together the input and

output behaviours into an “equivalence” rule and testing the logical dependencies of

these rules. These rules are regarded as test cases, therefore redundant rules are

discarded.

2.3.1 Effort

Although all three testing strategies have similar properties and all work towards the

same goal, each of the methods is different in terms on application and effort.

Boundary Value Analysis is not concerned with the data or logical dependencies as

it’s a domain based testing technique. It requires low effort to develop test cases for

this method but on the other hand its sophistication is low and the number of test

cases generated is high compared with other functional methods.

Equivalence Class Testing is more concerned with data dependencies and treating

similar inputs and outputs the same by grouping them in classes. This reduces the test

cases and increases the effort used to create test cases due to the effort required to

group them. This is a more sophisticated method of test case development as it’s more

concerned with the values inside of the domain.

Decision Table-Based Testing on the other hand uses similar traits as Equivalence

Class Testing; it tests logical dependencies, which increases the effort in identifying

test cases, which increases the sophistication of these test cases. Because DT-BT

relies more on the logical dependencies of the equivalence classes in the decision

table this reduced the amount of rules required to complete the set of test cases.

____________________________

Boundary Value Analysis: - A functional testing strategy which is concerned with the limits of an input/output domain.

Equivalence Class Testing: - A functional testing strategy where the inputs/outputs that behave similarly are grouped together

into equivalence partitions, in order to decrease test cases.

A Review Paper on Decision Table-Based Testing

Cai Ferriday 345399 Software Testing

Number of Test Cases Effort to Identify Test Cases

Boundary Value Analysis Decision Table

Equivalence Class

Decision Table

Boundary Value Analysis

Sophistication

Figure 2.3 Graphs showing the relationships of all functional methods

2.3.2 Efficiency

In order to give a sense of how efficient Decision Table-Based Testing is with respect

to other functional methods, Boundary Value Analysis and Equivalence Class Testing

have to be examined.

On average Boundary Value Analysis yields 5 times as many test cases as Decision

Table- Based Testing, and Equivalence Class Testing 1½ times as many test cases. On

this basis we can say that there exists either test case redundancy or impossible test

cases, either way this reduces the efficiency of these testing strategies and shows how

efficient Decision Table-Based Testing is.

But as stated above, we cannot totally disregard the other functional testing methods

as they complement each other and are not totally redundant in all testing cases.

3 Applications

In order to demonstrate and aid the understanding of Decision Tables I will show the

some of the many applications it has and aid them with examples. I am going to use

the Triangle Problem to explore decision tables in more depth.

3.1 The Basics

As explained above, there are two types of decision table, limited and extended entry

tables. Below, in fig 3.1 is an example of a limited entry decision table where the

inputs are depicted using binary values.

Fig 3.1 Decision Table for the Triangle Problem

A Review Paper on Decision Table-Based Testing

Cai Ferriday 345399 Software Testing

When creating a decision table there are many techniques people adopt to improve the

construction. Most testers add two main techniques, the use of the “impossible” action

stub and don’t care entries. The impossible action stub entry is used as a form of error

catching, if out of range values are inputted then the impossible action entry is

checked. Don’t care entries are also another useful procedure, they are used when no

other checks are required in the table, and therefore we don’t care what the rest of the

values are. Often, these don’t care entries are referred to as false values.

3.2 Rule Counts

Rule counts are used along with don’t care entries as a method to test for decision

table completeness; we can count the amount of test cases in a decision table using

rule counts and compare it with a calculated value. Below is a table which illustrates

rule counts in a decision table.

Fig 3.2 an example of Rule Counts in a Decision Table

The table above has a total rule count of 64; this can be calculated using the limited

entry formula as it’s a limited entry table.

Number of Rules = 2Number of Conditions

So therefore, Number of Rules = 26 = 64

When calculating rule counts the don’t care values play a big part to the rule count of

that rule. Each “don’t care” entry in a rule doubles the rule count of that rule; so, each

rule has a rule count of 1 initially, if a “don’t care” entry exists then this rule count

doubles for every “don’t care” entry. So using both ways of computing the rule count

brings us to the same value which shows we have a complete decision table.

Where the Rule Count value of the decision table does not equal the number of rules

computed by the equation we know the decision table is not complete, and therefore

needs revision.

A Review Paper on Decision Table-Based Testing

Cai Ferriday 345399 Software Testing

3.3 Redundancy & Inconsistency

When using “don’t care” entries a level of care must be taken, using these entries can

cause redundancy and inconsistency within a decision table.

Using rule counts to check the completeness of the decision table can help to

eliminate redundant rules within the table. An example of a decision table with a

redundant rule can be seen in figure 3.3.

From the table you can see that there is some conflict between rules 1-4 and rule 9,

rules 1-4 use “don’t care” entries as an alternative to false, but rule 9 replaces those

“don’t care” entries with “false” entries. So when condition 1 is met rules 1-4 or 9

may be applied, luckily in this particular instance these rules have identical actions so

there is only a simple correction to be made to complete the following table.

Figure 3.3 an example of a Redundant Rule

If on the other hand the actions of the redundant rule differ from that of rules 1-4 then

we have a problem. A table showing this can be seen in figure 3.4.

Figure 3.4 an example of Inconsistent Rules

From the above decision table, if condition 1 was to be true and conditions 2 and 3

were false then rules 1-4 and 9 could be applied. This would result in a problem

because the actions of these rules are inconsistent so therefore the result is nondeterministic

and would cause the decision table to fail.

A Review Paper on Decision Table-Based Testing

Cai Ferriday 345399 Software Testing

3.4 Creating a Decision Table

When creating a decision table care must be taken when choosing your stub

conditions, and also the type of decision table you are creating. Limited Entry

decision tables are easier to create than extended entry tables. Here are some steps on

how to create a simple decision table using the Triangle Problem.

Step One – List All Stub Conditions

In this example we take three inputs, and from those inputs we perform conditional

checks to calculate if it’s a triangle, if so then what type of triangle it is. The first

condition we add must check whether all 3 sides constitute a triangle, as we don’t

want to perform other checks if the answer is false.

Then the remainder of the conditions will check whether the sides of the triangles are

equal or not. As there are only three sides to a triangle means that we have three

conditions when checking all of the sides.

So the condition stubs for the table would be:

a, b, c form a triangle?

a = b?

a = c?

Step Two – Calculate the Number of Possible Combinations (Rules)

So in our table we have 4 condition stubs and we are developing a limited entry

decision table so we use the following formula:

Number of Rules = 2Number of Condition Stubs

So therefore, Number of Rules = 24 = 16

So we have 16 possible combinations in our decision table.

Step Three – Place all of the Combinations into the Table

Figure 3.5 a Complete Decision Table

Here we have a complete decision table as there are three don’t care entries which

gives the first rule a rule count of 8 and the last 8 rules have a rule count of 1 each, so

the total rule count for the table is 16. Therefore we know that this table is complete.

A Review Paper on Decision Table-Based Testing

Cai Ferriday 345399 Software Testing

Step Four – Check Covered Combinations

This step is a precautionary step to check for errors and redundant and inconsistent

rules. We don’t want to go any further with the development of the decision table if

we have errors because this will complicate matters in the next step.

Step Five – Fill the Table with the Actions

For the final step of creating a decision table we must fill the Action Stub and Entry

sections of the table. The final decision table is shown in fig 3.6.

After completing the decision table and adding the actions we notice that each action

stub is exercised once, and we have also added the “impossible” action into the table

or catching rogue values.

Figure 3.6 the Final Decision Table

The above table can be explored and expanded by refining the first condition stub.

Instead of having “a, b, c form a triangle” we can expand this by using 3 conditions

rather than one, which will increase accuracy. This would also bring in a logical

dependency, because the actions of the first condition stub would affect the remaining

condition stubs. This is shown in figure 3.1.

A Review Paper on Decision Table-Based Testing

Cai Ferriday 345399 Software Testing

4 Summary & Future Work

Decision Table-Based Testing is an important part of Functional Testing; it explores

testing routes that other functional strategies avoid. One key aspect of decision tablebased

testing is the use of logical dependencies; this enhances the tester’s ability to

solve inputs in a program which relies upon other inputs to perform its operation,

which is a strong characteristic in testing nowadays.

DT-BT is the most complete method of all of the functional testing strategies as it

encourages strict logical relationships between conditions. Creating these logical

dependencies can be tricky especially for difficult and extensive programs. It works

well with the Triangle problem as there are lots of decisions within the problem.

The difference between the functional testing strategies were outlined and shown in

this report, we saw the difference in effort, sophistication and number of test cases

these functional methods create. This illustrates that decision table-based testing is the

final step of the functional testing process. There are many testing tools available for

creating decision tables which are excellent for new users to become accustomed to

using this functional technique.

This report has outlined the importance of testing and time required when creating

software. I aim to make use of the knowledge I have gained while writing this report,

and apply to my final year dissertation. For my dissertation I am developing a system

which navigates to users to rendezvous with each other using musing as their cue.

This insight will aid me to create a reliable software package as I can use the input

values from devices which my software uses.

Sommerville 1995 Software Engineering, 5th edition. Chapter 22 Slide 3

Static testing

! Involves analyses of source text by humans

! Can be carried out on ANY documents

produced as part of the software process

! Discovers errors early in the software process

! Usually more cost-effective than testing for

defect detection at the unit and module level

! Allows defect detection to be combined with

other quality checks

Static testing effectiveness

! More than 60% of program errors can be detected

by informal program inspections

(Meyers: 30 - 70 %)

! More than 90% of program errors may be

detectable using more rigorous mathematical

program verification

! The error detection process is not confused by the

existence of previous errors

Program inspections

! Formalised approach to document reviews

! Intended explicitly for defect DETECTION

(not correction)

! Defects may be logical errors, anomalies in the

code that might indicate an erroneous condition

(e.g. an uninitialised variable) or non-compliance

with standards

! group code reading " team-based quality

Inspection pre-conditions

! A precise specification must be available

! Team members must be familiar with the

organisation standards

! Syntactically correct code must be available

! An error checklist should be prepared

! Management must accept that inspection will

increase costs early in the software process

! Management must not use inspections for staff

appraisal

The inspection process

Inspection

meeting

Individual

preparation

Overview

Planning

Rework

Follow-up

Inspection procedure

! System overview presented to inspection team

! Code and associated documents are

distributed to inspection team in advance

! Inspection takes place and discovered errors

are noted, no repair

! Modifications are made to repair discovered

errors

! extension of checklists

! Re-inspection may or may not be required

Inspection teams

! Made up of at least 4 members

! Author of the code being inspected

! Reader who reads the code to the team

! Inspectors who finds errors, omissions and

inconsistencies

! Moderator who chairs the meeting and notes

discovered errors

! Other roles are Scribe and Chief moderator

! no superior

Inspection rate

! 500 statements/hour during overview

! 125 source statement/hour during individual

preparation

! 90-125 statements/hour can be inspected

Meyers: 150 statements/hour

Balzert: 1 page/hour

! Inspection is therefore an expensive process

! Inspecting 500 lines costs about 40 man/hours

effort = £2800

Inspection checklists

! Checklist of common errors should be used to

drive the inspection

! Error checklist may be programming language

dependent

! Complement static semantics checks

by compiler + static analyser

! The 'weaker' the type checking, the larger the

checklist

! Coding standard / programming guidelines

The Inspection Process

AQMD Inspectors use their observations of industrial and commercial processes and equipment to determine compliance with air quality rules and regulations, policies and state law (California Health and Safety Code). Although each inspection is unique, a series of general guidelines govern inspection procedures in the field. The typical inspection can be broken down into the following components:

Pre-Inspection Activities

These are the activities conducted by the inspector in preparation for the inspection which include the review of: the facility's permits to operate, the facility's compliance history, and other applicable requirements.

The Inspection

While in the company of a facility representative, the Inspector will tour the facility and make observations of equipment, processes and employee practices to determine if facility is operating in compliance with applicable permit and clean air requirements.

Closing Conference

Conducted before leaving the facility, the Inspectors usually discuss their findings with facility representatives during a closing conference, and later document these findings in written inspection reports.

Typically, AQMD earmarks facilities for inspection well ahead of time, however, an air quality complaint received from the public may prompt an unannounced inspection of a facility.

Software inspection

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Inspection in software engineering, refers to peer review of any work product by trained individuals who look for defects using a well defined process. An inspection might also be referred to as a Fagan inspection after Michael Fagan, the inventor of the process.

Software Testing portal

Contents [hide]

1 Introduction

2 The process

3 Inspection roles

4 Related inspection types

4.1 Code review

4.2 Peer Reviews

5 See also

6 External links

[edit] Introduction

An inspection is one of the most common sorts of review practices found in software projects. The goal of the inspection is for all of the inspectors to reach consensus on a work product and approve it for use in the project. Commonly inspected work products include software requirements specifications and test plans. In an inspection, a work product is selected for review and a team is gathered for an inspection meeting to review the work product. A moderator is chosen to moderate the meeting. Each inspector prepares for the meeting by reading the work product and noting each defect. The goal of the inspection is to identify defects. In an inspection, a defect is any part of the work product that will keep an inspector from approving it. For example, if the team is inspecting a software requirements specification, each defect will be text in the document which an inspector disagrees with.

[edit] The process

The inspection process was developed by Michael Fagan in the mid-1970s and it has later been extended and modified.

The process should have an entry criteria that determines if the inspection process is ready to begin. This prevents unfinished work products from entering the inspection process. The entry criteria might be a checklist including items such as "The document has been spell-checked".

The stages in the inspections process are: Planning, Overview meeting, Preparation, Inspection meeting, Rework and Follow-up. The Preparation, Inspection meeting and Rework stages might be iterated.

Planning: The inspection is planned by the moderator.

Overview meeting: The author describes the background of the work product.

Preparation: Each inspector examines the work product to identify possible defects.

Inspection meeting: During this meeting the reader reads through the work product, part by part and the inspectors point out the defects for every part.

Rework: The author makes changes to the work product according to the action plans from the inspection meeting.

Follow-up: The changes by the author are checked to make sure everything is correct.

The process is ended by the moderator when it satisfies some predefined exit criteria.

[edit] Inspection roles

During an inspection the following roles are used.

Author: The person who created the work product being inspected.

Moderator: This is the leader of the inspection. The moderator plans the inspection and coordinates it.

Reader: The person reading through the documents, one item at a time. The other inspectors then point out defects.

Recorder: The person that documents the defects that are found during the inspection.

[edit] Related inspection types

[edit] Code review

A code review can be done as a special kind of inspection in which the team examines a sample of code and fixes any defects in it. In a code review, a defect is a block of code which does not properly implement its requirements, which does not function as the programmer intended, or which is not incorrect but could be improved (for example, it could be made more readable or its performance could be improved). In addition to helping teams find and fix bugs, code reviews are useful for both cross-training programmers on the code being reviewed and for helping junior developers learn new programming techniques.

[edit] Peer Reviews

Peer Reviews are considered an industry best-practice for detecting software defects early and learning about software artifacts. Peer Reviews are composed of software walkthroughs and software inspections and are integral to software product engineering activities. A collection of coordinated knowledge, skills, and behaviors facilitates the best possible practice of Peer Reviews. The elements of Peer Reviews include the structured review process, standard of excellence product checklists, defined roles of participants, and the forms and reports.

Software inspections are the most rigorous form of Peer Reviews and fully utilize these elements in detecting defects. Software walkthroughs draw selectively upon the elements in assisting the producer to obtain the deepest understanding of an artifact and reaching a consensus among participants. Measured results reveal that Peer Reviews produce an attractive return on investment obtained through accelerated learning and early defect detection. For best results, Peer Reviews are rolled out within an organization through a defined program of preparing a policy and procedure, training practitioners and managers, defining measurements and populating a database structure, and sustaining the roll out infrastructure.

Glass box testing

Glass box testing has traditionally been divided up into static and dynamic analysis (Hausen82, 119,122).

Static analysis techniques

The only generally acknowledged and therefore most important characteristic of static analysis techniques is that the testing as such does not necessitate the execution of the program (Hausen84, 325). ``Essential functions of static analysis are checking whether representations and descriptions of software are consistent, noncontradictory or unambiguous'' (Hausen84, 325). It aims at correct descriptions, specifications and representations of software systems and is therefore a precondition to any further testing exercise. Static analysis covers the lexical analysis of the program syntax and investigates and checks the structure and usage of the individual statements (Sneed87, 10.3-3). There are principally three different possibilities of program testing (Sneed87, 10.3-3), i.e.\

program internally, to check completeness and consistency

considering pre-defined rules

comparing the program with its specification or documentation

While some software engineers consider it characteristic of static analysis techniques that they can be performed automatically, i.e.\ with the aid of specific tools such as parsers, data flow analysers, syntax analysers and the like (Hausen82, 126), (Miller84, 260) and (Osterweil84, 77), others also include manual techniques for testing that do not ask for an execution of the program (Sneed87, 10.3-3). Figure is an attempt to structure the most important static testing techniques as they are presented in SE literature between 1975 and 1994.

Figure B.1: Static Analysis Techniques

Syntax parsers, which split the program/document text into individual statements, are the elementary automatic static analysis tools. When checking the program/document internally, the consistency of statements can be evaluated.

When performed with two texts on different semantic levels, i.e. a program against its specification, the completeness and correctness of the program can be evaluated. (Sneed87, 10.3-6) and (Hausen84, 328). This technique, which aims at detecting problems in the translation between specification and program realisation, is called static verification (Sneed87, 10.3-3), and (Hausen87, 126). Verification requires formal specifications and formal definitions of the specification and programming languages used as well as a method of algorithmic proving that is adapted to these description means (Miller84, 263) and (Hausen87, 126). Static verification compares the actual values provided by the program with the target values as pre-defined in the specification document. It does not, however, provide any means to check whether the program actually solves the given problems, i.e.\ whether the specification as such is correct (Hausen87, 126). The result of automatic static verification procedures is described in boolean terms, i.e. a statement is either true or false (Hausen87, 127). The obvious advantage of static verification is that, being based on formal methods, it leads to objective and correct results. However, since it is both very difficult and time-consuming to elaborate the formal specifications which are needed for static verification, it is mostly only performed for software that needs to be highly reliable . Another technique which is normally subsumed under static analysis is called symbolic execution (Hausen84, 327), (Miller84, 263), (Hausen82, 117) and (Hausen87, 127). It analyses, in symbolic terms, what a program does along a given path (Miller84, 263). ``By symbolic execution, we mean the process of computing the values of a program's variables as functions which represent the sequence of operations carried out as execution is traced along a specific path through the program.'' (Osterweil84, 79). Symbolic execution is most appropriate for the analysis of mathematical algorithms. Making use of symbolic values only, whole classes of values can be represented by a single interpretation, which leads to a very high coverage of test cases (Hausen82, 117). The development of programs for symbolic execution is very expensive and therefore is mainly used for testing numerical programs, where the cost/benefit relation is acceptable.

The most important manual technique which allows testing the program without running it is software inspection (Thaller94, 36), (Ackerman84, 14) and (Hausen87, 126). The method of inspection originally goes back to Fagan (Fagan76) who saw the practical necessity to implement procedures to improve software quality at several stages during the software life-cycle. In short a software inspection can be described as follows: ``A software inspection is a group review process that is used to detect and correct defects in a software workproduct. It is a formal, technical activity that is performed by the workproduct author and a small peer group on a limited amount of material. It produces a formal, quantified report on the resources expended and the results achieved'' (Ackerman84, 14), (Thaller94, 36), (Hausen87, 126) and (Hausen84, 324).

During inspection either the code or the design of a workproduct is compared to a set of pre-established inspection rules (Miller84, 260) and (Thaller94, 37). Inspection processes are mostly performed along checklists which cover typical aspects of software behaviour (Thaller94, 37), (Hausen87, 126). ``Inspection of software means examining by reading, explaining, getting explanations and understanding of system descriptions, software specifications and programs'' (Hausen84, 324). Some software engineers report inspection as adequate for any kind of document, e.g. specifications, test plans etc. (Thaller94, 37). While most testing techniques are intimately related to the system attribute whose value they are designed to measure, and thus offer no information about other attributes, a major advantage of inspection processes is that any kind of problem can be detected and thus results can be delivered with respect to every software quality factor (Thaller94, 37) and (Hausen87, 126).

Walkthroughs are similar peer review processes that involve the author of the program, the tester, a secretary and a moderator (Thaller94, 43). The participants of a walkthrough create a small number of test cases by ``simulating'' the computer. Its objective is to question the logic and basic assumptions behind the source code, particularly of program interfaces in embedded systems (Thaller94, 44).

Dynamic analysis techniques

While static analysis techniques do not necessitate the execution of the software, dynamic analysis is what is generally considered as ``testing``, i.e. it involves running the system. ``The analysis of the behaviour of a software system before, during and after its execution in an artificial or real applicational environment characterises dynamic analysis'' (Hausen84, 326). Dynamic analysis techniques involve the running of the program formally under controlled circumstances and with specific results expected (Miller84, 260). It shows whether a system is correct in the system states under examination or not (Hausen84, 327).

Among the most important dynamic analysis techniques are path and branch testing. During dynamic analysis path testing involves the execution of the program during which as many as possible logical paths of a program are exercised (Miller84, 260) and(Howden80, 163). The major quality attribute measured by path testing is program complexity (Howden80, 163) and (Sneed87, 10.3-4). Branch testing requires that tests be constructed in a way that every branch in a program is traversed at least once (Howden80, 163). Problems when running the branches lead to the probability of later program defects.

Today there are a number of dynamic analysers that are used during the software development process. The most important tools are presented Table (Thaller94, 177) :

Type of Dynamic Analyser Functionality of Tool

test coverage analysis tests to which extent the code can be checked by glass box techniques

tracing follows all paths used during program execution and provides e.g. values for all variables etc.

tuning measures resources used during program execution

simulator simulates parts of systems, if e.g. the actual code or hardware are not available

assertion checking tests whether certain conditions are given in complex logical constructs

Table B.1: Dynamic Analysis Tools

Generation of test data in glass box tests

The selection and generation of test data in glass box tests is an important discipline. The most basic approach to test data generation is random testing. For random testing a number of input values are generated automatically without being based on any structural or functional assumption (Sneed87, 10.3-4) and (Bukowski87, 370). There are also two more sophisticated approaches to test data generation, i.e.\ structural testing and functional testing. ``Structural testing is an approach to testing in which the internal control structure of a program is used to guide the selection of test data. It is an attempt to take the internal functional properties of a program into account during test data generation and to avoid the limitations of black box functional testing'' (Howden80, 162). Functional testing as described by (Howden80) takes into account both functional requirements of a system and important functional properties that are part of its design or implementation and which are not described in the requirements (Howden80, 162). ``In functional testing, a program is considered to be a function and is thought of in terms of input values and corresponding output values.'' (Howden80, 162). There are tools for test data generation on the market that can be used in combination with specific programming languages. Particularly for embedded systems, tools for test data generation are useful, since they can be used to simulate a larger system environment providing input data for every possible system interface (Thaller94, 178). In other words, if a system is not fully implemented or not linked to all relevant data sources, not all system interfaces can be tested, because no input values are given for non-implemented functions. Data Generation tools provide input values for all available system interfaces as if a real module was linked to it.

Testing Measurement

Someone has rightly said that if something cannot be measured, it can not be managed or improved. There is immense value in measurement, but you should always make sure that you get some value out of any measurement that you are doing. You should be able to answer the following questions:

What is the purpose of this measurement program?

What data items you are collecting and how you are reporting it?

What is the correlation between the data and conclusion?

Value addition

Any measurement program can be divided into two parts. The first part is to collect data, and the second is to prepare metrics/chart and analyse them to get the valuable insight which might help in decision making. Information collected during any measurement program can help in:

Finding the relation between data points,

Correlating cause and effect,

Input for future planning.

Normally, any metric program involves certain steps which are repeated over a period of time. It starts with identifying what to measure. After the purpose is known, data can be collected and converted in to the metrics. Based on the analysis of these metrics appropriate action can be taken, and if necessary metrics can be refined and measurement goals can be adjusted for the better.

Data presented by testing team, together with their opinion, normally decides whether a product will go into market or not. So it becomes very important for test teams to present data and opinion in such a way that data looks meaningful to everyone, and decision can be taken based on the data presented.

Every testing projects should be measured for its schedule and the quality requirement for its release. There are lots of charts and metrics that we can use to track progress and measure the quality requirements of the release. We will discuss here some of the charts and the value addition that they bring to our product.

Defect Finding Rate

This chart gives information on how many defects are found across a given period. This can be tracked on a daily or weekly basis.

Defect Fixing Rate

This chart gives information on how many defects are being fixed on a daily/weekly basis.

Defect distribution across components

This chart gives information on how defects are distributed across various components of the system.

Defect cause distribution chart

This chart given information on the cause of defects.

Closed defect distribution

This chart gives information on how defects with closed status are distributed.

Test case execution

Traceability Matrics

Functional Coverage

Platform Matrics

Reducing Test Cases Created by Path

Oriented Test Case Generation

Nicha Kosindrdecha and Siripong Roongruangsuwan

Faculty of Science and Technology, Assumption University, Bangkok, Thailand

p4919741@au.edu, p4919742@au.edu

Jirapun Daengdej

Faculty of Science and Technology, Assumption University, Bangkok, Thailand

jirapun@scitech.au.edu

Abstract – “Path-Oriented” is one of the most

widely used techniques for finding a set of test

cases for testing software. Given a set of test

cases generated by the Path-Oriented technique,

this paper discusses the use of a number of case

maintenance techniques, which have been

investigated by Case-Based Reasoning (CBR)

researchers in ensuring that only small amount of

cases are stored in the case base, thereby

reducing number of test cases should be used in

software testing. Similar to what happen to

software testing, a number of CBR researchers

have focused on finding approaches especially for

reducing cases in the CBR systems’ storages. We

propose a number of techniques which adapt

some of this research. Our preliminary

experiments show that the proposed technique

can be effectively used in reducing the number of

test cases required for software testing, while

maintaining the accuracy of system's output.

I. Introduction

In the software development life cycle, software

testing has been proven that it is one of the most

crucial and expensive phases. The major goal of

software testing is to ensure that as many errors

as possible will be identified, especially before

releasing the software to the end-users. However,

to ensure that the software is of high quality

while minimizing the errors before delivering to

users, the software development providers have

to expend a lot of time and energy.

Test case generation has been proven to be one of

the major critical steps in software testing. The

main objective of generating test cases is to

ensure that those generated cases can be used to

reveal as many faults as possible. Our research

reveals that there are many test case generation

techniques. With regard to [12], path-oriented test

case generation is the most effective technique.

Moreover, our research reveals that to lower the

cost of software testing, we have to use small, but

efficient, group of test cases. With some

situations, the growth of the number of test cases

can’t be controlled. In order to solve this

problem, we apply the combined concepts of

software testing and CBR.

We assume that test cases are treated as cases in

CBR. Given a set of test cases generated by the

Path-Oriented technique, this paper discusses

how to maintain a number of test cases in

software testing by using the Case-Based

Maintenance or CBM concept.

CBM have been investigated by CBR researchers

in order to ensure that only small amounts of

efficient cases are stored in the case base.

However, in the light of software testing, the

proposed techniques focuse on how to maintain

the test case while the ability to reveal faults is

still preserved.

The next section presents the problem statement

and motivation of this paper. Section III shows

the investigated papers for software testing and

CBR. Section IV represents our proposed test

case maintenance techniques. The evaluation is

addressed in Section V. Finally, the conclusion

including our future works are identified in

Section VI.

II. Terminology

Test Case or Test Data is a collection of nodes,

which can be traversed in the control flow graph.

Tn = {N1, N2, N3, …, Nn}

Where T = test case, N = node in the control flow

graph

Probability is the defined frequency of test

cases’ usage. The more value of test data is the

higher probability is.

Impact is a measure of the criticality of the event

of faults, exceptions and errors detected in the

system. Example of high impact is functionality

not working and no work around, medium impact

is functionality not working but with work

AIAA <i>Infotech@Aerospace</i> 2007 Conference and Exhibit

7 - 10 May 2007, Rohnert Park, California

AIAA 2007-2979

Test Case Management

From OpenOffice.org Wiki

Jump to: navigation, search

The Test Case Management Portal (or in short TCM) is a web based portal for test case management (as the name states). This includes definition and translation of testcases as well as assigning test cases to specific testers and collecting the results.

Contents [hide]

1 General Information about TCM

1.1 Roles in TCM

1.2 Bugs in TCM

2 Doing your daily work

2.1 SQA tasks

2.1.1 Doing your tests

2.1.2 Report Issues for failed tests

2.2 SQE tasks

2.2.1 translating test cases

2.3 MGR tasks

2.3.1 grant access to new testers

2.3.2 assign tests

2.3.3 review test results

2.3.4 create a test report

General Information about TCM

TCM is developed and hosted by the Sun localization testing team. So, the initial focus of TCM is localization testing - not only for OOo. We are trying to extend the capabilities of TCM, so that it can be used for general testing tasks within the OOo project.

At the moment the OOo project is using the same TCM installation as the OpenSolaris project. This means, we use the same "program" but the data are distinct.

For a short introduction to TCM see about TCM. The introduction is written for a successor of the current TCM. So the screens may differ from what you see in current TCM.

Roles in TCM

There are three (or four) roles in TCM:

SQA (Software Quality Assurance)

Can view test cases, update results (pass/fail) per test case

SQE (Software QA Engineer)

can modify (add, remove, translate) test cases

MGR (Manager)

Has SQE access + can grant user access, assign tests to testers, enter new test cases / scenarios and create new test reports and report templates

All these roles are based on localizations. That means, every SQA / SQE / MGR belongs to one (ore more) localizations. SQA's will see test cases translated to their language, SQE's will be able to translate the cases to a given language. MGR can only assign tests to tester of the same localization.

The Manager of the en localization has some more rights. He is some kind of "Super Manager" (the fourth role).

Bugs in TCM

If you find bugs in TCM (the tool itself), you can use the OOo Issue Tracker to report them. Use category qa, subcategory tcm.

If you find bugs that apply to a test case (bug in OpenOffice.org) you need to file it for the appropiate Issue Tracker component. Don't use qa / tcm in this case. See Report Issues for failed tests.

You may ask at the dev@qa.openoffice.org mailinglist if something is unclear with TCM.

Doing your daily work

SQA tasks

Doing your tests

Before you can do any tests, you need to have assigned some tests to you. You should ask your manager to assign tests to you. (Ask at dev@qa.openoffice.org if you do not know, who "your manager" is).

Go to "Test Result Update"

after login, you will see some menu items (depending on your role). You need to go to "Test Result Update" (Item #2, if you have SQA role only)

Select the Build number

in the next screen you will see a list of build numbers you have tests assigned for. In most cases you should only see one or two builds. Follow the link in column "Build Number" for the build you are going to test.

Select the assigned test scenario

you will now see all of your assigned tests for the selected build. Do not click on the assignment id (although this seems obvious, it is wrong). Follow the link in column "Assign by" instead.

If you have already done some of your tests, you may follow the link in column U (click on the number). You will see only untested cases in the next screen.

Enter your results

you will no see all test cases of the scenario. Each test case has a short description (what to do) and the information, what results are expected.

At the top of each test case you will see option buttons. Select "pass" if the test meets the expected result. Select "fail" if not. You may also "skip" a test if you are not sure if you understand the description (or if it is not important for the current test). If you do not have the time to complete a test, leave it "untested".

In case, the test fails, you should file an issue and put the issue id to the input box "bug".

You may also leave a comment about the test. Your manager will be able to read the comments and may have better information about the quality of the build.

Update your results

navigate to the bottom of the page and "update" the results. You should do this from time to time, even if you entered only some of your results.

Hint: There is an option to download the test case descriptions in step 4. You will see a "download" link in the right most column. You may download a plain text file here. (In case your browser is going to save the file as .cgi, simply rename it to .txt). You may open the file with any text editor. The File header has some information about the file format. So you should be able to enter your test results offline. Once, your test has been completed and all results have been entered to the file, you can upload it again. You can do this again in step 4. Enter the file name (full file path and name) in the input box on top of the table. Then press "upload".

Report Issues for failed tests

If a test failes, you should file an issue at the OpenOffice.org Issue Tracker. For such an issue, the issue writing guidelines apply.

As not every developer or member of the qa project has access to TCM, repeat the steps that lead to the problem in the issue.

It is planned to be able to see test case descriptions in TCM without being logged in. Once this has been implemented, you may enter a link to the test case description in Issue Tracker.

SQE tasks

translating test cases

Go to "Test Case Maintenance"

After logging in, you will see the menu items appropriate to your rôle. Go to "Test Case Maintenance" (Item #1, if you have the SQE rôle).

Select the product "OpenOffice.org - Office Suites(2.0)"

"OpenOffice.org" is the only accessible product - click on the product-name link.

Choose a Category

Follow the link to the appropriate Category for the Test Case to be translated. You will now see all the Test Case Descriptions in this Category. The translation, if any, will also be displayed for each description.

Open a single Test Case Description to translate

Click on the Test Case ID. You'll see the English (original) text and input fields for translation.

Hint: you can also use HTML tags to format your translation text.

Update the description

Enter your translation, and press the "Update" button at the bottom of the page.

Hint: You will see "download" links shown in several places. You can use these links to download all the test cases in one text file, translate them offline and upload the resulting file when you're finished. The file includes a description of the file format.

MGR tasks

grant access to new testers

Go to "Property Maintenance"

after login, you will see some menu items (depending on your role). You need to go to "Property Maintenance" (Item #7, if you have MGR role)

Go To "People"

The only property you are able to edit as normal MGR is "People" - follow the link

Add new People

you can add new people by following the link on the upper right

Enter user details

You need to enter

thats the login name of the user (it's a good idea to use the OOo-account name as login name for TCM)

Name

the full name of the tester

Language

you may choose one or more languages you are responsible for

Location

free text, just a notice, where the tester is based

an e-Mail adress, in case you need to contact the tester. (the openoffice.org mail address would work here)

Role

choose the role of the tester. Make sure, you include the SQA role, even if you grant SQE or MGR role.

Add the tester

press the "Add"-button at the bottom of the screen

Hint: you can change user details or reset the password for existing testers at this screen. You just need to click on the login name.

assign tests

Go to "Test Assignment Maintenance"

after login, you will see some menu items (depending on your role). You need to go to "Test Assignment Maintenance" (Item #5, if you have MGR role)

Choose the Project "OpenOffice.org"

you will see a list of Projects that are managed in this TCM instance. Follow the link to OpenOffice.org

Select build number

the next screen will show a list of builds that are ready for testing (e.g. localisationXX for localisation tests or 2.XRC for release tests)

select "Scenario" in column "Assignemnt by" for the build that should be tested (don't follow the link to the build name, this will show a list of all testassignments for this build)

Select the scenario

choose a scenario, you like to assign. (Application scenarios are used for localization testing, the "OOo release sanity" scenario is used for release approval)

follow the link in column "Test Cases" (click on the number of test cases in this scenario)

Assign test scenarios per platform

now you can assign the scenario to a tester. Simply select a tester for any platform and click on Update.

you can assign multiple platforms to one tester and a plattform to more than one tester

Hint: to go back to the Scenario selection screen simply use the Back button of your browser.

Computer World

Search This Blog

Tester

Comments

Post a Comment

Popular posts from this blog

Random English

Mail Protocols