Introduction
A software tester is any human being who tests the software. So, what do you answer if anybody asks you 'Does software testing need a specialized skill'? It looks very simple to answer this question, isn't it? But ofcourse not. Software Testing is not as easy as you might think of. Needless to say, It's one of the most important and difficult tasks in the entire gamut of software development life cycle process.
Human beings reaction in this complex world of happenings varies widely with respect to situations, surroundings, emotions, need, requirements, time frame, money, visualization, belief, education, knowledge, expertise, intuition and so on. Such complex is the nature of human being and certainly there's no exception at work place too. Therefore it's cynical to say that the job being done (software testing) by a software tester is simple & complete.
Nevertheless, the quality of the job done by the software tester is directly proportional to his or her psychological maturity and profoundness acquired, adopted and developed with age and experience.
"The psychology of the persons carrying out the testing will have an impact on the testing process [Meyer 1979]."
Let's examine the psychology of the person under discussion (tester) by describing the definition of software testing under three circumstances.
1. "Software testing is the process to prove that the software works correctly"
The above definition sounds good. If the person under discussion (software tester) is the one who has developed the software he/she will try to use the above definition. This person's intentions would mostly revolve around the point to prove the software works. He/She will only give those inputs for which correct results are obtained.
This above explanation is the typical psychology of testing by a software developer.
2. "Testing is the process to prove that the software doesn't work"
This definition sounds very good especially if the aim of the tester is to prove that the software doesn't work for what it's supposed to do. This type of psychology would bring out the most of the defects in the software. A software tester should therefore have such a psychology to push the software beyond its boundaries.
Nevertheless, this definition involves a practical difficulty – If you question "how many days does the software should be tested to conclude that the software works perfectly?", perhaps the answer would again recur to be a question in itself.
It's never a right conclusion to infer that there are no bugs in the software if one says that "the software has no bugs even after testing for more than a week or a month or an year" nor it is wise to say that the software is completely reliable.
Testing does not guarantee defect less or zero bug software at all times but can only reduce or minimize known defects because the very nature of a bug can be volatile. If the above definition is to be strictly implemented, then perhaps the most Operating Systems & other Commercial Software packages which we use today would not have been in the market yet. If so, then the above definition would seem to be unrealistic.
3. "Testing is the process to detect the defects and minimize the risks associated with the residual defects"
This definition appears to sound realistic. Practically, if at any point, the software product development starts, the testing process should start and keep in track the number of bugs being detected while correcting them.
At some stage of a planned testing, there would be a stage where no bugs are identified after many days or weeks or sometimes months of testing which statistically allows you to conclude that the software is "good enough" to be released into the market. i.e there may still exist some bugs undetected, but the risk associated with the residual defects is not very high or are tolerable.
The decision to release a software may thus vary based on many other factors like business value, time constraint, client requirements & satisfaction, company's quality policies etc.
From the above three definitions, we can understand that the psychology of a software tester plays a vital role through out the software development cycle & the quality processes of software testing.
"The objective of testing is to uncover as many bugs as possible. The testing has to be done without any emotional attachment to the software. If someone points out the bugs, it is only to improve the quality of the software rather than to find fault with you."
Role & Characteristics of a Software Tester
Software Development & Software Testing go hand in hand. Both aim at meeting the predefined requirements and purpose. Both are highly creative jobs in nature but the general outlook towards these two individuals is psychological rather than distinctive classification.
If success is perceived as achieving something constructive or creative, like the job of developing a software to work, software testing in contrast is perceived as destructive job or negative job by majority of the masses. Nevertheless, software testing itself has to be perceived as an equally creative job on par with the development job, but this perception can only be practically possible with a different outlook and mindset.
Software Testers require technical skills similar to their development counterparts, but software testers need to acquire other skills, as well.
Keen Observation
Detective Skills
Destructive Creativity
Understanding Product as integration of its parts
Customer Oriented Perspective
Cynical but Affable Attitude
Organized, Flexible & Patience at job
Objective & Neutral attitude
Keen Observation – The software tester must possess the qualities of an 'eye for detail'. A keen observation is the prime quality any software tester must possess. Not all bugs are visible clearly to the naked eye in a software. With keen observation, the tester can easily identify or detect many critical bugs. Observing the software for established parameters like 'look & feel' of GUI, incorrect data representation, user friendliness etc needs this type of characteristics.
Detective Skills – Ideally the software under development would be documented before, after and throughout the development process. Unfortunately, there is every chance of not updating the documentation (specification, defect reports etc) due to time and resource constraints.
The software under testing would be completely described in a well defined set library of functions and design specifications in documents called specs or SRS etc. which needs constant update. The tester should therefore apply his knowledge of rationalization in knowing about the product from formal system like system specifications, design & functional specifications. From this information, the tester should look for researching more analytical information through other sources called non-formal sources of information like developers, support personnel, bugs and related product documents, reviews of related and (seemingly) unrelated documents.
A tester should therefore possess the quality of a 'detective' to explore the product under test, more rationally.
Destructive Creativity – The Tester need to develop destructive skills – means skills to perturb and crash the software workflow and its functionality. In other words, the tester should 'not hesitate to break the software' fearing to buy it or being empathetic to the creation of developers. In software testing, boundaries are meant to be crossed not obeyed.
A creative oriented but destructive approach is necessary while testing a software by the tester to make the software evolve more robustly and reliably.
"The Software under test should be tested like a Mercedes-Benz car. "
Understanding the Product as an integration of its parts – The software(read as product) is a culmination of lines of code interacting with data through user interface and database.
It is an integration of separate group of code interacting with other groups assembled together to function as a whole product. The developers might be working on a respective piece of code module focusing more on those modules under work, at a time.
It is no wonder if the developer sometimes may not even know the complete workflow of the product and not necessary too. Where as, in the case of a tester, being the rider of the software under test, should understand and know about the complete specifications (operational and workflow requirements) of the product.
The tester may not be the best expert on any one module of the software but definitely he/she should gain expertise on the overall operation of the product. In fact, the tester should possess a 'Systems' view of the product because they are the only people to see and test the complete functionality of interdependent modules and compatibility between the software and hardware specifications laid down by the client before it is sent to the customer.
Customer Oriented Perspective – Testers need to possess a customer oriented perspective. Customers (read as users) need not be as wise as software engineers or testers. As the number of computer users are growing every day (need not be as wise as engineers or testers), the end-users may neither be comfortable nor tolerate any bugs, explanations, or frequent upgrades to fix the bugs. As any customer would simply be interested in consuming the product (software) by deriving its value the software tester must adopt a customer oriented perspective while testing the software product.
Hence the tester must develop his abilities to suffice himself/herself in the place of the customer, testing the product as a mere end-user.
Cynical but Affable Attitude – Irrespective of the nature of the project, the tester need to be tenacious in questioning even the smallest ambiguity until it is proved. In other words ,the tester must chant the words "In God we Trust- Everything Else gets Tested" throughout the testing cycle.
There may arise some situations during the course of testing – large number of bugs might be encountered unusually which might compel to further delay in shipping of the product. This may lead to heat up the relation between the testers and other development teams. The tester should balance this relationship not at the cost of the bugs but by convincing and upholding their intentions to "assault the software but not the software developers".
Organized, Flexible and Patience at Job: Software testers must remember the fact that as the world is shrinking since the advent of internet and web, so, change is very dynamic and no exception for modern software. Not all the tests planned are performed completely and some tests dependent on other tests has to be blocked for later testing.
This needs an organized approach by the tester in attempting to phase out the bugs. Sometimes, significant tests has to be rerun which would change the fundamental functionality of the software. The tester should therefore have patience to retest the planned bugs and any new bugs that may arise.
It is even more important to the testers to be patient and keep prepared in the event of a dynamic development and test model. Development keeps changing continuously as requirements and technology keeps changing rapidly and so should testing. The testers must take these changes into account and plan to perform tests while maintaining the control of test environment to ensure valid test results.
Objective and Neutral Attitude – Nobody would like to hear and believe bad news, right? Well, testers are sometimes viewed as messengers of bad news in a software project team. No wonder, how good the tester is (meaning very negative) and brilliant in doing his job (identifying bugs-no one likes to do it but most human beings are taken for granted to be naturally very good at it, at least from childhood), he/she might always be a messenger of communicating the bad part of the software, which, the creators (developers) doesn't like it.
"Nobody who builds the houses likes to have them burned."
The tester must be able to deal with the situation where he/she is blamed for doing his/her job (detecting bugs) too good. The tester's jobs must be appreciated and the bugs should be welcomed by the development team because every potential bug found by the tester would mean a reduction of one bug that would potentially might have been encountered by the customer.
Irrespective of the negative aspects of doing such a highly skillful job, the role of a tester is to report honestly every known bug encountered in the product with always an objective and a neutral attitude.
Here are few links for keywords:
Psychology :
http://www.codeproject.com/KB/bugs/Pratap.aspx
Error incident :
error: A human action that produces an incorrect result. [After IEEE 610]
incident: Any event occurring that requires investigation. [After IEEE 1008]
Test ware:
testware: Artifacts produced during the test process required to plan, design, and execute
tests, such as documentation, scripts, inputs, expected results, set-up and clear-up
procedures, files, databases, environment, and any additional software or utilities used in
testing. [After Fewster and Graham]
Life cycle of bug:
http://www.softwaretestinghelp.com/bug-life-cycle/
http://www.bugzilla.org/docs/2.18/html/lifecycle.html
Bug effect:
http://www.mmm.ucar.edu/mm5/workshop/ws04/PosterSession/Berleant.Dan.pdf
Effect Analysis (FMEA): A systematic approach to risk identification
and analysis of identifying possible modes of failure and attempting to prevent their
occurrence. See also Failure Mode, Effect and Criticality Analysis (FMECA).
Bug classification :
http://www.softwaretestingstuff.com/2008/05/classification-of-defects-bugs.html
Testing Principles:
http://www.the-software-experts.de/e_dta-sw-test-principles.htm
http://www.cs.rit.edu/~afb/20012/cs1/slides/testing-02.html
http://www.qbit-testing.net/ctp/ctp1.pdf
Verfication of requirements
http://www.sstc-online.org/Proceedings/2002/SpkrPDFS/ThrTracs/p961.pdf
State table testing
http://www.cs.helsinki.fi/u/paakki/software-testing-s05-Ch611.pdf
Desion based testing
http://www.cs.swan.ac.uk/~csmarkus/CS339/dissertations/FerridayC.pdf
error guessing: A test design technique where the experience of the tester is used to
anticipate what defects might be present in the component or system under test as a result
of errors made, and to design tests specifically to expose them.
mutation testing:
Testing in which two or more variants of a component or system are
executed with the same inputs, the outputs compared, and analyzed in cases of
discrepancies. [IEEE 610]
Type of static testing
http://wiki.answers.com/Q/What_are_the_types_of_static_testing_in_software_testing
Static testing
Certified Tester
Foundation Level Syllabus
Page no 58
http://www-dssz.informatik.tu-cottbus.de/information/testen/02_human_testing_6up.pdf
static testing: Testing of a component or system at specification or implementation level
without execution of that software, e.g. reviews or static code analysis.
Inspections;
inspection: A type of peer review that relies on visual examination of documents to detect
defects, e.g. violations of development standards and non-conformance to higher level
documentation. The most formal review technique and therefore always based on a
documented procedure. [After IEEE 610, IEEE 1028] See also peer review.
Inspection process
http://www.aqmd.gov/comply/compliance_inspections.html
http://en.wikipedia.org/wiki/Software_inspection
Inspection
Key characteristics:
o led by trained moderator (not the author);
o usually peer examination;
o defined roles;
o includes metrics;
o formal process based on rules and checklists with entry and exit criteria;
o pre-meeting preparation;
o inspection report, list of findings;
o formal follow-up process;
o optionally, process improvement and reader;
o main purpose: find defects.
Structured walk through
A step-by-step presentation by the author of a document in order to gather
information and to establish a common understanding of its content. [Freedman and
Weinberg, IEEE 1028]
advantage of static testing
http://issco-www.unige.ch/en/research/projects/ewg96/node81.html
Recovery testing
The process of testing to determine the recoverability of a software
product. See also reliability testing.
security testing: Testing to determine the security of the software product. See also
functionality testing.
stress testing: A type of performance testing conducted to evaluate a system or component at
or beyond the limits of its anticipated or specified work loads, or with reduced availability
of resources such as access to memory or servers. [After IEEE 610] See also performance
testing, load testing
performance testing: The process of testing to determine the performance of a software
product. See also efficiency testing.
usability testing: Testing to determine the extent to which the software product is
understood, easy to learn, easy to operate and attractive to the users under specified
conditions. [After ISO 9126]
Software measurement
http://www.testinggeek.com/measurment.asp
Reducing test cases
http://pdf.aiaa.org/preview/CDReadyMIA07_1486/PV2007_2979.pdf
http://www.testrepublic.com/forum/topics/1178155:Topic:31804?page=1&commentId=1178155%3AComment%3A31898&x=1#1178155Comment31898
Test case management
http://wiki.services.openoffice.org/wiki/Test_Case_Management
Bug life cycle
Posted In | Testing Life cycle, Bug Defect tracking, Software Testing Templates
What is Bug/Defect?
Simple Wikipedia definition of Bug is: “A computer bug is an error, flaw, mistake, failure, or fault in a computer program that prevents it from working correctly or produces an incorrect result. Bugs arise from mistakes and errors, made by people, in either a program’s source code or its design.”
Other definitions can be:
An unwanted and unintended property of a program or piece of hardware, especially one that causes it to malfunction.
or
A fault in a program, which causes the program to perform in an unintended or unanticipated manner.
Lastly the general definition of bug is: “failure to conform to specifications”.
If you want to detect and resolve the defect in early development stage, defect tracking and software development phases should start simultaneously.
We will discuss more on Writing effective bug report in another article. Let’s concentrate here on bug/defect life cycle.
Life cycle of Bug:
1) Log new defect
When tester logs any new bug the mandatory fields are:
Build version, Submit On, Product, Module, Severity, Synopsis and Description to Reproduce
In above list you can add some optional fields if you are using manual Bug submission template:
These Optional Fields are: Customer name, Browser, Operating system, File Attachments or screenshots.
The following fields remain either specified or blank:
If you have authority to add bug Status, Priority and ‘Assigned to’ fields them you can specify these fields. Otherwise Test manager will set status, Bug priority and assign the bug to respective module owner.
Look at the following Bug life cycle:
[Click on the image to view full size] Ref: Bugzilla bug life cycle
The figure is quite complicated but when you consider the significant steps in bug life cycle you will get quick idea of bug life.
On successful logging the bug is reviewed by Development or Test manager. Test manager can set the bug status as Open, can Assign the bug to developer or bug may be deferred until next release.
When bug gets assigned to developer and can start working on it. Developer can set bug status as won’t fix, Couldn’t reproduce, Need more information or ‘Fixed’.
If the bug status set by developer is either ‘Need more info’ or Fixed then QA responds with specific action. If bug is fixed then QA verifies the bug and can set the bug status as verified closed or Reopen.
Bug status description:
These are various stages of bug life cycle. The status caption may vary depending on the bug tracking system you are using.
1) New: When QA files new bug.
2) Deferred: If the bug is not related to current build or can not be fixed in this release or bug is not important to fix immediately then the project manager can set the bug status as deferred.
3) Assigned: ‘Assigned to’ field is set by project lead or manager and assigns bug to developer.
4) Resolved/Fixed: When developer makes necessary code changes and verifies the changes then he/she can make bug status as ‘Fixed’ and the bug is passed to testing team.
5) Could not reproduce: If developer is not able to reproduce the bug by the steps given in bug report by QA then developer can mark the bug as ‘CNR’. QA needs action to check if bug is reproduced and can assign to developer with detailed reproducing steps.
6) Need more information: If developer is not clear about the bug reproduce steps provided by QA to reproduce the bug, then he/she can mark it as “Need more information’. In this case QA needs to add detailed reproducing steps and assign bug back to dev for fix.
7) Reopen: If QA is not satisfy with the fix and if bug is still reproducible even after fix then QA can mark it as ‘Reopen’ so that developer can take appropriate action.
8 ) Closed: If bug is verified by the QA team and if the fix is ok and problem is solved then QA can mark bug as ‘Closed’.
9) Rejected/Invalid: Some times developer or team lead can mark the bug as Rejected or invalid if the system is working according to specifications and bug is just due to some misinterpretation.
ABSTRACT
We investigate the impact of bugs in a well-known
weather simulation system, MM5. The findings help fill a
gap in knowledge about the dependability of this widely
used system, leading to both new understanding and
further questions.
In the research reported here, bugs were artificially added
to MM5. Their effects were analyzed to statistically
understand the effects of bugs on MM5. In one analysis,
different source files were compared with respect to their
susceptibility to bugs, allowing conclusions regarding for
which files software testing is likely to be particularly
valuable. In another analysis, we compare the effects of
bugs on sensitivity analysis to their effects on forecasting.
The results have implications for the use of MM5 and
perhaps for weather and climate simulation more generally.
1. MOTIVATION
Computer simulation is widely used, including in
transportation, digital system design, aerospace
engineering, weather prediction, and many other diverse
fields. Simulation results have significant impact on
decision-making and will continue to in the coming years.
However complex simulation programs, like other large
software systems, have defects. These adversely affect the
match between the model that the software is intended to
simulate, and what the software actually does. Simulation
programs produce quantitative outputs and thus typify
software for which bugs can lead to insidious numerical
errors. Weather simulators exemplify this danger because
of the large amount of code they contain, often with
insufficient comments; the complex interactions among
sections of code modeling different aspects of the domain;
and the plausibility that outputs can have even if they
contain significant error. These incorrect results may
escape notice even as they influence the decisions that they
are intended to support. This is an important dependability
issue for complex simulation systems. Hence, investigating
the effects of bugs in a simulation system can illuminate
the robustness of its outputs to the presence of bugs. This
in turn yields better understanding of important quality
issues, like trustworthiness of the outputs, and important
quality control issues, like software testing strategy.
The artificial generation of bugs and observation of their
effects is termed mutation analysis. We have done a
mutation analysis of MM5, an influential weather
simulator available from the National Center for
Atmospheric Research (NCAR). The results obtained
illuminate important aspects of this software system. We
focus on (1) what sections of the code are most likely to
have undetected bugs that cause erroneous results, and
hence are particularly important to test thoroughly; and (2)
the relative dependability of the software for sensitivity
analysis as compared to point prediction, and consequently
for forecasting vs. sensitivity analyses.
2. CONNECTION TO RELATED WORK
Model Error. Errors in a simulation program can be from
errors in the model underlying the system [1], or errors in
the implementation of a correct underlying model. The
current work deals with the latter, but the two are related.
Software Mutation Testing. This is the process of taking
a software system and injecting errors (i.e. adding bugs) to
it to see how its behavior changes [4][5][6]. Typically,
many different buggy versions will be tested and the
responses summarized statistically. The present work tests
several thousand versions of MM5, each with one unique
bug injected.
Sensitivity Analysis. A software system’s sensitivity
i
s o
?
?
= is the amount of change in its output ?o that
occurs due to a change in input by amount ?i [2][3]. In
contrast, examining the change in output due to changes in
actual software code (rather than in input parameters) is
termed software mutation testing. The work reported here
uses both, because one of the questions we focus on is,
“How do software mutations affect sensitivity analyses?”
Ensemble Forecasting [7]. A sophisticated form of
sensitivity analysis in which the system is run many times,
each on its own set of perturbed input parameters, and each
therefore giving its own forecast as output. The set of
forecasts is then statistically characterized. This enables
conclusions such as forecasts that are relatively resistant to
inaccurate specification of initial conditions (using
ensemble means), and forecasts containing distribution
Gene Takle
Department of Geological
and Atmospheric Sciences
Iowa State University
Ames, IA 50011
gstakle@iastate.edu
Zaitao Pan
Department of Earth &
Atmospheric Sciences
St. Louis University
St. Louis, MO 63103
panz@eas.slu.edu
functions describing the probabilities of different values
for a forecasted quantity.
3. APPROACH
Two related studies were performed. In the first,
simulation results were obtained from numerous variants
of MM5. Each variant had a different mutation (“bug”),
deliberately inserted into code within a selected subset of
its Fortran source code files. The set of 24-hour forecasts
produced by these variants for region of the U.S. midwest
with a time step of 4 minutes allows comparison of the
abilities of the source code files to resist erroneous outputs
despite the presence of bugs. That in turn has implications
for software testing strategy.
In the second study, the same mutated variants were used
but the initial conditions were slightly different. The
simulation results for this perturbed initial state were
compared to the results obtained in the first study for each
variant. This gives evidence about whether sensitivity
analyses (in which the ratio of output change to input
change is computed) tend to be affected by bugs less or
more compared to forecasts (in which a single weather
prediction scenario is computed). If less, the dependability
of MM5 for studies using sensitivity analysis is increased
relative to its dependability for forecasting. If more, MM5
would be relatively more dependable for forecasting.
3.1 Study #1: Effects of Bugs on Forecasts
Several thousand different mutations were tested. For each,
the MM5 weather simulator was compiled and run using a
typical American midwest weather scenario for
initialization. Effects of the mutation on the final state of
the forecast were recorded. Each mutation was then
classified into one of three categories (Figure 1).
Figure 1. The three possible effects of a mutation (bolded).
The “Fail to complete” category includes cases where (1)
the simulator terminated (“crashed”) prior to providing a
24-hour forecast, or (2) the simulator did not terminate.
The “Results affected” category includes cases in which
one or more members of a set of 11 important output
parameters had a different forecasted value than it had in
the original, unmutated version of MM5.
Each source code file c in which mutations were made was
analyzed as follows. Define
rc = # of mutations in the “Results affected” category; and
fc = # of mutations in the “Fail to complete” category.
A dependability metric, dc, for rating source code file c
was defined as
dc=fc/(rc+fc).
Value dc estimates the likelihood that a bug inadvertently
introduced during software development will be detected,
therefore removed, and hence not affect results
subsequently. Thus low dc for a file suggests a need to
compensate with extra effort in testing and debugging.
The “Results not affected” category, which has no
influence on dc, contains mutations for which (1) the
mutated code cannot have an effect (meaning the code,
despite its presence, has no function), (2) the mutated code
would fall into one of the previous two categories given
other initial conditions, or (3) the mutation affected some
output but not the output parameters we examined for
effects. Mutations in case (1) do not impact the
dependability issues of interest here, and therefore are
ignorable. Of mutations in case (2), there is no reason to
expect that their effects, when triggered by other initial
conditions, would cause other than random changes to the
various dc. Those in case (3) are in a gray area. If their
effects can be considered insignificant relative to effects on
the output parameters that were analyzed, they can be
ignored. Otherwise, had they been detected, dc would have
been lower. Thus the dc values calculated in this work are
upper bounds relative to an alternate view of the software
outputs that classifies all outputs as equally important.
3.2 Study #2: Comparing Effects on
Forecasts to Effects on Sensitivity Analyses
The results of study #1 provide data on the effects of bugs
on forecasts. By introducing a perturbation to the initial
data and then obtaining data parallel to the data of study
#1, there will be two sets of data that can be compared.
The perturbation in the initial data can be summarized as a
number ?i. The resulting change in the output of the
unmutated software can be summarized as a number ?o.
For each mutated version m that runs to completion under
the two input conditions, there are two corresponding
output scenarios whose difference can be summarized as a
number ?om. Sensitivities (changes in outputs divided by
changes in inputs) can now be calculated.
Let s be the sensitivity of the original, unmutated software:
s=?o/?i.
Let sm be the sensitivity of the software as modified by
mutation m:
sm=?om/?i.
To compare the effect of a mutation m on forecasting to its
effect on sensitivity analysis, we must define measures for
the magnitudes of its effects on forecasting and on
sensitivity analysis, and compare those measures. We
define the magnitude of its effect on forecasting as
o
o o
F m
m
-
=
where om is a number derived from the output parameters
of the software as modified by mutation m, and o is an
Results not affected
Results affected
One bug in one file
Runs successfully
Fail to complete
analogous number for the unmutated software. Thus, Fm
describes the change in the forecast due to mutation m, as a
proportion of the nominally correct output o.
The magnitude of mutation m’s effect on sensitivity is
analogously defined as
s
s s
S m
m
-
=
where s and sm are as defined earlier.
For a given mutation, if Fm>Sm, then the mutation affected
the forecast more than the sensitivity analysis. On the other
hand, if Fm<Sm then the opposite is true. Considering the
mutations m collectively, if Fm>Sm for most of them, this
suggests that the MM5 software resists the deleterious
effects of bugs on sensitivity analysis better than it resists
their effects on forecasting. On the other hand, if Fm<Sm,
that suggests the opposite.
Meteorological uses of sensitivity analysis include
predicting the effects of interventions on climate, and
doing ensemble forecasting. Because the present study
relied on 24-hour forecasts, study #2 provides data
relevant to ensemble forecasting. The next section
summarizes the results.
4. RESULTS
Results related to study #1 on the effects of bugs on
forecasts are given in section 4.1. Results related to study
#2, which builds on study #1 with additional investigation
of sensitivity analyses are given in section 4.2. Some
caveats are given in section 4.3.
4.1 The Effects of Bugs on Forecasts
It is useful to be able to compare source code files based
on the likelihood of each that an important bug in it will be
readily detected. Such comparisons would help focus
software testing activities on files for which undetected
bugs are most likely to reside. This motivated defining the
metric dc for the dependability of source code file c in
section 3.1.
Low values of dc mean that a bug is relatively likely to
allow a simulation to complete but with error in its output.
High values of dc mean that a bug is relatively likely to
cause the program to crash without giving output. Thus, a
low value of dc indicates that file c is relatively likely to
contain undetected bugs and therefore that file c is a good
candidate for careful testing to find bugs. Values of dc for a
number of important files in MM5 are shown in Figure 2.
Conclusion: of the files tested, bugs in exmoiss.f, hadv.f,
init.f, mrfpbl.f, param.f, and vadv.f are more likely than
bugs in the others to have insidious rather than obvious
effects. Hence these files might be expected to benefit
from correspondingly thorough testing.
4.2 Effects on Forecasts Vs. Effects on
Sensitivity Analyses
MM5 and other weather and climate simulation systems
are useful for both forecasting and for sensitivity analysis.
It is therefore an interesting question about MM5 which,
forecasts or sensitivity analysis results, are more resistant
to bugs which, as noted earlier, are undoubtedly present.
Likelihood a serious bug will be evident
0
0.2
0.4
0.6
0.8
1
1.2
exmoiss
hadv
init
kfpara2
kfpara
lexmoiss
mrfpbl
param
paramr
solve
vadv
tridi2
source code file name
d_c
Figure 2. Values of dc for some MM5 source code files. Lower
values suggest files that are good candidates for focusing
testing effort on.
To answer this question, the metrics described in section
3.2 were used. Instead of a composite number
summarizing 11 output parameters, for this study we used
8 output parameters and analyzed each separately. Because
a total of 10,893 different mutations were tested on both
the base and perturbed inputs, and 8 output parameters
were observed for each mutation, a total of 87,144
different sensitivities (i.e., values of Sm as defined in
section 3.2) were observed. Similarly, the same number of
forecasted parameter values (i.e., values of Fm as defined
in section 3.2) were observed. Although most were
unaffected by the mutation, 12,835 were affected (first two
rows of Table 1).
Perturbation to the input conditions was done as follows.
Some variables in the file init.f were changed by 0.0001%.
The variables chosen for this were the prognostic 3D
variables (UA, UB, VA, VB, TA, TB, QVA, QVB) for
each grid in the domain (these variables are described e.g.
in http://www.mmm.ucar.edu/mm5 /documents/mm5-
code-pdf/sec6.pdf.). The percentage of perturbation was
chosen to be close to the smallest percentage for which the
program produced significant changes to the output.
For many mutations and observed output parameters, the
sensitivities and forecasted values of an observed output
parameter r were exactly the same as for the unmutated
program, so for each such mutation m and parameter r,
Fm(r)=Sm(r). For other mutations and parameters, the
change caused by that mutation to the forecast was greater
than the change to the sensitivity. For those, Fm(r)>Sm(r).
Finally, for the remaining mutations and parameters, the
situation was reversed and Fm(r)<Sm(r). In order to
determine whether MM5’s forecasts or sensitivity analyses
were more resistant to bugs, we simply compare the
quantity of parameter/mutation pairs for which Fm(r)>Sm(r)
to the number for which Fm(r)<Sm(r). If more pairs have
Fm(r)>Sm(r) than have Fm(r)<Sm(r), then forecasts are more
likely to be affected by bugs than sensitivity analyses and
therefore MM5 is observed to be more dependable for
sensitivity analyses than forecasts. If fewer pairs, then the
opposite is observed: MM5 would be observed to be more
dependable for forecasts. Results will be presented at the
meeting.
4.3 Details, Caveats, and Needs for Further
Work
Study #1. The results of this study on the effects of bugs
on forecasts required mutations to be applied to a number
of different source files. A number of different types of
mutations were applied, each designed to be plausible as a
kind of bug a human programmer might accidentally make.
As examples, loops can suffer from “off-by-one” bugs,
additions can be programmed into calculations when
subtractions should be, multiplication and division can be
incorrectly substituted similarly, and so on. Table 2 shows
the types of mutations that were used. Each was applied
opportunistically to a source file at each point in it where
the source code would permit such a mutation. Thus, for
example, off-by-one bugs can be applied to points in the
code where loop control variables were tested.
This process of using a number of different, seemingly
plausible bug types leads to two caveats.
1) The proportion of each bug type in one source file
may not match the proportion in another source
file. The question this leads to is whether
differences observed across source code files
(Figure 2) could be due in part to differences in
the effects of mutations across different bug
types. (Similarly, differences in effects of
different bug types could be due in part to
differences across the source files containing
them.) Appropriate statistical analyses should be
able to separate the effects due to source code file
from those due to bug type.
2) Although the bug types used have intuitive appeal
as mistakes that human programmers might make,
there is no claim that all such mistakes are
captured by the set of bug types used for
mutations in this (or any) work. In particular,
humans can make diffuse mistakes that cover a
number of lines of code, and these are hard to
mimic when generating mutations artificially.
Mutation analyses have historically assumed that
automatically generated mutations are similar in
their effects to human programmer errors,
however.
Another limitation of the study is its reliance on a single
weather forecasting scenario. While within the range of
typical forecasting problems, it is possible that other initial
conditions could lead to different results for the
dependability metrics of the source files. This could in
principle be addressed by seeing if similar results are
obtained for a set of diverse forecasting problems.
Study #2. This study comparing the ability of sensitivity
analyses and forecasts to resist the effects of bugs relied on
a particular perturbation to the input conditions, a
particular weather scenario (as in the other study), and a
particular time period of 24 hours. Each of these may
potentially have an influence on the results.
The perturbation to the input conditions was chosen to be
small in order to stay within the linear response region of
the simulation system. However weather simulation is
well-known to be mathematically chaotic. Thus, there may
be legitimate doubt about whether in fact this experiment
did stay within the linear response region (or even if trying
to do so is worth doing). The questions this raises is
whether different input perturbations might lead to
different assessments of which resists bugs better, weather
forecasts or sensitivity analyses. The solution here is more
extensive testing that includes and compares different
input perturbations.
The relative abilities of forecasts vs. sensitivity analyses to
resist bugs may also potentially depend on the time period
of the simulation. While 24 hours incorporates both day
and night, thereby exercising varied portions of the system,
other time periods are also of interest. This suggests
additional testing that incorporates a range of different
time periods.
Finally, as in the other study, more extensive testing could
usefully include different weather scenarios.
Severity Wise:
Major: A defect, which will cause an observable product failure or departure from requirements.
Minor: A defect that will not cause a failure in execution of the product.
Fatal: A defect that will cause the system to crash or close abruptly or effect other applications.
Work product wise:
SSD: A defect from System Study document
FSD: A defect from Functional Specification document
ADS: A defect from Architectural Design Document
DDS: A defect from Detailed Design document
Source code: A defect from Source code
Test Plan/ Test Cases: A defect from Test Plan/ Test Cases
User Documentation: A defect from User manuals, Operating manuals
Type of Errors Wise:
Comments: Inadequate/ incorrect/ misleading or missing comments in the source code
Computational Error: Improper computation of the formulae / improper business validations in code.
Data error: Incorrect data population / update in database
Database Error: Error in the database schema/Design
Missing Design: Design features/approach missed/not documented in the design document and hence does not correspond to requirements
Inadequate or sub optimal Design: Design features/approach needs additional inputs for it to be completeDesign features described does not provide the best approach (optimal approach) towards the solution required
In correct Design: Wrong or inaccurate Design
Ambiguous Design: Design feature/approach is not clear to the reviewer. Also includes ambiguous use of words or unclear design features.
Boundary Conditions Neglected: Boundary conditions not addressed/incorrect
Interface Error: Internal or external to application interfacing error, Incorrect handling of passing parameters, Incorrect alignment, incorrect/misplaced fields/objects, un friendly window/screen positions
Logic Error: Missing or Inadequate or irrelevant or ambiguous functionality in source code
Message Error: Inadequate/ incorrect/ misleading or missing error messages in source code
Navigation Error: Navigation not coded correctly in source code
Performance Error: An error related to performance/optimality of the code
Missing Requirements: Implicit/Explicit requirements are missed/not documented during requirement phase
Inadequate Requirements: Requirement needs additional inputs for to be complete
Incorrect Requirements: Wrong or inaccurate requirements
Ambiguous Requirements: Requirement is not clear to the reviewer. Also includes ambiguous use of words – e.g. Like, such as, may be, could be, might etc.
Sequencing / Timing Error: Error due to incorrect/missing consideration to timeouts and improper/missing sequencing in source code.
Standards: Standards not followed like improper exception handling, use of E & D Formats and project related design/requirements/coding standards
System Error: Hardware and Operating System related error, Memory leak
Test Plan / Cases Error: Inadequate/ incorrect/ ambiguous or duplicate or missing - Test Plan/ Test Cases & Test Scripts, Incorrect/Incomplete test setup
Typographical Error: Spelling / Grammar mistake in documents/source code
Variable Declaration Error: Improper declaration / usage of variables, Type mismatch error in source code
Status Wise:
Open
Closed
Deferred
Cancelled
These are the major ways in which defects can be classified. I'll write more regarding the probable causes of these defects in one of my next posts... :)
Principles of Software TestingWhat is not Software Testing?Test is not debugging. Debugging has the goal to remove errors. The existence and the approximate location of the error are known. Debugging is not documented. There is no specification and there will be no record (log) or report. Debugging is the result of testing but never a substitution for it.
Test can never find 100% of the included errors. There will be always a rest of remaining errors which can not be found. Each kind of test will find a different kind of errors.
Test has the goal to find errors and not their reasons. Therefore the activity of testing will not include any bug fixing or implementation of functionality. The result of testing is a test report. A tester must not modify the code he is testing, by no means! This has to be done by the developer based on the test report he receives from the tester.
What is Software Testing?Test is a formal activity. It involves a strategy and a systematic approach. The different stages of tests supplement each other. Tests are always specified and recorded.
Test is a planned activity. The workflow and the expected results are specified. Therefore the duration of the activities can be estimated. The point in time where tests are executed is defined.
Test is the formal proof of software quality.
Overview of Test Methods Static tests
The software is not executed but analyzed offline. In this category would be code inspections (e.g. Fagan inspections), Lint checks, cross reference checks, etc.
Dynamic tests
This requires the execution of the software or parts of the software (using stubs). It can be executed in the target system, an emulator or simulator. Within the dynamic tests the state of the art distinguishes between structural and functional tests.
Structural tests
These are so called "white-box tests" because they are performed with the knowledge of the source code details. Input interfaces are stimulated with the aim to run through certain predefined branches or paths in the software. The software is stressed with critical values at the boundaries of the input values or even with illegal input values. The behavior of the output interface is recorded and compared with the expected (predefined) values.
Functional tests
These are the so called "black-box" tests. The software is regarded as a unit with unknown content. Inputs are stimulated and the values at the output results are recorded and compared to the expected and specified values.
Test by progressive Stages The various tests are able to find different kinds of errors. Therefore it is not enough to rely on one kind of test and completely neglect the other. E.g. white-box tests will be able to find coding errors. To detect the same coding error in the system test is very difficult. The system malfunction which may result from the coding error will not necessarily allow conclusions about the location of the coding error. Test therefore should be progressive and supplement each other in stages in order to find each kind of error with the appropriate method.
Module test
A module is the smallest compilable unit of source code. Often it is too small to allow functional tests (black-box tests). However it is the ideal candidate for white-box tests. These have to be first of all static tests (e.g. Lint and inspections) followed by dynamic tests to check boundaries, branches and paths. This will usually require the employment of stubs and special test tools.
Component test
This is the black-box test of modules or groups of modules which represent certain functionality. There are no rules about what can be called a component. It is just what the tester defined to be a component, however it should make sense and be a testable unit. Components can be step by step integrated to bigger components and tested as such.
Integration test
The software is step by step completed and tested by tests covering a collaboration of modules or classes. The integration depends on the kind of system. E.g. the steps could be to run the operating system first and gradually add one component after the other and check if the black-box tests still run (the test cases of course will increase with every added component). The integration is still done in the laboratory. It may be done using simulators or emulators. Input signals may be stimulated.
System test
This is a black-box test of the complete software in the target system. The environmental conditions have to be realistic (complete original hardware in the destination environment).
Testing Principles
Out of Glenford Myers, ``The Art of Software Testing'':
A necessary part of a test case is a definition of the expected output or result.
A programmer should avoid attempting to test his or own program.
A programming organization should not test its own programs.
Thoroughly inspect the results of each test.
Test cases must be written for invalid and unexpected, as well as valid and expected, input conditions.
Examining a program to see if it does not do what it is supposed to do is only half of the battle. The other half is seeing whether the program does what it is not supposed to do.
Avoid throw-away test cases unless the program is truly a throw-away program.
Do not plan a testing effort under the tacit assumption that no errors will be found.
The probability of the existence of more errors in a section of a program is proportional to the number of errors already found in that section.
Testing Principles
Out of Glenford Myers, ``The Art of Software Testing'':
A necessary part of a test case is a definition of the expected output or result.
A programmer should avoid attempting to test his or own program.
A programming organization should not test its own programs.
Thoroughly inspect the results of each test.
Test cases must be written for invalid and unexpected, as well as valid and expected, input conditions.
Examining a program to see if it does not do what it is supposed to do is only half of the battle. The other half is seeing whether the program does what it is not supposed to do.
Avoid throw-away test cases unless the program is truly a throw-away program.
Do not plan a testing effort under the tacit assumption that no errors will be found.
The probability of the existence of more errors in a section of a program is proportional to the number of errors already found in that section.
Testing Principles
Out of Glenford Myers, ``The Art of Software Testing'':
A necessary part of a test case is a definition of the expected output or result.
A programmer should avoid attempting to test his or own program.
A programming organization should not test its own programs.
Thoroughly inspect the results of each test.
Test cases must be written for invalid and unexpected, as well as valid and expected, input conditions.
Examining a program to see if it does not do what it is supposed to do is only half of the battle. The other half is seeing whether the program does what it is not supposed to do.
Avoid throw-away test cases unless the program is truly a throw-away program.
Do not plan a testing effort under the tacit assumption that no errors will be found.
The probability of the existence of more errors in a section of a program is proportional to the number of errors already found in that section.
Testing Principles
Out of Glenford Myers, ``The Art of Software Testing'':
A necessary part of a test case is a definition of the expected output or result.
A programmer should avoid attempting to test his or own program.
A programming organization should not test its own programs.
Thoroughly inspect the results of each test.
Test cases must be written for invalid and unexpected, as well as valid and expected, input conditions.
Examining a program to see if it does not do what it is supposed to do is only half of the battle. The other half is seeing whether the program does what it is not supposed to do.
Avoid throw-away test cases unless the program is truly a throw-away program.
Do not plan a testing effort under the tacit assumption that no errors will be found.
The probability of the existence of more errors in a section of a program is proportional to the number of errors already found in that section.
STSC
As Of: 5/8/2002 David A. Cook V&V 10
Verification – Part 1 Inspections
• I would love to cover inspections and
reviews – but time does not permit. If
you are interested, email me and I will
send you a presentation on reviews
and inspections that I co-presented
last year.
• If you can only do ONE inspection on
a project, you get the biggest “bang
for the buck” (ROI) performing
requirements inspections.
way?
• Are the requirements complete?
STSC
As Of: 5/8/2002 David A. Cook V&V 12
Interpretation - Exercise
Count the number of occurrences below of the
letter e . No questions – just count!!
ANSWER: the letter e occurs ___ times.
“Any activity you can perform to reduce
testing errors is cost-efficient. Inspections
are very effective, and requirements
inspections provide the the biggest ROI of
any inspection effort.”
STSC
As Of: 5/8/2002 David A. Cook V&V 13
Are the Requirements Complete?
• The best way to determine this is to use
checklists to ensure that you are asking the
right questions at inspection time.
• In addition, such things as a trained and
prepared inspection team plus adequate
preparation help.
Ada
95
STSC
As Of: 5/8/2002 David A. Cook V&V 14
What to Inspect:
• The software requirements specification (or
however you list the requirements).
• The sources or preliminaries for the SRS
(concept of operations) or any documents that
preceded the SRS.
If it is used as a
requirements
reference, then
inspect it!
STSC
As Of: 5/8/2002 David A. Cook V&V 15
Let’s see – I have to
pay the grocer,
pay the electricity bill,
pay the mortgage,
and make a car payment.
This is an expensive check list!
Sample Checklists
• Following are some sample checklists.
• These checklists grew out of research with
several customers.
STSC
As Of: 5/8/2002 David A. Cook V&V 16
Requirements Review Checklist
• Is problem partitioning complete?
• Are external and internal interfaces
properly defined?
• Can each requirement be tested?
• Can each requirement be numbered and
easily identified? (Is the requirement
tracable?)
STSC
As Of: 5/8/2002 David A. Cook V&V 17
Requirements Review Checklist
(Cont.)
• Has necessary prototyping been conducted
for users?
• Have implied requirements (such as speed,
errors, response time) been stated?
• Is performance achievable within constraints
of other system elements?
STSC
As Of: 5/8/2002 David A. Cook V&V 18
Requirements Review Checklist (Cont.)
• Are requirements consistent with schedule,
resources and budget?
• Is information to be displayed to the user
listed in the requirements?
• Have future issues (upgrades, planned
migration, long-term use) been addressed in
requirements?
STSC
As Of: 5/8/2002 David A. Cook V&V 19
Even I can’t remember
all of these at once –
so I inspect
requirements when I
am finished writing
them!
What You Are Looking For…
• Are requirements that are
• Unambiguous
• Complete
• Verifiable
• Consistent
• Modifiable
• Traceable
• Usable
• Prioritized (optional)
STSC
As Of: 5/8/2002 David A. Cook V&V 20
Metrics for
Requirements
• During the
requirements phase,
there are few metrics
that are very useful.
• One simple metric is
simply the % of
requirements that
have been inspected.
STSC
As Of: 5/8/2002 David A. Cook V&V 21
Metrics for Requirements
• Another useful metric
# Of requirements that reviewers
interpreted the same
Total # of requirements reviewed
STSC
As Of: 5/8/2002 David A. Cook V&V 22
Configuration Management
“The most frustrating software problems are
often caused by poor configuration
management. The problems are frustrating
because they take time to fix. They often
happen at the worst time, and they are totally
unnecessary”.
-Watts Humphrey,
“Managing The Software Process.”
STSC
As Of: 5/8/2002 David A. Cook V&V 23
Configuration Management
• Again, time does not permit a complete
discussion of CM. If you want an intro, email
and I’ll send you a presentation.
• Requirements require a
centralized location and
STRICT configuration
management. If you are
“sloppy” here – soon
it all goes downhill.
STSC
As Of: 5/8/2002 David A. Cook V&V 24
Validation of Requirements
STSC
As Of: 5/8/2002 David A. Cook V&V 25
Validation Is Difficult for Software
• Based on three concepts
• Testing
• Metrics
• Quality assurance teams
• Testing is the least important
• Difficult to test requirements prior to coding
• Most requirement methodologies (prototyping,
simulation) rely on many assumptions and
simplifications
STSC
As Of: 5/8/2002 David A. Cook V&V 26
Metrics for Validation
• Metrics can be useful, but
mostly in hindsight.
• If you know how many of
your bugs or how much of
your defect fix time are
requirements-related, you
can adjust inspections and
reviews accordingly.
STSC
As Of: 5/8/2002 David A. Cook V&V 27
Validation of Requirements
• The best way to validate requirements is to
involve customers in the requirements
inspection process.
• End-users.
• Program office.
• User management.
• End-users are the most effective, but hardest
to include.
• End-users typically only see the “small
picture”, and requirements are written in the
large.
STSC
As Of: 5/8/2002 David A. Cook V&V 28
If Dogs Wrote
Requirements
Validation Typically Occurs Twice
• During requirements gathering/analysis/
review.
• After coding and during test. This SHOULD
be the function of the QA team.
STSC
As Of: 5/8/2002 David A. Cook V&V 29
Danger, Danger
• QA and testing are
EXTREMELY EXPENSIVE!
• Anything you can do to shorten the QA/test
phase is useful. If you rely on QA and testing
to find and fix errors – your
software is probably
• Late
• Over budget
• Still full of errors after you deliver it!!
STSC
As Of: 5/8/2002 David A. Cook V&V 30
Validation Summary
• In summary – validation requires a tie-in
between implementers and users. If you can’t
involve users as much as you would like, then
designate people to (ALAC) Act Like A
customer.
• I have no checklists for validation, but suggest
that you focus on two areas:
• External interfaces to systems.
• Internal interfaces between modules or sub-systems.
STSC
As Of: 5/8/2002 David A. Cook V&V 31
STSC
As Of: 5/8/2002 David A. Cook V&V 32
Good Source of Information
• Software Verification and Validation for
Practitioners and Managers, by Steven R.
Rakitin
STSC
As Of: 5/8/2002 David A. Cook V&V 33
In Short … There Are Solutions!
6. State-based testing
State machine: implementation-independent specification
(model) of the dynamic behavior of the system
§ state: abstract situation in the life cycle of a system entity
(for instance, the contents of an object)
§ event: a particular input (for instance, a message or
method call)
§ action: the result, output or operation that follows an event
§ transition: an allowable two-state sequence, that is, a
change of state (”firing”) caused by an event
§ guard: predicate expression associated with an event,
stating a Boolean restriction for a transition to fire
Jukka Paakki 2
initial state /
action
final state /
action
event [ guard ] /
action
2
Jukka Paakki 3
There are several types of state machines:
§ finite automaton (no guards or actions)
§ Mealy machine (no actions associated with states)
§ Moore machine (no actions associated with transitions)
§ statechart (hierarchical states: common superstates)
§ state transition diagram: graphic representation of a state
machine
§ state transition table: tabular representation of a state machine
Jukka Paakki 4
Example: Mealy model of a two-player video game
§ each player has a start button
§ the player who presses the start button first gets the
first serve
§ the current player serves and a volley follows:
– if the server misses the ball, the server’s opponent becomes
the server
– if the server’s opponent misses the ball, the server’s score
is incremented and the server gets to serve again
– if the server’s opponent misses the ball and the server’s
score is at game point, the server is declared the winner
(here: a score of 21 wins)
3
Jukka Paakki 5
Game
started
Player 1
served
Player 2
served
Player 1
won
Player 2
won
p1_Start /
simulateVolley( )
p2_Start /
simulateVolley( )
p1_WinsVolley
[ p1_Score < 20 ] /
p1AddPoint( )
simulateVolley( )
p2_WinsVolley
[ p2_Score < 20 ] /
p2AddPoint( )
p1_WinsVolley / simulateVolley( )
simulateVolley( )
p2_WinsVolley /
p1_WinsVolley simulateVolley( )
[ p1_Score = 20 ] /
p1AddPoint
p2_WinsVolley
[ p2_Score = 20 ] /
p2AddPoint
p1_IsWinner ? /
return TRUE
p2_IsWinner ? /
return TRUE
Jukka Paakki 6
General properties of state machines
§ typically incomplete
– just the most important states, events and transitions are given
– usually just legal events are associated with transitions; illegal events
(such as p1_Start from state Player 1 served) are left undefined
§ may be deterministic or nondeterministic
– deterministic: any state/event/guard triple fires a unique transition
– nondeterministic: the same state/event/guard triple may fire several
transitions, and the firing transition may differ in different cases
§ may have several final states (or none: infinite computations)
§ may contain empty events (default transitions)
§ may be concurrent: the machine (statechart) can be in several
different states at the same time
4
Jukka Paakki 7
The role of state machines in software testing
§ Framework for model testing, where an executable model (state
machine) is executed or simulated with event sequences as test
cases, before starting the actual implemention phase
§ Support for testing the system implementation (program) against
the system specification (state machine)
§ Support for automatic generation of test cases for the
implementation
– there must be an explicit mapping between the elements of the state
machine (states, events, actions, transitions, guards) and the elements of
the implementation (e.g., classes, objects, attributes, messages, methods,
expressions)
– the current state of the state machine underlying the implementation must
be checkable, either by the runtime environment or by the implementation
itself (built-in tests with, e.g., assertions and class invariants)
Jukka Paakki 8
Validation of state machines
Checklist for analyzing that the state machine is complete and
consistent enough for model or implementation testing:
– one state is designated as the initial state with outgoing transitions
– at least one state is designated as a final state with only incoming
transitions; if not, the conditions for termination shall be made explicit
– there are no equivalent states (states for which all possible outbound event
sequences result in identical action sequences)
– every state is reachable from the initial state
– at least one final state is reachable from all the other states
– every defined event and action appears in at least one transition (or state)
– except for the initial and final states, every state has at least one incoming
transition and at least one outgoing transition
5
Jukka Paakki 9
– for deterministic machines, the events accepted in a particular state are
unique or differentiated by mutually exclusive guard expressions
– the state machine is completely specified: every state/event pair has at least
one transition, resulting in a defined state; or there is an explicit
specification of an error-handling or exception-handling mechanism for
events that are implicitly rejected (with no specified transition)
– the entire range of truth values (true, false) must be covered by the guard
expressions associated with the same event accepted in a particular state
– the evaluation of a guard expression does not produce any side effects in
the implementation under test
– no action produces side effects that would corrupt or change the resultant
state associated with that action
– a timeout interval (with a recovery mechanism) is specified for each state
– state, event and action names are unambiguous and meaningful in the
context of the application
Jukka Paakki 10
Control faults
When testing an implementation against a state machine, one shall
study the following typical control faults (incorrect sequences of
events, transitions, or actions):
– missing transition (nothing happens with an event)
– incorrect transition (the resultant state is incorrect)
– missing or incorrect event
– missing or incorrect action (wrong things happen as a result of a transition)
– extra, missing or corrupt state
– sneak path (an event is accepted when it should not be)
– trap door (the implementation accepts undefined events)
6
Jukka Paakki 11
Game
started
Player 1
served
Player 2
served
Player 1
won
Player 2
won
p1_Start /
simulateVolley( )
p2_Start /
simulateVolley( )
p1_WinsVolley
[ p1_Score < 20 ] /
p1AddPoint( )
simulateVolley( )
p2_WinsVolley
[ p2_Score < 20 ] /
p2AddPoint( )
simulateVolley( )
p2_WinsVolley /
p1_WinsVolley simulateVolley( )
[ p1_Score = 20 ] /
p1AddPoint
p2_WinsVolley
[ p2_Score = 20 ] /
p2AddPoint
p1_IsWinner ? /
return TRUE
p2_IsWinner ? /
return TRUE
Missing transition: Player 2 loses the volley, but continues as server
Jukka Paakki 12
Game
started
Player 1
served
Player 2
served
Player 1
won
Player 2
won
p1_Start /
simulateVolley( )
p2_Start /
simulateVolley( )
p1_WinsVolley
[ p1_Score < 20 ] /
p1AddPoint( )
simulateVolley( )
p2_WinsVolley
[ p2_Score < 20 ] /
p2AddPoint( )
simulateVolley( )
p1_WinsVolley /
simulateVolley( )
p2_WinsVolley /
p1_WinsVolley simulateVolley( )
[ p1_Score = 20 ] /
p1AddPoint
p2_WinsVolley
[ p2_Score = 20 ] /
p2AddPoint
p1_IsWinner ? /
return TRUE
p2_IsWinner ? /
return TRUE
Incorrect transition: After player 2 misses, the game resets
7
Jukka Paakki 13
Game
started
Player 1
served
Player 2
served
Player 1
won
Player 2
won
p1_Start p2_Start
p1_WinsVolley
[ p1_Score < 20 ] /
p1AddPoint( )
simulateVolley( )
p2_WinsVolley
[ p2_Score < 20 ] /
p2AddPoint( )
p1_WinsVolley / simulateVolley( )
simulateVolley( )
p2_WinsVolley /
p1_WinsVolley simulateVolley( )
[ p1_Score = 20 ] /
p1AddPoint
p2_WinsVolley
[ p2_Score = 20 ] /
p2AddPoint
p1_IsWinner ? /
return TRUE
p2_IsWinner ? /
return TRUE
Missing actions: No volley is generated, and the system will wait forever
Jukka Paakki 14
Game
started
Player 1
served
Player 2
served
Player 1
won
Player 2
won
p1_Start /
simulateVolley( )
p2_Start /
simulateVolley( )
p1_WinsVolley
[ p1_Score < 20 ] /
p1AddPoint( )
simulateVolley( )
p2_WinsVolley
[ p2_Score < 20 ] /
p1AddPoint( )
p1_WinsVolley / simulateVolley( )
simulateVolley( )
p2_WinsVolley /
p1_WinsVolley simulateVolley( )
[ p1_Score = 20 ] /
p1AddPoint
p2_WinsVolley
[ p2_Score = 20 ] /
p2AddPoint
p1_IsWinner ? /
return TRUE
p2_IsWinner ? /
return TRUE
Incorrect action: Player 2 can never win
8
Jukka Paakki 15
Game
started
Player 1
served
Player 2
served
Player 1
won
Player 2
won
p1_Start /
simulateVolley( )
p2_Start /
simulateVolley( )
p1_WinsVolley
[ p1_Score < 20 ] /
p1AddPoint( )
simulateVolley( )
p2_WinsVolley
[ p2_Score < 20 ] /
p2AddPoint( )
p1_WinsVolley / simulateVolley( )
simulateVolley( )
p2_WinsVolley /
p1_WinsVolley simulateVolley( )
[ p1_Score = 20 ] /
p1AddPoint
p2_WinsVolley
[ p2_Score = 20 ] /
p2AddPoint
p1_IsWinner ? /
return TRUE
p2_IsWinner ? /
return TRUE
Sneak path: Player 2 can immediately win by pressing the start button at serve
p2_Start
Jukka Paakki 16
Game
started
Player 1
served
Player 2
served
Player 1
won
Player 2
won
p1_Start /
simulateVolley( )
p2_Start /
simulateVolley( )
p1_WinsVolley
[ p1_Score < 20 ] /
p1AddPoint( )
simulateVolley( )
p2_WinsVolley
[ p2_Score < 20 ] /
p2AddPoint( )
p1_WinsVolley / simulateVolley( )
simulateVolley( )
p2_WinsVolley /
p1_WinsVolley simulateVolley( )
[ p1_Score = 20 ] /
p1AddPoint
p2_WinsVolley
[ p2_Score = 20 ] /
p2AddPoint
p1_IsWinner ? /
return TRUE
p2_IsWinner ? /
return TRUE
Trap door: Player 1 can immediately win by pressing the Esc key at serve
Esc
9
Jukka Paakki 17
Test design strategies for state-based testing
Test cases for state machines and their implementations can be
designed using the same notion of coverage as in white-box
testing:
§ test case = sequence of input events
§ all-events coverage: each event of the state machine is
included in the test suite (is part of at least one test case)
§ all-states coverage: each state of the state machine is exercised
at least once during testing, by some test case in the test suite
§ all-actions coverage: each action is executed at least once
Jukka Paakki 18
§ all-transitions: each transition is exercised at least once
– implies (subsumes) all-events coverage, all-states coverage,
and all-actions coverage
– ”minimum acceptable strategy for responsible testing of a state machine”
§ all n-transition sequences: every transition sequence generated
by n events is exercised at least once
– all transitions = all 1-transition sequences
– all n-transition sequences implies (subsumes) all (n-1)-transition
sequences
§ all round-trip paths: every sequence of transitions beginning
and ending in the same state is exercised at least once
§ exhaustive: every path over the state machine is exercised at
least once
– usually totally impossible or at least unpractical
10
Jukka Paakki 19
Game
started
Player 1
served
Player 2
served
Player 1
won
Player 2
won
p1_Start /
simulateVolley( )
p2_Start /
simulateVolley( )
p1_WinsVolley
[ p1_Score < 20 ] /
p1AddPoint( )
simulateVolley( )
p2_WinsVolley
[ p2_Score < 20 ] /
p2AddPoint( )
p1_WinsVolley / simulateVolley( )
simulateVolley( )
p2_WinsVolley /
p1_WinsVolley simulateVolley( )
[ p1_Score = 20 ] /
p1AddPoint
p2_WinsVolley
[ p2_Score = 20 ] /
p2AddPoint
p1_IsWinner ? /
return TRUE
p2_IsWinner ? /
return TRUE
all-states coverage
Jukka Paakki 20
Game
started
Player 1
served
Player 2
served
Player 1
won
Player 2
won
p1_Start /
simulateVolley( )
p2_Start /
simulateVolley( )
p1_WinsVolley
[ p1_Score < 20 ] /
p1AddPoint( )
simulateVolley( )
p2_WinsVolley
[ p2_Score < 20 ] /
p2AddPoint( )
p1_WinsVolley / simulateVolley( )
simulateVolley( )
p2_WinsVolley /
p1_WinsVolley simulateVolley( )
[ p1_Score = 20 ] /
p1AddPoint
p2_WinsVolley
[ p2_Score = 20 ] /
p2AddPoint
p1_IsWinner ? /
return TRUE
p2_IsWinner ? /
return TRUE
all-transitions coverage
11
Jukka Paakki 21
7. Testing object-oriented software
§ The special characteristics of the object-oriented software
engineering paradigm provide some advantages but also
present some new problems for testing
– advantages: well-founded design techniques and models (UML), clean
architectures and interfaces, reusable and mature software patterns
– problems: inheritance, polymorphism, late (dynamic) binding
§ There is a need for object-oriented testing process, objectoriented
test case design, object-oriented coverage metrics, and
object-oriented test automation
Jukka Paakki 22
Object-oriented code defects
– buggy interaction of individually correct superclass and subclass methods
– omitting a subclass-specific override for a high-level superclass method in a
deep inheritance hierarchy
– subclass inheriting inappropriate methods from superclasses (”fat
inheritance”)
– failure of a subclass to follow superclass contracts in polymorphic servers
– omitted or incorrect superclass-initialization in subclasses
– incorrect updating of superclass instance variables in subclasses
– spaghetti polymorphism resulting in loss of execution control (the ”yo-yo”
problem caused by dynamic binding and self and super objects)
– subclasses violating the state model or invariant of the superclass
– instantiation of generic class with an untested type parameter
– corrupt inter-modular control relationships, due to delocalization of
functionality in mosaic small-scale and encapsulated classes
– unanticipated dynamic bindings resulting from scoping nuances in multiple
and repeated inheritance
12
Jukka Paakki 23
Object-oriented challenges for test automation
– low controllability, observability, and testability of the system, due to a
huge number of small encapsulated objects
– difficulties in analyzing white-box coverage, due to a large number of
complex and dynamic code dependencies
– construction of drivers, stubs, and test suites that conform to the inheritance
structure of the system
– reuse of superclass test cases in (regression) testing of subclasses
– incomplete applications (object-oriented frameworks)
Jukka Paakki 24
Object-oriented challenges for testing process
– need for a larger number of testing phases, due to an iterative and
incremental software development process
– unclear notion of ”module” or ”unit”, due to individual classes being too
small and too much coupled with other classes
– (UML) models being too abstract to support test case design
– need for more test cases than for conventional software, due to the inherent
dynamic complexity of object-oriented code
– general and partly abstract reusable classes and frameworks, making it
necessary to test them also for future (unknown) applications
13
Jukka Paakki 25
7.1. UML and software testing
UML (Unified Modeling Language): a design and modeling
language for (object-oriented) software
– widely used ”de facto” standard
– provides techniques to model the software from different perspectives
– supports facilities both for abstract high-level modeling and for more
detailed low-level modeling
– consists of a variety of graphical diagram types that can be extended for
specific application areas
– associated with a formal textual language, OCL (Object Constraint
Language)
– provides support for model-based testing: (1) testing of an executable
UML model, (2) testing of an implementation against its UML model
Jukka Paakki 26
UML diagrams in software testing
§ Use case diagrams: testing of system-level functional
requirements, acceptance testing
§ Class diagrams: class (module / unit) testing, integration testing
§ Sequence diagrams, collaboration diagrams : integration testing,
testing of control and interaction between objects, testing of
communication protocols between (distributed) objects
§ Activity diagrams: testing of work flow and synchronization
within the system, white-box testing of control flow
§ State diagrams (statecharts): state-based testing
§ Package diagrams, component diagrams: integration testing
§ Deployment diagrams: system testing
14
Jukka Paakki 27
7.2. Testing based on use cases
§ Use case: a sequence of interactions by which the user
accomplishes a task in a dialogue with the system
– use case = one particular way to use the system
– use case = user requirement
– set of all use cases = complete functionality of the system
– set of all use cases = interface between users (actors) and system
§ Scenario: an instance of a use case, expressing a specific task by
a specific actor at a specific time and using specific data
§ UML: use case model = set of use case diagrams, each
associated with a textual description of the user’s task
– for testing, the use case model must be extended with (1) the domain of
each variable participating in the use cases, (2) the input/output
relationships among the variables, (3) the relative frequency of the use
cases, (4) the sequential (partial) order among the use cases
Jukka Paakki 28
Example: Automatic Teller Machine (ATM)
Bank
customer
Establish
session
Inquiry
Bank
transaction
Withdraw Deposit
Repair
Replenish
<<extends>> <<extends>>
ATM
technician
15
Jukka Paakki 29
Some ATM use cases and scenarios
Establish session /
Bank customer
Withdraw /
Bank customer
Replenish /
ATM technician
1) Wrong PIN entered. Display ”Reenter PIN”.
2) Valid PIN entered; bank not online. Display
”Try later”.
3) Valid PIN entered; account closed. Display
”Account closed, call your bank”.
4) Requests 50 €; account open; balance 51 €. 50 €
dispensed.
5) Requests 100 €; account closed. Display
”Account closed, call your bank”.
6) ATM opened; cash dispenser empty; 50000 €
added.
7) ATM opened; cash dispenser full.
Use case / actor Scenario
Jukka Paakki 30
From use cases to test cases
1. Identification of the operational variables: explicit inputs and outputs,
environmental conditions, states of the system, interface elements
– Establish session: PIN on the card, PIN entered, response from the bank, status of
the customer’s account
2. Domain definitions (and equivalence classes) of the variables
– Establish session: PIN Î [ 0000 – 9999 ]
3. Development of an operational relation among the variables, modeling the
distinct responses of the system as a decision table of variants
– variant: combination of equivalence classes, resulting in a specific system action
– variants shall be mutually exclusive
– scenarios are represented by variants
4. Development of the test cases for the variants
– at least one ”true” test case that satisfies all the variant’s requirements on variable
values
– at least one ”false” test case that violates the variant’s requirements
– typically every ”false” test case is a ”true” test case for another variant
16
Jukka Paakki 31
Does not – Invalid card Eject
acknowledge
7 Revoked –
Card Retain
revoked
6 Revoked – Acknowledges –
Select None
service
Matches Acknowledges Open
card
5 Valid
Insert ATM Eject
card
4 Invalid – – –
Matches Acknowledges Closed Call bank Eject
card
3 Valid
Does not – Try later Eject
acknowledge
Matches
card
2 Valid
Reenter None
PIN
Doesn’t – –
match card
1 Valid
ATM card
action
ATM
message
Account
status
Bank
response
Entered
PIN
Card
PIN
Variant
Establish session: Operational relation
Operational variables Expected action
Jukka Paakki 32
Eject
None
Invalid card
Reenter
–
–
NACK
–
–
0000
5555
9999
7T
7F (1T)
Retain
None
Revoked
Reenter
–
–
ACK
–
–
9999
5555
9998
6T
6F (1T)
None
Eject
Select
Try later
Open
–
ACK
NACK
3210
0001
3210
0001
5T
5F (2T)
Eject
None
Insert ATM
Select
–
Open
–
ACK
–
0001
%&+?
0001
4T
4F (5T)
Eject
Eject
Call bank
Insert ATM
Closed
–
ACK
–
0000
–
0000
000?
3T
3F (4T)
Eject
Eject
Try later
Call bank
–
Closed
NACK
ACK
9999
9999
9999
9999
2T
2F (3T)
None
Eject
Reenter
Try later
–
–
–
NACK
1134
1234
1234
1234
1T
1F (2T)
ATM card
action
ATM
message
Account
status
Bank
response
Entered
PIN
Card
PIN
Variant
Establish session: Test cases
17
Jukka Paakki 33
Some questions when designing the testing of subclasses:
§ Should we test inherited methods?
– extension: inclusion of superclass features in a subclass, inheriting
method implementation and interface (name and arguments)
– overriding: new implementation of a subclass method, with the same
inherited interface as that of a superclass method
– specialization: definition of subclass-specific methods and instance
variables, not inherited from a superclass
§ Can we reuse superclass tests for extended and overridden
methods?
§ To what extent should we exercise interaction among methods
of all superclasses and of the subclass under test?
7.3. Class-based testing
Jukka Paakki 34
1. Superclass modifications: a method is changed in superclass
– the changed method and all its interactions with changed and
unchanged methods must be retested in the superclass
– the method must be retested in all the subclasses inheriting the
method as extension
– the ”super” references to the method must be retested in subclasses
2. Subclass modifications: a subclass is added or modified,
without changing the superclasses
– all the methods inherited from a superclass must be retested in the
context of the subclass, even those which were just included by
extension but not overridden
Superclass-subclass development scenarios
18
Jukka Paakki 35
class Account {
Money balance; // current balance
void openAccount (Money amount)
{ balance = amount; } // account is
// opened
// …
};
class Stocks : public Account {
void takeShares (Money shares)
{ balance = shares ´ unitPrice ; }
// value of shares is deposit
// into account
void giveProportion ( )
{ print (totalPrice / balance); }
// …
};
Shared data
The balance may be
computed in openAccount
instead of takeShares,
so openAccount
must be retested for finding
the defect in giveProportion
Contract: balancemay be 0
Contract: balance may not be 0
Jukka Paakki 36
3. Reuse of the superclass test suite: a method is added to a
subclass, overriding a superclass method
– the new implementation of the method must be tested
– test cases for the superclass method can be reused, but additional test
cases are also needed for subclass-specific behavior
4. Addition of a subclass method: an entirely new method is
added to a specialized subclass
– the new method must be tested with method-specific test cases
– interactions between the new method and the old methods must be
tested with new test cases
Superclass-subclass development scenarios
19
Jukka Paakki 37
5. Change to an abstract superclass interface
– ”abstract” superclass: some of the methods have been left without
implementation; just the interface has been defined
– all the subclasses must be retested, even if they have not changed
6. Overriding a superclass method used by another superclass
method
– the superclass method that is using the overridden method will
behave differently, even though it has not been changed itself
– so, the superclass method must be retested
Superclass-subclass development scenarios
Jukka Paakki 38
class Account {
rollOver( ) { … yearEnd( ) … }
yearEnd ( ) { … foo ( ) … }
// …
}
class Deposit : public Account {
yearEnd ( ) { … bar ( ) … }
// …
}
The inherited
rollOver will now activate
”bar” instead of ”foo”,
so it must be retested
20
Jukka Paakki 39
8. Integration testing
Integration testing: search for module faults that cause failures
in interoperability of the modules
– ”module”: generic term for a program element that
implements some restricted, cohesive functionality
– typical modules: class, component, package
– ”interoperability”: interaction between different modules
– interaction may fail even when each individual module
works perfectly by itself and has been module-tested
– usually related to the call interface between modules: a
function call or its parameters are buggy
Jukka Paakki 40
Typical interface bugs
§ missing, overlapping, or conflicting functions
§ incorrect or inconsistent data structure used for a file or database
§ violation of the data integrity of global store or database
§ unexpected runtime binding of a method
§ client sending a message that violates the server’s constraints
§ wrong polymorphic object bound to message
§ incorrect parameter value
§ attempt to allocate too much resources from the target, making
the target crash
§ incorrect usage of virtual machine, object request broker, or
operating system service
§ incompatible module versions or inconsistent configuration
21
Jukka Paakki 41
Interaction dependencies between modules:
– function calls (the most usual case)
– remote procedure calls
– communication through global data or persistent storage
– client-server architecture
– composition and aggregation
– inheritance
– calls to an application programming interface (API)
– objects used as message parameters
– proxies
– pointers to objects
– dynamic binding
Jukka Paakki 42
Dependency tree: hierarchical representation of the
dependencies between modules
In integration testing: uses-dependency
– usually: module A uses module B =
(some function in) module A calls (some function in) module B
– can be generated by automated static dependency analyzers
A
B C
D
A uses (calls) B
A uses (calls) C
C uses (calls) D
22
Jukka Paakki 43
A: Air traffic
control system
B: Radar
input module
C: Aircraft
tracking module
D: Controller
interface module
E: Aircraft
position module
F: Aircraft identification
module
G: Position prediction
module
H: Display
update module
(1) Big-bang integration: all the modules are tested at the same time
– One test configuration: {A, B, C, D, E, F, G, H}
– Failure => where is the bug?
– OK for small and well-structured systems
– OK for systems constructed from trusted components
Integration testing strategies
Jukka Paakki 44
A: Air traffic
control system
B: Radar
input module
C: Aircraft
tracking module
D: Controller
interface module
E: Aircraft
position module
F: Aircraft identification
module
G: Position prediction
module
H: Display
update module
(2) Top-down integration: the system is tested incrementally,
level by level with respect to the dependency tree,
starting from the top level (root of the tree)
1. The main module A is tested by itself. Stubs: B, C, and D.
2. The subsystem {A, B, C, D} is tested. Stubs: E, F, G, and H.
3. Finally, the entire system is tested. No stubs.
– Failure in step 3: the bug is in E, F, G, or H
23
Jukka Paakki 45
Advantages:
– in case of failure, the suspect area is limited
– testing may begin early: when the top-level modules have been coded
– early validation of main functionality
– modules may be developed in parallel with testing
Disadvantages:
– lowest-level (most often used) modules are tested last, so performance
problems are encountered late
– requires stubs: partial proxy implementations of called modules
– since stubs must provide some degree of real functionality, it may be
necessary to have a set of test case specific stubs for each module
– dependency cycles must be resolved by testing the whole cycle as a
group or with a special cycle-breaking stub
A
B C
Jukka Paakki 46
A: Air traffic
control system
B: Radar
input module
C: Aircraft
tracking module
D: Controller
interface module
E: Aircraft
position module
F: Aircraft identification
module
G: Position prediction
module
H: Display
update module
(3) Bottom-up integration: the system is tested incrementally,
starting from the bottom level (leaves of the dependency tree)
1. The module E is tested alone. Driver: C. [The driver may have been developed
already in module testing of E.]
2. The module F is tested. (Extended) driver: C.
3. The module G is tested. (Extended) driver: C.
4. The module H is tested. (Extended) driver: C.
5. The module B is tested. Driver: A.
6. The module D is tested. (Extended) driver: A.
7. The modules {C, E, F, G, H} are tested together. (Extended) driver: A.
Failure => bug in C or its downwards-uses interfaces.
8. Finally, the entire system is tested. No drivers.
Failure => bug in A or its uses interfaces.
24
Jukka Paakki 47
Advantages:
– in case of failure, the suspect area is very narrow: one module and its
interfaces
– testing may begin early: as soon as any leaf-level module is ready
– initially, testing may proceed in parallel
– early validation of performance-critical modules
Disadvantages:
– main functionality (usability) and control interface of the system are
validated late
– requires drivers: partial proxy implementations of calling modules
– dependency cycles must be resolved
– requires many tests, especially if the dependency tree is broad and has
a large number of leaves
Jukka Paakki 48
A: Air traffic
control system
B: Radar
input module
C: Aircraft
tracking module
D: Controller
interface module
E: Aircraft
position module
F: Aircraft identification
module
G: Position prediction
module
H: Display
update module
(4) Backbone integration: combination of big-bang, top-down, and bottom-up
1. The backbone (kernel) modules of the system are tested first, bottom-up.
2. The system control is tested next, top-down.
3. The backbone modules are big-banged together, bottom-up.
4. Top-down integration is continued, until the backbone has been included.
1. Each backbone module E, F, G, H is tested alone, in isolation. Drivers are needed.
2. The control subsystem {A} is tested. Stubs are needed.
3. The backbone and its controller {E, F, G, H, C} are tested. Driver for C is needed.
4. The subsystem {A, B, C, D} is tested. Stubs are needed.
5. The entire system is tested. No stubs or drivers.
Backbone:
E, F, G, H
25
Jukka Paakki 49
Usually bottom-up preferable:
+ Drivers are much easier to write than stubs, and can even be
automatically generated.
+ The approach provides a greater opportunity for parallelism than the
other approaches; that is, there can be several teams testing different
subsystems at the same time.
+ The approach is effective for detecting detailed design or coding errors
early enough.
+ The approach detects critical performance flaws that are generally
associated with low-level modules.
+ The approach supports the modern software engineering paradigms
based on classes, objects, and reusable stand-alone components.
- A prototype of the system is not available for (user) testing until at the
very end.
- The approach is not effective in detecting architectural design flaws of
large scale.
- May be too laborious for large systems.
Jukka Paakki 50
9. Regression testing
ANSI/IEEE Standard of Software Engineering
Terminology:
selective testing of a system or component to
verify that modifications have not caused
unintended effects and that the system or
component still complies with its specifications
26
Jukka Paakki 51
Usually integrated with maintenance, to check the validity of the
modifications:
§ corrective maintenance (fixes)
§ adaptive maintenance (porting to a new operational
environment)
§ perfective maintenance (enhancements and
improvements to the functionality)
Jukka Paakki 52
Differences between “ordinary” testing and
regression testing:
§ regression testing uses a (possibly) modified
specification, a modified implementation, and an old test
plan (to be updated)
§ regression testing checks the correctness of some parts
of the implementation only
§ regression testing is usually not included in the total cost
and schedule of the development
§ regression testing is done many times in the life-cycle of
the system (during bug fixing and maintenance)
27
Jukka Paakki 53
General approach:
– P: program (module, component), already tested
– P’: modified version of P
– T: test suite used for testing P
1. Select T’ Í T, a set of tests to execute on P’.
2. Test P’ with T’, establishing the correctness of P’ with
respect to T’.
3. If necessary, create T’’, a set of new functional (black-box)
or structural (white-box) tests for P’.
4. Test P’ with T’’, establishing the correctness of P’ with
respect to T’’.
5. Create T’’’, a new test suite and test history for P’, from T,
T’, and T’’: T’’’ = T È T’’
Jukka Paakki 54
§ naïve regression testing (re-running all the existing test
cases) not cost-effective, although the common strategy in
practice
§ principle 1: it is useless to test unmodified parts of the
software again
§ principle 2: modified parts shall be tested with existing test
cases (may not be possible, if the interface has been changed:
GUI capture-replay problem!)
§ principle 3: new parts shall be tested with new test cases
§ database of test cases needed
§ additional automation support possible (data-flow analysis)
28
Jukka Paakki 55
When and how?
§ A new subclass has been developed: rerun the superclass tests on
the subclass, run new tests for the subclass
§ A superclass is changed: rerun the superclass tests on the
superclass and on its subclasses, rerun the subclass tests, test the
superclass changes
§ A server (class) is changed: rerun tests for the clients of the
server, test the server changes
§ A bug has been fixed: rerun the test that revealed the bug, rerun
tests on any parts of the system that depend on the changed code
§ A new system build has been generated: rerun the build test suite
§ The final release has been generated: rerun the entire system
test suite
Jukka Paakki 56
Selective regression test strategies
§ Risk-based heuristics: rerun tests for (1) unstable,
(2) complex, (3) functionally critical, or (4) frequently
modified modules (classes, functions, and such)
§ Profile-based heuristics: rerun tests for those use cases,
properties, or functions that are the most frequently used
§ Coverage-based heuristics: rerun those tests that yield the
highest white-box (statement, branch, …) code coverage
§ Reachability-based heuristics: rerun those tests that reach an
explicitly or implicitly changed or deleted module
§ Dataflow-based heuristics: rerun those tests that exercise
modified or new definition-use pairs
§ Slice-based heuristics: rerun those tests that generate a similar
data-flow slice over the old and the new software version
29
Jukka Paakki 57
10. Statistical testing
§ operational profile: distribution of functions actually used
Þ probability distribution of inputs
§ most frequently used functions / properties tested more
carefully, with a larger number of test cases
§ more complex, error-prone functions tested more carefully
§ central “kernel” functions tested more carefully
§ useful strategy, when in lack of time or testing resources
§ based on experience with and statistics over previous use
§ history data over existing systems must be available
Jukka Paakki 58
Example: file processing
- create: probability of use 0.5 (50 %)
- delete: probability of use 0.25 (25 %)
- modify: probability of use 0.25 (25 %)
ð create: 50 test cases
ð delete: 25 test cases
ð modify: 25 test cases
In total: 100
30
Jukka Paakki 59
Improved strategy: relative probability of failure
- modify twice as complex as create and delete
ð create: 40 test cases
ð delete: 20 test cases
ð modify: 40 test cases
In total: 100
Jukka Paakki 60
Calendar: 0.05
Alarm: 0.05
Games: 0.4
Messages: 0.2 Calls: 0.3
Memory: 0.1
Snake: 0.8
Logic: 0.1
123: 0.3
ABC: 0.7
– Memory: 0.4 * 0.1 = 4 %
– Snake: 0.4 * 0.8 = 32 %
– Logic: 0.4 * 0.1 = 4 %
– Calls: 30 %
Games: 40 %
Profile of a mobile
phone as a graph
Complexity ?!?
31
Jukka Paakki 61
11. Practical aspects of testing
When to stop?
§ all the planned test cases have been executed
§ required (white-box) coverage has been reached (e.g. all the
branches have been tested)
§ all the (black-box) operations, their parameters and their
equivalence classes have been tested
§ required percentage (e.g., 95%) of estimated total number of
errors has been found (known from company’s project history)
§ required percentage (e.g., 95%) of seeded errors has been found
§ mean time to failure (in full operation) is greater than a required
threshold time (e.g. 1 week)
Jukka Paakki 62
Frequently made mistakes in software testing
§ Most organizations believe that the purpose of testing is
just to find bugs, while the key is to find the important bugs.
§ Test managers are reporting bug data without putting it into
context.
§ Testing is started too late in the project.
§ Installation procedures are not tested.
32
Jukka Paakki 63
§ Organizations overrely on beta testing done by the
customers.
§ One often fails to correctly identify risky areas (areas that
are used by more customers or would be particularly severe
if not functioning perfectly).
§ Testing is a transitional job for new, novice programmers.
§ Testers are recruited from the ranks of failed programmers.
§ Testers are not domain experts.
Jukka Paakki 64
§ The developers and testers are physically separated.
§ Programmers are not supposed to test their own code.
§ More attention is paid to running tests than to designing them.
§ Test designs are not reviewed / inspected.
§ It is checked that the product does what it’s supposed to do,
but not that it doesn’t do what is isn’t supposed to do.
33
Jukka Paakki 65
§ Testing is only conducted through the user-visible interface.
§ Bug reporting is poor.
§ Only existing test cases (no new ones) are applied in
regression testing of system modifications.
§ All the tests are automated, without considering economic
issues.
§ The organization does not learn, but makes the same testing
mistakes over and over again.
Jukka Paakki 66
Empirical hints for defect removal
B. Boehm, V.R. Basili: Software Defect Reduction Top 10 List. (IEEE)
Computer, 34, 1 (January), 2001, 135-137.
1. Finding and fixing a software problem after delivery is often 100 times more
expensive than finding and fixing it during the requirements and design
phase.
2. Current software projects spend about 40 to 50 percent of their effort on
avoidable rework.
3. About 80 percent of avoidable rework comes from 20 percent of the defects.
4. About 80 percent of the defects come from 20 percent of the modules, and
about half of the modules are defect free.
5. About 90 percent of the downtime comes from, at most, 10 percent of the
defects.
34
Jukka Paakki 67
6. Peer reviews catch 60 percent of the defects.
7. Perspective-based reviews catch 35 percent more defects than
nondirected reviews.
8. Disciplined personal practices can reduce defect introduction rates by up
to 75 percent.
9. All other things being equal, it costs 50 percent more per source
instruction to develop high-dependability software products than to
develop low-dependability software products. However, the investment is
more than worth it if the project involves significant operations and
maintenance costs.
10. About 40 to 50 percent of user programs contain nontrivial defects.
Conclusion: Try to avoid testing as much as possible. However, if
you have to test, do it carefully and with focus.
Jukka Paakki 68
Observations
§ Testing is not considered as a profession, but rather as an art.
– Psychologically, testing is more “boring” than coding
§ Too little resources are allocated to testing.
– Exception: Microsoft has about as many testers as programmers
§ Quality of testing is not measured.
– A general test plan is usually written, but the process is not tracked
§ The significance of testing tools is overestimated.
§ The significance of inspections is underestimated.
1 Introduction
From the beginning of software development testing has always been incorporated
into the final stages. Over the years the technicality of software has increased
dramatically. As this complexity increases, programmers realise that testing is just as
important as the development stages.
Nowadays there are two main types of stages, White box testing and Black box
testing. Grey box testing is another type, but it’s not so well known and is sometimes
used with Decision Table-Based Testing:
White box – testing concerned with the internal structure of the program.
Black box – testing concerned with input/output of the program.
Grey box – using the logical relationships to analyse the input/output of
the program.
Testing has been modularised into different categories as it’s been practised and
researched since the 1970’s. This paper is going to discuss and analyse Decision
Table-Based Testing which is a Functional Testing method, also known as Black box
testing.
In this paper I aim to explore the fundamental concepts of Decision Table-Based
Testing and how it differs from other functional testing methods. Using examples I
will also explain how Decision Table-Based Testing operates and how to use it.
The remainder of this document is split up into 3 areas:-
Background – An overview of DT-BT’s origin and its relationship with other
Functional Testing methods.
Applications – A discussion on the ways DT-BT can be used, along with
examples and how it compares with different Functional Testing methods.
Summary & Further Work – A brief outline of the material covered and
further work that will complement it.
A Review Paper on Decision Table-Based Testing
Cai Ferriday 345399 Software Testing
4
2 Background
2.1 Origin
Decision Table-Based Testing has been around since the early 1960’s; it is used to
depict complex logical relationships between input data. There are two closely related
methods of Functional Testing:
• The Cause-Effect Graphing (Elmendorf, 1973; Myers, 1979), and
• The Decision Tableau Method (Mosley, 1993).
These methods are a little different to Decision Table-Based Testing, but use similar
concepts of which I will explain later on. I won’t go into great detail as these methods
are awkward and unnecessary with the use of Decision Tables.
2.2 Definitions
2.2.1 Decision Table-Based Testing?
A Decision Table is the method used to build a complete set of test cases without
using the internal structure of the program in question. In order to create test cases we
use a table to contain the input and output values of a program. Such a table is split up
into four sections as shown below in fig 2.1.
Figure 2.1 The Basic Structure of a Decision Table.
In fig 2.1 there are two lines which divide the table into its main structure. The solid
vertical line separates the Stub and Entry portions of the table, and the solid horizontal
line is the boundary between the Conditions and Actions. So these lines separate the
table into four portions, Condition Stub, Action Stub, Condition Entries and Action
Entries.
A column in the entry portion of the table is known as a rule. Values which are in the
condition entry columns are known as inputs and values inside the action entry
portions are known as outputs. Outputs are calculated depending on the inputs and
specification of the program.
A Review Paper on Decision Table-Based Testing
Cai Ferriday 345399 Software Testing
5
In fig 2.2 there is an example of a typical Decision Table. The inputs in this given
table derive the outputs depending on what conditions these inputs meet. Notice the
use of “-“in the table below, these are known as don’t care entries. Don’t care entries
are normally viewed as being false values which don’t require the value to define the
output.
Figure 2.2 a Typical Structure of a Decision Table
Figure 2.2 shows its values from the inputs as true(T) or false(F) values which are
binary conditions, tables which use binary conditions are known as limited entry
decision tables. Tables which use multiple conditions are known as extended entry
decision tables. One important aspect to notice about decision tables is that they aren’t
imperative as that they don’t apply any particular order over the conditions or actions.
2.2.2 Cause-Effect Graphing?
Cause-Effect Graphing is very similar to Decision Table-Based Testing, where logical
relationships of the inputs produce outputs; this is shown in the form of a graph. The
graph used is similar to that of a Finite State Machine (FSM). Symbols are used to
show the relationships between input conditions, those symbols are similar to the
symbols used in propositional logic.
A Review Paper on Decision Table-Based Testing
Cai Ferriday 345399 Software Testing
6
2.3 Functional Relationships
There are 3 main functional methods:
Boundary Value Analysis (BVA)
Equivalence Class Testing (ECT)
Decision Table-Based Testing (DT-BT)
All three functional testing methods compliment each other; the functional testing
outcome can not be completed to a satisfactory level using just one of these functional
testing strategies, or even two.
Decision Table-Based Testing has evolved from Equivalence Class Testing in some
way; Equivalence Class Testing groups together inputs of the same manner which
behave similarly. DT-BT follows on from ECT by grouping together the input and
output behaviours into an “equivalence” rule and testing the logical dependencies of
these rules. These rules are regarded as test cases, therefore redundant rules are
discarded.
2.3.1 Effort
Although all three testing strategies have similar properties and all work towards the
same goal, each of the methods is different in terms on application and effort.
Boundary Value Analysis is not concerned with the data or logical dependencies as
it’s a domain based testing technique. It requires low effort to develop test cases for
this method but on the other hand its sophistication is low and the number of test
cases generated is high compared with other functional methods.
Equivalence Class Testing is more concerned with data dependencies and treating
similar inputs and outputs the same by grouping them in classes. This reduces the test
cases and increases the effort used to create test cases due to the effort required to
group them. This is a more sophisticated method of test case development as it’s more
concerned with the values inside of the domain.
Decision Table-Based Testing on the other hand uses similar traits as Equivalence
Class Testing; it tests logical dependencies, which increases the effort in identifying
test cases, which increases the sophistication of these test cases. Because DT-BT
relies more on the logical dependencies of the equivalence classes in the decision
table this reduced the amount of rules required to complete the set of test cases.
____________________________
Boundary Value Analysis: - A functional testing strategy which is concerned with the limits of an input/output domain.
Equivalence Class Testing: - A functional testing strategy where the inputs/outputs that behave similarly are grouped together
into equivalence partitions, in order to decrease test cases.
A Review Paper on Decision Table-Based Testing
Cai Ferriday 345399 Software Testing
7
Number of Test Cases Effort to Identify Test Cases
Boundary Value Analysis Decision Table
Equivalence Class
Equivalence Class
Decision Table
Boundary Value Analysis
Sophistication
Sophistication
Figure 2.3 Graphs showing the relationships of all functional methods
2.3.2 Efficiency
In order to give a sense of how efficient Decision Table-Based Testing is with respect
to other functional methods, Boundary Value Analysis and Equivalence Class Testing
have to be examined.
On average Boundary Value Analysis yields 5 times as many test cases as Decision
Table- Based Testing, and Equivalence Class Testing 1½ times as many test cases. On
this basis we can say that there exists either test case redundancy or impossible test
cases, either way this reduces the efficiency of these testing strategies and shows how
efficient Decision Table-Based Testing is.
But as stated above, we cannot totally disregard the other functional testing methods
as they complement each other and are not totally redundant in all testing cases.
3 Applications
In order to demonstrate and aid the understanding of Decision Tables I will show the
some of the many applications it has and aid them with examples. I am going to use
the Triangle Problem to explore decision tables in more depth.
3.1 The Basics
As explained above, there are two types of decision table, limited and extended entry
tables. Below, in fig 3.1 is an example of a limited entry decision table where the
inputs are depicted using binary values.
Fig 3.1 Decision Table for the Triangle Problem
A Review Paper on Decision Table-Based Testing
Cai Ferriday 345399 Software Testing
8
When creating a decision table there are many techniques people adopt to improve the
construction. Most testers add two main techniques, the use of the “impossible” action
stub and don’t care entries. The impossible action stub entry is used as a form of error
catching, if out of range values are inputted then the impossible action entry is
checked. Don’t care entries are also another useful procedure, they are used when no
other checks are required in the table, and therefore we don’t care what the rest of the
values are. Often, these don’t care entries are referred to as false values.
3.2 Rule Counts
Rule counts are used along with don’t care entries as a method to test for decision
table completeness; we can count the amount of test cases in a decision table using
rule counts and compare it with a calculated value. Below is a table which illustrates
rule counts in a decision table.
Fig 3.2 an example of Rule Counts in a Decision Table
The table above has a total rule count of 64; this can be calculated using the limited
entry formula as it’s a limited entry table.
Number of Rules = 2Number of Conditions
So therefore, Number of Rules = 26 = 64
When calculating rule counts the don’t care values play a big part to the rule count of
that rule. Each “don’t care” entry in a rule doubles the rule count of that rule; so, each
rule has a rule count of 1 initially, if a “don’t care” entry exists then this rule count
doubles for every “don’t care” entry. So using both ways of computing the rule count
brings us to the same value which shows we have a complete decision table.
Where the Rule Count value of the decision table does not equal the number of rules
computed by the equation we know the decision table is not complete, and therefore
needs revision.
A Review Paper on Decision Table-Based Testing
Cai Ferriday 345399 Software Testing
9
3.3 Redundancy & Inconsistency
When using “don’t care” entries a level of care must be taken, using these entries can
cause redundancy and inconsistency within a decision table.
Using rule counts to check the completeness of the decision table can help to
eliminate redundant rules within the table. An example of a decision table with a
redundant rule can be seen in figure 3.3.
From the table you can see that there is some conflict between rules 1-4 and rule 9,
rules 1-4 use “don’t care” entries as an alternative to false, but rule 9 replaces those
“don’t care” entries with “false” entries. So when condition 1 is met rules 1-4 or 9
may be applied, luckily in this particular instance these rules have identical actions so
there is only a simple correction to be made to complete the following table.
Figure 3.3 an example of a Redundant Rule
If on the other hand the actions of the redundant rule differ from that of rules 1-4 then
we have a problem. A table showing this can be seen in figure 3.4.
Figure 3.4 an example of Inconsistent Rules
From the above decision table, if condition 1 was to be true and conditions 2 and 3
were false then rules 1-4 and 9 could be applied. This would result in a problem
because the actions of these rules are inconsistent so therefore the result is nondeterministic
and would cause the decision table to fail.
A Review Paper on Decision Table-Based Testing
Cai Ferriday 345399 Software Testing
10
3.4 Creating a Decision Table
When creating a decision table care must be taken when choosing your stub
conditions, and also the type of decision table you are creating. Limited Entry
decision tables are easier to create than extended entry tables. Here are some steps on
how to create a simple decision table using the Triangle Problem.
Step One – List All Stub Conditions
In this example we take three inputs, and from those inputs we perform conditional
checks to calculate if it’s a triangle, if so then what type of triangle it is. The first
condition we add must check whether all 3 sides constitute a triangle, as we don’t
want to perform other checks if the answer is false.
Then the remainder of the conditions will check whether the sides of the triangles are
equal or not. As there are only three sides to a triangle means that we have three
conditions when checking all of the sides.
So the condition stubs for the table would be:
a, b, c form a triangle?
a = b?
a = c?
a = c?
Step Two – Calculate the Number of Possible Combinations (Rules)
So in our table we have 4 condition stubs and we are developing a limited entry
decision table so we use the following formula:
Number of Rules = 2Number of Condition Stubs
So therefore, Number of Rules = 24 = 16
So we have 16 possible combinations in our decision table.
Step Three – Place all of the Combinations into the Table
Figure 3.5 a Complete Decision Table
Here we have a complete decision table as there are three don’t care entries which
gives the first rule a rule count of 8 and the last 8 rules have a rule count of 1 each, so
the total rule count for the table is 16. Therefore we know that this table is complete.
A Review Paper on Decision Table-Based Testing
Cai Ferriday 345399 Software Testing
11
Step Four – Check Covered Combinations
This step is a precautionary step to check for errors and redundant and inconsistent
rules. We don’t want to go any further with the development of the decision table if
we have errors because this will complicate matters in the next step.
Step Five – Fill the Table with the Actions
For the final step of creating a decision table we must fill the Action Stub and Entry
sections of the table. The final decision table is shown in fig 3.6.
After completing the decision table and adding the actions we notice that each action
stub is exercised once, and we have also added the “impossible” action into the table
or catching rogue values.
Figure 3.6 the Final Decision Table
The above table can be explored and expanded by refining the first condition stub.
Instead of having “a, b, c form a triangle” we can expand this by using 3 conditions
rather than one, which will increase accuracy. This would also bring in a logical
dependency, because the actions of the first condition stub would affect the remaining
condition stubs. This is shown in figure 3.1.
A Review Paper on Decision Table-Based Testing
Cai Ferriday 345399 Software Testing
12
4 Summary & Future Work
Decision Table-Based Testing is an important part of Functional Testing; it explores
testing routes that other functional strategies avoid. One key aspect of decision tablebased
testing is the use of logical dependencies; this enhances the tester’s ability to
solve inputs in a program which relies upon other inputs to perform its operation,
which is a strong characteristic in testing nowadays.
DT-BT is the most complete method of all of the functional testing strategies as it
encourages strict logical relationships between conditions. Creating these logical
dependencies can be tricky especially for difficult and extensive programs. It works
well with the Triangle problem as there are lots of decisions within the problem.
The difference between the functional testing strategies were outlined and shown in
this report, we saw the difference in effort, sophistication and number of test cases
these functional methods create. This illustrates that decision table-based testing is the
final step of the functional testing process. There are many testing tools available for
creating decision tables which are excellent for new users to become accustomed to
using this functional technique.
This report has outlined the importance of testing and time required when creating
software. I aim to make use of the knowledge I have gained while writing this report,
and apply to my final year dissertation. For my dissertation I am developing a system
which navigates to users to rendezvous with each other using musing as their cue.
This insight will aid me to create a reliable software package as I can use the input
values from devices which my software uses.
Sommerville 1995 Software Engineering, 5th edition. Chapter 22 Slide 3
Static testing
! Involves analyses of source text by humans
! Can be carried out on ANY documents
produced as part of the software process
! Discovers errors early in the software process
! Usually more cost-effective than testing for
defect detection at the unit and module level
! Allows defect detection to be combined with
other quality checks
©Ian Sommerville 1995 Software Engineering, 5th edition. Chapter 22 Slide 4
Static testing effectiveness
! More than 60% of program errors can be detected
by informal program inspections
(Meyers: 30 - 70 %)
! More than 90% of program errors may be
detectable using more rigorous mathematical
program verification
! The error detection process is not confused by the
existence of previous errors
©Ian Sommerville 1995 Software Engineering, 5th edition. Chapter 22 Slide 5
Program inspections
! Formalised approach to document reviews
! Intended explicitly for defect DETECTION
(not correction)
! Defects may be logical errors, anomalies in the
code that might indicate an erroneous condition
(e.g. an uninitialised variable) or non-compliance
with standards
! group code reading " team-based quality
©Ian Sommerville 1995 Software Engineering, 5th edition. Chapter 22 Slide 6
Inspection pre-conditions
! A precise specification must be available
! Team members must be familiar with the
organisation standards
! Syntactically correct code must be available
! An error checklist should be prepared
! Management must accept that inspection will
increase costs early in the software process
! Management must not use inspections for staff
appraisal
©Ian Sommerville 1995 Software Engineering, 5th edition. Chapter 22 Slide 7
The inspection process
Inspection
meeting
Individual
preparation
Overview
Planning
Rework
Follow-up
©Ian Sommerville 1995 Software Engineering, 5th edition. Chapter 22 Slide 8
Inspection procedure
! System overview presented to inspection team
! Code and associated documents are
distributed to inspection team in advance
! Inspection takes place and discovered errors
are noted, no repair
! Modifications are made to repair discovered
errors
! extension of checklists
! Re-inspection may or may not be required
©Ian Sommerville 1995 Software Engineering, 5th edition. Chapter 22 Slide 9
Inspection teams
! Made up of at least 4 members
! Author of the code being inspected
! Reader who reads the code to the team
! Inspectors who finds errors, omissions and
inconsistencies
! Moderator who chairs the meeting and notes
discovered errors
! Other roles are Scribe and Chief moderator
! no superior
©Ian Sommerville 1995 Software Engineering, 5th edition. Chapter 22 Slide 10
Inspection rate
! 500 statements/hour during overview
! 125 source statement/hour during individual
preparation
! 90-125 statements/hour can be inspected
Meyers: 150 statements/hour
Balzert: 1 page/hour
! Inspection is therefore an expensive process
! Inspecting 500 lines costs about 40 man/hours
effort = £2800
©Ian Sommerville 1995 Software Engineering, 5th edition. Chapter 22 Slide 11
Inspection checklists
! Checklist of common errors should be used to
drive the inspection
! Error checklist may be programming language
dependent
! Complement static semantics checks
by compiler + static analyser
! The 'weaker' the type checking, the larger the
checklist
! Coding standard / programming guidelines
The Inspection Process
AQMD Inspectors use their observations of industrial and commercial processes and equipment to determine compliance with air quality rules and regulations, policies and state law (California Health and Safety Code). Although each inspection is unique, a series of general guidelines govern inspection procedures in the field. The typical inspection can be broken down into the following components:
Pre-Inspection Activities
These are the activities conducted by the inspector in preparation for the inspection which include the review of: the facility's permits to operate, the facility's compliance history, and other applicable requirements.
The Inspection
While in the company of a facility representative, the Inspector will tour the facility and make observations of equipment, processes and employee practices to determine if facility is operating in compliance with applicable permit and clean air requirements.
Closing Conference
Conducted before leaving the facility, the Inspectors usually discuss their findings with facility representatives during a closing conference, and later document these findings in written inspection reports.
Typically, AQMD earmarks facilities for inspection well ahead of time, however, an air quality complaint received from the public may prompt an unannounced inspection of a facility.
Software inspection
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Inspection in software engineering, refers to peer review of any work product by trained individuals who look for defects using a well defined process. An inspection might also be referred to as a Fagan inspection after Michael Fagan, the inventor of the process.
Software Testing portal
Contents [hide]
1 Introduction
2 The process
3 Inspection roles
4 Related inspection types
4.1 Code review
4.2 Peer Reviews
5 See also
6 External links
[edit] Introduction
An inspection is one of the most common sorts of review practices found in software projects. The goal of the inspection is for all of the inspectors to reach consensus on a work product and approve it for use in the project. Commonly inspected work products include software requirements specifications and test plans. In an inspection, a work product is selected for review and a team is gathered for an inspection meeting to review the work product. A moderator is chosen to moderate the meeting. Each inspector prepares for the meeting by reading the work product and noting each defect. The goal of the inspection is to identify defects. In an inspection, a defect is any part of the work product that will keep an inspector from approving it. For example, if the team is inspecting a software requirements specification, each defect will be text in the document which an inspector disagrees with.
[edit] The process
The inspection process was developed by Michael Fagan in the mid-1970s and it has later been extended and modified.
The process should have an entry criteria that determines if the inspection process is ready to begin. This prevents unfinished work products from entering the inspection process. The entry criteria might be a checklist including items such as "The document has been spell-checked".
The stages in the inspections process are: Planning, Overview meeting, Preparation, Inspection meeting, Rework and Follow-up. The Preparation, Inspection meeting and Rework stages might be iterated.
Planning: The inspection is planned by the moderator.
Overview meeting: The author describes the background of the work product.
Preparation: Each inspector examines the work product to identify possible defects.
Inspection meeting: During this meeting the reader reads through the work product, part by part and the inspectors point out the defects for every part.
Rework: The author makes changes to the work product according to the action plans from the inspection meeting.
Follow-up: The changes by the author are checked to make sure everything is correct.
The process is ended by the moderator when it satisfies some predefined exit criteria.
[edit] Inspection roles
During an inspection the following roles are used.
Author: The person who created the work product being inspected.
Moderator: This is the leader of the inspection. The moderator plans the inspection and coordinates it.
Reader: The person reading through the documents, one item at a time. The other inspectors then point out defects.
Recorder: The person that documents the defects that are found during the inspection.
[edit] Related inspection types
[edit] Code review
A code review can be done as a special kind of inspection in which the team examines a sample of code and fixes any defects in it. In a code review, a defect is a block of code which does not properly implement its requirements, which does not function as the programmer intended, or which is not incorrect but could be improved (for example, it could be made more readable or its performance could be improved). In addition to helping teams find and fix bugs, code reviews are useful for both cross-training programmers on the code being reviewed and for helping junior developers learn new programming techniques.
[edit] Peer Reviews
Peer Reviews are considered an industry best-practice for detecting software defects early and learning about software artifacts. Peer Reviews are composed of software walkthroughs and software inspections and are integral to software product engineering activities. A collection of coordinated knowledge, skills, and behaviors facilitates the best possible practice of Peer Reviews. The elements of Peer Reviews include the structured review process, standard of excellence product checklists, defined roles of participants, and the forms and reports.
Software inspections are the most rigorous form of Peer Reviews and fully utilize these elements in detecting defects. Software walkthroughs draw selectively upon the elements in assisting the producer to obtain the deepest understanding of an artifact and reaching a consensus among participants. Measured results reveal that Peer Reviews produce an attractive return on investment obtained through accelerated learning and early defect detection. For best results, Peer Reviews are rolled out within an organization through a defined program of preparing a policy and procedure, training practitioners and managers, defining measurements and populating a database structure, and sustaining the roll out infrastructure.
Glass box testing
Glass box testing has traditionally been divided up into static and dynamic analysis (Hausen82, 119,122).
Static analysis techniques
The only generally acknowledged and therefore most important characteristic of static analysis techniques is that the testing as such does not necessitate the execution of the program (Hausen84, 325). ``Essential functions of static analysis are checking whether representations and descriptions of software are consistent, noncontradictory or unambiguous'' (Hausen84, 325). It aims at correct descriptions, specifications and representations of software systems and is therefore a precondition to any further testing exercise. Static analysis covers the lexical analysis of the program syntax and investigates and checks the structure and usage of the individual statements (Sneed87, 10.3-3). There are principally three different possibilities of program testing (Sneed87, 10.3-3), i.e.\
program internally, to check completeness and consistency
considering pre-defined rules
comparing the program with its specification or documentation
While some software engineers consider it characteristic of static analysis techniques that they can be performed automatically, i.e.\ with the aid of specific tools such as parsers, data flow analysers, syntax analysers and the like (Hausen82, 126), (Miller84, 260) and (Osterweil84, 77), others also include manual techniques for testing that do not ask for an execution of the program (Sneed87, 10.3-3). Figure is an attempt to structure the most important static testing techniques as they are presented in SE literature between 1975 and 1994.
Figure B.1: Static Analysis Techniques
Syntax parsers, which split the program/document text into individual statements, are the elementary automatic static analysis tools. When checking the program/document internally, the consistency of statements can be evaluated.
When performed with two texts on different semantic levels, i.e. a program against its specification, the completeness and correctness of the program can be evaluated. (Sneed87, 10.3-6) and (Hausen84, 328). This technique, which aims at detecting problems in the translation between specification and program realisation, is called static verification (Sneed87, 10.3-3), and (Hausen87, 126). Verification requires formal specifications and formal definitions of the specification and programming languages used as well as a method of algorithmic proving that is adapted to these description means (Miller84, 263) and (Hausen87, 126). Static verification compares the actual values provided by the program with the target values as pre-defined in the specification document. It does not, however, provide any means to check whether the program actually solves the given problems, i.e.\ whether the specification as such is correct (Hausen87, 126). The result of automatic static verification procedures is described in boolean terms, i.e. a statement is either true or false (Hausen87, 127). The obvious advantage of static verification is that, being based on formal methods, it leads to objective and correct results. However, since it is both very difficult and time-consuming to elaborate the formal specifications which are needed for static verification, it is mostly only performed for software that needs to be highly reliable . Another technique which is normally subsumed under static analysis is called symbolic execution (Hausen84, 327), (Miller84, 263), (Hausen82, 117) and (Hausen87, 127). It analyses, in symbolic terms, what a program does along a given path (Miller84, 263). ``By symbolic execution, we mean the process of computing the values of a program's variables as functions which represent the sequence of operations carried out as execution is traced along a specific path through the program.'' (Osterweil84, 79). Symbolic execution is most appropriate for the analysis of mathematical algorithms. Making use of symbolic values only, whole classes of values can be represented by a single interpretation, which leads to a very high coverage of test cases (Hausen82, 117). The development of programs for symbolic execution is very expensive and therefore is mainly used for testing numerical programs, where the cost/benefit relation is acceptable.
The most important manual technique which allows testing the program without running it is software inspection (Thaller94, 36), (Ackerman84, 14) and (Hausen87, 126). The method of inspection originally goes back to Fagan (Fagan76) who saw the practical necessity to implement procedures to improve software quality at several stages during the software life-cycle. In short a software inspection can be described as follows: ``A software inspection is a group review process that is used to detect and correct defects in a software workproduct. It is a formal, technical activity that is performed by the workproduct author and a small peer group on a limited amount of material. It produces a formal, quantified report on the resources expended and the results achieved'' (Ackerman84, 14), (Thaller94, 36), (Hausen87, 126) and (Hausen84, 324).
During inspection either the code or the design of a workproduct is compared to a set of pre-established inspection rules (Miller84, 260) and (Thaller94, 37). Inspection processes are mostly performed along checklists which cover typical aspects of software behaviour (Thaller94, 37), (Hausen87, 126). ``Inspection of software means examining by reading, explaining, getting explanations and understanding of system descriptions, software specifications and programs'' (Hausen84, 324). Some software engineers report inspection as adequate for any kind of document, e.g. specifications, test plans etc. (Thaller94, 37). While most testing techniques are intimately related to the system attribute whose value they are designed to measure, and thus offer no information about other attributes, a major advantage of inspection processes is that any kind of problem can be detected and thus results can be delivered with respect to every software quality factor (Thaller94, 37) and (Hausen87, 126).
Walkthroughs are similar peer review processes that involve the author of the program, the tester, a secretary and a moderator (Thaller94, 43). The participants of a walkthrough create a small number of test cases by ``simulating'' the computer. Its objective is to question the logic and basic assumptions behind the source code, particularly of program interfaces in embedded systems (Thaller94, 44).
Dynamic analysis techniques
While static analysis techniques do not necessitate the execution of the software, dynamic analysis is what is generally considered as ``testing``, i.e. it involves running the system. ``The analysis of the behaviour of a software system before, during and after its execution in an artificial or real applicational environment characterises dynamic analysis'' (Hausen84, 326). Dynamic analysis techniques involve the running of the program formally under controlled circumstances and with specific results expected (Miller84, 260). It shows whether a system is correct in the system states under examination or not (Hausen84, 327).
Among the most important dynamic analysis techniques are path and branch testing. During dynamic analysis path testing involves the execution of the program during which as many as possible logical paths of a program are exercised (Miller84, 260) and(Howden80, 163). The major quality attribute measured by path testing is program complexity (Howden80, 163) and (Sneed87, 10.3-4). Branch testing requires that tests be constructed in a way that every branch in a program is traversed at least once (Howden80, 163). Problems when running the branches lead to the probability of later program defects.
Today there are a number of dynamic analysers that are used during the software development process. The most important tools are presented Table (Thaller94, 177) :
Type of Dynamic Analyser Functionality of Tool
test coverage analysis tests to which extent the code can be checked by glass box techniques
tracing follows all paths used during program execution and provides e.g. values for all variables etc.
tuning measures resources used during program execution
simulator simulates parts of systems, if e.g. the actual code or hardware are not available
assertion checking tests whether certain conditions are given in complex logical constructs
Table B.1: Dynamic Analysis Tools
Generation of test data in glass box tests
The selection and generation of test data in glass box tests is an important discipline. The most basic approach to test data generation is random testing. For random testing a number of input values are generated automatically without being based on any structural or functional assumption (Sneed87, 10.3-4) and (Bukowski87, 370). There are also two more sophisticated approaches to test data generation, i.e.\ structural testing and functional testing. ``Structural testing is an approach to testing in which the internal control structure of a program is used to guide the selection of test data. It is an attempt to take the internal functional properties of a program into account during test data generation and to avoid the limitations of black box functional testing'' (Howden80, 162). Functional testing as described by (Howden80) takes into account both functional requirements of a system and important functional properties that are part of its design or implementation and which are not described in the requirements (Howden80, 162). ``In functional testing, a program is considered to be a function and is thought of in terms of input values and corresponding output values.'' (Howden80, 162). There are tools for test data generation on the market that can be used in combination with specific programming languages. Particularly for embedded systems, tools for test data generation are useful, since they can be used to simulate a larger system environment providing input data for every possible system interface (Thaller94, 178). In other words, if a system is not fully implemented or not linked to all relevant data sources, not all system interfaces can be tested, because no input values are given for non-implemented functions. Data Generation tools provide input values for all available system interfaces as if a real module was linked to it.
Testing Measurement
Someone has rightly said that if something cannot be measured, it can not be managed or improved. There is immense value in measurement, but you should always make sure that you get some value out of any measurement that you are doing. You should be able to answer the following questions:
What is the purpose of this measurement program?
What data items you are collecting and how you are reporting it?
What is the correlation between the data and conclusion?
Value addition
Any measurement program can be divided into two parts. The first part is to collect data, and the second is to prepare metrics/chart and analyse them to get the valuable insight which might help in decision making. Information collected during any measurement program can help in:
Finding the relation between data points,
Correlating cause and effect,
Input for future planning.
Normally, any metric program involves certain steps which are repeated over a period of time. It starts with identifying what to measure. After the purpose is known, data can be collected and converted in to the metrics. Based on the analysis of these metrics appropriate action can be taken, and if necessary metrics can be refined and measurement goals can be adjusted for the better.
Data presented by testing team, together with their opinion, normally decides whether a product will go into market or not. So it becomes very important for test teams to present data and opinion in such a way that data looks meaningful to everyone, and decision can be taken based on the data presented.
Every testing projects should be measured for its schedule and the quality requirement for its release. There are lots of charts and metrics that we can use to track progress and measure the quality requirements of the release. We will discuss here some of the charts and the value addition that they bring to our product.
Defect Finding Rate
This chart gives information on how many defects are found across a given period. This can be tracked on a daily or weekly basis.
Defect Fixing Rate
This chart gives information on how many defects are being fixed on a daily/weekly basis.
Defect distribution across components
This chart gives information on how defects are distributed across various components of the system.
Defect cause distribution chart
This chart given information on the cause of defects.
Closed defect distribution
This chart gives information on how defects with closed status are distributed.
Test case execution
Traceability Matrics
Functional Coverage
Platform Matrics
Reducing Test Cases Created by Path
Oriented Test Case Generation
Nicha Kosindrdecha and Siripong Roongruangsuwan
Faculty of Science and Technology, Assumption University, Bangkok, Thailand
p4919741@au.edu, p4919742@au.edu
Jirapun Daengdej
Faculty of Science and Technology, Assumption University, Bangkok, Thailand
jirapun@scitech.au.edu
Abstract – “Path-Oriented” is one of the most
widely used techniques for finding a set of test
cases for testing software. Given a set of test
cases generated by the Path-Oriented technique,
this paper discusses the use of a number of case
maintenance techniques, which have been
investigated by Case-Based Reasoning (CBR)
researchers in ensuring that only small amount of
cases are stored in the case base, thereby
reducing number of test cases should be used in
software testing. Similar to what happen to
software testing, a number of CBR researchers
have focused on finding approaches especially for
reducing cases in the CBR systems’ storages. We
propose a number of techniques which adapt
some of this research. Our preliminary
experiments show that the proposed technique
can be effectively used in reducing the number of
test cases required for software testing, while
maintaining the accuracy of system's output.
I. Introduction
In the software development life cycle, software
testing has been proven that it is one of the most
crucial and expensive phases. The major goal of
software testing is to ensure that as many errors
as possible will be identified, especially before
releasing the software to the end-users. However,
to ensure that the software is of high quality
while minimizing the errors before delivering to
users, the software development providers have
to expend a lot of time and energy.
Test case generation has been proven to be one of
the major critical steps in software testing. The
main objective of generating test cases is to
ensure that those generated cases can be used to
reveal as many faults as possible. Our research
reveals that there are many test case generation
techniques. With regard to [12], path-oriented test
case generation is the most effective technique.
Moreover, our research reveals that to lower the
cost of software testing, we have to use small, but
efficient, group of test cases. With some
situations, the growth of the number of test cases
can’t be controlled. In order to solve this
problem, we apply the combined concepts of
software testing and CBR.
We assume that test cases are treated as cases in
CBR. Given a set of test cases generated by the
Path-Oriented technique, this paper discusses
how to maintain a number of test cases in
software testing by using the Case-Based
Maintenance or CBM concept.
CBM have been investigated by CBR researchers
in order to ensure that only small amounts of
efficient cases are stored in the case base.
However, in the light of software testing, the
proposed techniques focuse on how to maintain
the test case while the ability to reveal faults is
still preserved.
The next section presents the problem statement
and motivation of this paper. Section III shows
the investigated papers for software testing and
CBR. Section IV represents our proposed test
case maintenance techniques. The evaluation is
addressed in Section V. Finally, the conclusion
including our future works are identified in
Section VI.
II. Terminology
Test Case or Test Data is a collection of nodes,
which can be traversed in the control flow graph.
Tn = {N1, N2, N3, …, Nn}
Where T = test case, N = node in the control flow
graph
Probability is the defined frequency of test
cases’ usage. The more value of test data is the
higher probability is.
Impact is a measure of the criticality of the event
of faults, exceptions and errors detected in the
system. Example of high impact is functionality
not working and no work around, medium impact
is functionality not working but with work
AIAA <i>Infotech@Aerospace</i> 2007 Conference and Exhibit
7 - 10 May 2007, Rohnert Park, California
AIAA 2007-2979
Copyright © 2007 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved.
Test Case Management
From OpenOffice.org Wiki
Jump to: navigation, search
The Test Case Management Portal (or in short TCM) is a web based portal for test case management (as the name states). This includes definition and translation of testcases as well as assigning test cases to specific testers and collecting the results.
Contents [hide]
1 General Information about TCM
1.1 Roles in TCM
1.2 Bugs in TCM
2 Doing your daily work
2.1 SQA tasks
2.1.1 Doing your tests
2.1.2 Report Issues for failed tests
2.2 SQE tasks
2.2.1 translating test cases
2.3 MGR tasks
2.3.1 grant access to new testers
2.3.2 assign tests
2.3.3 review test results
2.3.4 create a test report
General Information about TCM
TCM is developed and hosted by the Sun localization testing team. So, the initial focus of TCM is localization testing - not only for OOo. We are trying to extend the capabilities of TCM, so that it can be used for general testing tasks within the OOo project.
At the moment the OOo project is using the same TCM installation as the OpenSolaris project. This means, we use the same "program" but the data are distinct.
For a short introduction to TCM see about TCM. The introduction is written for a successor of the current TCM. So the screens may differ from what you see in current TCM.
Roles in TCM
There are three (or four) roles in TCM:
SQA (Software Quality Assurance)
Can view test cases, update results (pass/fail) per test case
SQE (Software QA Engineer)
can modify (add, remove, translate) test cases
MGR (Manager)
Has SQE access + can grant user access, assign tests to testers, enter new test cases / scenarios and create new test reports and report templates
All these roles are based on localizations. That means, every SQA / SQE / MGR belongs to one (ore more) localizations. SQA's will see test cases translated to their language, SQE's will be able to translate the cases to a given language. MGR can only assign tests to tester of the same localization.
The Manager of the en localization has some more rights. He is some kind of "Super Manager" (the fourth role).
Bugs in TCM
If you find bugs in TCM (the tool itself), you can use the OOo Issue Tracker to report them. Use category qa, subcategory tcm.
If you find bugs that apply to a test case (bug in OpenOffice.org) you need to file it for the appropiate Issue Tracker component. Don't use qa / tcm in this case. See Report Issues for failed tests.
You may ask at the dev@qa.openoffice.org mailinglist if something is unclear with TCM.
Doing your daily work
SQA tasks
Doing your tests
Before you can do any tests, you need to have assigned some tests to you. You should ask your manager to assign tests to you. (Ask at dev@qa.openoffice.org if you do not know, who "your manager" is).
Login to TCM
Login to TCM with your username and password
Go to "Test Result Update"
after login, you will see some menu items (depending on your role). You need to go to "Test Result Update" (Item #2, if you have SQA role only)
Select the Build number
in the next screen you will see a list of build numbers you have tests assigned for. In most cases you should only see one or two builds. Follow the link in column "Build Number" for the build you are going to test.
Select the assigned test scenario
you will now see all of your assigned tests for the selected build. Do not click on the assignment id (although this seems obvious, it is wrong). Follow the link in column "Assign by" instead.
If you have already done some of your tests, you may follow the link in column U (click on the number). You will see only untested cases in the next screen.
Enter your results
you will no see all test cases of the scenario. Each test case has a short description (what to do) and the information, what results are expected.
At the top of each test case you will see option buttons. Select "pass" if the test meets the expected result. Select "fail" if not. You may also "skip" a test if you are not sure if you understand the description (or if it is not important for the current test). If you do not have the time to complete a test, leave it "untested".
In case, the test fails, you should file an issue and put the issue id to the input box "bug".
You may also leave a comment about the test. Your manager will be able to read the comments and may have better information about the quality of the build.
Update your results
navigate to the bottom of the page and "update" the results. You should do this from time to time, even if you entered only some of your results.
Hint: There is an option to download the test case descriptions in step 4. You will see a "download" link in the right most column. You may download a plain text file here. (In case your browser is going to save the file as .cgi, simply rename it to .txt). You may open the file with any text editor. The File header has some information about the file format. So you should be able to enter your test results offline. Once, your test has been completed and all results have been entered to the file, you can upload it again. You can do this again in step 4. Enter the file name (full file path and name) in the input box on top of the table. Then press "upload".
Report Issues for failed tests
If a test failes, you should file an issue at the OpenOffice.org Issue Tracker. For such an issue, the issue writing guidelines apply.
As not every developer or member of the qa project has access to TCM, repeat the steps that lead to the problem in the issue.
It is planned to be able to see test case descriptions in TCM without being logged in. Once this has been implemented, you may enter a link to the test case description in Issue Tracker.
SQE tasks
translating test cases
Login to TCM
Login to TCM with your username and password
Go to "Test Case Maintenance"
After logging in, you will see the menu items appropriate to your rôle. Go to "Test Case Maintenance" (Item #1, if you have the SQE rôle).
Select the product "OpenOffice.org - Office Suites(2.0)"
"OpenOffice.org" is the only accessible product - click on the product-name link.
Choose a Category
Follow the link to the appropriate Category for the Test Case to be translated. You will now see all the Test Case Descriptions in this Category. The translation, if any, will also be displayed for each description.
Open a single Test Case Description to translate
Click on the Test Case ID. You'll see the English (original) text and input fields for translation.
Hint: you can also use HTML tags to format your translation text.
Update the description
Enter your translation, and press the "Update" button at the bottom of the page.
Hint: You will see "download" links shown in several places. You can use these links to download all the test cases in one text file, translate them offline and upload the resulting file when you're finished. The file includes a description of the file format.
MGR tasks
grant access to new testers
Login to TCM
Login to TCM with your username and password
Go to "Property Maintenance"
after login, you will see some menu items (depending on your role). You need to go to "Property Maintenance" (Item #7, if you have MGR role)
Go To "People"
The only property you are able to edit as normal MGR is "People" - follow the link
Add new People
you can add new people by following the link on the upper right
Enter user details
You need to enter
Login
thats the login name of the user (it's a good idea to use the OOo-account name as login name for TCM)
Name
the full name of the tester
Language
you may choose one or more languages you are responsible for
Location
free text, just a notice, where the tester is based
an e-Mail adress, in case you need to contact the tester. (the openoffice.org mail address would work here)
Role
choose the role of the tester. Make sure, you include the SQA role, even if you grant SQE or MGR role.
Add the tester
press the "Add"-button at the bottom of the screen
Hint: you can change user details or reset the password for existing testers at this screen. You just need to click on the login name.
assign tests
Login to TCM
Login to TCM with your username and password
Go to "Test Assignment Maintenance"
after login, you will see some menu items (depending on your role). You need to go to "Test Assignment Maintenance" (Item #5, if you have MGR role)
Choose the Project "OpenOffice.org"
you will see a list of Projects that are managed in this TCM instance. Follow the link to OpenOffice.org
Select build number
the next screen will show a list of builds that are ready for testing (e.g. localisationXX for localisation tests or 2.XRC for release tests)
select "Scenario" in column "Assignemnt by" for the build that should be tested (don't follow the link to the build name, this will show a list of all testassignments for this build)
Select the scenario
choose a scenario, you like to assign. (Application scenarios are used for localization testing, the "OOo release sanity" scenario is used for release approval)
follow the link in column "Test Cases" (click on the number of test cases in this scenario)
Assign test scenarios per platform
now you can assign the scenario to a tester. Simply select a tester for any platform and click on Update.
you can assign multiple platforms to one tester and a plattform to more than one tester
Hint: to go back to the Scenario selection screen simply use the Back button of your browser.
Comments
Post a Comment