|

Ian Tibbles MA (Cantab.), MSc (London)
Common questions from potential licensees and from participants completing
a Personal Style Survey are How reliable is this survey? Is
it relevant to me at work (or at home)? and Are the results
accurate? These are important questions and must be answered clearly
if the results of the survey are to have any credibility for that person.
The difficulty is that the questions can be answered in a number of different
ways depending on the perspective of the questioner and their degree of
understanding of the issues of reliability and validity in the design
of behavioural surveys. This note seeks to give licensees a framework
for dealing with both the technical and non-technical questioner.
This term is widely used to describe ability, aptitude, behavioural and
personality surveys and questionnaires. Literally metric means
measure and psycho means mind its dictionary definition
is the science of measuring mental capacities and processes.
This is done through the collection and interpretation of survey data.
The Personal Style Survey was designed using psychometric principles of
survey construction.
A test is designed to measure some aspect of ability, aptitude, personality
or motivation against a pre-determined standard. Potentially it can be
threatening to the participant, as there is inevitably a sense of pass
or fail in the analysis. It is therefore, important that the use of tests
is demonstrated to be objective, fair and appropriate. Tests of personality,
for example, commonly have measures of:
faking good
faking bad and
consistency
to ensure the results are not distorted.
The benefits of this process are that it is objective (as far as possible)
and usually rigorous. The potential disadvantages are that it is threatening
and can be a mystery to the participant who is trying to understand how
the results were arrived at.
The Personal Style Surveys are not tests and should never be described
or used as such. Everybody scores 100%. Each survey simply seeks to
measure how the person completing it prefers to behave when things are
going well (favourable conditions) and when they are experiencing stress
or conflict (unfavourable conditions). The surveys are not situation specific
and are not a predictor of effective or ineffective behaviour each
persons profile is capable of being effective or ineffective depending
on their understanding and management of their behavioural strengths and
potential weaknesses. Nevertheless the results can be very powerful, giving
people insights into how to:
make more of their strengths
make more effective use of the strengths of others
minimise potentially inappropriate or ineffective behaviour and
get on well with people who are not like them

The Personal Style Survey is constructed as a forced choice ranking
of four different endings to each statement. The process of forcing the
person completing the survey to choose between 4 behaviours quickly is
designed to access the individuals sub-conscious self-understanding
and to bring it into conscious understanding through feedback and discussion
of the survey results.
Because the process is non-threatening it is possible to openly discuss
and confirm the survey findings with the client Does this
feel or sound accurate to them? The licensee can encourage them
to discuss and validate the findings with friends and colleagues. It is
important to ensure that they choose someone who they trust to know
them and to have a constructive opinion to offer. If necessary, they
should be allowed to modify the findings to create a best fit
profile of their behaviour.
However, some aspects of traditional reliability and validity measures
are helpful. Below is a description of the measures and how they relate
to the Personal Style Survey.
Reliability
Survey scores vary from one measurement to another. A range of factors
may cause this:
differing degrees of effort
variations in attention levels
administration
health
circumstances etc.
The precision or consistency of measurement displayed by a survey is referred
to as its reliability. It is normally expressed in terms of a statistic
the correlation coefficient, often referred to as the reliability
coefficient.
The three most common types of reliability measure are:
- Test retest. This compares the results of the same survey
being completed by the same candidate at different points in time.
- Personal Style Survey Version Two. This compares the results
of two or more forms of the same survey completed by the same group
of subjects.
- Internal consistency. This measures the performance of all items
(questions) in a survey by comparing the two halves of the survey
the split-half technique or using the Kuder-Richardson reliability coefficient
(the mean of all split-half coefficients).
Reliability coefficients are usually expressed as a number between 0.1
and 1.0. A coefficient of 0.2 would suggest a much lower level of reliability
than a coefficient of 0.6. It should be born in mind however that this
is strictly incorrect, as the figure is only an estimate based on a
particular group of people. It is not just the statistic but the quality
of the study from which it was derived which needs to be understood.
It is important for any survey to measure consistently and with a reasonable
degree of accuracy. The reliability coefficient for the Personal Style
Survey was derived using Cronbachs coefficient alpha and is reported
below from an analysis by Dr Allan Katcher (co developer of the Life Orientations®
Method) for the eight scales:
| Orientations |
Favourable |
Unfavourable |
| Supporting/Giving-in |
0.54 |
0.54 |
| Controlling/Taking-over |
0.70 |
0.61 |
| Conserving/Holding-on |
0.63 |
0.46 |
| Adapting/Dealing-away |
0.61 |
0.37 |
Test/retest study By Dr Allan Katcher
The reporting of the stability of test results over time is usually reported
as part of the data around the performance of any psychological instrument.
Test/retest data has a less clear meaning with regard to test reliability
than internal consistency data. However, it cannot be determined whether
the person has changed over time, has reported him or herself from two
different standpoints (not test-related) or whether the survey evokes
different kinds of reporting at different times. There is also the attenuation
problem; on the second completion of the survey, it is no longer really
new - even though in the study reported below, meaning was not put on
the test between the first and second administration. Still, in all, one
should expect some amount of stability if the test measures salient variables,
though apparent shortcomings are very hard to interpret.
The Personal Style Survey was administered to 63 graduate students and
then re-administered after five weeks. The subjects were not given their
scores or any information about the meaning of the survey until after
the second administration. The simple product-moment correlations are
as follows:
| Orientations |
Favourable |
Unfavourable |
| Supporting/Giving-in |
0.49 |
0.53 |
| Controlling/Taking-over |
0.61 |
0.57 |
| Conserving/Holding-on |
0.62 |
0.60 |
| Adapting/Dealing-away |
0.69 |
0.39 |
It is of interest to see whether the Life Orientations method style
descriptions change from one administration to the next. Each pair of
test profiles was analysed to note whether the basic descriptions changed.
The results of this analysis are as follows:
| No change (favourable) |
 |
38 of 6360% |
| No change (unfavourable) |
|
31 of 6349% |
| No change (considering both) |
|
19 of 6330% |
Even though 30% of those tested showed virtually identical scores on
both administrations, it was suspected that those who showed a clearly
predominant style preference would be less likely to change; that is,
if the test really measures some genotype variables. Again, the test was
considered in two parts, the "favourable" style and "unfavourable"
style. 21 subjects showed a predominant style choice (5 points more than
any other score) on the "favourable" scales and of those, 14,
or 67%, showed the same style preference on the second administration.
20 subjects showed a predominant "unfavourable" style with 16,
or 80%, showing no change on the second taking.
These same data were also examined to pick out those subjects who had
clear "favourable" and "unfavourable" styles that
were the same, another gross measure of strength of preference. Of the
27 who showed such a pattern on the original administration 17, or 63%,
showed no change with the second administration. The expectation that
those who have clear style preferences are less likely to change over
time is strongly supported.
Overall, it is evident that the Personal Style Survey measures pretty
much the same thing in people over time though, as stated earlier, the
interpretation of less than perfect stability is difficult. Some anecdotal
evidence suggests that changes in scores could be due to subjects focusing
on different parts of their lives as they took the test at different times,
or that they could respond differently according to mood. One person reported
some progress in his personal therapy between the first and second administrations,
and felt the second test results reflected more what he was going after
and the first a rather pessimistic view of himself. But this sort of evidence
only adds to the confidence in the surveys reliability and usefulness.
Key Points on the Reliability of the Personal Style Survey
In demonstrating why the survey should be considered to be reliable it
is important to make the following points:
- track record the survey has been in use internationally in
all the major developed countries for over 25 years
- our experience of using the survey, combined with data from our licensees
is used to constantly improve the product range
- translations into other languages are carefully checked by experienced
survey developers from each country for accuracy in terms of the culture
and linguistic nuances rather than just literally translated
- over 8 million people have completed the survey
- the model is based on well respected and soundly based psychological
theories:
- Erich Fromm in Man For Himself
- the strength/weakness paradox
- 4 behavioural orientations
- Carl Rogers the founding father of client centred therapy
- client centred development
- communication congruency
- the standard statistical measure of reliability often quoted is to
achieve a correlation coefficient of 0.7 or above. However this measure
is relevant for Psychometric Tests, often used in isolation from other
data! The Personal Style Survey is not a test, its structure can be
easily explained - its results can therefore be checked and explored
openly and fully with the participant. Therefore a lower measure of
statistical reliability, 0.4 0.6 is perfectly acceptable
- conclusions are easily understood by the participants and (because
the process is non-threatening) can be openlychecked against previous
scores and reasons for differences explored jointly to establish confidence
in the findings.
Future Developments
The technically minded will be aware that the transparent construction
of the survey limits its performance in test/retest. Having completed
the survey once completing the same survey at a later date can allow some
unconscious manipulation of data if the individual has had feedback
on their profile (unlike the study described above) they may answer on
the second occasion as they think they should. Licensees may not be aware
that we already have a Personal Style Survey Version Two
for use with individuals who wish to assess how their behaviours may have
changed. During 1998 we will be making available for the first time a
range of surveys where the sequence of the answers has been randomised.
We will notify licensees in the quarterly newsletter when they are available
to purchase.
The Relationship Between Reliability and Validity
Reliability has importance because of its relationship to the validity
of the survey. Whilst reliability is about the measurement, validity is
about the relevance and usefulness of what is measured. It is possible
for a survey to be reliable i.e. to measure the same thing consistently
and with precision and for what it measures to be of no use or invalid.
An example of this would be - knowledge of the persons behavioural
preferences is not a valid measure of their intellectual ability (the
Personal Style Survey does not measure this). However, it is not possible
for survey results to be valid if the data is not reliable.
Validity Measures
We shall distinguish three types of non-technical 'validity' which
in a sense could be argued not to be validity at all:
face validity
content-analytic validity
faith validity
And four main types of technical validity:
content validity
construct validity
concurrent validity
predictive validity
Face validity
Face validity is concerned with whether an instrument appears to measure
what it was designed to measure. Whilst face validity has no technical
or statistical basis, it must not be overlooked if a survey is to be accepted
by participants or (psychometrically) untrained managerial staff.
Content-analytic validity
One sometimes hears test users speak of content-analytic validity where
the item content of a test has been analysed and related subjectively
to abilities that are of assumed importance in the job. As an illustration,
the argument might go:
- We wish to select a good salesman.
- This survey asks questions about selling.
- Therefore this is a valid survey.
This is often what untrained people call validity but it has obvious
flaws in failing to define what the specific characteristics of a good
salesman are and how the survey will measure these.
Faith validity
This is often the most difficult to deal with. It is a belief in the validity
of an instrument without any objective data to back it up, and the evidence
is not wanted!
The more empirically based concepts are:
Content validity
This is mainly in relation to attainment tests e.g. a spelling test containing
only the names of politicians in America would be a poor test of general
spelling in the United Kingdom. High content validity should always be
checked with one of the empirical methods of validation described below
when using any survey as a test.
Construct validity
Construct validity is more abstract than the other forms of validity and
is the extent to which a test measures some theoretical construct or trait.
Such constructs might be mechanical, verbal or spatial ability, emotional
stability or intelligence. Building up a picture of the construct validity
of a test can be a long process and involves any information that throws
some light on the nature of the construct under investigation. The complex
statistical technique which goes past the more visual inspection of inter-correlations
between different tests and which is often met in construct validation
is known as factor analysis.
Other information, which can lead to an understanding of the construct
validity of a test, includes internal consistency and the effect of experimentally
controlled variables and also variables such as age, sex and culture on
test scores.
Concurrent validity
Concurrent validity is the relationship between test scores and some criterion
of performance obtained at the same time. Thus, if we were to test a group
of computer programmers and correlate the results with supervisors' ratings
of work performance, we would have undertaken a concurrent validity study.
Where we wish to know the current status of an individual, concurrent
validity is the most appropriate form of validity. Some organisations,
for example, use attainment tests of job knowledge at the end of training
courses or in making decisions on staff promotion. However, although a
test may be of high concurrent validity it does not necessarily mean that
it will be useful in predicting later performance.
Predictive validity
This is the extent to which a test predicts some future outcome or criterion.
This is of crucial importance in personnel selection and placement. Two
difficulties in relation to this form of study are:
- The timescales for undertaking studies are often lengthy reducing
the practical use of the findings.
- Results can be distorted by the tests themselves; for example, measuring
whether individuals assessed as high flyers achieved their potential
can produce false results. Success may be partly a function of being
identified by tests as having potential enhancing prospects rather than
data on individual potential identified by testing being validated by
actual performance leading to career progression.
Statistical benchmarks for validity studies are set at much lower levels
than reliability usually a correlation of between 0.2-0.3 as opposed
to 0.6-0.7 for reliability reflecting the difficulty of achieving secure
findings in validity studies!
The Personal Style Survey and Validity
Of the non-empirical measures only face validity has any relevance
the other non-empirical measures are seriously flawed and therefore inapplicable.
The whole range of Personal Style Surveys has very high face validity
according to feedback received from licensees and course participants
over many years. The reasons for this are:
- The transparency of the analysis clients can see how the results
are derived.
- The deceptive simplicity of the model it is easily
understood by participants yet also produces powerful insights into
their behavioural strategies.
- Comparison of the feedback with self perception the forced
choice ranking is actually accessing sub-conscious self-understanding
and bringing it into conscious analysis giving the client more
choices to consider.
- The ability to cross reference the survey findings with the views
of others who know the individual (either in discussion or from analysis
of the results of the Personal Style Feedback Survey)
Face validity is important for both the user and the administrator of
the survey to have confidence in the appropriateness of the instrument
in individual, team or organisational development.
Faith and Content-Analytic validity are unsound measures and should be
discounted.
The empirical measures all presuppose some form of testing as they all
require some form of standard to measure the survey against:
- content validity depends on what purpose the survey is being used
for to measure the content against.
- concurrent validity and predictive validity are both trying to measure
against a set of performance characteristics.
The difficulty here is that the Personal Style Survey is not designed
to measure performance or ability only behavioural preferences.
As it is not used in isolation as a test there is no basis for doing such
studies. A number of studies do exist on the use of the survey in career
development and assessment centres but these are measuring the overall
effectiveness of the process i.e. the combination of instruments and exercises
- not the Personal Style Survey on its own. Information from licensees
consistently indicates that the survey is very useful in processes where
other instruments and processes can validate its results. It provides
a helpful focus, which can be explored in more depth with the other techniques.
Conclusion
The Personal Style Survey is one of the most widely used behavioural surveys
in the world. Because of the open process which is employed it is one
of the most reliable and meaningful insights an individual can have into
their subconscious self-understanding. The individual completing the survey
can validate the findings against their self-experience and against the
knowledge of them that others have. This information can be used to amend
and extend the analysis provided by the survey results, which ensures
a refinement of measurement, which is subtler and more robust than a statistical
coefficient in isolation.
The ability of the individual to understand, explore and check out the
survey results against real life data creates a more meaningful and valid
outcome than a validity study can provide the understanding and
ownership of the conclusions are with the client rather than the coach/counsellor.
Statistically the level of confidence achieved by validity studies is
much lower than that derived from reliability studies and there are numerous
examples where difficulties in measuring with confidence and flawed study
techniques can all too often undermine the quality of the data generated.
Using a statistical framework to prove the reliability and validity
of findings can (unintentionally) disempower clients as it is perceived
by many as an incomprehensible black box which can create
unnecessary threat and provoke caution and scepticism which is inhibiting
and unhelpful in a development setting.
In contrast the Personal Style Survey and associated development exercises
give the client ownership of the analysis using a client-centred process,
promoting understanding and the confidence to consider new behavioural
choices validated by their self-understanding and the feedback of friends
and colleagues.
back to top
|