DRAFT Version: Wed Jun 6 17:48:37 EDT 2001

An Evaluation of the Effects of
Web Page Color and Layout Adaptations

D. C. Brown, E. Burbano, J. Minski & I. F. Cruz.

1. Introduction

An Adaptive Web Site molds itself to the user, creating a unique interaction [Brusilovsky, 1998]. The intention is to provide a more personalized and enjoyable experience, but also to increase the success of an interaction. Success can be measured in a variety different ways, depending on the site and its use. These include the speed of the completion of the task, and related measures, such as the number of mouse clicks.

This research studied the effect of web page adaptations on information finding tasks at a web site [Burbano & Minski, 2001]. Many components in the user interface can be altered to produce adaptations, such as page content and web links, but we have limited our work exclusively to the alteration of color and layout.

The hypothesis studied in our research was that these adaptations would allow users to complete tasks in a shorter time, and that this effect would occur whether the adaptations were used individually or together.

While Adaptive Web Sites are normally dynamic, in order to focus on the `effects' of adaptations we used predetermined adaptations, creating a set of "static" web sites containing all the adaptations to be studied.

A web-based experiment was designed that required each subject to answer three questions. The answer for each could be found by searching through a web site. A local copy of a portion of IBM's Sydney 2000 web site was used for these experiments, in which one hundred and twenty eight students participated.

Each question was associated with a different adaptation of the web site. For each question, each subject saw either no adaptation (N), a color adaptation (C), a layout adaptation (L), or a combination of both color and layout adaptations (B). In order to reduce the potential effects of learning, and to compensate for adaptation order, the experiment was kept brief and a balanced experimental design was used.

The `Color' adaptations, the `Layout' adaptations, and the combined adaptations all reduced task completion time. It was concluded that there was significant support from the experimental data for the hypothesis.

2. Literature Review

2.1 The Impact of Color & Layout

There is a subtle and complex relationship between color usage and effectiveness. However, researchers agree on some major guidelines. For example: use color sparingly; use color consistent with cultural and standard meanings; use colors that contrast well; and avoid saturated colors [Shneiderman, 1998] [Najjar, 1990] [Doore et al, 1993] [Krebs & Wolf, 1979].

Color affect symbol legibility, and user performance for all colors improves with larger symbols [Durret, 1987]. It is an effective coding method for reducing visual search time on complex displays, and its advantage increases as the amount of symbol density increases. However, if the target's color is unknown, performance is inferior to searching without color [Durret, 1987].

Search time for color-coded displays increases as the number of displayed items of the target's color increases, and also with the number of differently colored items. Even though the use of color aids a user's task for most situations, adding color that does not convey any meaning yields a longer search time [Krebs & Wolf, 1979].

Layout affects the efficiency of visual access. Guidelines, such as complying with the left-to-right, top-down reading direction, are often used. These include: following standard layout conventions; matching common eye scanning directions; using left or right justified fields and labels appropriately; good use of whitespace; using sufficiently large icons, buttons and other "targets"; and designing layouts to reduce cursor movements [Mullet & Sano, 1995] [Shneiderman, 1998].

2.2 Evaluation of User Interfaces

Evaluation can be done using a variety of methods. Expert reviews methods include Heuristic evaluation, Guidelines review, Consistency inspection, Cognitive walkthrough and Formal usability inspection [Shneiderman, 1998].

Usability testing, and laboratories, are focused on identification of user needs and relating the interface to its users. Surveys are a very convenient method of evaluation, and are a familiar, inexpensive, and generally an acceptable companion for usability tests and expert reviews.

Evaluation with a large number of participants provides a sense of authority to the results, compared to the possibly biased and variable results of the small number of usability-test participants, or even of expert reviewers. Web-based experiments allow large samples that differ demographically from the usual, available subjects [Birnbaum, 2000] [APS 2001].

2.3 Evaluation of Adaptive Hypermedia

Adaptive hypermedia systems are developed, and therefore evaluated, with five key factors in mind: what application areas are suitable; what user features inform the adaptation. (e.g., goal, interests, experience); what can be adapted (e.g., color, links, layout); what adaptation mechanism to use; and what is the goal of the adaptation (e.g., reduce errors, increase speed) [Brusilovsky, 1998].

Our work has focussed on evaluating the interaction between what can be adapted and the goal of the adaptation, with little or no attention paid to the other factors.

Every evaluation tends to include the following essential steps: Identifying the purposes or objectives of the evaluation; Experimental design, including selecting suitable methods, subjects, tasks, measurements, and analysis frameworks; Running the experiments, and collecting the relevant data; Analyzing the data; Evaluating the results and drawing conclusions. [Browne, 1990, pp. 163-164]

Usually, evaluating adaptive systems is not the same as evaluating regular interfaces, because of the nature of adaptive processes. Comparative evaluation is typically done against a non-adaptive, static system. In our case we have used static systems that represent the results of the possible adaptations, their combinations, as well as no adaptations.

***some notes here about evaluation of adaptation from the literature***
e.g., Hothi & Hall, Kobsa, Specht, Ardissono

3. System Design & Implementation

3.1 Interface Design

To be able to accurately measure performance improvements in our subjects, we needed to reduce stress on the user. Dealing with a new Website, and all that entails, is a significant source of stress. It is usual for much time to be lost while the user is getting to know a Website.

However, when an interface uses common content and well-known subject-subject relationships, used consistently, users tend to anticipate what the site will offer, and concentrate on taking part in the experiment. This suggested using a site developed for a large audience, with well-known subject matter.

Given this requirement, we then decided on the contents to be displayed. The 2000 Sydney Olympics Website appealed to us for several reasons:

this site was developed by the IBM e-business team, an experienced group of developers;
the site used easy-to-understand language, to accommodate the large readership;
its content was non-technical;
the site was structured in a similar fashion to the way we envisioned the material for our controlled experiment, as a wide tree of nodes.

The interface for the experiment presents the user with a set of tasks to be completed. The interface design has two frames, an upper one containing a question that defines the current task, and a set of possible answers, while below, in the second frame, the subject can traverse a local copy of the Olympics site to locate the answer. Despite admonitions not to use frames [Nielson, 1996], we decided to use them so that users would not have to toggle between two active browser windows.

3.2 Aspects of the Software Design

The software for the experiment was required to collect the number of clicks, the elapsed time, and the answers to each of the tasks for every user in the experiment. Data needed to be written to a file when each user had completed their tasks. Also, every task question needed to be read from a file located on the server side. Cookies were used as temporary storage during each interaction with a user.

Perl [2000] was chosen as the implementation language as it could handle operations with files quickly and reliably, including opening, closing, reading and writing on the server side. It could also be used to handle cookies, to generate HTML dynamically, and to get information from forms.

A final, important feature that Perl has is that whenever a request comes to the server for a specific script, the server creates a separate process. This means that unforeseen interactions between subjects using the web pages can be avoided.

4. Design of the Experiment

4.1 The Adaptations

Color change was chosen as one of our adaptations, as it is easy to implement, and important for conveying information such as order, magnitude, etc. In this experiment use of color was limited to the enhancement of grouping and order relationships.

Page layout change was chosen as the second adaptation. While not as easily implemented, their potential for great impact makes them important to study. Layout can easily make specific information more accessible to the user. It can emphasize importance and order when dealing with large amounts of information (e.g., data positioning in a list).

These two adaptations have the advantage that they can be used by themselves or can easily be combined.

For the experiment, it was important to keep in mind the order in which the results of the combinations are achieved. For example, knowing that layout adaptation A combined with color adaptation B yields a positive result does not mean that combining the adaptations in the opposite order will yield the same result.

A sample web page with both Layout and Color Adaptations is shown in Figure 1. In contrast to the page in the "None" category, on this page the countries are sorted alphabetically (layout) and are color coded to match the continents on the map (color).

Figure 1: A sample web page with both Layout and Color Adaptations.

4.2 Catering to the Subjects

For any statistical analysis to be significant one must account for variation by testing a substantial number of subjects. Our Internet-based approach allowed subjects to visit our experiment from anywhere and at anytime. This helped to increase the sample size. Other benefits were that accurate records and measurements could be recorded online, and that it relieved us from having to reserve time and space in which to conduct the experiments.

During our experiment we wanted to keep the subjects as comfortable as possible, and also to try to reduce the effects of learning. We achieved this goal by limiting their interaction with the system by reducing the number of experiment phases, only presenting them with three tasks, each defined by a question about the Olympics. We were able to keep the completion time for each subject to about 9 minutes.

4.3 The Stages of the Experiment

The presentation of the experimental itself was divided into four parts. First, an Experiment Briefing presented overall details that subjects needed to know before starting the experiment. This included what the site is about, and what Internet browser was preferred.

Next the subject saw the Tutorial. This included a sample screen capture from an actual experiment with important interface elements labeled, particularly the two frames. This familiarity with the experiment's interface should help speed up initial use.

Next the users filled out a Demographics form. Here, information such as: age, major, username, citizenship, Internet experience and Olympic knowledge was recorded.

Users then entered the "Experiment" section. As each question was presented to the subject, he/she has to `surf' the Olympics site presented in the lower frame, and find the answer. Every link-click made by the subject was recorded, as was the time from "question-prompt" to "question-finish". Measurements were independent of whether they correctly answered the question.

Recordings were made for each question. Once the user had successfully finished their last task, they were thanked for their time and informed that their information had been saved. Saving the statistical information at that point, prevented recording any incomplete data for a user.

4.4 The Web Site

Because the 2000 Sydney Olympics Web site that we used had to be modified -- to reduce its complexity and ensure control over its organization -- we aimed at achieving a broad, shallow tree. Schneiderman [1998] encourages designers to limit trees to three levels in depth: "when depth goes to four or five, there is a good chance of users becoming lost or disoriented." He mentions that better productivity (speed, accuracy, preference) occurs when users encounter at most eight nodes (in its leaf level) in a two level deep tree.

In order to reduce user learning, tasks were selected such that the answer to each question they saw was located on a considerably separated leaf node in the Web site structure. In addition, tasks had to be challenging for the users and require them to actually browse in order to answer correctly. It was also very important to select tasks such that finding the answers could be enhanced by color, by layout or by both adaptations.

A sample question, used for Layout adaptation and for Color adaptation, was:

In the men's marathon what was Kenya's position in relationship to that of Ethiopia? The answer choices given were:

Same; Better; In between; Worse; with "In between" being the correct answer.

4.5 The Form of the Experiment

We designed a "diamond graph" (Figure 2) where each of the four nodes were adaptations. In this diagram, C is for `color' adaptation, L is for `layout' adaptation, N is for `no' adaptation, and B stands for `both' adaptations,

Figure 2: Experiment Design.

Each subject was exposed to one of the 4 paths between N and B. The paths are: BCN, BLN, NCB, NLB. Subjects were randomly assigned, dynamically at the time of browser use, into one of the four groups that corresponded to these four paths. This "counterbalancing" approach provides compensation for the potential effects of presentation order. In addition it keeps each subject's experiment short.

5. Results

One hundred and twenty eight subjects participated in the experiment. From the Demographics forms completed we know that: all of the subjects were aged 18 to 23 years old; 73% of the subjects were Computer Science students; 58% were intermediate Internet users, 38% experts, and 3% beginners; and 57% had beginning Olympics knowledge, 30% intermediate, and 9% expert knowledge. Similarly to Internet experience, more knowledgeable users in this field might have had an advantage, allowing faster task completion.

As two of the paths through the experiment included color adaptation and two included layout adaptation, it was necessary to analyze the data in two separate batches. One analysis was conducted for subjects who worked with the set of adaptations: Both, Color, and None. Another analysis was done for: Both, Layout, and None. A one-way repeated measures analysis of variance was conducted for time and for number of clicks, for both groups.

Results were significant at p < .0005 levels, which means that the probability of achieving these results by chance alone was less than 5 in 10,000. These results indicate that there are significant differences between the effects of each adaptation.

First, we analyzed the overall effects of the two Both-Color-None groups (BCN and NCB) and how each adaptation affected users' performance with respect to time (Figure 3).

Figure 3: Overall time average for B, C, and N.

The adaptation was graphed as a function of its mean time value. In this case, 64 subjects account for the data. Clearly, the Both adaptation has reduced task completion time to slightly less than half that of None, and is a significant reduction compared to Color alone. The p-level in this analysis was less than .0000001, making these results strongly significant.

Analysis of the Both-Layout-None group (BLN and NLB) also shows how each adaptation affected users' performance with respect to time (Figure 4).

Figure 4: Overall time average for B, L, and N.

Sixty four subjects' data was used in this case where the Both adaptation correlated with a speedier task completion, being slightly twice as fast as Layout, and nearly three times faster than None. The p-level here was less than 0.000006, again greatly significant.

After the two groups were analyzed for time, we analyzed the behavior of the users in terms of number of clicks, with similar, significant results. The strongest adaptation continued to be Both, yielding half as many clicks compared to Color, and nearly one third of the clicks in the None case.

We also included "planned comparisons analysis" between individual adaptations for both groups. This identifies significant differences between individual adaptations. For the Both-Color-None group, Both was faster than Color with a significance of 0.00038. It was faster than None with a p-level of zero. For the Both-Layout-None group, Both was faster than Layout with a p-level of 0.0017, while being better than None with a p-level of 0.00011.

The experiments were set up in such a way that no single user was exposed to both Color and Layout adaptation alone. It would have been inaccurate to compare these adaptations since the data used did not correspond to the same context or users. However, it can be seen in the graphs that the Color adaptation generated faster task completion than Layout. However, this might have been caused by the complexity of the tasks or the degree of adaptation used.

6. Conclusion

It was concluded that there is significant support from the experimental data for the hypothesis that adaptations allow users to complete tasks in a shorter time, and that this effect occurs whether the adaptations are used individually or together.

Users achieved their task goals faster when adaptations were present. Color or layout adaptations by themselves reduced average times and number of clicks compared to when there was no adaptation. Even faster task completion occurred when color and layout adaptations were combined.

The study suggests that changes in color or layout tend to be more effective when the previous task was completed with no adaptation. In addition, color adaptation produced more effect than layout adaptation. However, the color and layout adaptations were used in totally different contexts. We also have no way of knowing whether these adaptations represent the same degree of change.

Future studies should categorize both layout and color adaptations, and more systematically vary them in an experimental situation, correlating the results with the type of task, and with user preferences and characteristics. In addition, these categories of adaptations should be matched with techniques for accomplishing the adaptations dynamically.

References

APS (2001) Psychological Research on the Net, http://psych.hanover.edu/APS/exponnet.html, American Psychological Society.

Liliana Ardissono

M. H. Birnbaum (Ed.) (2000) Psychological Experiments on the Internet. Academic Press.

Browne, D., Totterdell, P. & Norman, M. (1990). Adaptive User Interfaces. London: Academic Press.

Brusilovsky, P. (1998). Methods and Techniques of Adaptive Hypermedia. In: Adaptive Hypertext and Hypermedia, (Eds.) P. Brusilovsky, A. Kobsa & J. Vassileva, Kluwer Academic Publishers, pp. 1-43.

de Bra

E. Burbano & J. Minski (April 2001) Qualitative Analysis of Web Site Color and Layout Adaptations, Major Qualifying Project, MQP-DCB-0004, Advisors: D. C. Brown & I. F. Cruz, http://www.cs.wpi.edu/~dcb/MQPs/MinskiBurbano/, Computer Science Department, WPI.

G. S. Doore et al (1993) Guidelines for using color to depict meteorological information. Bull. Amer. Meteor. Soc., Vol. 74, No. 9, pp. 1709-1713. Available as: http://www.cdc.noaa.gov/iips/color.html

Durrett, H.J. (1987). Color and the Computer. Orlando, FL: Academic Press, Inc.

J. Hothi & W. Hall

Alfred Kobsa

M. J. Krebs & J. D. Wolf (1979) Design principles for the use of color in displays. Proc. Society for Information Display, Vol. 20, pp. 10-15.

Mullet, K. & Sano, D. (1995). Designing Visual Interfaces: Communication Oriented Techniques. Mountain View, CA: SunSoft Press.

Najjar, L. J. (1990). Using color effectively (or peacocks can't fly). IBM TR52.0018, Atlanta, GA: IBM Corporation. Available as: http://mime1.gtri.gatech.edu/mime/papers/colorTR.html

Nielsen, J. (May 1996) Top Ten Mistakes in Web Design. Alertbox, http://www.useit.com/alertbox/9605.html

Perkowitz, M. & Etzioni, O. (April 1997) Adaptive Sites: Automatically Learning from User Access Patterns. Proc. 6th Int. World Wide Web Conf., http://www.scope.gmd.de/info/www6/posters/722/

Perkowitz, M. (1999). Towards Adaptive Web Sites: Conceptual Framework and Case Study. Proc. 8th Int. World Wide Web Conf., http://www8.org/w8-papers/2b-customizing/towards/towards.html

Perl (2000) The Perl Reference Guide, O'Reilly & Associates, http://www.squirrel.nl/people/jvromans/perlref.html

Shneiderman, B. (1998). Strategies for Effective Human-Computer Interaction. Addison Wesley Longman, Inc.

Markus Specht

http://www.cs.wpi.edu/~dcb/MQPs/MinskiBurbano/paper.html

An Evaluation of the Effects of Web Page Color and Layout Adaptations