• site guide
  • Pangaro Incorporated



    (C) Paul Pangaro 1982. All Rights Reserved.


    Harvard Computer Graphics Week
    Harvard University
    Graduate School of Design


    This paper was written and presented in 1982. It was the first documentation of the system Do-What-Do {(c) 1979} which I had conceived and developed in the few years before, and which had been presented at seminars at MIT and elsewhere.

    The text was written at a time when non-keyboard interaction was achieved via "graph tablet and pen" rather than mouse. All the concepts that allude to tablet touches and menu action apply equally well to mouse clicking.

    Conversion of the original paper from a printed copy was via OCR, apologies for any artifacts remaining from that process.




    As graphical interfaces gain favor over keyboard and text-driven interaction, the search for techniques that are directly visual continues in earnest. Menus are considered to be the best in state-of-the-art computer interaction, but have been in use at least since Sutherland's SKETCHPAD in 1963. This paper argues that although present menu systems have some advantages for interaction, especially for non-technical users, they are unnecessarily restrictive.

    A research program funded by the Admiralty Marine Technology Establishment, UK, and the US Army Research Institute applies a theoretical framework to the problems of computer-based instruction and computer-aided decision and design. Implications for a generalized approach, applicable in any situation with persons interacting with machines, have emerged and are reported in this paper.

    The main breakthrough promised by the research program stems from the formal application of Pask's Theory of Conversations to person/machine dialogue. Useful definitions for transaction, interaction, and conversation are developed in a framework of knowledge representation. These concepts, and simple metrics for the utility of an interface, are offered as well-defined characterizations of person/machine interaction.

    The claim for the resulting system is that, at worst, it is a goal-directed system that is self-teaching and adaptive to the user's learning style; at best, it is extensible, it can be personalized easily, and actually aids in the design process in tangible ways.

    The necessary relationship between frameworks to study human communication and the advance of human/machine interaction is argued.


    The vocabulary of any individual or group is quite idiosyncratic. The acquisition of this vocabulary by anyone outside the group is often a haphazard process and is usually fraught with misunderstandings.

    A cousin of mine had two expressions that were particularly difficult for me to fathom when I was younger. They are "ratza-stratz" and "bahdeens". The words cropped up in what seemed like any context and they appeared to have the identical meaningt but I hadn't a clue what the meaning was.

    Finally I realized that they had no specific meaning; they were synonyms for "whosis" and "whatchamacallit". Part of their magic was the lack of apparent correspondence between their meaning and their origins. Jargon invariably arises in ways that are unpredictable and spontaneous.

    One's identity is very much encapsulated in such inventions; to communicate them to others is to express this identity. Without such freedom, the "individual" does not exist.

    Any use of computers must encounter these subtleties of communication and individuality. From the perspectives of training (system to user) and individualization (user to system), present interfaces are incapable of communicating personalized meanings and capture them only with great effort. For the individual to continue to exist the present conditions must change.


    Advances in knowledge representation schemes have been used in expert systems (Steels 1979) with mixed success, but have not impacted user interaction directly. I believe that this has been the case because until recently no theory of human communication has provided a detailed set of procedures with which to characterize intelligent dialogue, and that could be embodied in software at the human/machine interface.

    This paper presents the results of design work and software experiments to be applied to the construction of expert advisor systems. The concepts incorporated in software have been tested in a variety of component forms, and the purpose of the near-term work is to bring these components together in the construction of a unified Intelligent Support System. The appropriate extensions to knowledge representation schemes provide an environment which contain interactive modes (such as menus) as a subset, while additionally providing training and paired dialogue within a uniform system design.


    In 1963, Ivan Sutherland's SKETCHPAD contained all of the essential elements of present day interactive, simulation-based graphics systems. Menus were the mode of interaction. Since then, many changes have occurred (for example, physical switches may be replaced by virtual ones, Bolt 1977) and the speed and power behind the interface is greater; but basically, the transactions required of the user, and their semantic components, have not changed.

    Generically, a menu refers to a list of choices available at a given moment in the user's interaction with the software system. In that sense, command line systems (in which functions are invoked by text commands with parameter arguments, for example, DRAW-LINE 0 0 10 10 RED) are functionally identical to menus. For those who cannot' type, menus are better especially when displayed on graphics devices with some mechanism for the user to "pick" choices.

    One advantage cited for menu displays is that the possible choices are visible, and therefore do not need to be memorized. However, for any system with a range of options, there are too many options to be shown at one time. The usual solution is to break up the entire repertoire into subsets, and to arrange these subsets into a hierarchy. As was discovered at the Architecture Machine Group in both the PAINT system and the Spatial Data Management projects, this flipping about from level to level is wasteful and tiresome. There is no purpose in spending many precious tablet-pen motions and touches simply to traverse a fixed topology to get to a place, especially after the place and its contents are known.

    In nearly all menu systems, there is the implicit interpretation of "place" to mean "the ability to input or modify the state of specific parameters." Traveling to that place in the tree means that the succeeding motions (if they are required, and they nearly always are) refer to those specific parameters. A classic example is "Line mode." After invoking this function, the next two tablet touches are interpreted to mean the endpoints of the line to draw. Unless there is the next transaction is unusual (such as a touch to a special menu area for color, or to change functions altogether to Circle or some such thing), these two points must be given next.

    Consider a further problem: here I am in sub-mode, sub a, sub viii, and I want only to get to a different state within the same mode; say, to widen my line and to change its color. Two or perhaps four moves and touches of my tablet pen and I have what I want. Now my touches place a fat blue line instead of a thin red one. But, suppose that I want my next picture element to be that thin red linet once again my only course is to reset the earlier parameters. This means that 1) I must be able to reconstruct them exactly, whether by eye or memorizing and resetting their values numerically, and 2) I am now unable to return to their newer values, namely, the fat blue line.

    These difficulties may be rephrased. Firstly, I must travel geographically in the fixed hierarchy of commands to give a particular interpretation to the just-succeeding transactions. This is equivalent to setting a "state" of the system. Secondly, any parameter settings (which technically are encompassed in the "state of the system") of those transactions are lost by further activity. Moreover there is the additional burden of fixed hierarchies t I am at the mercy of the programmer's choices as to how functions are grouped.


    Each of these difficulties my be avoided by a simple re-interpretation of "place" in the menu of functions. Instead of defining each place to be equivalent to one "state" of the system, consider it to be the ability to define an "instance" of a function. Parameters of line width and color, for example, would be set in the same way, and the succeeding actions on the picture would reflect this new state of the system. But the system retains this "state" and provides it as a "user-defined" menu choice.

    Hence, upon defining a "thin red line" instance, I can reproduce it at any time from a single menu choice. Of course the management and arrangement of such proliferating menus requires careful design; it would in fact take more than a single action to find and activate the appropriate button. A further elaboration would make menu arrangements dynamic in themselves and sensitive to context, but this area is not yet sufficiently explored.

    In sophisticated systems, the new menu choice that results from the defining actions might be given full graphical appearancet a new menu button which is a picture rather than text, and determined by the user. But even if left as a text string, the power derives from the "configuring", or extensibility, of the system: sequences of transactions are now coalesced to the single menu choice and according to the user's goals.

    Another entrance into the pre-defined system function of "Line mode", and parameter setting to fat red lines produces another menu choicet side by side with the thin blues. Single transactions have the effect of many, with the precise repeatability that is needed. There must of course be an additional mechanism whereby undesirable instances are forgotten and desirable ones arranged.

    Consider that a configuration of the system may be made by the user and without the intervention of the programmer. One complexity facing the uninitiated user of the usual menu system is the variation between each choice in the system: some require further actions, some do not; these actions are generally different in each case and require strictly correct responses on the user's part. An alternative would be the presentation of only a small set of choices, each of which is consistent with its neighbors. For example, a PAINT system which started with only four buttons, each of which draws a line of different color and width, would be relatively easy to teach. Once familiar, the user would still have access to the full set and also be able to arrange them according to design or to whim.

    It is fair to say that coherent action involves goals, whether well-specified or not, fixed or not. User transactions at an interface involve goals, and certainly a "user-oriented" system must be sensitive to this.

    The action of "menu choice" may be best understood as an attempt to complete a goal, or more likely, as one step toward the completion of a goal. Goals operate at many levels of abstraction and process. Decision and design have goals and without goals one can neither decide nor design.

    Yet, nearly all available menu systems totally ignore goal-directed activity. This is part of the tradition in computer science which places heavy emphasis on procedural descriptions (executable code in sequential machines). This encourages application builders to provide systems where the only possible transaction is to execute a procedural descriptiontto hit a menu button, type a command. Multiple-level structures in which goals interact with procedures may be expressed in a few well-know systems (Hewitt 1972) but their straightforward application to user interface and menu systems has been missed.

    The "instancing" scheme described above is one simple step toward such an improvement. The dynamic status of the function (its parameter settings) is captured and instanced and made available; insofar as this simple sequence is a goal or fragment of a goal, the system has captured it for subsequent use. The management and aggregation of experience by the interface as a focused activity which converges on "the design" (rather than merely proliferates previous and unwanted trials) is the topic of a sequel paper.


    Many of us have spent hours laboriously demonstrating and tutoring new users in. the use of our software systems. We have pondered over the problems of over-the-shoulder coaching, lengthy manuals that are impossible to index, and the ubiquitous "help" facility. "All you need to know is how to use the help command" is a frequent but ridiculous excuse for programs that attempt to explain themselves.

    Given a need to arrange the repertoire of pre-defined functions in a menu-driven system, it makes sense to have all tutoring correspond to that hierarchy. Put another way, there are two fundamental aspects of human/machine interaction: commands to execute procedures ("do this") and descriptive transactions ("what does this do"); and they may flow in both directions between user and machine. What is required at the interface is a means to bring both these aspects into one-to-one (or many-to-one) correspondence.

    A formal calculus has been developed which makes this correspondence (Pask 1979). Machine-based systems have been constructed for education and training which rely on a knowledge representation scheme called "entailment meshes" (Pask 1976, Pangaro 1981). This term is a good characterization of the structures involved, in two ways.

    First, the interconnections between topics (the "do-ables" and "know-ables" of the domain) are not hierarchical; they are a heterarchy, that is, there are no inherent "higher" or "lower" entities. Hierarchies emerge out of the mesh only during action and interrogation as a result of transactions with a user.

    Second, connections or relations between topics cannot not be arbitrary and ubiquitous; they must follow the rule of entailment. Consider the example:

    The surrounding boundaries indicate topics which "entail" each other; that is, to understand/describe/communicate/construct a given topic, all neighbors from at least one neighborhood are required. The neighbors are a necessary and sufficient condition. And there is more than one way to comprehend a given topic. Above, for example, Line-Attributes may be derived/entailed/understood in terms of Width and Color. Or, in a slightly different case, Line Attributes may be understood in terms of Line-Position and Line, which means that Line-Attributes are distinguishable in the context of Line and Line-Position.

    This example points up two aspects of the technique. First, the apparent hierarchy is mis-leading because topics become "more primitive" than others only as a result of dynamic interaction. The level of a topic is a subjective view, and relative to a desire for action or a desire to comprehend. Both kinds of goal consider the relations between an entity and its context, where context may have a variety of possible interpretations.

    Second, there is a fundamental requirement that the entailment mesh must contain entities which are distinct by virtue of their relations with other entities. It is this structure of knowledge which gives it sufficient consistency to remain stable and provide mechanisms for phenomena such as "memory."

    The complete knowledge representation calculus contains many significant rules for manipulation, including analogy elicitation, the automatic creation of generalizations and exemplars, and the orderly increase in the number of relations before a point of ambiguity or contradiction in structure. There are also conditions under which further structure must be provided either by the user or the system in order to resolve "confusion."

    One primary advantage of the application of entailment meshes to the machine interface is complete correspondence between the command aspects of user interaction ("do this") and the tutoring aspects ("what is this?"). This is evident from the following sample series of transactions:

    A user indicates "What-" "Line". (The following description is independent of the question of how the user indicates these particular topics beyond saying that they are displayed on a screen and picked, as if from a menu.) The system responds with a demonstration based on the two further topics, Line-Position and Line-Attribute. For example, the following text appears, accompanied by an appropriate motion graphic: "Line is a function whereby Position and Attributes are indicated by the user and a line is drawn on the screen." Helpful, but not the entire story so far.

    The user then indicates "What-" "Line-Attributes". The system responds: "Lines have the attributes of Width and Color." Seeing a palette at the screen bottom, the user presumes that color is chosen directly; touching a shade of gray in the palette causes in the appearance of the Color topic on the menu screen to change; not because the topic is concerned with Color but because the goal of Color has been achieved. Continuing:

    User: "What-" "Start-Point". System: "Use the tablet pen to indicate the position for the line by first touching at the Start-point, then at the End-point." The user then wishes to perform this function, and indicates "Do-" "Line-Position". The next two tablet touches are interpreted, as the tutorial indicated, as the Start-Point and End-Point of the Line. At this stage, both of the Points and the Line-Position have been indicated by the user, and the appearance of each of these menu choices has been changed by the system to indicate they have been achieved as goals (or, more appropriately in this context, sub-goals).

    Nothing further happens at this time. In every Paint-type system I have seen, these two transactions terminate the Line activity by placing the Line into the picture. In the "Do-What-Do" (copyright 1979) system, however, the requirement is for all entailed topics to be specified before any other topics are "achieved." Thus there are two basic ways in which goals are achieved: directly, by "doing", if they are parameter settings; and indirectly, by achieving all of the required sub-goals. In this case, Width is the remaining requirement for the Line to be achieved.

    User: "What-" "Width". System: "The Width is set by two tablet touches, the second Offset from the first by the desired Width of the Line to be drawn." In this case there is mere detail needed to clarify an otherwise obscure and terse sentence. Such clarification resides in consistent fashion inside the knowledge representation, to be revealed upon further questioning by the user, as in "What-" "Offset". The system responds, again with a short animation as well as text, by demonstrating that "After the first touch is made, the tablet pen should be Offset, or moved aside on the tablet, just the distance desired for the Width of the Line." The user then indicates "Do-" "Width"; the next two tablet touches are then interpreted as the Width parameter for Line; the appearance of the word Width on the menu changes to indicate that a valid setting has been achieved; and so does Line since it is a goal which has been achieved. At this stage also, the Line appears somewhere on the screen in the indicated Position with the specified Attributes.

    There are some subtleties in the methods of Do-What-Do. It is important to have many ways in which to achieve a goal, perhaps many alternate input methods and even user-defined methods and further goals. Also, once the system is given a goal, it actively seeks ways to achieve it. If Line-Position were also contained in another boundary with Type-Start-Point and Random-Vector, there would be more than one way to achieve Line-Position. Suppose Line-Attributes were already set from previous interaction; and Random-Vector is provided by the system, meaning that it is a goal which is always achieved. After "DO-" "Line" is indicated, the system is trying to achieve what is entailed by Line by any available method; merely typing two integers on the keyboard to be interpreted as x,y Start-Point would then result in the appearance of the Line.

    The essence of the Do-What-Do method is specifically as follows:

    a) Order of specification of parameters is not pre-determined and fixed; design may emerge from the user in any order.

    b) A fully mixed-initiative on the users part, of "Do-" and "What-", is allowed. The distinction between tutoring and activating is blurred.

    c) Because the system holds a history of transactions, as to whatever topics have been tutored as well as successfully activated (i.e. without error and perhaps repeatedly), it can tutor at later moments in a way adapted to the users previous understanding, and even based on the individual's conceptual style (Pask 1976, Pangaro 1982).

    d) The Do-What-Do system is not based on states; rather, it is goal-oriented. This means that the interconnections represent conditions which interact, and all requirements to achieve goals are easily seen from the displayed structure.

    Recalling 'the advantages of instancing as opposed to state menu systems, note that a further result of the transactions in the previous section is the appearance of a further topic in the heterarchy of the user: the topic which represents the procedure of "Line with specific parameter settings." A further indication at this or a later time to "Do-" "that-there-instance-of-Line" would reproduce the Line at the identical position with identical attributes. If desired, the further system function "Do-What-Did" could clear any specific parameter, say the Start-Point, by the sequence "Do-What-Did" "Start-Point". Then the sequence "Do-" "that-instance-of-Line" and a tablet touch in whichever place would reproduce the Width, Color and End-Point as before but would draw a new Line to the new Start-Point. Reflection on the generality of this will reveal a new meaning for extensible environments in the context of goal-directed systems.


    It has been disturbing to me for some time that popular terms such as "user friendly" and "English language interface" have been so long unchallenged for their imprecision and inaccuracy. I will sketch some early trys at characterizing with clear metrics what these vague terms are attempting to capture.

    A "transaction" is defined to be the minimal actions of the user to invoke a unitary procedure in a serial, digital computer. The transaction is considered "complete" when execution of the associated process is begun, whether or not there are further transactions which the user may perceive as continuous interaction.

    The term "interaction" is reserved for a sequence of transactions which achieves a goal or set of goals of the user. The goal(s) need not, and often is (are) not, clearly known beforehand. Note that the beginning and end of interaction is not clearly defined, although, for specific and local goals, delimiters based on transactions may be useful but not strictly appropriate.

    "Conversation" is duly reserved for the formal development in Pask (1975); the unified framework of objective and subjective transactions provides a mechanical means for measuring that communication at an interface has taken place. The strict interpretation of the paradigm as described above for Do-What-Do in tutoring mode is consistent with this formalism. The essence is that shared understandings between the user and the system can be shown to converge to stable concepts, both for the pre-existing system operations (so-called "tutoring") and the goal structures which are imbedded in the interface (so-called "individualization" ).

    The proposal here is for a "metric of utility" based on the above definitions. It is likely that the most fruitful approach would involve the concept of self-organisation, and such a measure is contemplated. For the present, a measure consistent with information theory is proposed.


    Imagine for simplicity a fixed set of choices on a menu. One crude measure of the number of bits required (in the Shannon sense) is simply the base-two logorithm of the number of choices.

    Presume a menu in a graphically-based system in which there are 16 possible modes to choose from. "DO-" "draw" is chosen by the user. Assume this requires 4 bits of distinction (although a careful ponder on the question "How many bits are required to express one point using a tablet pen in a 1024 x 1024 bit field?" reveals the real limitations of such Information Theoretic measures.)

    Suppose that this pick takes 1 transaction; a single tablet hit on the menu button. Suppose further that this reveals the menu choices of Line, Circle, Rectangle and Blotch modes; picking Line mode is an additional 2 bits with 1 further transaction. Let us say that to set the Line parameters of Start- and End-Points, Width and Color requires an additional 5 transactions which specify roughly 50 bits of data.

    Let us therefore say that relative to this interface and the indicated goal, the "Metric of Utility" is equal to the number of specified bits divided by the number of transactions required; in this case, the ratio of 56 to 7, or 8. This is a crude measure of the overall utility of the system in achieving the stated goal. Matters of processing time and ease of performing the transactions are simple and difficult to quantify, respectively, and are subordinate to the main point.

    Here now is the essence of the matter: Assume that further Lines of the same Width and Color are desired, but in different Positions. It is therefore necessary to backtrack by indicating "Do-What-Did" "Line Position" (4 bits in 2 transactions, say). New Lines will appear when this instance is activated and 2 Points chosen (38 bits in 3 transactions). The new Metric of Utility is 4 + 38 to 2 + 3, or 8.4. On the next occasion when no further backtracking is required, it is simply 38 to 3, or 12.6.

    It is unfair to consider these later values independent of the first; hence a further ratio is derived from the earlier figures, dubbed the "Metric of Individualization", e.g., 8.4 to 8 = 1.05; 12.6 to 8.4 = 1.5.

    When an interface has no memory whatever, this measure must remain at unity. When there is memory, of which parameter memory and extensibility are simple examples, the measure increases in value as the system becomes "individualized." This is a rough measure of the amplification achieved by the features of the system.

    I propose these measures in their primitive form to challenge the development of appropriate and useful metrics, which are an essential but missing component of interface research.


    Arguments for the power of extensible systems are well established; an argument-by-example was constructed in the context of animation environments (Pangaro, Steinberg, Davis and McCann, 1977). It is occasionally noted that the added complexity of training in the use of extensible systems may be too great for naive users. The implication of the metrics suggested above is that without extensibility, the utility of a given system is of constant value, that is, it does not increase with usage. The person may become more accustomed to "thinking along the lines of" the provided functions and hopefully will become "faster" at using the system. However, without extensibility the system cannot be tailored to individual needs.

    Extensibility which does not account for goal-directed activity on many levels of discourse does not allow true extension of the individual into that medium. If I now say to you that the whatsis is missing from the framls because you can't imbed the kinesis, then you will understand me in the context of this paper. Until the whosis goes inside the whatzit the me-go is too slow.


    The following positive results are obtained by application of concepts described in this paper and characterized by "Do-What-Do":

    a) A new power in interface design is revealed by the application of practical approaches to human communication: the incorporation of goals into the interface rather than at it can redefine the meaning behind extensible systems.

    b) There can be unity in the self-tutoring aspects of the system (normally only fudged by "help" commands); hence the User can ask the system "what is this function" in the same vocabulary as "do this function", the vocabulary here contained in the underlying knowledge representation.

    c) The system can track performance history to know how best to present tutorial material based on the known shared vocabulary of user and system.

    d) The system, to be both efficient and responsive to individual users, must be capable of incorporating new definitions. When coupled to capabilities for instancing, the ability to define goal structures, and dynamic seeking to achieve goals, a new meaning for individualization in interaction and interface design is achieved.


    Bolt, R.A. Touch Sensitive Displays, DARPA Report, MIT Architecture Machine Group, March 1977.

    Bolt, R.A. Spatial Data-Management, DARPA Report, MIT Architecture Machine Group, March 1'979.

    Hewitt, C. "Description and theoretical analysis (using schemata) of PLANNER", Report No TR-258, A.I. Laboratory, MIT, 1972

    Pangaro, Steinberg, Davis and McCann. "EOM: A Graphically- Scripted, Simulation-Based Animation System", Architecture Machine Group, MIT, 1977.

    Pangaro, P. "CA S T E: Course Assembly System and Tutorial Environment/A Short History and Description", System Research Limited, Richmond, Surrey, UK, 1981.

    Pangaro, P. "Overview of the CASTE 'AU' System and User Documentation", System Research Limited, Richmond, Surrey, UK 1982.

    Pask, G. Introduction to Chapter 1, "Aspects of Machine Intelligence", Soft Architecture Machines, by Nicholas Negroponte, MIT Press, 1975.

    Pask, G. Conversation Theory, Applications In Education and ..Epistemology, Elsevier, Amsterdam, 1976.

    Pask, G. "A Proto-Language", System Research Limited, Richmond, Surrey, UK, 1979.

    Pask, G. and Pangaro, P. Entailment Meshes as Representations of Knowledge and Learning, Conference in Computers in Education, Cardiff, Wales, 1980.

    Steels, L. "Procedural Attachment", A.I. Memo 543, A.I. Laboratory, MIT, August 1979.

    Sutherland, I. "SKETCHPAD- A Man-Machine Graphical Communication System", AFIPS Conference Proceedings, Spring Joint Computer Conference, 23, 347-353, '1963.


    © Copyright Paul Pangaro 1994 - 2000. All Rights Reserved.