DoxPara Research
01-Aug-1998 / Dan Kaminsky Cluehunting: A Proposal Regarding The Intelligent Use of Available Data In The User Interface, Version 1.0>
Where Cluehunting Comes From

(Editor´s Note: I have received an impressive and incredible amount of email regarding cluehunting, and I thank everybody who mailed me. Much of the text here needs to be rewritten to accommodate the lucid and honestly surprising quantity and quality of research people put into advancing this proposal. Some stuff regarding the true history of cluehunting does need to be modified.. Bear with me.) Cluehunting is an advanced Expansion Agent, defined as a system that allows the computer to search possible "expansions" throughout given contexts given a "clue" by the user. Clues are defined as segments of data(type irrelevant) that the computer would be able to utilize to predict the final contents of the user's intention. Expansions are the presumed intentions of the user. Finally, contexts are the "search space" that is being scanned--the file system context, the launcher context, or even a thesaurus/spell check context are all valid options.

It would be completely unfair to describe cluehunting as a totally original concept--it stands, if you will, on the shoulders of giants. Tab-completion is the oldie, and as far as I know originated with the Unix shell tcsh, though it's also a hidden option in the NT command shell. This technology is quite file-system specific: Enter as much as you know about a path, starting with the root, and tab complete will expand what you type to fit. For example--enter /usr/home/eff and hit tab, and you will be given the first entry in /usr/home/ that begins with "eff". Some limited regular expressions are allowed--for example, if I'm in the directory /usr/home/effugas and type unzip *.zip, I will be able to tab through each zip file in my home directory. Very slick.

Tab Completion is nice, but it has it's flaws. First of all, tab has become the de facto standard for "advance to next field" in GUIs, and there's no way I want to get rid of one of the best keyboard timesavers in existence. Secondly, it searches files and only files. There are other search contexts that should be hit. Finally, tab-complete provides no way to expand into anything but a single entry--what if the didn't want just one of the group, what if the user wanted to expand into all entries that fit the given form? In other words, instead of just one zip, all of *.zip was inserted? Would be logical in a number of situations.

Tab Complete's newborn sibling, Autocomplete, was a web browser innovation that began at the much maligned UI shop known as Microsoft and was later adopted by Netscape for its Communicator browser. (To be as fair as possible, the emacs editor includes substantial autocomplete facilities. I am referring here to the fact that this was the first implementation of autocomplete for ordinary users, and as far as I know was the first implementation among the thousands of Windows apps over the last few years.) As Microsoft integrated Internet Explorer and Windows Explorer, both the Run Dialog and the Web Open Dialog possess Autocomplete functionality. (Actually, Microsoft Word will also Autocomplete anything you type that is related to a few known categories, i.e. date, author name, etc., but I'll deal with this later.) So what does this bring to the table? Well, we see the beginnings of clue contexts showing up here, since at first glance it appears that the run menu will autocomplete files and the web browser will autocomplete web sites. But these are both searches of the same clue context--the history context, in which things that have been typed before are called back to be expanded back into reality. And how does Autocomplete expand entries? In the middle of typing, inverted text will appear containing the contents of what the computer is guessing the user is trying to get at. This text will only appear to the next valid level--http:// will expand to http://www.best.com, but it will not expand to http://www.best.com/~effugas nor
http://www.best.com/~effugas/Personal/SILC/silc.html. There's no way to really scroll through possible entries in this history-based autocomplete--the first thing that matches will be matched to its first level, and that's all you get.(Ed Note: Holding shift and arrow down lets you scroll through possible autocompletes on Netscape.) Worse, sometimes a delay in typing is required to simply trigger an autocomplete. Still, this functionality is total joy, even with all of its warts.

What´s New In Cluehunting

Cluehunting specifies the following advancements beyond present-day expansion technology:

  1. Universal Expansions
  2. Inputstream Aware Expansion Styles
  3. Application-Dependant Clue Contexts
  4. Clue Context Overrides
  5. Pluggable Context Servers
  6. Regular Expressions
  7. Batch Expansions
  8. Cluelists
  9. General Accessability

Definitions help, of course:

  1. Univeral Expansions: Expansion should be available in all interface components. The primary limitation of present expansion methods is that can't really be available everywhere. Cluehunting is designed to allow every interface construct to read the intentions of the user. It is the purpose of the next nine points to make sure that this works, and works well.
  2. Inputstream Aware Expansion Styles: Segmented streams of input data ought to implement commanded expansion, while unified inputstreams may take advantage of automatic expansion. A little background is going to be necessary to understand this. First, You can't outclass something you can't recognize the class of. That being said, lets talk about Microsoft's UI department. Take Microsoft Word 95/97. Red and green spelling and grammar warning underlines are excellent interface components. They're unobtrusive enough to ignore in the heat of thought, yet available enough to make it difficult to miss misspelled or inappropriate words. I miss them any time I type in anything else. They enhance the feedback loop of the inputstream. The inputstream is defined as the flow of commands from the user to the computer as well as any information fed back along the same channels as the input--for example, a clock in the lower right hand corner is not part of the inputstream, but the characters that pop up in response to the corresponding character being pressed on the keyboard is. What does not work in Word, however, is Autocomplete. When I type Dan, I'm not always talking about myself, and when I type August, I'm not always talking about the present date. I don't want to have to interrupt my stream of thought to correct Word--my concepts are segmented into words from sentences, paragraphs, and full documents. This contrasts sharply with the very appropriate and useful usage of autocomplete for web sites, which have addresses that are single-phrase and thus unified. Therefore, while Word, and any other segmented inputstream receivers ought to require a key to be pressed before the phrase is expanded(though a graphical hint like a different cursor would help), Netscape should attempt to expand automatically. NOTE: Research is required to make sure this inconsistency does not overly confuse users. It is very possible that automatically triggering an expansion in unified instances but delaying expansions in segmented cases is utterly confusing to users. In this case, I'd lean towards an completely delayed expansion interface.
  3. Application Dependant Clue Contexts: Applications should search multiple clue contexts appropriate to the active application context. Strange words coming from someone who worships consistency in user interfaces, but I really think this is necessary. Applications generate context, and all clues should not expand from some single chosen source. For example: Suppose I enter the word "liffe" into a word processor. The ideal word processor would notify the user immediately and non-intrusively that the word was mispelled. Obviously, the appropriate clue context for a misspelled word is to search through alternative correct spellings. Multiple presses of the Continue Cluehunt keybinding would search through multiple alternative spellings, until the user chose to press either the Cancel Cluehunt keybinding(probably Escape) to revert to the misspelled form or to press the Cluehunt Successful keybinding(probably Enter). The user could, of course, reselect the correctly spelled word, and this time search through the default context for a correctly spelled word: the thesaurus. So, life would be replaced with various synonyms--or, the thesaurus dialog could come up to provide a multidimensional search between life-as-vocation, life-as-socialness, or life-as-complete-lack-thereof. All that cluehunting specifies is a precondition and a postcondition--dialogs do not violate this. It would be preferable if these weren't modal dialogs, however--it is rarely appropriate for the user to be locked out of his or her document.
  4. Pluggable Context Indexes: Clue contexts, either attached to an application or independant, should register themself with a central index. This index of clue contexts would be categorized either by type or by owner application, would have MRU(most recently used) lists, and would be reconfigurable by the user.
  5. Clue Context Overrides: The user should be able to specify a specific clue context to expand from, in either a proactive manner or a reactive manner. Despite the fact that applications often have context that make sense, there are times when the user has another context in mind. For example, the user should be able to access the Thesaurus context while saving a file, or the filesystem context while documenting an application, or the web history context while creating a web page of links. This would be implemented with a Set Clue Context keybinding which would modify the present word's clue context--a reactive override. If the user had not yet typed a word, the next word would be the recipient of the entered context--this would be a proactive override. Contexts would be registered upon install as per the plug-in clue context interface, and manipulatable via a replacable dialogs. Most probably, some degree of categorization would be appropriate, as well as expansion on the clue context type itself. (In other words, a box would be given, and you'd type in Th and Thesaurus might come up). Of course, common clue contexts should be automatically recognized. A user typing in a path in any application, for example, should usually first trigger the file system history context, and then the literal file system search context. Similar results should await a user typing http://. However, there is an advantage to being able to select a context. By selecting the Execute Command context, the user could load any app directly from within any other app and have the stdout reply be pasted at the cursor. Much like ircii's /exec command, this would allow the contents of, say, an ls to be directly pasted at the cursor. Quite nice.
  6. Regular Expressions: Regular Expressions should be available for usage in clue expansions. Many users are familiar with using * to signify a wildcard. While the default expansion would, in general, presume a * at the end of the provided clue and expand from there, there is no reason this is necessary. A user searching for dictionary words that end with "sort" should be able to expand *sort into resort, consort, and plain old sort. The only problem--how to differentiate between a clue containing a regex for search purposes(execute context for ls -l *.gz) versus a clue that wants its regex expanded before search(command history context for ls -l *.gz). It's quite probable that most contexts will only fit one or the other, but I'm unsure. Email me if you think that a specific "begin regex" keybinding would be necessary.
  7. Batch Expansions: All entries that fit the provided clue should be available for simultaneous expansion. Through an "expand all" keybinding, the contents of all clues that fit the given context should be pasted at the cursor. This facilitates things such as "gunzip *.gz" being expanded into a list of all files to be gunzipped, allowing the user to make sure the shell was expanding the list correctly, among other uses.
  8. Cluelists: All entries that fit the provided clue should be listable in a multiselectable sortable dialog. In same ways, a basic version of this is part of Microsoft Word 97: Right click on a misspelled word and note the four or five alternate correct spellings right there in front of you. Most GUI web browsers also allow you to search the typed-in history by clicking on the down arrow at the far right of the entry bar. Cluelists extend this behavior by allowing the user a listmode or detailsmode(more windowspeak, so shoot me) interface to select between multiple options for expansion. Suppose the user wants to gunzip a couple of his or her .gz files. Simply typing gunzip *.gz inside of a cluehunt-enabled xterm and pressing the "cluelist" keybinding would generate a window containing a list of all files ending in ".gz". Then, the user would control-click or shift-click the specific gzipped files desired to be expanded, press OK, and hit enter to cause those files to be gunzipped.
  9. General Accessibility: All capabilities of cluehunting must be accessible by mouse as well as by keyboard. It is critical that Cluehunting be part of a self-documenting interface, defined as an interface that bolsters the user's understanding and mapping of available options. One major way to make an interface self-documenting is to provide multiple paths to the same destination that reference eachother. Right-clicking on a batch of text should either bring up a single menu item containing "cluehunt" or a list of all the cluehunting options directly in the root right-click--research will be necessary to see which is preferable. Now, of course, each entry in the right-click menu would contain the keyboard shortcut right-justified, and the corresponding shortcut would be listed in the keybox(dev-note: Will be explained in upcoming proposal). Pretty slick.
Default Cluehunting Keybindings

(Editor´s Note: I have some association with the GNOME project, which hopefully will end up creating a world class User Interface for Linux and other Unix systems. Nothing official, anymore.) Well, I'll be blunt: We're still working on a default keyspace for GNOME compliant apps. However, the following are a preliminary set of keybindings for cluehunting:

          + Cluehunt Forwards:  Alt-Shift-Right Arrow
          + Cluehunt Backwards: Alt-Shift-Left Arrow
          + Accept Cluehunt:  Anything that moves the cursor.  Enter has
            its functionality modified to not clear the contents of the
            expansion.
          + Reject Cluehunt:  Esc
          + Expand All:  Alt-Shift-Enter
          + Scroll Through Cluelist:  Alt-Shift-Up and Down.
The Future Of Cluehunting

Cluehunting is a developed proposal, but it's still in development. Research will be needed to check for areas of confusion and functionality. Still to be determined:

  1. How to notify the user that the existing text is expandable via a cluehunt? Different cursors, different text colors, a note in the title bar...?
  2. How to implement cluehunting? One possible way is to simply have a directory structure that corresponds to individual clue contexts and contains standard stdin/stdout apps that take in the appropriate segment and spit out a return value. Implementation isn't that much of an issue, though--possibility is more relevant than methodology.
Access Archives
Mission
DoxPara Research exists as a repository for information security analysis, UI theory, and the miscellaneous writings of its founder, Dan Kaminsky.

Authorship

Writings
ZapMail Redux
RFID Security
The Absentee SIGGRAPH 2002 Review
Deaf and Dumb: A Critique
Speech Vs. Vision
Why Most Albums Suck
Tracing Smart Fridges
Password Rejected
Trinity Redux
Thoughts On Secure Deletion in 2001: Part 1
Thoughts On Secure Deletion in 2001: Part 2
On The Nature Of Data Shredding
Cryptography Doesn't Save Napster, and The War Over Parodies
Passfaces: An Intriguing Way To Authenticate
BugTRAQ-- Re: Security Hole in Win2K's FTP server

Security and Networking
Insecurity By Design: The Unforseen Consequences Of Login Script
TCP Chorusing in the Windows9x TCP/IP Stack
Vectorcast

Editorials
Core Competencies: Why Open Source Is The Optimum Economic Paradigm For Software
Mandatory Registration: Bad Business

User Interface Proposals
Analogous Key Arrays
Cluehunting