Before attempting to follow these thoughts about requirements, it will be
very helpful to consult the outline example (1.3
MB). Reviewing MS Word's outlining feature, and its ability to toggle
between outline view and normal view, will help make the following much clearer.
It will also be necessary to review this glossary first.
The first Textop subproject to launch will probably be
the Collation Project, because (at least
according to the current plan) it will generate the outline other projects will
use.
It will help to review the screenshot first.
Very generally, this interface allows a contributor to divide a text into chunks and then put the chunks into an outline, which is itself editable. The outline, the chunks found at a node of the outline, the text, and the metadata for a selected chunk might all live in different columns of the Content Contribution Interface. Each column should be collapsable and resizable.
In column 1, the leftmost, is The Outline, although at any time only a small part of it can be displayed. Only node headings, not chunks, are displayed here. The outline is displayed in only one specific language at a time. It should be possible to edit node headings using a click-and-drag interface. Changes to the outline are trackable via a “Recent Changes” log, as in MediaWiki. Since node headings and their positions must be subject to constant change, each node should have a unique ID (usually hidden, of course, from the user). There might, perhaps, be a displayable node history and a forum of some sort attached to the IDs.
In column 2 are The Chunks that are found at the node currently highlighted in column 1. When someone clicks on a node in the outline, then the chunks associated with that node are displayed (this might be just the default behavior). This column might have various useful bits of information for the content contributor, e.g., node history and node forum might be displayed here, as well as an account of what properly falls under the node, or the node's definition. Also, when one clicks on a text reference displayed under a chunk (see outline example), this should (by default?) center the display in the next (third) column, the text, on the part of the text from which the chunk is taken.
In column 3 is The Text; for example, Hobbes’ Leviathan.
Actually, there should be three text displays, editor view, markup
view, and normal view,
but the editor view, which displays the text in an easily readable and
selectable format, is displayed by default in the Content Contribution
Interface. (The other views will be discussed below.) When
an editor selects some part of the text (that has not already been
chunked), it brings up new metadata
fields in the rightmost column. It should be clearly and
elegantly displayed how the text has already been chunked; so, for
example, if
Ch. 14, Para. 3 has been made into a chunk, then that paragraph should
have brackets, or colors, or some other clear "marker" showing that it
has been chunked already. It should also be possible to click on
that "marker" and bring up the metadata for that chunk, as well as the
place in the outline where that chunk currently
resides. Note, however, that it should be possible to file the
same chunk, and overlapping parts of the text (i.e., chunks that share
some but not all sentences), in different parts of the outline.
This must be borne in mind in designing how markers
are displayed. We might make an upper limit on the number of
places under which any sentence can be filed (e.g., three). Also,
there should be a Recent Changes function for the text, which displays
a list of chunks that have been created, changed, or deleted
recently. Note that, probably, a chunk should not be considered created unless it has a specified function and summary (at least).
Bear in mind that contributors should be able to change what
sentences are included in a particular chunk. For example, a
chunk might begin life consisting of just the first sentence of a
paragraph; then someone might drag the second and third sentences of
the paragraph into the chunk. Moreover, we might want to say that
the
same set of sentences can be assigned to different parts of the outline
under different summaries (that, at least, has proven to be an
occasionally useful thing to do in the outline of the
Leviathan)--unless we want to say, as we might, that the same chunk
might have different summaries, and it is the summaries that
are filed
under a node, and not chunks. Further thought is necessary here. At the very least, though, we can say that chunks
need to be editable, in which case a unique identifier of a chunk needs
to be created. The chunk's function and summary are then
associated with that unique identifier, rather than with some
sentences; in that case, which sentences are assigned to a chunk is
itself a piece of metadata.
In column 4, the rightmost, are The Metadata. Under the
current conception of the tool, the
contributor selects part of the text, and upon doing so, the software
generates a set of metadata fields attached to the selected
sentences.
Technically speaking, the selected sentences are not stored, but
rather, a unique identifier of the sentences, based on the markup of
the text; this allows corresponding sentences in other languages to be
displayed. Then some metadata fields pop up in
column 4, and the contributor specifies the function of the
selection (argument, explanation, description, etc.) and summarizes it
in a sentence. (Again, see outline example.)
The software automatically generates the text reference based on what
text is selected (e.g.: "Hobbes, Lev XXVII 3"). Finally, the
contributor saves the metadata and drags an icon, representing
the chunk (including its metadata), from the rightmost (fourth)
column to the leftmost (first) column, dropping it into the outline at
the appropriate place(s). Note that it should be easy to move chunks
around from node to node in the second column, and that the same chunk should be
able to exist in several places (but perhaps a limited number of places) in the
outline.
It's possible that, instead of making a separate (fourth) column for the metadata, each chunk's metadata should be displayed in some sort of popup or hovering JavaScript bubble.
The initial version of a text used in the system should be whatever
scholars regard as the canonical version, in its original
language. Other editions and translations must be includable in
the database, marked up according to the same convention, so that the
markup schemes for a text and for all its editions are translations are identical
(or mappable, anyway);
so, for example, if there is some tag identifying a sentence in the
original, there must be an identical tag in an isomorphic position identifying a corresponding sentence in the translation.
So there needs to be a markup interface, displayed to people working "behind the scenes" to mark up the text with tags indicating parts, chapters, paragraphs, and so forth--in other words, marking the structure, locations, and interrelations of the parts of the text. Some of this markup should be generated automatically, but it will no doubt need to be edited by hand. Then, when a translation is added, it is linked to the original edition and marked up automatically using the original markup as a template. This will make it possible for people viewing the outline and the chunks in it to toggle back and forth between languages. This will have many nice effects; for one thing, it will make it possible to display a text in some desired language even if there is no summary of the chunk, yet, in that language.
For purposes of marking up translations of a text, there should be
two mutually-tracking columns, the first containing the original text
and its markup, the second containing the translation and whatever
automatically-generated markup was added to it. A markup editor
then scrolls through the translation, with the original text window
scrolling along (by default) to the same positions according to the
default markup, and the markup person (who reads both languages)
does copyediting, making sure that the same tags are used for corresponding chapters,
paragraphs, and sentences. (Given the vagaries of translation,
whole sentences will probably have to be the smallest level of markup
granularity.) Finally, a utility should check that the tagging
scheme of the translation is identical to the original. It should
not be possible to start chunking any
text, original or translation, until the markup is perfect--since the
rest of the software requires that text locations are
set and not changeable.
All parts of all interfaces should be displayable in about a
dozen common languages to begin with. But this means that there have to be
tools for closely comparing and translating text from one language
to another. In what follows we will briefly describe some of these tools.
It may be helpful see internationalization for relevant
policy considerations. As argued there, it will probably be necessary for there to be one "master
version" of the outline
(but not necessarily of all other content), in English. The reason
for this is that, without a master version, there will be one outline
per language, which will not achieve the remarkable and unprecedented
benefits of tearing down language barriers. Remember that outline
nodes will have unique identifiers that are independent of position or
header wording. This
means that the same node can be assigned multiple translations, and that the
reordering and renaming of nodes in the English master version will not
"break" versions in other languages.
So, first, there should be an outline translation utility. The basic functionality of this utility is that it should display the English adjacent to the target language. It should also track and display nodes according to a triage system: new and higher-level added and changed nodes are placed in a queue for translation first; minor changes at lower levels are placed further down in the queue. As long as the queue can be worked on by multiple people interchangeably, as much as this system should, it seems that, with enough volunteer translators, the outline can be kept up-to-date in all languages nearly simultaneously.
Second, consider that, regarding the Collation Project, there is a fundamental decision to be made whether different
languages will have all the same chunks, or instead whether the same text will
be chunked differently for different languages. Given consistent markup, as described above,
different languages can have the same chunking. As argued in internationalization, probably, all
languages should have the same chunking.
In that case, clearly, the way a text is chunked should depend on how
it is chunked in the original language. And in that case, it
would be helpful to have a tool that allowed translators to compare the
same chunk in different languages, displaying the chunk in both
languages, the chunk summary in the original language, and a prompt for
a translation into the target language. As with the outline
translation utility, untranslated summaries (or summaries changed in
the original language) should be placed in order of priority, or
triaged. But note that this is not strictly necessary--one could
go through a text in the target language without consulting the
original, if (as seems likely) there is no reason for summaries to be
translated rather than written from scratch.
Third, there should be a utility that compares markup of different translations of the same text, so that the markup is isomorphic. The outlining of a translation should not be permitted to proceed until the markup has been proven to be isomorphic to the original.
The Display Interface is the one presented to the user, not the contributor, e.g., person who is using Textop to do a comparative study of texts.
The user is likely to use search as much as anything. It should be possible to select what to search: the outline headings, the summaries, the sentences, and/or the other metadata; and selected subsets of these (e.g., 18th century philosophical texts in French; only arguments; only one particular text). How search results are displayed might or might not differ depending on the type of search; certainly in any case search should be configurable. One main search result display would first place results in the context of the outline, and would be multi-step, as following. First, in the left-hand column, after the user types in search terms, the search results appear below the search box. Second, one can click on a result and then the parts of the outline where the result is found appears in the next column. The outline should be displayed in such a way that a large number of results are displayed as part of the same outline, with non-matching nodes being collapsed and invisible. Third, the user clicks a node and in the third column the chunks associated with that node are displayed. Alternatively, if texts were included in the search, the third column automatically centers the displayed chunks on the first matching text. But perhaps this method of search would be most appropriate for searching the outline itself.
Another type of search method would first display lists of matching summaries or of whole chunks. Then the user could click on a summary or a chunk, and then its "first" location in the outline would be displayed, with its surrounding outline context. At the same time, the context of the summary or chunk in the text it is taken from would be displayed.
Yet another type of search method would, after many texts had been added to the database, display matching texts, based on whether the search terms are found in their summaries. Then the user might open up a given text and, at the same time, be shown a list of places where the search terms are found in the text, together with a summary of the chunks in which the search terms are found.
As you can see, this whole system, while complex, would offer tremendous and
unprecedented text searching power. If well designed, it is the search
function that would make Textop an invaluable research tool. And none of
this is to mention the possibilities inherent in the semantic markup of
classic texts--and Textop would be a natural venue for such markup to be done.
Sometimes users will simply want to drill down through the outline to find topics they are interested in, particularly when they don't have a name or a clear idea of what they should be searching for. The basic functionality here is that one clicks one part of a header, such as a plus sign at its left side, to view the children nodes of a node, and one clicks on the header text itself to view (in the next column over) the chunks that live at that node. It should also be possible to collapse all chunks and view just the summaries and sources.
As there will be many thousands if not millions of outline headers, obviously not all of the outline will be able to be displayed at once. Hence, the outline browser should probably re-center outline based on the most recent click, and (at the user's option) close open nodes that are not parents or children of the current node. Furthermore, the user might for simplicity's sake want to hide all nodes except a given node (which might be buried deep) and its children.
Furthermore, as with the Digital Universe's Universal Navigator, the user should be able to select "bottom row favorites," except that in this case they will be top level (or left side) favorites. In either case, in a large system, one needs to be able to compile a list of favorites just as one can compile favorites in a Web browser.
Note also that special browsing tools might be associated with the part of the outline that takes the form of a chronology or timeline, although in principle this doesn't seem to be any different from the rest of the outline.
Another handy way to browse the database would be to view all the chunk summaries for a single given text. Since the text would have already been marked up into parts, chapters, and so forth, a handy display tool could be written that would automatically compile a summary of the entire text. Students would have free "Cliff Notes" for a huge number of public domain texts, not just the usual suspects.
Summary browsing might be particularly useful for proprietary texts that the project has summarized, but the chunks of which are not viewable: at least the summary would be viewable, which would help scholars to determine if buying or otherwise obtaining the text is worthwhile.
The display interface should make it easy to switch languages. Having identically-marked up translations will make it possible for the right chunks to appear under the right nodes, regardless of what language the user has chosen as default, and regardless of what language the original text was written in.
Note that, as long as a translation of a text has been marked up, even if summaries have not been written for the translation, it will be possible to display the sentences themselves in the proper part of the outline. And that could be useful. But, while the outline is growing, there are bound to be parts of the outline, the summaries, and chunks, depending on the language displayed, that are available in one language and not others. The user, therefore, should be given a choice of languages. For example, it should be possible for a user to tell the system to display French if available, and failing that English, and failing that German. Moreover, this needs to be true of every part of the system. So if a newly-created part of the outline itself is available only in English, and a user's default language is French, then those node headers should be displayed in English. But for clarity it should be obvious from a glance what is available only in a non-preferred language; for example, the English-only node header would be (say) colored red for a Parisian.
So this tool is about the creation, search, and display of texts analyzed into chunks and organized into an outline, but also viewable in their original form. So what kind of tool is that? It's a tool for collating texts and viewing the results.
Note that, for the sake of usability, it might be convenient to display the same types of data always in the same column. In fact, instead of hard-wiring particular "displays," as the above might be taken to imply, one might instead want to let the user have all possible columns displayed at the same time (at least in principle), opening and closing columns as needed or not. This way the display is maximally user-friendly because designed by the user. Some users, for example, might want to search through the outline when they are summarizing a text; others might find the search column unhelpful while summarizing, because they don't put their chunks immediately into the outline while summarizing.
Also note that scholars may well wish to include several different same-language editions of the same work in the database. This is important because bits of text in one edition sometimes do not appear in another edition. The implications of this for the system are not examined above, but obviously it would complicate matters even further.
The discussion above concerns the requirements only for the Collation Project. According to our current reasoning, since it is that project that will generate the outline used by the other projects, the Collation Project will be launched first, and the other projects will be launched only after the Collation project is well under way.
It is worth thinking now about how the other projects might make use of the software framework described above.
It seems clear that for the Analytical Dictionary Project as well as the Event Summary Project, the additions could be relatively minimal. The system would of course have to distinguish between a text chunk, a dictionary entry, and an event summary, but they would all be assigned to particular nodes. Beyond that, a dictionary entry and an event summary might essentially work like wiki pages.
But we need only to look at the special requirements of the fourth planned project, the Debate Guide Project, to be impelled to think twice about setting up just a simple text editor or wiki. New projects usually inherently carry with them new requirements and significant system changes. In the case of the debate guide, the differences are obvious. A debate guide will set opposing arguments on specific questions side-by-side. It might also allow the guide writers to elaborate points and sub-points, so that the entire debate guide can take an outline form. Sub-questions might be linked to particular nodes. Note that if a node at which a particular debate guide entry lived were deleted, then the software should prompt the person editing the outline to assign the debate guide to another entry, or it should in some other way make sure that people are not disoriented by broken and missing links. The point is that adding new functionality to the original Collation Project system will complicate the original system, so coders building the system in the first place should be aware of the necessity of making the code maximally extensible.
If we allow ourselves to think freely about the ideal system for the Analytical Dictionary Project, then we might produce a whole set of different software proposals. A simple wiki page would be only one. Another would be a collaborative wiki-like database, in which, under a given concept heading, such as "affection," the contributor would be prompted for different pre-defined classes of data, such as words, idioms, slang, and jargon; then, for each of these, what distinguishes them from other items that fall under the heading, what connotations they have, their cultural associations, some representative quotations, and so forth. Then project designers could massage this data into something very interesting and usable.
On second thought, the Event Summary Project, too, would seem to have some obvious special requirements. To make an event summary maximally useful to the reader, there should be some mechanism whereby the latest-added news is displayable somehow: the reader might, for instance, be able to highlight edits that were made since last visit, or to aggregate them somehow at the top of an article for a quick update on a developing story. But event summarizers (the contributors) would, then, want to distinguish between things like mere copyedits and rewordings, of relatively old news, from substantive new additions. So there should be a way to mark, or otherwise generate information about, the difference between a "copyedit" and a "substantive addition." Furthermore, since event summaries concern ongoing, complex events, which are not neatly distinguished into nicely separate bundles in advance, there needs to be a way to separate a single summary into others: what the best way to achieve that is not obvious.
We could, simply because it seemed convenient, use a wiki for all of these projects. But participants and users and the world at large will thank us if we create specially-designed tools that achieve the very specific requirements of the tasks we set out to accomplish.
As you can see, we're talking about a very complex piece of software. Coders, are you up to the challenge?