Project
From Gannodss
Contents |
Project Motivation
Movement towards Web 2.0 technologies that utilize meta-data descriptions for web content has necessitated the need for a markup standard that can be used to encode digital documents. Such encodings facilitate several operations including annotation, complex search, and knowledge sharing. The Text Encoding Initiative (TEI) is a consortium devoted to creating a standard for the representation of digital texts for online research. The TEI guidelines consist of an XML schema that is used to markup text documents. While the standard has many benefits, there is a gap between the expertise required to use the notation and the intended user community. Specifically, the intended users are not typically trained in the use of XML and are more likely to use word processing software such as Microsoft Word.
In this project, we are developing a number of specialized add-ins to Microsoft Word 2007 that can be used to encode documents using the TEI standard. The goal of the project is to provide a set of tools that are closer to the habitat of the user community. Specifically, our intention is reduce the learning curve for using markup languages, thus enabling adoption. Long-term, we would like to be able to develop an approach that facilitates creation of code generators for the automated markup tools.
Background
XML
TEI
VBA
Alternatives
The different alternatives that are available for generating TEI encoded documents include using XSLT transformations to convert existing electronic documents and using XML editors to create documents using a template. The former solution requires creation of XSLT transforms by a third-party developer that are then executed by the end-user. The challenge with this approach is that the end-user must trust that the transforms are correct or comprehensive. If the transforms are not correct or are not comprehensive enough, the end-user has no control over the tagging process and must then resort to manually tagging the documents. The latter provides more end-user control, but requires a detailed knowledge of XML, the TEI standard, and XML editors.
Our solution lies in between the two aforementioned techniques. The approach requires development of add-ins and at some point must revert to end-users making atomic markups. However, our approach is user-driven in the sense that end-users make decisions about when and where to apply markups. As such, the approach provides automatic tagging capabilities while also providing the flexibility of low-level tagging. In addition, our approach exists within the context of Microsoft Word 2007, which allows an end-user to markup a scholarly document within an environment that they are likely to be familiar with.
Approach
The approach to this project first started with the information that Word 2007 was now supported on the XML platform structure. The need for a tool that was easy to use and accessible was quickly addressed knowing that Word 2007 was an environment many users are familiar with and that most already have available to them. The ability to use this environment to mark-up a document wouldn't require users to install a software package or need to update various libraries on their computer, having the ability to use Word 2007 is the only requirement.
Taking this new ability of Word, the first consideration was to directly access the XML structure and data that the .docx format provides. This first thought was going to involve the use of LINQ which uses programming in a query style to access the document information needed to the user. LINQ involved many different set-up situations and finding supplemental resources to find out how to use this language were difficult to find. The use of LINQ was something we were considering to be included with a C# program. This had been the other option we had consider pursuing based on the languages ability to have access to a Word document and the information that was provided regarding using C# to create add-ins for Word. The use of C# had better information about how to use various methods and there was an additional package that could be installed into Visual Studio called Visual Studio Tools for Office(VSTO)[1]. VSTO provided extra tools to create a customized toolbar for the Word environment. This approach would allow custom buttons and features to be created, while also creating an easy way to see and access the tools created. While starting to work through some examples to begin the implementation of this method, it was difficult to gain the level of access that was needed to be able to manipulate XML live in a Word document. Also, the ability to get the correct button functionality assigned was challenging since Word documents have active status that need to be continually checked for changes.
While looking through the information regarding the XML structure of a Word document, also referred to as Office Open XML[2], there was a lot of discussion about the use of macros and the ability to directly access the contents of an active document using Visual Basic for Applications (VBA). This information was something that the Microsoft Developer Network had a lot of information about. The information discussing the use of VBA made this approach to the development of this tool in Word 2007 seem very reasonable. The use of this language was already available as a part of Word and this allowed for much easier access to the active parts of the document which was the area of interest to this project. The discussions about the use of VBA and what could be done with it led us to look for more information regarding its capabilities. Several books were available that detailed the uses of VBA in Word. This information gave us a way to create a tool that allowed for the specified mark-ups that were needed and also gave us options for creating automated functions that would reduce the repetitious tagging that occurs within certain literary structures.
The ability to work directly in Microsoft Office provided a lot of options as far as what functionality could be included and how users will be able to gain use of the toolbar and its included functions. I started getting more familiar with VBA by reading a book about using VBA for Microsoft Office 2007[3] which also discussed the use of a tool to download to create a toolbar using an XML structure that is embedded directly into a Word document in it's XML formatting structure. This allowed for a toolbar to be created with needing to download an add-in or have to look for another anywhere else outside of an initial template file that provided all the needed functionality. The creation of the toolbar was a tree structure of XML markup that consists of the tab being the root node and the various groups and buttons as children nodes.
Requirements
Enumeration of Requirements
Google Docs [[4]]
Mockups
Thinkature workspace [[5]]
Wizard Mockups for the Boilerplate and Formatted Text Search [[6]]
Links
Team Members
- Gerald Gannod (Advisor)
- Laura Mandell (Project Consultant)
- Holly Connor (Research Assistant)
