Information Floes & Eddies: 2008

Monday, June 16, 2008

The Next Version of MarkMail

The Mark Logic User Conference closed out on an amazing high note. Ian Small, Senior Vice President and General Manager of MarkMail, gave an exciting preview of the next iteration of that product. If you are unfamiliar with MarkMail, it is an archive of public mailing lists that has been enhanced by Mark Logic engineers with superb retrieval and analytical functions. The next iteration of MarkMail will apply those strengths to private mailing lists and even to an individual's private email collection. This is so clearly what Microsoft ought to have had the foresight to do with Outlook and that failure could well cost them. Frankly, I would drop Outlook without a moment's hesitation, if I were able to use this application instead to slice and dice the hundreds of emails that flow through my inbox.

I will be keeping an eye on the MarkMail blog, expecting progress reports. (When I asked how soon I, as a member of the general public, might be able to gain access to the service, Ian would only give me and the rest of the audience a cryptic "soon".)

This session was the highlight for me of the Mark Logic User Conference. Even now, four days after watching Ian demo the product, I can still feel the excitement. Such a cool tool!

Friday, June 13, 2008

Emerging Themes

This morning the speakers that I have heard have been integrating the threads that have emerged through the past two days of discussion as they apply to their own presentations. Lisa Bos of Really Strategies, Inc. pointed out two such themes in the day's first presentation, the first of which is that not all content is well-suited to the relational database model and the second of which was that quality metadata is what drives robust applications. Search by itself is inadequate.

What I am hearing as an emergent concern is that the shift required in moving to an XML environment on a content level is more at this point about converting cultural attitudes than about technological change. In the modern age, human beings are used to being presented with content in a fixed, linear form. With the adoption of XML, there is the flexibility to re-form content for delivery in a way that divorces it from that fixed, linear narrative. The traditional thinking about content is alarmed by that concept. Context is absolutely critical to understanding a piece of content and we do not want to lose that. But the ability to see pieces of content outside of an original document (whether as a search snippet or as an thumbnail in a gallery view or as an abstract shape in a visualization tool) is critical for purposes of allowing users to browse online. We are in fact learning how to browse content in new ways. Instead of flipping pages back and forth in a book, stopping at an interesting paragraph appearing in a specific chapter, we see and grasp the significance of a piece of content online through different viewing options. XML makes that feasible in a digital environment. But changing the mindset of providers and consumers of content in accepting that this is an equally valid mechanism for absorbing and retaining information is the most difficult challenge.

Thursday, June 12, 2008

Twitter at the Conference

Follow the tweets pouring out from this conference through this page. Marvelous way of viewing audience response to the formal presentations.

Chris Linblad, Founder and CTO of Mark Logic Speaks

Just three bullet points from the interview Andy Feit, Senior VP Of Marketing, just finished with Chris Linblad, Founder and CTO of Mark Logic.

How did you come up with the name, Mark Logic? What names did you discard? Ceriscent was the original name of the company (based on the cerisse, a dramatic shade of red), but the investors hated that name. They struggled with it; finally at the end of a very long meeting, someone asked in a humorous vein, "Why don't you just call it Mark?" Chris said he had wanted to use Mark Schema as the corporate name, "...That's how geeky I am". They finally settled on Mark Logic.

What will Mark Logic look like five years from now? He would like to see Mark Logic as a mainstream technology. Those attending this user conference are, in his eyes, early adopters in pushing the envelope with what can be done with content. He'd like such applications to be less remarkable in five years.

What do you think is the biggest feature in the forthcoming version 4.0 of Mark Logic Server for content providers and developers? Chris says that's like asking a parent to say which is his favorite kid. "I don't have a favorite, but I want them all to play nicely with one another. Integration is key."

What Would You Do If You Were The Publisher?

That's the question Elsevier is asking users to answer. How might the display of scientific articles be enhanced or improved? There will be more information over this summer about a contest they will be sponsoring (go here for the information that is currently available). If you are clever with XQuery as a programming language, you have an opportunity to possibly direct the future of STM publishing. Very cool!

The Importance of Customization

I listened to two presentations this morning from publishers (one commercial, one a scholarly society) as they discussed the ways in which they are leveraging the strength of Mark Logic's tools. Both services were designed to satisfy specific populations. In the instance of the commercial publisher, the provider was developing an information service to serve industry analysts associated with brokerage and investment firms. In the instance of the scholarly society, the service was designed to satisfy an equally specialized population, exploratory geophysicists. These are the populations that can not be satisfied with just a "good enough" answer. These populations need authoritative information, augmented by analytical tools.

In previous years, both of these populations would have been supported by systems that forced the searcher to run a query using the same interface, regardless of the difference in the nature of the content being sought. Very like Henry Ford's old line about buying a Model-T -- "You can have any color you want as long as it is black." There was no customization of display or rendering of content specifically suited to a community or profession. In the current environment, it seems that this is one of the primary ways in which publishers are trying to tailor their product to satisfy the need. In the case of Platts, a McGraw-Hill B2B information service, the results generated from a search include not just the full text of an article or report, but additional links to relevant pricing data, regional information and additional relevant content from McGraw Hill publications that the user may not have thought to consider. In the case of the American Institute of Physics, the system brought together the content from more than half-dozen publishers to enable a unified search that even allowed searches to be run according to a variety of mark-up languages. That's important because mathematical equations are currently being rendered in several different mark-up languages, each with its own population of supporters who need to work in that particular rendering. Such differences may seem minor, but are indicative of how different user populations approach an information query, discover relevant content and interact or re-use that content.

Users will be better served by this style of customization. Mark Logic's contribution is to provide the technology in support of that publishing need.

McAfee and the Search Experience

Andrew McAfee, Professor, Harvard Business School and noted advocate of Enterprise 2.0, offered an interesting example of information seeking.from his personal experience. Searching for a known item on a brand-name information service and on Google scholar, he was amazed to learn that he was more successful in his objective on Google Scholar than on the brand-name information platform. His point was that the current user experience with regard to information retrieval was incredibly frustrating and that the alternatives provided by organizations like Google or Wikipedia will inevitably be more successful over the long term because they offer a more satisfactory user experience.

McAfee suggests that expert, authoritative, and perfect are slippery concepts. He agrees that “Good enough” is not necessarily appropriate for all information queries; one might want a cardiologist to only accept the best answer. Name-brand information providers need to understand that improving the user experience is imperative. Google forgives his stupidity; his query structure need not be perfect to retrieve an answer. Google offers convenience in the sense of not requiring a sign-on or any form of authentication. Google can offer the power of serendipitous discovery. He sees the dynamism of Google's changing results as a positive thing as in fact information is not necessarily static.

The overwhelming message from McAfee was that whether one was discussing search on the local intranet, search on an advanced information service platform, or search for a simple information need, the current approaches in use by providers are failing the user. Rising populations of knowledge workers who have grown up with the Web and Google will not be satisfied with the current offerings and as McAfee points out, will walk with their feet.

Dave Kellogg has fantastic coverage of McAfee's talk here.

Wednesday, June 11, 2008

Corporate Approaches to Managing Documents

JetBlue, an airline less than ten years in existence, is part of a highly regulated, highly complex industry. As with any highly regulated industry, the demands within JetBlue for documentation of policies and procedures are intense. Murry Christenson spoke about their current project to enable knowledge workers (not necessarily technical engineers) to use standard office software tools to contribute to a highly structured, specialized XML content management system. As Christenson said, the problem was as much a social problem (changing a corporate mindset from publishing hefty manuals to documenting a process on a more granular staff level). His presentation outlined the organizational modifications envisioned, more of a template of how to manage such a shift in approach.

I will have to return to this presentation; right now I must get ready to blog the final panel segment of today's program.

Using Aquabrowser in Conjunction with MarkLogic Server

Taco Ekkel, Director of Development, MediaLab Solutions BV delivered a lively presentation about the Aquabowser visualization tool. This tool has received popular acceptance both in Europe and in the United States as a navigational support in library catalog environments.

If you want to see this tool (and for visualization tools, actually *seeing* it is important), you can run a search or two on http://lens.lib.uchicago.edu. It is the product of MediaLabs, a small company that was acquired in 2007 by Bowker. The system allows users to indicate the context of their search by selecting terms from a dynamically generated word cloud. The system is also useful for purposes of faceted navigation via the refinement of a search.

Customers can use Aquabrowser as an integrated search and discovery interface on top of MarkLogic (see March press release about the partnership here). The partnership between these two organizations enables an out of the box solution, freeing clients from having to independently create a search engine and interface for their product. There is a simple interoperability between the two systems and a relatively short time frame for implementation. Aquabrowser draws on both the Lexicon functionality and the Thesaurus functionality in Marklogic 3.1. The presence of such a proven product as a layer overtop a publisher's content seems to increase usage and visibility of content, based on usage reports from the library environment.

It's useful to see how partnerships like this leverage the strength of both systems to support user success.

Building for Content Syndication.

Tom Masciovecchio, Director of Publishing Systems, Simon and Schuster discussed building a system for the purpose of content syndication. He noted that trade publishers have different needs for their product when compared with STM publishers, fewer opportunities to re-use and re-purpose content chunks.

That in itself is an important consideration for publishers in approaching the shift to XML. Andy Feit had said earlier that Mark Logic wanted to help publishers of content maximize the value of their content. What did the publisher hope to achieve? Simon and Schuster, in dealing with ordinary books actually have fewer opportunities for re-using chunks of their information. If I understood the speaker's meaning correctly, they recognize the potential for syndication but they are still in the early stages of forming the strategy and are moving primarily to make themselves ready at the right moment. (Note: that's my interpretation of his comments, not statements made by that speaker.) They do realize that they must adapt in order to keep their authors satisfied.

Their focus has been on developing a digital warehouse that would house three key elements of the content (the book content itself, the bibliographic information associated with that specific book title, and most importantly for trade publishing, key rights information.)

I think I had encountered the meat of Tom's presentation at another conference in terms of the S&S rationale for building their digital warehouse. What I hadn't realized or heard elsewhere was the process by which they are transforming their book content into digital XML form. They create PDF files of the book content which are then OCR'ed and the OCR is subsequently tagged in XML. They have not yet disrupted existing production processes.

Building a Publishing Platform One Product At A Time

Alex Humphreys, Director of Business Technology Services, Oxford University Press offered an interesting glimpse into the actual experience of a publisher transitioning products onto a new platform. Working with Mark Logic since 2004, this scholarly enterprise was working with journals, monographs, major reference works, etc. -- not necessarily straightforward with fairly stringent expectations for discoverability. Internally the hope was that the adoption of MarkLogic would enable OUP to deliver subscription based information services to the market rapidly and cost-effectively. The past two to four years has shown them that owning one's own platform so that you can deliver product using multiple vendors (on top of the Mark Logic server) is not without its challenges. Humphreys reviewed half a dozen high-quality information services in order to demonstrate how each experience in delivering a unique service drew on previous investment in time and staff resources. These diverse services included AASC, Oxford Language Dictionaries Online, Oxford Music Online, UK Who's Who, and similar high-profile library products.

The unique requirements for each product drove development of graphical templates and display options. However the fact that they were still being built across a single platform helped when Oxford chose to work with a variety of vendors (Semantico, IFactory and CDSI). That enhanced the speed of product development. The standardization saved them both time and money, spread staff resources more effectively across products. Development time frame has shrunk, allowing kickoff to launch in five months for some projects.

This held true even when they began work on non-reference information services, such as InvestmentClaims (http://www.investmentclaims.com), solving slightly different problems of discoverability and consumption of content. Investment Claims at present is a PDF-content-oriented site, although new content is being prepped for the service. (Alex noted that this was the first instance of using "bootstrap" code with a vendor. That resulted in shorted development time, measured in days or weeks rather than months.)

To his credit, Alex was honest about what they had seen in terms of costs in building and maximizing the benefits of the platform across multiple projects. His bullet points were these:

per-product development costs have not yet decreased due to the unique requirements of each product and its content

The iterative platform development might not have saved money to develop a platform but allowed them to spread those costs over time and multiple projects.

As the platform matures, we do predict development cost savings.

In closing, Alex noted these lessons learned in the Oxford University Press experience:

It takes multiple products before a feature becomes truly generic.

Easy to underestimate the soft support costs for a system.

It takes a strong stomach to own your own platform

Solid experience from a traditional scholarly publisher who seems to be thriving in the new publishing environment.

Product Vision

Ron Avnur, Director of Engineering and Andy Feit, Senior VP of Marketing, are up now with a little bit of product history. Mark Logic is based on the idea of what you can do with the inherent flexibility of XML. "If you can mark it up, we can query it...If we can query it, you can deliver it."
You can deliver your vision is the marketing message. The vision for your content and its use can be made real in the confines of XML and XQuery.

In the initial release of Mark Logic's product, they would take your schema (whether malformed or not) and put it into a W3C standard (XQuery) which had the ability to retrieve and render.

The second generation of the Mark Logic Server dealt with formats and meta-data and high precision aspects of search (stemming, wild-cards, etc) and improved scaleability.

The 3 series of Mark Logic Server focused on the mark-up of more formats (PDF, MS Office, other linguistic capabilities) Analysis of the data was enhanced through the use of Lexicons and Frequency Analysis. Fields, highlighting, debugger and profiler were improvements for purposes of delivery. An XML repository, powerful search and analytics, and a platform for content applications.

In discussing the future of the Mark Logic server, they seem to be talking about trends in terms of client requests for new mark-up functionality for concept and entity extraction, visualization tools, and content mining as well as industry emphasis on standardizations. How do we cope with user-generated content, including ratings and comments contributed by external parties. There are changes taking place in content creation, the explosion in volume of content, and the emphasis on embedding different versions of content in workflow, dependent upon the individual role or task at hand.

The next version of MarkLogic (4.0) which is due out in the third quarter of this year is engineered to respond to these new trends and better serve customers. I don't want to reveal too much in this context but the improvements being discussed seem entirely practical and useful for any organization serving knowledge workers. Additional delivery enhancements, understanding usage through analytics, marriage of geospatial data with text are all referenced. Improved security, more automation of specific kinds of activity in working with the XML and control. The current slide has buzz words like enrich, manage, mirror, re-use as part of the cycle of pushing the content through the Mark Logic Server out to the user as best suits the provider of that content and data. I need to get some additional background on what this signifies before I try to express it here. Entity enrichment, administrative enhancements and accelerated development of applications and services can sometimes involve complexities that I (as a non-technical person) don't always grasp.

Dave Kellogg's Opening Keynote

I'm told about 400 people are attending this event. Client and sponsor logos flashed on screen include Nerac, Elsevier, Oxford University Press, American Psychological Association, New England Journal of Medicine, Temis, Access Innovations, Data Harmony, etc.

Speed, scale, power, agility are the watchwords here this morning. Andy Feit just stepped up to welcome us all. This year's theme is "Discovering Agility". A particularly good analogy that is emerging from Andy's welcome is the power of the pelotan. A pelotan for those of you unfamiliar with the concept is the main pack of cyclers in a bike race. That pack moves together for two reasons, going downhill the pack gains momentum from the draft created and going uphill, the leaders set the pace. Clearly, the point is that developers working with XML for their content gain from the knowledge of the pack -- ie. the user conference such as this one.

Dave Kellogg has just stepped up with a presentation with a title, "Carving a Niche in the Infrastructure Market". If you aren't following Dave's blog, that is something of a mistake. Dave writes his own CEO blog; it's not the product of his corporate communication staff. I learn about the technology and business community when I read his blog over at http://marklogic.blogspot.com/. He just indicated that he spends 2 to 4 hours a week working on his blog. The subscriptions to his RSS feed greatly dominates over the site visits which is an indication of the power of RSS.

The company strategy and vision is up next. Their mission is to unlock content. That breaks out to the following steps -- Find it, Database-ize it, Build applications on top of it and Analyze it. Finding it means rounding up the content from all the various places where it may be housed (internally and externally). Database-ize it refers to the database management system where you aggregate it in a repository. Building applications involves the re-use of the content in a variety of forms. Analyze it -- see what other information may be contained in that content and surface it for its value.

He's suggesting that a better database management system is currently envisioned by and built by search, database and content people. It's a question of bridging the gaps between text people and data people and content people in order to optimize the content owned by publishers and other organizations (federal agencies, enterprise entities, etc.). The difference in approach is that Mark Logic doesn't envision content just in the context of tables and columns; they see it in the context of flexibility. How can the content be handled so that it works in the way required to best satisfy the need of the content provider/publisher and the community they serve.

He's suggesting that Mark Logic has a "bowling alley strategy" - you have the head pin, in this instance the first thing you hit, and that subsequently drives what else goes down. The success you have with the first customer will create an impact with others surrounding that head pin. Then you pick up on adjacent organizations with similar problems (he references JetBlue and the Church of Latter Day Saints). These are not traditional publishers but they have a high volume of documents. You build success out from that. Dave references Geoffrey Moore, author of "Inside the Tornado" as one of the sources for more on this topic.

Dave uses humor very effectively in his presentation as he talks about how Mark Logic tries to put their internal thinking outside of the traditional box and bringing their customers outside of the box (in terms of database solutions). What's their positioning of their services? In an interview with Steven Arnold, the answer (delivered with some humor) is "better". The company's growth, displayed on one of his slides, is impressive over the course of the past five years. They're balancing focus with experimentation (try things, run with what works, learn). He's referencing MarkMail (http://www.markmail.org) and Mark Logic's Facebook application, KickIt, as instances of experimentation. The creativity inherent in such experiments may help to resolve customer problems.

Mark Logic User Conference (MLUC08)

I'm sitting in the ballroom of the Intercontinental Ballroom in San Francisco at the Mark Logic User Conference. I'm fortunate enough to be watching the early morning activity as Dave Kellogg, CEO, Andy Feit, Senior VP of Marketing, and John Kreisa, Director of Product Marketing, fiddle with slides to ensure perfection. As someone who serves as ring master for numerous events like this, it's always a little nerve-wracking that last hour before the opening at any conference. You've got people assembled, you have a workforce on site ready to iron out any last minute wrinkles or problems, and the adrenaline is pumping. I can actually feel the excitement and energy. This seems like it will be fun.

Just to reiterate the point being made, bloggers of this event (as well as Flickr users, etc.) should tag their entries as MLUC08. If you are twittering the event, make sure you include #MLUC08 so that the tweets can be properly aggregated on services such as Summize.com.

Tuesday, May 27, 2008

Participation Levels

Business Week has Beyond Blogs as a cover story for its June 2, 2008 issue, That relatively lengthy piece picks up on the incursion of Web 2.0 tools into American work life. There is a kind of wonderment expressed in the article that the business community has come so far in just three years with regard to employing new technologies for purposes of interaction and collaboration. to interact and collaborate. Bill Ives on Portals and KM posted fairly recently about a rise in the hiring of chief blogging officers at major corporations. It certainly sounds like we've reached the tipping point here with regard to use of these tools in daily online communication.

A Silicon Valley marketing professional recently posted his statistics with regard to levels of participation in a variety of social media environments such as Twitter, Facebook, FriendFeed, etc. He noted his activity levels in blogging as well as his shared bookmarks on del.icio.us. He then outlined briefly how he follows a stream of social content in the course of a daily workflow. Based on his outline, Louis Gray expends significant effort in listening to and communicating with others in online social networks. For which effort, I must commend him. Tracking 969 individuals is no mean feat.

I don't think I'm saying anything new when I suggest that any proper inventory of activity on these various communication networks ought to include some indication as to the level of participation rather than just the rate of consumption. Louis Gray follows 270 RSS feeds (consumption) but what I would want to read on his blog is some indication as to how he fosters connections to the individuals behind those feeds (individual interaction). Based on a week or so of following his activity on FriendFeed, my belief is that he does foster interactive exchange, but he gives no sense of it in his blog entry.

I have watched with interest Michele Martin's 31-Day-Comment-Challenge at The Bamboo Project Blog. Her focus during May has been the fostering of participation. If the business community wants to enhance the value of social media as well as improve the success of their communications, in my opinion, they should be encouraged to offer up an inventory of participation levels.

Wednesday, May 21, 2008

Picking Out The Important Bits

Recently, I encountered an excitingly useful tool called AideRSS. In a nutshell, the application provides new metrics on the items included in an RSS feed (such as comments added to a blog entry or links in Twitter messages) and allows the individual to filter according to benchmarks. The user can say I only want to read those items that generate the most discussion or I want to see those that exceed a particular level for one the metrics AideRss uses. The intent is to reduce the volume of content flowing to the individual, but still bring important items that are creating buzz to his or her attention.

Type the URL of a blog or website RSS feed into the AideRSS box at their home page. The system runs a rapid analysis of the most recent postings and spits back a table that displays Postrank (their metric), the relevant date of the entry, the headline associated with the entry and then three "conversational" metrics (number of comments left, del.icio.us bookmarkings, Google conversations, Twitter messages, and digg votes). The Postrank metric (which is an algorithm generate using the volume/frequency of those various indicators) allows you to rapidly see the most popular entry; clickable column headings on the interface trigger the appropriate sorting. One entry may generate a number of twitter entries but no comments while another may be bookmarked at a high rate and have five comments. Looking at this feedback allows the user to quickly identify the important items in an RSS feed for purposes of follow-up. Another graphic indicates the consistency of quality over time by noting the most popular item, least popular item, and the popularity level of the most recent posting on the site.

Click through here to see how Newsgator (as one prominent reader/aggregator of feeds) can support AideRSS:
http://www.aiderss.com/newsgatorHowTo.html

The need to filter information is of increasing importance to Web knowledge workers. For those of us who monitor in excess of 200 feeds for purposes of industry news and analysis, headlines are frequently faulty indicators of a particular entry's content and importance. The development of something like AideRSS could allow me to gauge in a single screen what within a feed might be important to me as a reader or to my audience of readers. Publishers of all forms of content should take note!

Friday, January 4, 2008

NFAIS Fiftieth Anniversary Conference

NFAIS

1958 – 2008: 50^th Anniversary

The cut-off date for the early bird conference registration fee is only days away - Tuesday, January 8, 2008!! The conference, scheduled for February 24-26, 2008 in Philadelphia, PA, is for all information providers – publishers, librarians and educators - who want to learn more about the user behavior and expectations that are driving the new information order and the technologies, business practices and strategies that are required to adapt products and services to a new generation of information seekers.

The Conference theme - The New Information Order: Its Culture, Content and Economy will look at how the rapid adoption of information technology is creating a user-centric, technology-driven society with its own unique culture, value propositions, behavior and economy, and will highlight the opportunities that are available to all who are willing to adapt to the New Order. The preliminary program, registration forms and general information are now available at: http://www.nfais.org/2008_Tier_Program.htm.

Highlights include:

Trends that are driving the new information order from noted author David Weinberger, (Everything is Miscellaneous: The Power of the New Digital Disorder ClueTrain Manifesto, etc.)

Emerging technologies and the future of information discovery

User perceptions of the value of content based upon recent surveys from Outsell, Inc.

Corporate and library business practices and revenue models that reflect the culture of today’s information society

The geographic shift in the information economy and the opportunities offered by China as a new source of content

Strategies for success in the New Information Order form the perspective of corporate, academic and government executives

This 2008 NFAIS Annual Conference will be a very special event as NFAIS will be marking the 50^th Anniversary of its founding. The City of Philadelphia will proclaim the opening day, February 24, 2008, as “NFAIS Day,” the Gala celebration will be held in the ballroom of the historic Academy of Music, the oldest grand opera house in the U.S. that is still used for its original purpose, and the meeting itself will be held in the Park Hyatt at the Bellevue, a national historic hotel. Join us and find out how your organization can thrive in the New Information Order!

For more information, contact Jill O'Neill, NFAIS Director of Communication and Planning (jilloneill@nfais.org or 215-893-1561) or visit the NFAIS Web site at http://www.nfais.org/events/event_details.cfm?id=44.

Information Floes & Eddies