IBM's Unstructured Information Management Architecture (UIMA) was released to the open source community in early 2006 when the entire source code was made available on Sourceforge. After spending more than a year at Sourceforge, UIMA is now a part of the Apache Incubator.
UIMA is pitched to become the first and only open-standard for unstructured information management. In very short, UIMA is a framework for building analytics solutions for the new world of structured-unstructured information sharing. Other frameworks like CALAIS are narrowly focussed on the Semantic Web technologies rather than providing a framework for building rich Text Analytics applications.
UIMA allows developers to build applications around technologies and chain the processes through its framework. Each component in the framework is an annotator. Consider for example an application that identifies person names in a text document. The algorithm can be implemented as an annotator that implements the UIMA interface (jCAS if you are using apache UIMA) for common analysis systems(CAS). A CAS is a general representation schema and can store arbitrary data structures for the analysis of documents. Using CAS, the span of annotation can be represented easily. The data can be passed through several Analysis Engines (AE) so far as each of them comply to the descriptor. Details on using UIMA and how to build Aggregate Analysis Engines are available here.
One of the most exciting engagement will be between UIMA and the Semantic Search. Semantic Search is the next generation of Search Technology using metadata (read information) created through Advanced Text Analytics and enabling 'contextual' search. The underlying technologies from NLP, Machine Learning, Statistics have existed for decades and explored to finer details by the research community. With the increasing adoption of enabling frameworks like UIMA, it is now easy to develop scalable solutions using Advanced Research Tools.
Some useful links to learn how UIMA can be used for building advanced text analytics solutions:
1. Background information on UIMA
2. UIMA and Semantic Search
Undercover information: When at IBM, I was part of the gang that developed ProAct - A UIMA based Customer Satisfaction Analysis technology.
Tuesday, February 12, 2008
Friday, February 1, 2008
What is Microsoft upto?
This sure is a no-brainer. It is gearing up for a whole new battle of market share on the Web - Ad Revenues, et al. Mr. Gates announced in the World Economic Forum at Davos this year that his company is getting ready with some new and exciting products in the WWW services/offerings arena. Interesting was his comment that there is not just one company in this world of "Search". And why not? The online ad market is projected to grow to $61 billion. You sure have arrived Mr. Gates - though inorganically, but who cares.
Microsoft announced it's intention to acquire Yahoo! and made an offer of $31/share valuing the company at $44.6 billion. This is after the two companies decided to work together in 2006.
There has been consistent effort by Microsoft to consolidate it's portfolio of offerings related to enterprise search and web/mobile advertisement. MSDN's enterprise search blog is a good insight into the company's initiatives in this realm.
In 2007 alone, MSFT made 7 acquisitions related to the two offerings. The recent acquisition of Oslo based Fast Search & Transfer (FAST) for $1.2 billion is aimed at giving Sharepoint (MSFT's enterprise offering) a smarter search capability. Earlier in 2007, MSFT announced a partnership with Atlassian (of Confluence fame) and Newsgator for providing more features taking Sharepoint closer to the Web 2.0 world. Sharepoint contributes $800 million to MSFT's revenues and it seems the Redmond gang is beginning to feel the heat from open-source collaboration platforms such as mediawiki.
Sharepoint badly needed a makeover and to get rid of its very Office-like feeling. I am a big fan of MS Office, but I sure am not game with the use of a Word app for maintaining personal notes and technical meeting records. I would anyday prefer a wiki with latex and mindmap plugins. The Atlassian and Newsgator partnership is probably targeted around building such easy to use features around Sharepoint.
To add to user's woes imagine digging out relevant information from hundreds of documents and not being able to search within the document contents. For large enterprises this is a non-trivial problem; and a very annoying one too. The FAST acquisition will probably enable Sharepoint with potential solution to these bottlenecks.
With enterprises slowly (but steadily) accepting open-source technologies, will MSFT's pursuit to revive and empower Sharepoint and enterprise technologies really work? This is a question we all will eagerly wait for an answer.
The game has already begun. This is a three cornered polygon - Google, MSFT and the Open-source, with Google trying to get identified as an open-source angel. My personal favourite is the world of Sourceforge, Slashdot and Wikipedia. Let the best win the battle.
Microsoft announced it's intention to acquire Yahoo! and made an offer of $31/share valuing the company at $44.6 billion. This is after the two companies decided to work together in 2006.
There has been consistent effort by Microsoft to consolidate it's portfolio of offerings related to enterprise search and web/mobile advertisement. MSDN's enterprise search blog is a good insight into the company's initiatives in this realm.
In 2007 alone, MSFT made 7 acquisitions related to the two offerings. The recent acquisition of Oslo based Fast Search & Transfer (FAST) for $1.2 billion is aimed at giving Sharepoint (MSFT's enterprise offering) a smarter search capability. Earlier in 2007, MSFT announced a partnership with Atlassian (of Confluence fame) and Newsgator for providing more features taking Sharepoint closer to the Web 2.0 world. Sharepoint contributes $800 million to MSFT's revenues and it seems the Redmond gang is beginning to feel the heat from open-source collaboration platforms such as mediawiki.
Sharepoint badly needed a makeover and to get rid of its very Office-like feeling. I am a big fan of MS Office, but I sure am not game with the use of a Word app for maintaining personal notes and technical meeting records. I would anyday prefer a wiki with latex and mindmap plugins. The Atlassian and Newsgator partnership is probably targeted around building such easy to use features around Sharepoint.
To add to user's woes imagine digging out relevant information from hundreds of documents and not being able to search within the document contents. For large enterprises this is a non-trivial problem; and a very annoying one too. The FAST acquisition will probably enable Sharepoint with potential solution to these bottlenecks.
With enterprises slowly (but steadily) accepting open-source technologies, will MSFT's pursuit to revive and empower Sharepoint and enterprise technologies really work? This is a question we all will eagerly wait for an answer.
The game has already begun. This is a three cornered polygon - Google, MSFT and the Open-source, with Google trying to get identified as an open-source angel. My personal favourite is the world of Sourceforge, Slashdot and Wikipedia. Let the best win the battle.
Subscribe to:
Comments (Atom)
