Today Udo Hahn gave an interesting presentation on a new methods of extracting technical terms from a large text corpus. Traditional methods work by statistical analysis of how often a phrase occurs. His new method used limited paradigmatic modifiability to test the frequency of each single word of a given phrase and thereby compute how likely it is that a phrase is part of a term and not just a chance combination of frequently used words. The new p-mod method beat the t-test and c-value methods in testing on the UMLS meta-thesaurus. Supplementary tools used were the GENIA POS tagger, YAMCHA (support vector machine) chunker and a stop-words filter.
Some US Army and IBM researchers were experimenting with ways to detect if a particular speech contained a story. Their vision is to attach small recording devices to every soldier and automatically record the war stories they tell. Stories are the best way to entice people to take up military life, entertain them, keep up their moral and record the "human" side of military service. They used the WEKA toolkit to rapidly try out different machine learning algorithms and ultimately settled on support vector machines with polynomial kernels. The neural net would be used in real time on textual speech data transcribed by IBM ViaVoice 10. Certain kinds of figures of speech indicate a story is being told. The SVM was therefore trained to recognize the structure and grammar of story-speech. Ultimately, they failed in their experiment. The speech recognition was only about 70% accurate, which wasn't high enough to accurately distinguish stories from regular conversation.
Carol Goble from Manchester (the co-leader of the IMG research group) gave the closing keynote presentation. She talked about the Montagues and Capulets, the two families from William Shakespeare's Romeo and Juliet. The Montagues are equivalent to the logicians and knowledge engineers in the realm of research. Ian Horrocks, for example, falls squarely into this camp. They are interested in the cool technology, advanced tools, logical rigor, writing researcher papers, solving the interesting (though often not practical) problems. The Capulets, in contrast, are the biomedical researchers such as the people that created the Gene Ontology (GO). They don't care about the theory, but do care about solving practical problems. They also tend to be better at the social engineering necessary to get people to actually use the tools they provide. A third camp is the philosophers (like Barry Smith), who say that everyone else is doing everything completely wrong, but don't offer any practical advice or help in how to do it better. Her conclusion: let's not all kill each other and instead try to work together and have a happy ending.
Need: a seemless ontology authoring and annotation tool that lets people annotate data and extend the ontology at the same time. At the moment we not only need to switch between tools to accomplish this talk, we also need to switch between people. Currently only the biologists can do the annotation and only the logicians can build the ontologies.
Jim Hendler's principle: "A little bit of semantics goes a long way". Just using OWL as a common knowledge interchange format is of great benefit to the e-science community.