textming chancediscovery
TRANSCRIPT
Chance discovery means discovering chances - the breaking points in systems, the marketing windows in business, etc. It involves determining the significance of some piece of information about an event and then using this new knowledge in decision making. The techniques developed combine data mining methods for finding rare but important events with knowledge management, groupware, and social psychology. Theoretical Computer Science Springer.com
ser·en·dip·i·ty 1. The faculty of making fortunate discoveries by accident.2. The fact or occurrence of such discoveries.3. An instance of making such a discovery.
Fortuitous accidents
Accidents in medicine: The idea sends chills down your spine as you conjure up thoughts of misdiagnoses, mistakenly prescribed drugs, and wrongly amputated limbs. Yet while accidents in the examining room or on the operating table can be regrettable, even tragic, those that occur in the laboratory can sometimes lead to spectacular advances, life-saving treatments, and Nobel Prizes.
PBS NOVA
“It takes years of study to create a chance discovery, “writes Ashley Hay.
author of the Science of Serendipity
Can we use text mining techniques to speed up this process?
There are so many different things included in text miningfrom social network dynamics to searching the web
The importance of the corpus
Using a corpus Here on the left is an example in the social sciences but
these techniques are also used to evaluate the capacities of industrial companies via what they put on their webpages and so on...
To give you some concrete situations to deal withtwo examples are suggested
Poisonous or Venomous Animals
The Curious Case of Dental & Arterial Plaques
Most venomous animals appear to produce their toxins
Not the blue ringed octopus
Its venom contains a neurotoxin produced by Bacteria
This toxin isTetrodotoxin
synthesized by several bacterial species, including strains of the family Vibrionaceae, q.v., Pseudomonas sp., and Photobacterium phosphoreum.
The venom is stored in its salaviary glands
Manually identified keywords or keyphrases
Catagories
Placement of the venom,
Phrases associated with the toxin, venom, poison
Structures associated with hosting the bacteria
Possible Results
Modified salivary glands
The poison is actually a cocktail of chemicals
Gland, duct, mucas etc
What will you discover?Should you make separate corpuses for before & after the source of the poison was correctly identified as bacterial?
Once you have begun manually identify key phrases, add synonyms, and see the the patterns that result ...then you can try to automate this process.
How can this system be optimized?
When you have some results there is also the challenge of putting them into perspective.If the corpus of the king cobra has many phrases similar to the blue ringed octopus what does this mean?
For many years medicine has known that dental plaque was caused by oral bacteria
However in 2008 University of Florida researchers cornered the bacterial ringleaders of gum disease inside human artery-clogging plaque see
Human Atherosclerotic Plaque Contains Viable Invasive Actinobacillus actinomycetemcomitans and Porphyromonas gingivalis byEmil V. Kozarov, Brian R. Dorn, Charles E. Shelburne, William A. Dunn Jr,and Ann Progulske-Fox
The Curious Case of Dental & Arterial Plaques
What does this mean?
If these two have the same cause perhaps other places were the keyword is plaque is also bacterial in origin or
that microbes are implicated
Again more or less the same protocol
Manually search texts to identify keyphrases
Include synonyms, science direct, google scholar etc
Find other medical conditions that refer to plaque or use simlar phrases
Create corpuses, manuplate searches, automate
Analyze the results and present them