luigi abruzzese, luciano bonvissuto, giuseppe carluccio, mario ceresa, michele garbugli, davide lo...

29
Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla Milano -Italy

Upload: ella-shauna-todd

Post on 29-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo

Residenza Universitaria Torrescalla

Milano -Italy

Page 2: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

Scientific discovery is one of the most characterizing activity of the human mind

Can computers emulate human mind in scientific discovery?

Or are computers only a strong help in this activity?

Page 3: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

Artificial Intelligence

Theoreticalfoundations

Methodologies Techniques

Page 4: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

Artificial Intelligence

Hardware Software

Performancesof human mind

Page 5: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

PHENOMENON

MODEL

LAW

ABDUCTION

INDUCTION

DEDUCTION

ADDUCTION

Philosophical background

Page 6: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

Historical analysis

The early days of artificial intelligence saw various attempts to automate creative tasks of scientific and mathematical inference. Perhaps the earliest examples (on electronic computers) of symbolic mathematical or scientific inference were master’s theses at MIT (J.F. Nolan) and at Temple (H. G. Kahrimanian) in 1953 on analytical differentiation in the calculus. Starting in the 1960’s, Lederberg invented an algorythm for generating molecular structures efficiently, which led to the Stanford Dendral project whose goal was to elucidate molecularstructure on the basis of mass spectograms and other experimental evidence.

Page 7: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

Space-state search

Page 8: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

BACON DISCOVERS KEPLER’S THIRD LAW:

The squares of the periods of planets are proportional to the cubes of the mean radii of their orbits: P^2 / D^3.

Page 9: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

BACON

• AI program developed in 1970s.

• BACON is provided with knowledge of certain mathematical relationships.

• It carries out a search through the space of possible compositions of those relationships

Page 10: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

Heuristics of BACON

DmPn =constant

Page 11: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

In conclusion:

• BACON does not know what it has discovered. It is BACON creators who comprehend the significance of the discovery.

So who is the real discoverer, human or machine?

Page 12: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

Computer(Artificial

Intelligence)

Mathematicaldefinitions

anddemonstrations

Page 13: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

The Four Color Theorem

Page 14: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

First demonstration

1922 Franklin max 25 regions

Heesch

Reducibilityand

discharging

Appel e

Haken

1476particular cases

1200 hours processing

Page 15: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

a not rigorous demonstration

The demonstration couldn’t be verified by a human brain

Proving something to the people would mean persuading

a sufficient number of qualified people. If we accept this kind of definition, in the future it will be possible that calculators will help men

in the discovery of the new laws of math

Page 16: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

FROM AUTOMATED DISCOVERY PROGRAMS TO COMPUTATIONAL

SCIENTIFIC DISCOVERY SYSTEMS

• Up to the 80’s automated discovery programs discovered laws already known

• The state space approach leads to the explosion of possibilities

• Only the scientists can introduce heuristics

that can limit the number of possible states• Who really makes the discovery : man or

machines?

Page 17: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

RECENT RESEARCH

• The idea of totally automated discoveries is abandoned

• The new trend is towards computer supported scientific discovery

• The new goal is to obtain really new discoveries, that can be published on specialized literature

Page 18: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

MACHINE LEARNING

Different kinds of machine learning :

• Supervised learning

• Unsupervised learning

• Reinforcement learning

Page 19: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

SUPERVISED LEARNING

• Ensemble of elemental units combined in a reticular structure

• Net elements, called neurons, are organized in layers and are tightly interconnected

• To each link is associated a weight that represents a kind of inner knowledge

MAIN FEATURES• Learning• Prediction

TRAINING• Training-set of examples (as input/output)• Learning Algorithm• Weights “calibration”

GOAL• Generalization of training results based on

test-set ( prediction )

TRAINING PHASE

Formal neuron structure

NEURAL NETS

Page 20: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

NEURAL NETSThe endeavour of emulating the real neural nets leads to conceive various

kinds of nets that can be classified according to some parameters :

• Use

• Learning algorithms

• Links structure

The most known models are :

• Feedforward nets (with back-propagation algorithm,most used)

• Associative nets

• Stochastic nets

• Self-organizing nets (Kohonen, also unsupervised)

• Genetic nets ( models from Darwin evolution theory)

Page 21: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

UNSUPERVISED LEARNING

DATA MINING Process of exploration and analysis ,with automatic and

semi-automatic tools, of vast amount of data, oriented to discover significant structures and rules and to develop predictive or explicative models of a specific phenomenon.

We have many techniques of data mining :

Decisional trees , data warehouse , clustering , associative rules and temporal sequences......

Page 22: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

CLUSTERING

Cluster: Objects/data collection – Similar compared with each object in the same cluster – Different from other clusters objects

CLUSTERING ANALYSIS :To group objects together in cluster

• Clustering is defined as unsupervised classification: It doesn’t use any background knowledge on studied data set

TIPICAL APPLICATIONS• As stand-alone tool to try to understand how data are distributed (for ex. in genic expression data analysis,astronomic data

elaboration....)• As preprocessing pass for other algorithms

Page 23: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

REINFORCEMENT LEARNING

• System acts directly on problem making attempts

• A teacher “rewards” or “punishes”

the system through a numerical

signal of reinforce , depending on

system instant behaviour

Rewards and punishments

Page 24: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

An example : GOLEM ( built by Muggleton and Feng, 1992 )

PROBLEM : Prediction of secondary structure of protein from the _ ________ sequence of amino acids.

• A traditional method used for discovering the secondary structure is X-ray crystallography , but a crystal structure determination may require one or more man-year.

• In general, other techniques also used for this problem are costly,time-consuming and often limitated by some proteins parameters (like size ...).

• From this the need of computational systems support.

Page 25: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

GOLEM : problem description

The two main substructures of proteins are : • α – helix structure• β – filamants structure

GOLEM :

• Restricts the field of his analysis to α– helix proteins• Attempts to predict, from primary _structure, if a particular residue (amino acid) belongs or not to the α– helix _type.

β – filaments structure _

α – helix structure

Page 26: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

GOLEM FUNCTIONING (1)

TRAINING SET : 12 proteins ,non homologous, with well known structures

(LEARNING) of α– helix type , comprising 1612 residues.

+ BACKGROUND KNOWLEDGE

=SMALL SET OF RULES used for predicting which residues belong to α–helix

_ proteins.

TEST SET : 4 proteins (structure known) , α–helix type, comprising 416 _ __ residues

ACCURACY(on test set): 81% ( ±2 )

Page 27: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

GOLEM FUNCTIONING (2)

• Information coded in 1 or 2 parts predicates

Ex: α(155C,105) means that a particular protein (155c) residue (in 105

___ position) is a α–helix type .

• Preferential research toward residues that show particular links characters with the others (data mining)

• Research of rules carried out with an iterative procedure that involves

a bootstrapping learning process.

• Then the rules generated by GOLEM can be considered hypothesis about

the ways through which α–helix form in nature.They define the pattern of relations that ,if present in the sequence of residues, indicates that a specific residue could be part of a α–helix .

Page 28: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

GOLEM RESULTS• One of the rules produced by GOLEM ,concerning protein structure is for example the RULE 12 :

There is a α–helix residue in the protein A in position B if : 1 – The residue in B -2 is not proline 2 – The residue in B -1 is neither aromatic nor proline 3 – The residue in B is big, neither aromatic and nor lysine 4 – The residue in B +1 is hydrophobic and not lysine 5 – The residue in B +2 is neither aromatic nor proline 6 – The residue in B +3 is neither aromatic nor proline, and or small or polar and, 7 – The residue in B +4 is hydrophobic and not lysine

This rule has an ACCURACY of 95% in training and of 81% on test set.This rule was not known before GOLEM discovered it and it has contributed to one of the most important actual problem of natural sciences.That’s why we can credit to GOLEM the discover of a natural law.

Page 29: Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla

CONCLUSION

We have presented some attempts of creating artificial intelligence programs that make scientific discoveries

From the historical analysis we have shown that the first idea of A.I programs that autonomously discover scientific laws has been abandoned

The new trend is that of computer supported scientific discovery in which Artificial Intelligence is a useful and sometimes necessary tool for scientific research