big proposal for the audi urban future award 2010 … › ub › np › tutorials ›...

BIG proposal for the AUDI Urban Future Award 2010 – full of imagined machine learning… http://www.archdaily.com/77103/bigs-proposal-for-the-audi-urban-future-award/

Machine Learning: Overview - imagine this simple problem: You have a microphone in a room and want to use data from the microphone to find out whether there are people in the room or not. For simplicity sake lets assume you have only loudness (amplitude) and no frequency data. Here is a simulated data stream from the microphone.

amplitude

time

Machine learning can ‘read’ such a data stream for potential ‘meaning ‘ .

Machine Learning:

amplitude

time

You might try to check for the most obvious first: peak values

something happened here….

amplitude

time

but what about here and here…

Machine Learning: Generalizing this: How do you differentiate the (two) states you seek (presence – non presence) given the particularities of the input data?

Here this translates into a classification problem that can be addressed by several methods. Machine learning seeks to use existing data to guess the meaning of new data. Here, this would mean classifying several examples of data related to -people in the room- and then letting the system search for similar patterns in the new data which would then be labeled as -people in the room- again. Ideally, such a system will check its assumptions periodically and change its rules.

Machine Learning: Machine Learning (ML) attempts to build theoretical and practical frameworks for synthetic systems that improve with experience. In particular, ML attempts to define relationships between types of tasks, desired performances and necessary experience.

ML grew out of the field of Artificial Intelligence (AI), but is less concerned with symbolic logic

and more interested in interaction with the 'real' world, where a fixed algorithm might not be

available.

A focus of ML research is the production of models and patterns from data, as in improving

performance over time based on input from sensors or queries of databases. For this reason,

ML is closely related to data mining, inductive reasoning and pattern recognition.

Supervised Learning: ML, as opposed to AI, bases its operations on data. This data, be it from sensors, animals or people, is used to train a synthetic system, often with a reward funtion, to act (or reason) like the source that produced the data. The diligent collection and expert guided classification of data becomes central to the learning process. Because of this training with pre-classified data, this kind of machine learning is often refered to as supervised learning. After exposure to labeled training data, a computer is typically confronted with new unlabeled data and responds to it based on the experience gained from the labeled data. It is often necessary to correct the computer when it reacts erroneously to the new data. By rewarding the system for correct responses one can improve the learning process on some types of problems. Systems that use reward functions (as opposed to labled data) are refered to as reinforcement learning systems. Typical problem domains of supervised learning are: classification and regression

Unsupervised Learning: Altering behavior through training and reward constitutes learning in the synthetic system and is, in many ways, similar to the way human beings learn, although the details are very different. Importantly, humans have a vastly more complex -and sometimes altruistic- conception of reward than any machine can. Also, human beings are able to learn to learn. In ML, the ability to learn in unstructured data environments is called unsupervised learning. Here, computers typically learn to cluster data into different groups or patterns depending on the kinds of features that can be determined. There is, however, little knowledge or understanding on the part of the computer on what these features mean.

Details of implementing ML are dependent on the choice of method. Choice of method in turn is dependent on the application domain and the data collected from it. Statistical methods, neural networks of various topologies probabilistic reasoning, fuzzy logic and case based reasoning, often in combination, are common tools. ML has produced fundamental statistical-computational theories of learning processes, has designed learning algorithms that are routinely used in commercial systems from speech recognition to computer vision. Current research trends in ML, as discussed by Tom Mitchell in “The Discipline of Machine Learning” listed below include synergies between ML and human learning, where social and cultural constraints carry agency in addition to the biological substructures of learning, and the question of never-ending learning that continuously and indefinitely improves performance and maybe begins to question its very premise over time.

Additional Introductory Texts/Sources: Mitchell, T., The Discipline of Machine Learning, CMU-ML-06_108, July 2006.

Mitchell, T., Brains, Meaning and Corpus Statistics, 2009

http://www.youtube.com/watch?v=QbTf2nE3Lbw Bishop, C., A New Framework for Machine Learning, in: Computational Intelligence: Research Frontiers, pp. 1-24, Springer-Verlag, Berlin, 2008. Alpaydin, E., Introduction to Machine Learning (Second Edition), MIT Press, 2010

Methods of ML:

Supervised Learning - Neural nets: connectivist network that stores information in its nodes and weighted node connections. Baysian nets/filtering: probabilistic graphical model of properties of random variables Support vector machines: a set of classification methods that produce hyperplanes maximumly separating data clusters

Unsupervised Learning - Data Clustering: discovering and visualizing groups of related items Self-Organizing Maps: neural nets with neighourhood functions to find low level views of high dimensional data

Methods of ML: Supervised Learning - Neural Networks: Back Propagation:

Back Propagation is a neural network technique that calculates an error function and tries to minimize it in repeated steps. The error function is based on the measurable difference between inputs and current outputs. The smaller the difference between input and output, the better the pattern match. Detailed example: http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html

Neural Nets are inspired by but abstracted from biological networks. They use examples (hence supervision) to train a network to recognize similarities to previously defined patterns See: http://en.wikiversity.org/wiki/Learning_and_neural_networks

http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html












http://en.wikiversity.org/wiki/Learning_and_neural_networks

Biological Neuron and its abstraction (McCulloch-Pitts model; illustrations from Jackson and Hertz)

THE FIVE BASIC CHARACTERISTICS of neural networks (Pfeifer) (1) The characteristics of the node. The terms nodes, units, processing elements, neurons, and model neurons are used synonymously. It is important to define the way in which the node sums the inputs, how they are transformed into level of activation, how this level of activation is updated, and how it is transformed into an output which is transmitted along the axon. (2) The connectivity. It must be specified which nodes are connected to which and in what direction. (3) The propagation rule. It must be specified how a given activation that is traveling along an axon, is transmitted to the neurons to which it is connected. (4) The learning rules. It must be specified how the strengths of the connections between the neurons change over time. (5) Embedding the network in the physical system: In neural networks for embedded systems, one must always specify how the network is embedded, i.e. how it is connected to the sensors and the motor components.

Neocortical column (rat brain) considered the smallest functional unit of the neocortex (the part of the brain thought to be responsible for higher functions such as conscious thought). http://bluebrain.epfl.ch/

http://bluebrain.epfl.ch/

Back propagation implemented in python class NN: def update(self, inputs) def backPropagate(self, targets, N, M) def test(self, patterns)

def main(): # Define a pattern train_pat = [ [[0,0,0], [1]], [[0,0,1], [0]], [[0,1,0], [0]], [[0,1,1], [0]], [[1,0,0], [0]], [[1,0,1], [0]], [[1,1,0], [0]], [[1,1,1], [1]] ] # select a new, similar test pattern test_pat = [[[0,0,0.99]]] # create a network with 3 input, 1 hidden layers (2 nodes), and 1 output (based on training data) n = NN(3, 2, 1) print "training.." n.train(train_pat) print "testing.." n.test(test_pat)

Methods of ML: Unsupervised Learning - Data Clustering: Hierarchical Clustering: This method builds up a hierarchy of groups by continuously merging the two most similar groups. Each of these groups starts as a single item. In each iteration this method calculates the distances between every pair of groups, and the closest ones are merged together to form a new group.

Methods of ML: Unsupervised Learning - Data Clustering: Hierarchical Clustering: After the clustering has been achieved one typically visualizes the result in tree-like structure, a dendogram, as it retains the nodes and the node relationships:

This is computationally expensive (slow) because the relationship between every pair of items must be calculated and recalculated as the items are merged.

Methods of ML: Unsupervised Learning - Data Clustering: K-means Clustering: This method uses k randomly placed centroids, the assumed centers of clusters, and assigns each item to the nearest centroid. After that, the centroids are moved to the average location of all the nodes assigned to them, and the process is repeated (until the centroids stop changing).

c 1

c 2

Methods of ML: Unsupervised Learning - Data Clustering: 2 Dimensional Scaling: For each pair of items, the target distance is compared with the current distance and the error (difference) is calculated. On each iteration of the algorithm, each of the items is ‘nudged’ a bit (in proportion to the error between the items). Each node is moved according to the combination of all the other nodes pushing and pulling on it. On each iteration, the overall distance (between all node points and the targets) decreases. The algorithm

terminates when this distance no longer changes.

0.3

0.5

0.4

0.4 0.7

0.3

0.6

0.2 0.9

current state change target state

Topics in ML: Optimization Optimization finds the best solution to a problem by trying many different solutions and scoring them to determine their quality. Optimization is typically used in cases where there are too many possible solutions to try. Optimization techniques are typically used in problems that have many possible solutions across many variables, and that have outcomes that can change greatly depending on the combinations of these variables. Machine learning has several optimization techniques, including learning neural nets and genetic algorithms.

ST5-4W-03 antenna design optimized by a genetic algorithm.

http://ti.arc.nasa.gov/projects/esg/research/antenna.htm

Topics in ML: Cost Function: Any time one is faced with finding the best solution to a complicated problem, one needs to decide what the important factors are. After choosing some variables that represent those factors and impose costs, one needs to determine how to combine them into a single number. The cost function is the key to solving a problem using optimization. However, the cost function is usually difficult to determine. The goal of any optimization algorithm is to find a set of inputs that minimizes the cost function; the cost function has to return a value that represents how bad a solution is. There is no particular scale for badness; the only requirement is that the function returns larger values for worse solutions. Often it is difficult to determine what makes a solution good or bad across many variables. Cost functions assume that all things that matter (to the evaluation) can be represented numerically. This is not always the case, particularly in cultural uses of information. Caveat.

Topics in ML: Random Search Random searching isn’t a very good optimization method, but it makes it easy to understand exactly what all search algorithms are trying to do, and it also serves as a baseline so one can compare other algorithms. However, randomly trying different solutions is very inefficient because it does not take advantage of the good solutions that have already been discovered.

Hill Climbing An alternate method of random-related search is called hill climbing. Hill climbing starts with a random solution and looks at the set of neighboring solutions that are better (have a lower cost function) than the existing one.

schematic of hill climbing algorithm

Topics in ML: Simulated Annealing Simulated annealing is an optimization method inspired by physics (thermodynamics in particular). Annealing is the process of heating up an alloy and then cooling it down slowly. Because the atoms are first made to jump around a lot and then gradually settle into a lower energy state, the atoms find a well defined low energy configuration.

The algorithm version of annealing begins with a random solution to the problem. It uses a

variable representing the temperature, which starts very high and gradually gets lower. In each

iteration, one of the numbers in the solution is randomly chosen and changed in a certain

direction. At the onset of the algorithm, the temperature variable has a strong influence of the

overall behaviour. Over time the influence of ‘temperature’ (as it 'cools' down) is reduced.

Topics in ML: 2010: Never Ending Learning (NELL) Goal: To build a never-ending machine learning system that acquires the ability to extract structured information from unstructured web pages. The inputs to NELL include (1) an initial ontology defining hundreds of categories that NELL is expected to read about, and (2) seed examples of each category and relation. Given these inputs, plus a collection of 500 million web pages, NELL runs 24 hours per day, continuously, to perform two continuous jobs: - Extract new instances of categories and relations. - Learn to read better than yesterday. NELL uses a variety of methods to extract beliefs from the web. These are retrained, using the growing knowledge base as a self-supervised collection of training examples. Further reading: http://rtw.ml.cmu.edu/rtw/publications

http://rtw.ml.cmu.edu/rtw/publications









Topics in ML: 2012: Distributed Computing aka Cloud Computing Cloud computing is the convergence of three major trends: • Virtualization applications are separated from infrastructure, (operating system from the underlying hardware). • Utility Computing server capacity is accessed across a grid as a variably priced shared service • Software as a Service applications are available on demand on a subscription basis.

Software for the cloud: MapReduce Hadoop Map/Reduce is a software framework for writing applications which process vast amounts of data in parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner. Map/Reduce splits the input data-set into independent chunks which are processed by the ‘map tasks’ in parallel. The framework sorts the outputs of the maps, which are then input to the ‘reduce tasks’. Further reading: http://www.ibm.com/developerworks/cloud/library/cl-cloudintro/index.html http://www.cloudera.com/blog/2010/04/scaling-social-science-with-hadoop/ http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html#Overview

http://www.ibm.com/developerworks/cloud/library/cl-cloudintro/index.html











http://www.cloudera.com/blog/2010/04/scaling-social-science-with-hadoop/



















http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html















Topics in ML: 2012: Very Large Data Sets Cloud Computing with unsupervised learning – deep learning > “Building High-level Features Using Large Scale Unsupervised Learning” “…Is it possible to learn a face detector using only unlabeled images? … new experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not.”

Further reading: http://arxiv.org/pdf/1112.6209.pdf

“…We train a 9- layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images The model has 1 billion trainable connections [the human visual cortex is 10^6 times larger in terms of # of neurons], the dataset has 10 million 200x200 pixel images downloaded from the Internet. We train this network using model parallelism and asynchronous stochastic gradient descent on a cluster with 1,000 machines for three days”.

http://arxiv.org/pdf/1112.6209.pdf










The optimal stimulus according to numerical constraint optimization Histograms of faces (red) vs. no faces

(blue).

48 stimuli of the best neuron from the test set.

Most responsive stimuli on the test set for the cat neuron.

big proposal for the audi urban future award 2010 … › ub › np › tutorials ›...

Documents