financial information grid –an esrc e-social science pilot
DESCRIPTION
Financial Information Grid –an ESRC e-Social Science Pilot. Khurshid Ahmad Department of Computing, University of Surrey; Jon Nankervis Department of Accountancy and Finance, University of Essex. FINGRID Project. - PowerPoint PPT PresentationTRANSCRIPT
FINGRID
RES-149-25-0028
All Hands Meeting, 31/08-3/09, 2004 Nottingham
Financial Information Grid –an ESRC e-Social Science
Pilot
Khurshid AhmadDepartment of Computing, University of
Surrey;
Jon NankervisDepartment of Accountancy and Finance, University of
Essex
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID ProjectThe FINGRID project is a collaboration
between econometricians at Essex, computing academics, particularly in grid computing and artificial intelligence, at Surrey (plus financial traders).
The FINGRID project aims to provide a solution for the information management/ processing challenge in social sciences: analysis and fusion of distributed quantitative and qualitative data and programs.
FINGRID is the third project at Surrey that deals with qualitative data (news and reports) and qualitative data (time series) EU Projects ACE (1996-99), GIDA (2001-03).
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID Objectives
Create a Grid environment based on Open Grid Services Architecture to provide a demonstrable software application, for analysing financial information in the form of quantitative and qualitative data.
Evaluate the benefits of the Grid approach.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID Reflections
DAME (York): Engine Behaviour Time-series + Reports in a controlled language; Case-based Reasoning;
Belfast e-Science Centre: Value at Risk Computation;
MYGRID and MIAKT: Information Extraction + Image Annotation
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID Project Team
David Cheng, Research Officer, Text Analysis; (ESRC funded)Tuğba Taşkaya-Temizel, Tutor, Grid Computing, Grid
Architect;Lee Gillam, Research Officer, Grid Implementation;
Pensiri Manumapousat, Research Student, Text Categorisation;
Saif Ahmad, Research Student, Wavelet Analysis;Hayssam Trablousi, Research Student, Named Entity
Extraction;Ademola Popoula, Research Student, Fuzzy Logic Analysis;
Gary Dear, Computing Officer, Grid Implementation;
Khurshid Ahmad, Principal Investigator;Jon Nankervis, Co-Investigator (Essex)
ESRC Funding: Fifty Thousand Pounds Sterling (Gross).
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Social science research requires the capture and analysis of data that is quantitative - numerical data - and data that is qualitative - opinions expressed in language or other sign systems.
The fusion of multi-modal information, is critical to social
sciences research.
The Problem
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
The Problem – Decision Making
Challenges: Hypothesis formation and theory development in
financial and political economics,both by researchers and financial traders, now involves
analysis of streaming time serial data and financial and political news.
The Data:Numerical data Time series
price/value movement of financial instruments;
c. 5MB/day, per instrument
Textual data Text streams different genres:
news items; financial reports; company brochures; government documents
c. 20MB/day
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Streaming Time-serial Data and News Service
STREAMING ECONOMIC/POLITICAL NEWS-
Reuters; Yahoo; Bloomberg, BBC! Al Jazeera
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
The Problem – Decision Making
• Financial and political analysis requires data over short time periods (daily) or longer time periods (5-10 years).
• This is large volume of data which requires instant processing – much like data emerging from particle or gene factories- except that the data is in two or more modalities in our case.
•The financial/political analysis requires access to data tombs (archives) and data nurseries (streaming news and time-series)
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
The Problem – Decision Making
•Decision making involves dealing with factual news (who, where, what,
when) and news related to ‘market sentiment’ news
•Decision making involves dealing with time-ordered data which lacks stochastic stability and has considerable variance changes.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Market Sentiment?
In addition to the very quantitative data related to trading volumes and price movements, the financial traders, and increasingly economists, rely on market sentiment.
Behaviour of the investors, security analysts, and financial/monetary theoreticians, is influenced by information other than market data: investor credulity; herding sentiment analysis
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Market Sentiment? MotivationBounded Rationality
Herbert Simon(Nobel Prize in Economics 1978)Rational Decision Making in Business Organisations:
Mechanisms of Bounded Rationality –failures of knowing all of the alternatives, uncertainty about relevant exogenous events, and
inability to calculate consequences .
Daniel Kahneman (Nobel Prize in Economics 2002)Maps of bounded rationality –intuitive judgement & choice:
Two generic modes of cognitive function: an intuitive mode: automatic and rapid decision making; controlled mode deliberate and
slower.
E-Economics? FINGRID?Computing at the limits of rationality
distributed multi-modal data analysis and fusion
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Market Sentiment, Behavioral PsychologyInvestor sentiment & stock market bubbles has some causal relationship with:
Baker, M., & Wurgler, J. (2003). ‘Investor sentiment and cross-section of stock returns. Proc. Conf on Investor Sentiment.
1961 -tronics mania
1967 franchise and computer ‘crazies’
1983 high tech issues
2001 dot.com
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Market Sentiment, Quantitative Behavioral
Psychology Investor sentiment can be affected by: Closed-end fund discount (CEFD); Turnover ratio (in NYSE for example) (TURN) Number of Initial Public Offerings (N-IPO); Average First Day Returns on R-IPO Equity share S Dividend Premium Age of the firm, external finance, ‘size’(log(equity))…….
A novel composite index: Sentiment = -0.358CEFDt+0.402TURNt-1+0.414NIPOt
+0.464RIPOt+0.371 St-0.431Pt-1
A very complex non-linear regression on large data sets – computed on monthly basis
Baker, M., & Wurgler, J. (2003). ‘Investor sentiment and cross-section of stock returns. Proc. Conf on Investor Sentiment.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID Contribution• Extraction of market sentiments using a ‘local
grammar’ of rise/fall, growth/decay coupled with attributed and un-attributed news (rumours).
• Automatic analysis of terminology and ontology: Financial Trading has 25 sub-domains.
• An integrated framework of time-series analysis (pre-processing, filtering, trend and seasonality, variance change) using wavelet analysis and fuzzy-logic.
• Neural network based classifiers for classifying streaming news.
• Implementation of a Grid-based solution and ‘daily’ market report service.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Fusing Qualitative and Quantitative Data Analysis
We have developed a Sentiment and Time Series: Financial analysis system (SATISFI) for visualising and correlating the sentiment and instrument time series both as text (and numbers) and graphically as well.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
What we need…
A common infrastructure: for interoperability and reusability for aggregating distributed
resources to create a single-source computing power and provides seamless access
which allows sharing geographically distributed resources
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Is Grid Computing the Solution?
IBM on Financial Grid Computing: Grid computing enables the virtualisation of distributed computing and data resources
@ IBM “What is grid computing?” http://www-1.ibm.com/grid/about_grid/what_is.shtml
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Is Grid Computing the Solution?
GRID Resource Sharing; Collaboration: Financial Economics, Sociology
of Poverty, Policy Formation Working with living data
much Grid work relates to data tombs social sciences with data nurseries
living data is unstable, incomplete, and requires at least two interdependent modalities – one compensates for the other
Software, including legacy, is in silos and its operation based on tradition. Packages come with experts!
‘Home’ punters – everybody plays the market
Speed up – factor of 5 in text analysis; 3-4 in Monte Carlo simulations
@ IBM “What is grid computing?” http://www-1.ibm.com/grid/about_grid/what_is.shtml
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID Infrastructure in Surrey
A 24-node data and compute Grid interfaced to a ‘real world’ data stream (Reuters News and Financial Time series Feed) for capturing, analysing and fusing quantitative and ‘qualitative’ data.
Reuters Feed: 2 dedicated data lines, PC and Sun for feed management and associated networking
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID Infrastructure: Reuters Financial Services Streaming Data
and News Service
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID Architecture
A 3 tier Architecture
The first tier facilitates the client in sending a request to one of the services: Text Processing Service or Time Series Service;
The second tier facilitates the execution of parallel tasks in the main cluster and is distributed to a set of slave machines (nodes);
The third tier comprises the connection of the slave machines to the data providers
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID Architecture
Streaming Textual Data
GRID Cluster24 Slaves
Streaming Numeric DataMain Cluster
Text and Time Series Service
Notify user about results
Distribute Tasks
Receive Results
Send Service Request
1
2
34
Surrey Grid•Given an allocated task, the corresponding data is retrieved from the data providers by the slave machines. •The main cluster monitors the slave machines until they have completed their tasks, and subsequently combines the interim results. •The final result is sent back to the client machine.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID TechnologyGlobus Toolkit 3.0 (based on Open Grid
Services Architecture (OGSA)) Java CogKit (Java Commodity Grid) for resource
management Languages for Development JAVA + Reuters SSL
Developer’s Kit (Java) for the connection with the Reuters streaming data
Applications Integrated: Existing statistical programs in FORTRAN
Matlab: JMatlink (adapted to Linux environment for the communication with Matlab environment)
Other Technologies: XML (NewsML) for the news information CGI for communication of Java Applet with the server side
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID Services
News Analysis: service for extracting MARKET SENTIMENT.
Correlation: Market sentiment correlation with financial time series.
Bootstrapping: service for computing standard errors, confidence intervals and hypothesis testing by a simulation of the time series or market sentiment series.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID Service: Market Sentiment
At one level market sentiment is often expressed in news reports and editorials, and ranges from views about national economies to the imminent take-overs, mergers and acquisitions and from people leaving/joining an organization to news about political and economic successes and failures.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Market Sentiment
Sentiments are expressed using metaphors.The metaphors, bullish and bearish, so-called
animal metaphors, refer to the aggressive or recessive (shy) mood of the investors and perhaps of the traders.
The sentiment words are typically used metaphorically and in general are ambiguous (‘rose’ may be used in different contexts and indeed as a proper noun).
The local grammar reduces the ambiguity by constraining the use of the sentiment words.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Market Sentiment
A finite state automata (local grammar), learnt by our system, from a news corpus, for identifying ‘sentiments’ in free text unambiguously, was used for extracting sentiment information.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Market Sentiment
A finite state automata (local grammar), was learnt by our system, from a news corpus, for identifying names of persons and organisations in free text unambiguously, was used for attributing sentiment information to people and organisations.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Case Studies & ResultsText Analysis Service
For the Brown Corpus, the number of words processed per second is similar to Hughes et al.: 7,120 versus 6,670 in a single CPU system.
Our 2-node grid implementation shows a 98% gain of performance, whereas Hughes et al. (SMP configuration, equivalent to our 2-node grid) implementation shows a 27% gain.
Relative performance of the word frequency counting experiment on the RCV1 corpus is lower than the Brown corpus - it is necessary to parse the XML files prior to processing.
Brown RCV1Words/s (1 machine) 7,120 -
Words/s (2 machines) 14,091 5,334
Words/s (4 machines) 23,944 10,532
Words/s (8 machines) 31,453 14,590
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Case Studies & ResultsText Analysis Service
A Java program for sentiment extraction has been developed.
Experiments on Reuters RCV1 corpus (2.3GB) were conducted. Significant improvement on processing time: 15.9 hours on a 4-node grid to 13.1 hours on a 8-node grid.
Text Analysis
0
100
200
300
400
500
600
1 2 4 8
# of machines
Tim
e in
sec
onds
Text Analysis (process time in ms)
Time required to process a month news with different configurations
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID Service: Fusing quantitative & qualitative
information Time serial data related to financial instruments, for
example, currency, stocks, derivatives, often exhibit nonstationarity.
In order to extract long-term trends, seasonal variation, and the random component, in a complex time-series, increasingly multi-scale analysis and fuzzy-logic is used.
The positive and negative sentiments related to a financial instrument may be ordered as a time series.
This sentiment series is then correlated with the movement of a financial instrument.
Such correlation can be used for prediction, or better still for the analysis of (volatile) movements in the market.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Fusing Qualitative and Quantitative Data Analysis
We have developed a Sentiment and Time Series: Financial analysis system (SATISFI) for visualising and correlating the sentiment and instrument time series both as text (and numbers) and graphically as well.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
FINGRID Service: Bootstrapping & Large-scale
simulations
Bootstrap method assumes that the observed data is a representative of the unknown population.
Bootstrap procedures are data-based simulation methods that estimate the distribution of estimators by re-sampling observed data.
Statistical inferences obtained from distributions of simulated data are reported to be more reliable than inferences gained from asymptotic theory when the sample size is infinitely large (MacKinnon 2002).
Bootstrap tests and Monte Carlo tests are examples of simulation-based tests.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Case Studies & Results Bootstrapping
Java-wrapped (Fortran) implementations of bootstrapping algorithm.
processing time of the bootstrapping program with different grid node configurations, starting from two-node to eight-node, was measured.
Simple Bootstrapping
0
500
1000
1500
2000
2500
1 2 4 8
# of machines
Tim
e in
se
con
ds
Bootstrap rep=500 Bootstrap rep=1000
When the number of bootstrap replications set to 1000, 1050 seconds was required on a 2- node grid; and 404 seconds on a 8-node grid
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Fusing Qualitative and Quantitative Data Analysis
We have developed a Sentiment and Time Series: Financial analysis system (SATISFI) for visualising and correlating the sentiment and instrument time series both as text (and numbers) and graphically as well.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Fusing Qualitative and Quantitative Data Analysis
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Fusing Qualitative and Quantitative Data Analysis
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Fusing Qualitative and Quantitative Data Analysis
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Fusing Qualitative and Quantitative Data Analysis
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Conclusion
We have identified the following problems that may cause performance degradation in a grid environment:
The configurations of the machines: During the distribution of tasks, we did not consider the configuration of the machines faster machines were idling while the rest were still processing.
One common data source: Network latency occurs due to the number of nodes using the same bandwidth to retrieve files.
Amdahl’s law: Amdahl’s law is applicable to our grid, where the fraction of code f, which cannot be parallelised, affects speedup factor.
Program constraints: In the task distribution process, the file size is not considered.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Conclusions
The FinGrid project has achieved three major objectives.
The project demonstrates how both quantitative and qualitative data from multiple sources can be processed, analysed, and fused.
It has raised considerable interest in the financial news information market ( Ahmad et al. 2004).
Contribution in terms of improvements to goods and services and financial software houses, and news vendors have shown interest in the project.
A Master’s level Grid Computing module has been developed based on our experience in FinGrid.
Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004
Next StepsInvestigate and evaluate Condor-G, MPICH2
and OGSA-DAI for effective job management, parallel processing and database management.
Towards a knowledge grid PARALLEL and DISTRIBUTED KNOWLEDGE DISCOVERY:
Continual analysis and fusion of text and numerical data both real-time and historical data.
KNOWLEDGE GRID SERVICES:
KNOWLEDGE RETRIEVAL: Adapt information extraction methods and systems (e.g. Surrey’s SYSTEM QUIRK) onto a GRID architecture for extended semantic analysis.
KNOWLEDGE MODELLING: Representation of non-stationary time series using Wavelet Analysis, Neural Networks and Fuzzy Logic, such that the system learns from its past experience.