1 peter fox xinformatics – itec 6961/csci 6960/erth-6963-01 week 1, january 26, 2010 introduction...
TRANSCRIPT
1
Peter Fox
Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01
Week 1, January 26, 2010
Introduction to XinformaticsCourse Scope, Assessments
Contents• Introductions
• Course Outline
• Application areas
• Logistics and resources
• Assessment and assignments
• Learning objectives, outcomes
• Introduction to Xinformatics
• Next class(es)
2
Introductions• Name, major, year
• Interests, goals, outcomes
• Have you completed any *suggested* prerequisites:– Knowledge such as that gained in a Data Base
class (e.g., CSCI-4380)– Knowledge such as that gained in a Data
Structures class (e.g., CSCI-1200)– Knowledge such as that gained in a Data
Science class (e.g. ITEC/CSCI/ERTH 6961-01)
• Questions 3
Course Outline• Introduction to Informatics• State-of-the-Art, informatics applications• Use case methodologies• Capturing the problem: Use case development and
requirement analysis• Information theory, models, tools• Foundations; semiotics, library, cognitive and social science• Information life-cycle• Information architectures (Internet, Web, Grid, Cloud) • Information Visualization, Information and Workflow
Management• Information Discovery, Information Integration• Class exercises, presentations along the way
4
Application Areas• Geoinformatics
• Astroinformatics
• Cheminformatics
• Bioinformatics
• Helioinformatics
• Healthinformatics
• Ecoinformatics
• Nursing informatics
• and the list goes on, and on5
Logistics• Class: ITEC 4960,6960/CSCI 4962,6961/ERTH-
4963,6963-01• Hours: 9am-11:50am Tuesdays• Location: SAGE 3705• Instructor: Peter Fox• Instructor contact: [email protected] , x4862• Contact hours: Mondays 3pm-4pm (or by appt)• Contact location: Winslow 2120• Wiki:
http://tw.rpi.edu/wiki/Xinformatics_%282010_Spring%29– Schedule, syllabus, reading, assignments, etc.
6
Resources• TA: Mandeep Singh• Instructor contact: [email protected] • Contact hours: by appt
7
Assessment and Assignments• Via written assignments with specific percentage of
grade allocation provided with each assignment• Via individual oral presentations with specific
percentage of grade allocation provided• Via group presentations – depending on class size• Via participation in class (not to exceed 10% of
total)• Late submission policy: first time with valid reason –
no penalty, otherwise 20% of score deducted each late day
8
Assessment and Assignments• Reading assignments
– Are given almost every week– Most are background and informational– Some are key to completing assignments– Some are relevant to the current week’s class (i.e. follow
up reading)– Others are relevant to following week’s class (i.e. pre-
reading)– Will not be tested on but we will often discuss these in
class and participation in these is taken into account
• You will progress from individual work to group work
9
Objectives• To instruct future information architects how to
sustainably generate information models, designs and architectures
• To instruct future technologists how to understand and support essential data and information needs of a wide variety of producers and consumers
• For both to know tools, and requirements to properly handle data and information
• Will learn and be evaluated on the underpinnings of informatics, including theoretical methods, technologies and best practices.
10
Learning Objectives• Through class lectures, practical sessions,
written and oral presentation assignments and projects, students should:– Understand and develop skill in Development
and Management of multi-skilled teams in the application of Informatics
– Understand and know how to develop Conceptual and Information Models and Explain them to non-experts
– Knowledge and application of Informatics Standards
– Skill in Informatics Tool Use and Evaluation11
Academic Integrity• Student-teacher relationships are built on trust. For example, students
must trust that teachers have made appropriate decisions about the structure and content of the courses they teach, and teachers must trust that the assignments that students turn in are their own. Acts, which violate this trust, undermine the educational process. The Rensselaer Handbook of Student Rights and Responsibilities defines various forms of Academic Dishonesty and you should make yourself familiar with these. In this class, all assignments that are turned in for a grade must represent the student’s own work. In cases where help was received, or teamwork was allowed, a notation on the assignment should indicate your collaboration. Submission of any assignment that is in violation of this policy will result in a penalty. If found in violation of the academic dishonesty policy, students may be subject to two types of penalties. The instructor administers an academic (grade) penalty, and the student may also enter the Institute judicial process and be subject to such additional sanctions as: warning, probation, suspension, expulsion, and alternative actions as defined in the current Handbook of Student Rights and Responsibilities. If you have any question concerning this policy before submitting an assignment, please ask for clarification.
12
Questions so far?
13
Introduction to Informatics• E.g. Bioinformatics
– Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological information generated by the scientific community. This deluge of genomic information has, in turn, led to an absolute requirement for computerized databases to store, organize, and index the data and for specialized tools to view and analyze the data.
– http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html
14
Tell us more…• Bioinformatics is the field of science in which biology,
computer science, and information technology merge to form a single discipline.
• The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.
• At the beginning of the "genomic revolution", a bioinformatics concern was the creation and maintenance of a database to store biological information, such as nucleotide and amino acid sequences.
• Development of this type of database involved not only design issues but the development of complex interfaces whereby researchers could both access existing data as well as submit new or revised data.
15
And…• Ultimately, however, all of this information
must be combined to form a comprehensive picture of normal cellular activities so that researchers may study how these activities are altered in different disease states.
• Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. 16
And…• The actual process of analyzing and interpreting
data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include:– the development and implementation of tools that enable
efficient access to, and use and management of, various types of information
– the development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets, such as methods to locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequences
17
Definitions
• Data - are pieces of information that represent the qualitative or quantitative attributes of a variable or set of variables.
• Data (plural of "datum", which is seldom used) - are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables.
• Data - are often viewed as the lowest level of abstraction from which information and knowledge are derived 18
Definitions ctd.• Information
– Representations (of facts? data?) in a form that lends itself to human use
– The word information derives from the Latin informare (in+formare) meaning to give form, shape, or character to. It is therefore to be the formative principle of, or to imbue with some specific character or quality.
• Knowledge– Check out Wikipedia…. meaning
19
Definitions ctd.
• Metadata – data about data
• Metainformation – information about information
• Documentation – integrated collection of information and metadata intended to support all aspects of data (find, access, use…)
20
Full life cycle of dataMicro
22
The Information Era: Interoperability
• managing and accessing large data sets• higher space/time resolution capabilities • rapid response requirements• data assimilation into models• crossing disciplinary boundaries.
Modern information and communications technologies are creating an “interoperable” information era in which ready access to data and information can be truly universal. Open access to data and services enables us to meet the new challenges of understand the Earth and its space environment as a complex system:
Fox CI and X-informatics - CSIG 2008, Aug 11
23
Shifting the Burden from the Userto the Provider
19 April 2023 © GEO Secretariat
slide 24
Earth is a complex system of systems
Data is required from multiple observation
networks . . . and systems . . .
19 April 2023 © GEO Secretariat
Local in-situ Networks and Systems Air pollution
measurement station
Emden, Germany
Local and national air pollution networks Venice, Italy, and Indonesia
BORROMEAN RINGSBORROMEAN RINGSThree interlinked circles that Three interlinked circles that represent inseparable parts of represent inseparable parts of the whole. Remove any one the whole. Remove any one ring and the other two fall ring and the other two fall apart. Because of this apart. Because of this property, Borromean Rings property, Borromean Rings have been used as a symbol have been used as a symbol of unity in many fields.of unity in many fields.
THE PHYSICS OF INFORMATIONTHE PHYSICS OF INFORMATION
•Information has three indivisible ingredients – Information has three indivisible ingredients – content,content, context context and and structure. structure. •The ability to automatically utilize the inherent The ability to automatically utilize the inherent structure of information is the threshold in information structure of information is the threshold in information management from hardcopy to digital media. management from hardcopy to digital media.
© 2005 EvREsearch LTD© 2005 EvREsearch LTD
EvREsearch©EvREsearch©
27
Mind the gap• As a result of finding out who is doing what,
sharing experience/ expertise, and
substantial coordination:
• There is/ was still a gap between science
and the underlying infrastructure and
technology that is available
• Cyberinfrastructure is the new research environment(s) that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet.
Informatics - information science includes the
science of (data and) information, the practice
of information processing, and the engineering
of information systems. Informatics studies the
structure, behavior, and interactions of natural
and artificial systems that store, process and
communicate (data and) information. It also
develops its own conceptual and theoretical
foundations. Since computers, individuals and
organizations all process information,
informatics has computational, cognitive and
social aspects, including study of the social
impact of information technologies. Wikipedia.
28
Progression after progression
IT Cyber
Infrastructure (CI)
Cyber Informatics
Core Informatics
Science Informatics
Science, Benefit to others
Informatics
•CI = Discipline neutral, e.g. web server, database, wiki
•Cyberinformatics = mapping to discipline neutral aspects
•Core informatics = Reasoning engine, semantics, computer science
•Science (X) informatics = Use cases, science domain terms, concepts in an ontology or controlled vocabulary
29
A moment of history
• In the late 1950’s (actually around 1957-1958 or 1962 depending on what you read) the modern informatics term was coined
• Existed for a while but then split into library science and computer science and developed their own fields, became disconnected
• Now coming back to be relevant to science• Informatics IS NOT just having a scientist work
with an “IT/ICT” person (NOT, NOT, NOT) 30
Cyberinformatics
• The first match between the domain and the underlying domain-neutral e-infrastructure/ cyberinfrastructure
• When the underlying infrastructure (when it becomes real infrastructure and not just software) changes this is one part that needs to change
• Less brittle since upper layers remain intact
31
Core informatics
• The realm of computer science (for the most part, also librarians)
• Strongly influenced by science (and medical applications) above and below this layer
• If we can leverage this, we do not need to do the specialist work, however …
• We must work with these scientists, sustainably
32
Science Informatics
• Where science meets the underlying technical capabilities and methods
• Must be expressible in science terms; increasingly use cases
• The people in this area are multi-lingual and both interdisciplinary and multi-disciplinary, few are trained or literate here ******
• Team, or really a community of practice (CoP) 33
Information theory
• Semiotics, also called semiotic studies or semiology, is the study of sign processes (semiosis), or signification and communication, signs and symbols, into three branches:– Syntactics: Relation of signs to each other in
formal structures– Semantics: Relation between signs and the
things to which they refer; their denotata– Pragmatics: Relation of signs to their impacts on
those who use them
34
Library science• Curates the artifacts of knowledge
• Organizes and manages them for consumers– Cataloging and classification
• Preservation– ‘maintaining or restoring access to artifacts,
documents and records through the study, diagnosis, treatment and prevention of decay and damage’ (wikipedia)
• Digital age– Curation and preservation
35
CLAY
PAPYRUS
PAPER
DIGITAL
TIME (years before present)
INF
OR
MA
TIO
N T
RA
NS
PO
RT
INF
OR
MA
TIO
N I
NT
EG
RA
TIO
N
INFORMATION VOLUME
STONE
HISTORY OF INFORMATION THRESHOLDSHISTORY OF INFORMATION THRESHOLDS
INFORMATION ERASINFORMATION ERAS
© 2005 EvREsearch LTD© 2005 EvREsearch LTD
FUTURE0100020003000400050006000
Cognitive Science• Cognitive science is an interdisciplinary study of
the mind and intelligence• It operates at the intersection of psychology,
philosophy, computer science, linguistics, anthropology, and neuroscience.
• Of relevance for data and information science are three significant theoretical underpinnings– mental representation,– the nature of expertise, – and intuition
• Very relevant to model, metamodel choice37
Social Science• Branch of humanities
• Especially as it relates to networks of scientists
• Exploits sociology of groups, teams
• Cultural norms as well as discipline norms– Modes of what and how rewards are given– Between those who produce and those who
consume data and information– How you collect, understand, model and design
models and architectures is as much social as technical skill
38
Use Case
• … is a collection of possible sequences of interactions between the system under discussion and its actors, relating to a particular goal.
• The collection of Use Cases should define all system behavior relevant to the actors to assure them that their goals will be carried out properly.
• Any system behavior that is irrelevant to the actors should not be included in the use cases.– is a prose description of a system's behavior when
interacting with the outside world.– is a technique for capturing functional requirements of
business systems and, potentially, of an IT system to support the business system.
– can also capture non-functional requirements
Use Case
• Must be documented (or it is useless)• Should be implemented (or it is not well
scoped)• Is used to identify: objects ~ resources,
processes, roles (aka actors), requirements, etc.
• Should iterate with your end-user on wording and details at least once
Preview of Information Models• Conceptual models, sometimes called domain
models, are typically used to explore domain concepts
• High-level conceptual models are often created as part of initial requirements envisioning efforts as they are used to explore the high-level static business or science or medicine structures and concepts.
• Conceptual models are often created as the precursor to logical models or as alternatives to them
• Followed by logical and physical models 41
Object models• A data model is a logic organization of the
real world objects (entities), constraints on them, and the relationships among objects. – A database (DB) language is a concrete syntax
for an object (data) model. – A DB system implements that model.
42
Architectures• Building on content, context,
and users, some illustrate information architecture as an iceberg.
• Just like an iceberg, the majority of information architecture work is out of sight, "below the water."
• The work includes the creation of plans, controlled-vocabularies, and blueprints all before any user interfaces are created.
43
44
Information life-cycle
45
Visualization
46
Workflow Management
47
Discovery, Integration• Discovery (mostly about libraries!)
– Digital Fluencies– Federated Search– Folksonomies– Information Literacy– Intelligent Agents– Search Engines– Taxonomies
• Integration (mostly about application tools)
48
Discussion• About informatics?
• Definitions?
• Applications?
• Components?
• Theory (we’ll start on this soon)
49
Skills needed• Modeling, theory, architecture experience?
– Nah
• Literacy with computers and applications that can handle information– Yep
• Ability to access internet and retrieve/ acquire data– Oh yea
• Presentation of assignments– Ditto 50
What is expected• Attend class, complete assignments (esp.
reading)
• Participate
• Ask questions
• Work both individually and in a group
• Work constructively in group and class sessions
• Next class Feb 2 and 9 …
51
Also on the wiki• Reading assignments – are intended to
prepare you for following lectures and may be considered materials for written assignments or project
• Assignments will be posted there– Individual– Group
• Mandeep is your first contact for assignment questions
52
What is next• Next week – State of the art
• First assignment handed out in week 3 due week 4
• Reading
53