network analysis / introduction to...

44
Mark Lombardi Network Analysis Sources of Networks Vladimir Batagelj University of Ljubljana ECPR Summer School, July 30 – August 16, 2008 Faculty of Social Sciences, University of Ljubljana version: August 5, 2008 / 02 : 03

Upload: others

Post on 28-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • '

    &

    $

    %

    Mark Lombardi

    Network AnalysisSourcesof Networks

    Vladimir Batagelj

    University of Ljubljana

    ECPR Summer School, July 30 – August 16, 2008Faculty of Social Sciences, University of Ljubljana

    version: August 5, 2008 / 02 : 03

    http://www.pierogi2000.com/flatfile/lombardi.html

  • V. Batagelj: Network Analysis / Sources of Networks 2'

    &

    $

    %

    Outline1 Some names in the development of SNA . . . . . . . . . . . . . . . . . . . . . . 1

    2 Some important events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    3 Selected Books on SNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Courses on NA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Software for SNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Roman roads (Peutinger) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    7 Moreno: Who shall survive? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Development of DNA (Garfield) . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    10 Organic molecule 3CRO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    11 Hijackers (Krebs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    12 Wall Street Follies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1213 They Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    14 Lombardi’s networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1418 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1819 Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    21 How to get a network? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    22 Complete and ego-centered networks . . . . . . . . . . . . . . . . . . . . . . . 22

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.fh-augsburg.de/\protect \unhbox \voidb@x \penalty \@M \ {}harsch/Chronologia/Lspost03/Tabula/tab_pe05.htmlhttp://www.amazon.com/o/dt/assoc/handle-buy-box=9992695722http://www.garfield.library.upenn.edu/papers/http://www.rcsb.org/pdb/cgi/explore.cgi?pid=278591076701573&pdbId=3CROhttp://www.orgnet.com/hijackers.htmlhttp://www.wallstreetfollies.com/diagrams.htmhttp://www.theyrule.net/

  • V. Batagelj: Network Analysis / Sources of Networks 3'

    &

    $

    %

    23 Use of existing network data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    25 Genealogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    28 Molecular networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    29 GraphML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    32 Approaches to computer-assisted text analysis . . . . . . . . . . . . . . . . . . . 32

    42 Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    45 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    46 Networks from the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    51 Random networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

  • V. Batagelj: Network Analysis / Sources of Networks 1'

    &

    $

    %

    Sociograph

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

  • V. Batagelj: Network Analysis / Sources of Networks 2'

    &

    $

    %

    Development of DNA (Garfield)

    In 1964 E. Garfield with collabora-tors produced, on the basis of thebook Asimov I.: The Genetic Code(1963), a corresponding ’citation’network. It was shown that the anal-ysis ’demonstrated a high degree ofcoincidence between an historian’saccount of events and the citationalrelationship between these events’.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.garfield.library.upenn.edu/papers/

  • V. Batagelj: Network Analysis / Sources of Networks 3'

    &

    $

    %

    Organic molecule 3CRO

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.rcsb.org/pdb/cgi/explore.cgi?pid=278591076701573&pdbId=3CRO

  • V. Batagelj: Network Analysis / Sources of Networks 4'

    &

    $

    %

    Hijackers (Krebs)Mapping Networks of Terrorist Cells / Krebs50

    Figure 4. Hijacker’s Network Neighborhood

    This dense under-layer of prior trusted relationships made the hijacker network both stealth andresilient. Although we don’t know all of the internal ties of the hijackers’ network it appears that manyof the ties were concentrated around the pilots. This is a risky move for a covert network. Concen-trating both unique skills and connectivity in the same nodes makes the network easier to disrupt –once it is discovered. Peter Klerks (Klerks 2001) makes an excellent argument for targeting those nodesin the network that have unique skills. By removing those necessary skills from the project, we caninflict maximum damage to the project mission and goals. It is possible that those with unique skillswould also have unique ties within the network. Because of their unique human capital and their highsocial capital the pilots were the richest targets for removal from the network. Unfortunately they werenot discovered in time.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.orgnet.com/hijackers.html

  • V. Batagelj: Network Analysis / Sources of Networks 5'

    &

    $

    %

    Wall Street Follies

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.wallstreetfollies.com/diagrams.htm

  • V. Batagelj: Network Analysis / Sources of Networks 6'

    &

    $

    %

    Lombardi’s networks

    Mark Lombardi(1951-2000)transformed businessrelations into art.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.pierogi2000.com/flatfile/lombardi.html

  • V. Batagelj: Network Analysis / Sources of Networks 7'

    &

    $

    %

    FAS: The scientific field of Austria

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.fas.at/business/en/galery/index.htm

  • V. Batagelj: Network Analysis / Sources of Networks 8'

    &

    $

    %

    Katy Börner: Text analysis

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://ella.slis.indiana.edu/~katy/

  • V. Batagelj: Network Analysis / Sources of Networks 9'

    &

    $

    %

    Ulrik Brandes: Discourse network

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.inf.uni-konstanz.de/~brandes/

  • V. Batagelj: Network Analysis / Sources of Networks 10'

    &

    $

    %

    How to get a network?Collecting data about the network N = (V,L,P,W) we have first todecide, what are the units (vertices) – network boundaries, when are twounits related – network completness, and which properties of vertices/lineswe shall consider.

    How to measure networks (questionaires, interviews, observations, archiverecords, experiments, . . . )?

    What is the quality of measured networks (reliability and validity)?

    Several networks are already available in computer readable form or can beconstructed from such data.

    For large sets of units we often can’t measure the complete network.Therefore we limit the data collection to selected units and their neighbors.We get ego-centered networks.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

  • V. Batagelj: Network Analysis / Sources of Networks 11'

    &

    $

    %

    Complete and ego-centered networks

    Pajek

    Egos Alters

    COMPLETE NETWORK EGO-CENTEREDNETWORKS

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

  • V. Batagelj: Network Analysis / Sources of Networks 12'

    &

    $

    %

    Use of existing network dataPajek supports input of network data in several formats: UCINET’sDL files, graphs from project Vega, molecules in MDLMOL, MAC, BS;genealogies in GEDCOM.

    Davis.DAT, C84N24.VGR, MDL, 1CRN.BS, DNA.BS, ADF073.MAC,Bouchard.GED.

    Several network data sets are already available in computer readable formand need only to be transformed into network descriptions.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://vlado.fmf.uni-lj.si/vlado/podstat/AO/net/Davis.dathttp://vlado.fmf.uni-lj.si/vlado/podstat/AO/net/C84N24.VGRhttp://vlado.fmf.uni-lj.si/vlado/podstat/AO/net/1CRN.bshttp://vlado.fmf.uni-lj.si/vlado/podstat/AO/net/DNA.bshttp://vlado.fmf.uni-lj.si/vlado/podstat/AO/net/ADF073.machttp://vlado.fmf.uni-lj.si/vlado/podstat/AO/net/Bouchard.ged

  • V. Batagelj: Network Analysis / Sources of Networks 13'

    &

    $

    %

    Krebs Internet industries

    Each node in the network rep-resents a company that competesin the Internet industry, 1998 do2001.n = 219, m = 631.red – content,blue – infrastructure,yellow – commerce.Two companies are connectedwith an edge if they have an-nounced a joint venture, strategicalliance or other partnership.

    URL: http://www.orgnet.com/netindustry.html. Recode,InfoRapid.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.orgnet.com/netindustry.htmlhttp://vlado.fmf.uni-lj.si/pub/networks/data/progs/RecodeLinks.ziphttp://inforapid.com/html/searchreplace.htm

  • V. Batagelj: Network Analysis / Sources of Networks 14'

    &

    $

    %

    GenealogiesFor describing the genealogies on computer most often the GEDCOMformat is used (GEDCOM standard 5.5).

    Many such genealogies (files *.GED) can be found on the Web – forexample Roper’s GEDCOMs or Isle-of-Man GEDCOMs.

    Several programs are available for preparation and maintainance of ge-nealogies: free GIM and commercial Brothers Keeper (Slovenian version isavailable at SRD).

    From the data collected in Phd. thesis:Mahnken, Irmgard. 1960. Dubrovački patricijat u XIV veku. Beograd,Naučno delo.the Ragusa network was produced.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://homepages.rootsweb.com/~pmcbride/gedcom/55gctoc.htmhttp://www.roperld.com/gedcom/http://www.isle-of-man.com/interests/genealogy/gedcom/index.htmhttp://www.gimsoft.com/http://ourworld.compuserve.com/homepages/brothers_keeper/http://www2.arnes.si/~rzjtopl/rod/rod.htmhttp://vlado.fmf.uni-lj.si/pub/networks/data/esna/ragusa.htm

  • V. Batagelj: Network Analysis / Sources of Networks 15'

    &

    $

    %

    GEDCOM

    GEDCOM is a standard for storing genealogical data, which is used tointerchange and combine data from different programs, which were usedfor entering the data.

    0 HEAD 0 @I115@ INDI1 FILE ROYALS.GED 1 NAME William Arthur Philip/Windsor/... 1 TITL Prince0 @I58@ INDI 1 SEX M1 NAME Charles Philip Arthur/Windsor/ 1 BIRT1 TITL Prince 2 DATE 21 JUN 19821 SEX M 2 PLAC St.Mary’s Hospital, Paddington1 BIRT 1 CHR2 DATE 14 NOV 1948 2 DATE 4 AUG 19822 PLAC Buckingham Palace, London 2 PLAC Music Room, Buckingham Palace1 CHR 1 FAMC @F16@2 DATE 15 DEC 1948 ...2 PLAC Buckingham Palace, Music Room 0 @I116@ INDI1 FAMS @F16@ 1 NAME Henry Charles Albert/Windsor/1 FAMC @F14@ 1 TITL Prince... 1 SEX M... 1 BIRT0 @I65@ INDI 2 DATE 15 SEP 19841 NAME Diana Frances /Spencer/ 2 PLAC St.Mary’s Hosp., Paddington1 TITL Lady 1 FAMC @F16@1 SEX F ...1 BIRT 0 @F16@ FAM2 DATE 1 JUL 1961 1 HUSB @I58@2 PLAC Park House, Sandringham 1 WIFE @I65@1 CHR 1 CHIL @I115@2 PLAC Sandringham, Church 1 CHIL @I116@1 FAMS @F16@ 1 DIV N1 FAMC @F78@ 1 MARR... 2 DATE 29 JUL 1981... 2 PLAC St.Paul’s Cathedral, London

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://homepages.rootsweb.com/~pmcbride/gedcom/55gctoc.htm

  • V. Batagelj: Network Analysis / Sources of Networks 16'

    &

    $

    %

    Network representations of genealogiesIn usual Ore graph every person is represented with a vertex; they are linked withtwo relations: are married (blue edge) and has child (black arc) – partitioned into ismother of and is father of .

    In p-graph the vertices are married couples or singles; they are linked with tworelations: is son of (solid blue) and is dauther of (dotted red). More about p-graphsD. White.

    I wife

    fathermother

    sisterbrother

    son daughter

    f-grandfather f-grandmotherm-grandfatherm-grandmother

    stepmother

    son-in-lawdaughter-in-law

    sister-in-law

    grandson

    sister

    grandson

    I & wife

    father & mother

    f-grandfather & f-grandmother m-grandfather & m-grandmother

    father & stepmother

    son-in-law & daughterson & daughter-in-law

    brother & sister-in-law Iwife

    father mother

    sister brother

    son daughter

    f-grandfather f-grandmother m-grandfather m-grandmother

    stepmother

    son-in-lawdaughter-in-law

    sister-in-law

    grandson

    I & wife

    father & mother

    f-grandfather & f-grandmother m-grandfather & m-grandmother

    father & stepmother

    son-in-law & daughterson & daughter-in-law

    brother & sister-in-law

    Ore graph, p-graph, and bipartite p-graph

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://eclectic.ss.uci.edu/~drwhite/pgraph/p-graphs.html

  • V. Batagelj: Network Analysis / Sources of Networks 17'

    &

    $

    %

    Molecular networks

    virus 1GDY: n = 39865, m = 40358

    In the Brookhaven Protein Data Bankwe can find many large organic molecula(for example: Simian / 1AZ5.pdb )stored in PDB format.They can be inspected in 3D using the pro-gram Rasmol ( RasMol, program, RasWin) or Protein Explorer.A molecule can be converted from PDBformat into BS format (supported byPajek) using the program BabelWin +Babel16.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.rcsb.org/pdb/http://www.umass.edu/microbio/rasmol/index2.htmhttp://www.umass.edu/microbio/rasmol/getras.htmhttp://www.openrasmol.org/OpenRasMol.html#Softwarehttp://molvis.sdsc.edu/protexpl/frntdoor.htmftp://ftp.brunel.ac.uk/pc/chemhttp://www.ccl.net/cca/software/UNIX/babel/babel16.zip

  • V. Batagelj: Network Analysis / Sources of Networks 18'

    &

    $

    %

    GraphMLGraphML – XML format for network description.

    L’Institut de Linguistique et Phonétique Générales et Appliquées (ILPGA),Paris III; Traitement Automatique du Langage (TAL): BaO4 : Des TextesAux Graphes Plurital

    LibXML, xsltproc download, XSLT, Xalan, Python, Sxslt.

    xsltproc GraphML2Pajek.xsl graph.xml > graph.net

    java -jar saxon8.jar graph.xml GraphML2Pajek.xsl > graph.net

    java org.apache.xalan.xslt.Process -IN p.xml -XSL m.xsl -OUT p.txt

    XSLT/Zvon

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://graphml.graphdrawing.org/http://www.cavi.univ-paris3.fr/ilpga/ilpga/tal/cours/bao4-textesauxgraphes.htmlhttp://www.cavi.univ-paris3.fr/ilpga/ilpga/tal/cours/bao4-textesauxgraphes.htmlhttp://tal-p3.wordpress.com/tag/http://www.zlatkovic.com/libxml.en.htmlhttp://www.zlatkovic.com/pub/libxml/http://www.xmlsoft.org/XSLT.htmlhttp://xml.apache.org/xalan-j/http://www.xmlsoft.org/XSLT/python.htmlhttp://www.omegahat.org/Sxslt/index.htmlhttp://zvon.org/xxl/XSLTutorial/Output/index.html

  • V. Batagelj: Network Analysis / Sources of Networks 19'

    &

    $

    %

    GraphML→ Pajek *Vertices 12 1 "a" 2 "b" 3 "c" 4 "d"

    5 "e"Label of the node NoLabel 6 "f"

    7 "g" 8 "h"

    Weight (value) of the edge 1 9 "i" 10 "j" 11 "k"

    a 12 "l"b *Edgesc 2 5d 3 4e 5 7f 6 8g *Arcsh 1 2i 2 1j 1 4k 1 6l 2 6 3 2 3 3 3 7 3 7 5 3 5 6 5 8 6 11 8 4 10 8 12 5 12 7 8 12 12 8

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

  • V. Batagelj: Network Analysis / Sources of Networks 20'

    &

    $

    %

    GraphML→ Pajek using XSLT

    *Vertices

    *Edges

    *Arcs

    "

    "

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

  • V. Batagelj: Network Analysis / Sources of Networks 21'

    &

    $

    %

    Approaches to computer-assisted text analysisR. Popping: Computer-Assisted Text Analysis (2000) distinguishes threemain aproaches to CaTA: thematic TA, semantic TA, and network TA.

    Terms considered in TA are collected in a dictionary (it can be fixed inadvance, or built dynamically). The main two problems with terms areequivalence (different words representing the same term) and ambiguity(same word representing different terms). Because of these the coding –transformation of raw text data into formal description – is done mainlymanually or semiautomaticly. As units of TA we usually consider clauses,statements, paragraphs, news, messages, . . .

    Till now the thematic and semantic TA mainly used statistical methods foranalysis of the coded data.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.amazon.com/exec/obidos/tg/detail/-/0761953795/

  • V. Batagelj: Network Analysis / Sources of Networks 22'

    &

    $

    %

    . . . approaches to CaTA

    In thematic TA the units are coded as rectangular matrixText units × Concepts which can be considered as a two-mode network.

    Examples: M.M. Miller: VBPro, H. Klein: Text Analysis/ TextQuest.

    In semantic TA the units (often clauses) are encoded according to the S-V-O(Subject-Verb-Object) model or its improvements.

    subject

    verb

    object

    Examples: Roberto Franzosi; KEDS, Tabari.

    This coding can be directly considered as network with Subjects ∪ Objectsas vertices and lines labeled with Verbs.

    See also RDF triples in semantic web.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://mmmiller.com/vbpro/vbpro.htmlhttp://www.textanalysis.info/http://www.textquest.de/eindex.htmlhttp://www.fp.rdg.ac.uk/sociology/people/academic/roberto/publications/socmeth89.pdfhttp://web.ku.edu/keds/http://web.ku.edu/keds/software.dir/tabari.htmlhttp://en.wikipedia.org/wiki/Resource_Description_Frameworkhttp://www.w3.org/RDF/

  • V. Batagelj: Network Analysis / Sources of Networks 23'

    &

    $

    %

    Network CaTA

    TextAnalyst’s ’semantic network’

    This way we already steped intothe network TA.Examples:Carley: Cognitive maps,J.A. de Ridder: CETA,Megaputer: TextAnalyst.

    See also: W. Evans: Computer Environments for Content Analysis, K.A.Neuendorf: The Content Analysis Guidebook / Online and H.D. White:Publications.

    There are additional ways to obtain networks from textual data.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.megaputer.com/products/ta/tutorial/textanalyst_tutorial_1.htmlhttp://www.casos.ece.cmu.edu/automap/index.htmlhttp://www.lib.uni-miskolc.hu/lib/archive/szoft/leirasok/ceta/ceta-2.htmhttp://www.megaputer.com/products/ta/index.php3http://www.press.uillinois.edu/epub/books/burton/ch04.htmlhttp://www.sagepub.com/book.aspx?pid=4808http://academic.csuohio.edu/kneuendorf/content/http://www.cis.drexel.edu/faculty/HUD.Web/HDWpubs.html

  • V. Batagelj: Network Analysis / Sources of Networks 24'

    &

    $

    %

    TA – Dictionary networks

    book description in ODLIS

    In a dictionary graph the terms determinethe set of vertices, and there is an arc(u, v) from term u to term v iff the term vappears in the description of term u.Online Dictionary of Library and Infor-mation Science ODLIS, Odlis.net (2909 /18419).Free On-line Dictionary of Comput-ing FOLDOC, Foldoc2b.net (133356 /120238).Artlex, Wordnet, ConceptNet, OpenCyc.

    The Edinburgh Associative Thesaurus (EAT) / net; NASA Thesaurus.

    Paper.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.wcsu.edu/library/odlis.html#Bookhttp://www.wcsu.edu/library/odlis.htmlhttp://vlado.fmf.uni-lj.si/pub/networks/data/dic/Odlis.ziphttp://www.nightflight.com/foldoc/http://vlado.fmf.uni-lj.si/pub/networks/data/dic/foldoc.ziphttp://www.artlex.com/http://www.cogsci.princeton.edu/~wn/http://web.media.mit.edu/~hugo/conceptnet/http://www.opencyc.org/http://monkey.cis.rl.ac.uk/Eat/htdocs/http://vlado.fmf.uni-lj.si/pub/networks/data/dic/EAT/EAT.htmhttp://www.sti.nasa.gov/products.html#pubtoolshttp://nl.ijs.si/isjt02/zbornik/sdjt02-24abatagelj.pdf

  • V. Batagelj: Network Analysis / Sources of Networks 25'

    &

    $

    %

    TA – Citation networks

    In a citation graph the vertices are differ-ent publications from the selected area;two publications are connected by an arcif the first is cited by the second. The ci-tation networks are almost acyclic.E. Garfield: HistCite / Pajek, papers.An example of very large citation net-work is US Patents / Nber,n = 3774768, m = 16522438.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.garfield.library.upenn.edu/histcomp/index.htmlhttp://vlado.fmf.uni-lj.si/pub/networks/data/cite/http://www.garfield.library.upenn.edu/http://patft.uspto.gov/netahtml/srchnum.htmhttp://www.nber.org/patents/

  • V. Batagelj: Network Analysis / Sources of Networks 26'

    &

    $

    %

    TA – Collaboration networksUnits in a collaboration network are usually indi-viduals or institutions. Two units are related if theyproduced a joint work. The weight is the number ofsuch works.A famous example of collaboration network is TheErdős Number Project, Erdos.net.A rich source of data for producing collaborationnetworks are the BibTEX bibliographies Nelson H.F. Beebe’s Bibliographies Page.For example B. Jones: Computational geometrydatabase (2002), FTP, Geom.net.An initial collaboration network from such data canbe produced using some programming. Then fol-lows a tedious ’cleaning’ process.

    Interesting datasets: The Internet Movie Database and Trier DBLP.

    Both citation and collaboration networks can be obtained from Web of Science using

    WoS2Pajek.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.acs.oakland.edu/~grossman/erdoshp.htmlhttp://www.acs.oakland.edu/~grossman/erdoshp.htmlhttp://vlado.fmf.uni-lj.si/pub/networks/data/Erdos/Erdos02.nethttp://www.math.utah.edu/~beebe/bibliographies.htmlhttp://www.math.utah.edu/~beebe/bibliographies.htmlhttp://compgeom.cs.uiuc.edu/\char 126\relax jeffe/compgeom/biblios.htmlhttp://compgeom.cs.uiuc.edu/\char 126\relax jeffe/compgeom/biblios.htmlftp://ftp.cs.usask.ca/pub/geometry/http://vlado.fmf.uni-lj.si/pub/networks/data/collab/geom.nethttp://www.imdb.com/interfaces#plainhttp://www.informatik.uni-trier.de/~ley/db/http://portal.isiknowledge.com/portal.cgihttp://vlado.fmf.uni-lj.si/pub/networks/pajek/WoS2Pajek/default.htm

  • V. Batagelj: Network Analysis / Sources of Networks 27'

    &

    $

    %

    TA – International Relations

    Paul Hensel’s International Relations Data Site,

    International Conflict and Cooperation Data,

    Correlates of War,

    Kansas Event Data System KEDS,

    KEDS in Pajek’s format.

    Recoding programs in R.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://garnet.acns.fsu.edu/~phensel/data.htmlhttp://garnet.acns.fsu.edu/~phensel/intlconf.htmlhttp://cow2.la.psu.edu/http://web.ku.edu/keds/http://vlado.fmf.uni-lj.si/pub/networks/data/KEDS/http://vlado.fmf.uni-lj.si/pub/networks/data/KEDS/KEDS.zip

  • V. Batagelj: Network Analysis / Sources of Networks 28'

    &

    $

    %

    Recoding of KEDS/WEIS data in Pajek’s format% Recoded by WEISmonths, Sun Nov 28 21:57:00 2004% from http://www.ku.edu/˜keds/data.dir/balk.html*vertices 3251 "AFG" [1-*]2 "AFR" [1-*]3 "ALB" [1-*]4 "ALBMED" [1-*]5 "ALG" [1-*]...

    318 "YUGGOV" [1-*]319 "YUGMAC" [1-*]320 "YUGMED" [1-*]321 "YUGMTN" [1-*]322 "YUGSER" [1-*]323 "ZAI" [1-*]324 "ZAM" [1-*]325 "ZIM" [1-*]*arcs :0 "*** ABANDONED"*arcs :10 "YIELD"*arcs :11 "SURRENDER"*arcs :12 "RETREAT"...

    *arcs :223 "MIL ENGAGEMENT"*arcs :224 "RIOT"*arcs :225 "ASSASSINATE TORTURE"*arcs224: 314 153 1 [4] 890402 YUG KSV 224 (RIOT) RIOT-TORN212: 314 83 1 [4] 890404 YUG ETHALB 212 (ARREST PERSON) ALB ETHNIC JAILED IN YUG224: 3 83 1 [4] 890407 ALB ETHALB 224 (RIOT) RIOTS123: 83 153 1 [4] 890408 ETHALB KSV 123 (INVESTIGATE) PROBING...

    42: 105 63 1 [175] 030731 GER CYP 042 (ENDORSE) GAVE SUPPORT212: 295 35 1 [175] 030731 UNWCT BOSSER 212 (ARREST PERSON) SENTENCED TO PRISON43: 306 87 1 [175] 030731 VAT EUR 043 (RALLY) RALLIED13: 295 35 1 [175] 030731 UNWCT BOSSER 013 (RETRACT) CLEARED121: 295 22 1 [175] 030731 UNWCT BAL 121 (CRITICIZE) CHARGES122: 246 295 1 [175] 030731 SER UNWCT 122 (DENIGRATE) TESTIFIED121: 35 295 1 [175] 030731 BOSSER UNWCT 121 (CRITICIZE) ACCUSED

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

  • V. Batagelj: Network Analysis / Sources of Networks 29'

    &

    $

    %

    . . . Recoding programs in RTo recode the KEDS/WEIS data we used short programs in R, such as thefollowing one:# WEISmonths# recoding of WEIS files into Pajek’s multirelational temporal files# granularity is 1 month# ------------------------------------------------------------------# Vladimir Batagelj, 28. November 2004# ------------------------------------------------------------------# Usage:# WEISmonths(WEIS_file,Pajek_file)# Examples:# WEISmonths(’Balkan.dat’,’BalkanMonths.net’)# ------------------------------------------------------------------# http://www.ku.edu/˜keds/data.html# ------------------------------------------------------------------

    WEISmonths

  • V. Batagelj: Network Analysis / Sources of Networks 30'

    &

    $

    %

    . . . Recoding programs in Rrecode

  • V. Batagelj: Network Analysis / Sources of Networks 31'

    &

    $

    %

    NeighborsLet V be a set of multivariate units and d(u, v) a dissimilarity on it. Theydetermine two types of networks:

    The k-nearest neighbors network: N (k) = (V,A, d)

    (u, v) ∈ A ⇔ v is among k nearest neighbors of u, w(u, v) = d(u, v)

    The r-neighbors network: N (r) = (V, E , d)

    (u : v) ∈ E ⇔ d(u, v) ≤ r, w(u, v) = w(v, u) = d(u, v)

    These networks provide a link between data analysis and network analysis.Efficient algorithms ?!

    Fisher’s Iris data.

    Details on Multivariate networks and procedures in R.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://vlado.fmf.uni-lj.si/pub/networks/data/mix/iris.ziphttp://vlado.fmf.uni-lj.si/pub/networks/doc/seminar/netanaE.pdf

  • V. Batagelj: Network Analysis / Sources of Networks 32'

    &

    $

    %

    Nearest k neighbors in Rk.neighbor2Net

  • V. Batagelj: Network Analysis / Sources of Networks 33'

    &

    $

    %

    Fisher’s Irises

    Draw/Draw-Partition-2Vectors

    The size of vertices is proportional to normalized (Sepal.Length, Sepal.Width) and (Petal.Length,

    Petal.Width). The color of vertices is determined by the original partition. Iris data.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://vlado.fmf.uni-lj.si/pub/networks/data/mix/iris.zip

  • V. Batagelj: Network Analysis / Sources of Networks 34'

    &

    $

    %

    TransformationsWords graph – words from a given set are vertices; two words are relatediff one can be obtained from the other by change (add, delete, replace) of asingle character. DIC28, Paper.

    Text network – vertices are (selected) words from a given text; two words arerelated if they coappeared in the selected type of ’window’ (same sentence,k consecutive words, . . . ) The weights count such coappearances. ExampleCRA.

    Game graph – vertices are states in the game; two states are linked with anarc if the rules of the game allow the transiton from first to the second state.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://vlado.fmf.uni-lj.si/pub/networks/data/dic/dic28.ziphttp://nl.ijs.si/isjt02/zbornik/sdjt02-24bbatagelj.pdfhttp://locks.asu.edu/terror/

  • V. Batagelj: Network Analysis / Sources of Networks 35'

    &

    $

    %

    Networks from the Internet

    KartOO network

    Internet Mapping Project.Links among WWW pages.KartOO, TouchGraph.Derived from archives of E-mail, blogs, . . . , server’s logs.Cybergeography, CAIDA.Tools: MedlineR, SocSciBot.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.kartoo.com/http://research.lumeta.com/ches/map/index.htmlhttp://www.kartoo.com/http://www.touchgraph.com/TGGoogleBrowser.htmlhttp://www.cybergeography.org/http://www.caida.org/http://www.dbsr.duke.edu/pub/MedlineR/MedlineR_slide.PDFhttp://socscibot.wlv.ac.uk/help/tutorial.htm

  • V. Batagelj: Network Analysis / Sources of Networks 36'

    &

    $

    %

    Collecting Networks from WWW

    Web wrappers are special programs for collecting information from webpages – often returned in XML format.

    Examples in R: Titles of patents from Nber, Books from Amazon.

    Several tools for automatic generation of wrappers: (paper / list / LAPIS).

    Free programs: XWRAP (description / page) in TSIMMIS (description /page).

    Among commercial programs it seems the best is lixto.

    Additional URLs 1, 2, 3.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://vlado.fmf.uni-lj.si/vlado/podstat/AO/PatNamesR.ziphttp://vlado.fmf.uni-lj.si/vlado/podstat/AO/AmazonR.zipftp://ftp.wifo.uni-mannheim.de/pub/PEOPLE/kuhlins/paper/wrapper.pdfhttp://www.wifo.uni-mannheim.de/~kuhlins/wrappertools/index.htmlhttp://graphics.csail.mit.edu/lapis/http://www.cse.ogi.edu/~lingliu/Papers/xwrapTechRep.pshttp://www.cc.gatech.edu/projects/disl/XWRAPElite/http://www-db.stanford.edu/pub/papers/extract.pshttp://www-db.stanford.edu/tsimmis/tsimmis.htmlhttp://www.lixto.com/http://www.isi.edu/sims/naveen/sig.pshttp://www.cs.uku.fi/research/publications/reports/A-2002-2.ps.gzftp://ftp.cs.sunysb.edu/pub/TechReports/kifer/data-extraction.ps.gz

  • V. Batagelj: Network Analysis / Sources of Networks 37'

    &

    $

    %

    Networks from Amazon in Ramazon

  • V. Batagelj: Network Analysis / Sources of Networks 38'

    &

    $

    %

    while (length(books)>0){bk

  • V. Batagelj: Network Analysis / Sources of Networks 39'

    &

    $

    %

    Networks from Amazon – books on SNABooks in SNA from Amazon, 10.november 2006; Starting point P.Doreian &: Generalized Block-modeling.SVG picture. Files/ZIP.The program in R is just a skele-ton. Possible improvements: listof starting points; continuation af-ter interrupts; . . .

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://www.amazon.com/exec/obidos/tg/detail/-/0521840856?v=glancehttp://www.amazon.com/exec/obidos/tg/detail/-/0521840856?v=glancehttp://vlado.fmf.uni-lj.si/pub/networks/docs/http://vlado.fmf.uni-lj.si/pub/networks/docs/

  • V. Batagelj: Network Analysis / Sources of Networks 40'

    &

    $

    %

    Random networksSeveral types of networks can be produced randomly using special genera-tors. The theoretical background of these generators is beyond the goals ofthis workshop.

    Some of them are implemented in Pajek underNet / Random network

    but can be also described by the following functions in R.

    Available is also a program GeneoRnd for generating random genealogies.

    ECPR Summer School, Ljubljana, July 30 – August 16, 2008 s s y s l s y ss * 6

    http://vlado.fmf.uni-lj.si/pub/networks/doc/preprint/BBrnd.pdfhttp://vlado.fmf.uni-lj.si/pub/networks/progs/random/http://vlado.fmf.uni-lj.si/pub/networks/pajek/howto/geneoRnd.htm

  • V. Batagelj: Network Analysis / Sources of Networks 41'

    &

    $

    %

    Random undirected graph of Erdős-Rényi typedice