accelerating the pace of discoverygate250.com/tc2/internet2 wp_final draft.pdfaccelerating the pace...

7
Accelerating the Pace of Discovery In the Age of Data Intensive Science

Upload: others

Post on 19-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerating the Pace of Discoverygate250.com/tc2/Internet2 WP_Final Draft.pdfAccelerating the Pace of Discovery In the Age of Data Intensive Science “The flood of sequence data,

Accelerating the Pace of DiscoveryIn the Age of Data Intensive Science

Page 2: Accelerating the Pace of Discoverygate250.com/tc2/Internet2 WP_Final Draft.pdfAccelerating the Pace of Discovery In the Age of Data Intensive Science “The flood of sequence data,

Accelerating the Pace of DiscoveryIn the Age of Data Intensive Science

“The flood of sequence data, human and non-human that may impact human health, is certainly growing and in need of being integrated, mined, and understood.”

Dr. Jack Collins, Director of the Advanced Biomedical Computing Center at the Frederick National Laboratory for Cancer Research, NCI

Science today is increasingly driven by

Big Data and the need for collaboration.

Sometimes the data is generated in

one place – CERN, for example – and must

be distributed to ten thousand far-flung

researchers for analysis. Other times, the data

is generated by thousands of geographically

dispersed instruments – DNA sequencers, for

example – and is most useful when aggregated.

Consider the way researchers use the Cancer Genome Hub (CGHub). CGHub hosts The Cancer Genome Atlas (TCGA)1 – a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI).

In mid-March 2015, CGHub held roughly 2.1 Petabytes2 of genomic data with researchers contributing more every day. Numerous sequencers spread around the U.S. and internationally contribute data to the TCGA database. Storing this data is just the beginning. Sifting this treasure trove for insight requires ready access to the data by the worldwide cancer researcher community.

Dr. Jack Collins, Director of the Advanced Biomedical Computing Center at the Frederick National Laboratory for Cancer Research, NCI, sums up the challenge nicely, “The flood of sequence data, human and non-human that may impact human health, is certainly growing and in need of being integrated, mined, and understood. Further, there are emerging technologies in imaging and high resolution structure studies that will be generating a huge amount of data that will need to be analyzed, integrated, and understood.”3

Managing Big Data is a challenge for an organization regardless of its size. A single researcher with access to a gene sequencer can create an avalanche of data. Data repositories like CGHub are increasingly turning to cloud computing to store massive datasets and co-locate them

>>>> INNOVATION BRIEF <<<<

CLICK HERE TO DOWNLOAD INFOGRAPHIC

Page 3: Accelerating the Pace of Discoverygate250.com/tc2/Internet2 WP_Final Draft.pdfAccelerating the Pace of Discovery In the Age of Data Intensive Science “The flood of sequence data,

with HPC resources and analytic tools. Moving data to a capable cloud, providing sustainable storage, and sharing data among collaborators present major challenges for data intensive research.

Internet2 can help life sciences organizations of all sizes tackle many of these challenges. Perhaps best known for its pioneering success in developing high-speed networks for research and education (R&E) nearly 20 years ago, Internet2 has since grown into an expansive technology ecosystem able to help with not only high-performance networking needs, but also with broader technology development and domain-specific collaboration.

“The big picture is that the Internet2 community, with all products taken into consideration, has the ability to enhance the way researchers work, accelerating workflows, shortening the time-to-discovery, and providing flexibility that traditional networks cannot deliver,” says Michael Sullivan, M.D., from Internet2’s Office of the Chief Technology Officer.

Internet2 has long played an important role in biomedical research, initially among universities and government, and more recently within the commercial sector. Moreover, Internet2 is well positioned to serve the life science community’s unusual advanced networking requirements typified by high speed and bursty transmission of massive datasets.

“Internet2 members have high-performance networking options designed to support research collaboration simply not available elsewhere,” says Rodney Wilson, Senior Director, External Research, Ciena, a long-time Internet2 member and key technology provider.

The Internet2 value proposition has many components – collaboration with a large community of scientists and technologists, access points to supercomputing and HPC resources, direct peering agreements with major cloud providers, and powerful networking tools to name just a few – but it starts with the network itself which includes a 100G nationwide backbone with total capacity of 8.8 Terabits and extensive links US universities, federally funded research and federal agencies as well as to other global high-speed R&E networks.

THE INTERNET2 NETWORKThere are several ‘flavors’ of networking services available to Internet2 members allowing them to customize cyber-infrastructure as needed. The three core offerings include:

• Advanced Layer 3 Services (AL3S)

• Advanced Layer 2 Services (AL2S)

• Advanced Layer 1 Services (AL1S)

INTERNET2 NETWORK INFRASTRUCTURE TOPOLOGY October 2014

INTERNET2 NETWORK BY THE NUMBERS

17 Juniper MX960 routers supporting Layer 3 Service

34 Brocade and Juniper switches supporting Layer 2 Service

62 Custom collocation facilities

250+ Amplification racks

15,717 miles of newly acquired dark fiber

8.8 TBPS of optical capacity 100 GBPS of hybrid

Layer 2 and Layer 3 capacity

300+ Ciena Activeflex 500 network elements

2,400 miles partnered with Zayo Communications in

support of the Northern Tier Region

Page 4: Accelerating the Pace of Discoverygate250.com/tc2/Internet2 WP_Final Draft.pdfAccelerating the Pace of Discovery In the Age of Data Intensive Science “The flood of sequence data,

AL1S, AL2S, and AL3S present members with a full menu of options to meet virtually every type of high-bandwidth networking requirement (for a more complete list of networking services, see supplement on last page, Delivering Uninhibited Performance). A brief summary of the key distinguishing features of the core services paints a clearer picture.

ADVANCED LAYER 3 SERVICES (AL3S) are the ubiquitous high-performance, high-speed IP network over a highly reliable, globally interconnected fabric. AL3S includes TransitRail-Commercial Peering Services (TR-CPS) which provides high performance, low latency, and efficient (1 hop) access to some of the top content destinations in the world including Google, Yahoo, NetFlix, and other commercial content providers. The service supports IPv4, IPv6 and multicast.

“Layer 3 is what people traditionally think of when thinking about Internet2. It’s the ubiquitous IP network and there are many use cases,” says George Loftus, Associate Vice President, Network Services, Internet2.

ADVANCED LAYER 2 SERVICES (AL2S) deliver wide area 100 gigabit Ethernet technology that allows for flexible, user-controlled provisioning of virtual networks across the AL2S backbone. Static or Dynamic, point-to-point or multipoint, intra-domain or inter-domain, AL2S puts control of the backbone VLANs into the users’ hands for the creation of purpose-built private circuits using infrastructure already in place.

“Layer 2 provides cost effective internal network connections between internal resources as well as ad hoc arrangements with external collaborators. It is very flexible with the user maintaining control. Because it is a user provisioned service, AL2S users will have an excellent idea of how their traffic is moving through the network at any time because they actually set it up,” says Loftus.

Within the life sciences community, AL2S has gained considerable traction. “It’s very cost effective and allows R&D groups within a corporate setting to deploy a specialized science network to work directly with collaborative partners and still be able to tell their internal corporate IT, ‘We’re not really messing with primary internet access or the network access of the corporation,’ we’re setting up a specialized science network that is going to be utilized to work directly with our collaborative partners,’” says Loftus.

ADVANCED LAYER 1 SERVICES (AL1S) are the most specialized and cost-effective way to build a custom, high-capacity network. AL1S is a state-of-the-art network at 10 gigabit, 100 gigabit – and eventually 1 terabit speeds. Internet2’s 100 gigabit networking technology national fiber network, optical system, and network operations center (NOC) provide a set of strategic services for organizations seeking to offer their constituents the most reliable, high-capacity network solution.

“This optical wave network essentially allows for dedicated point waves across the Internet2 backbone. Researchers often have multiple collaboration centers they need to connect directly and they also want to have a very deterministic service between those two centers,” says Loftus. Organizations requiring rigorous security can use Internet2 ‘dark fiber’ capabilities to extend their existing dark fiber deployment in a metropolitan area across the country for a fraction of the cost that would be required to do it themselves.

VIGILANT NOCDelivering consistent performance is a necessity and a priority. Member researchers transmit data at extremely high rates such as 6, 8, and 9 Gbp/s. It’s not possible to do that even with an infrequent loss rate. Dropping just one packet in ten thousand, for example, can slow transmission to 1.5-to-2 Gbp/s.

In addition to a vigilant network operations center (NOC), Internet2 also provides deep transparency to users. This is an important feature for life sciences researchers who tend to rely on a wide range of commercial, open source, and custom in-house-developed applications. Information about the traffic and status of the network are made available to the pertinent users.

“If you are a researcher and are developing an application, you are actually able to go and see how the interface that you’re traversing is configured and troubleshoot any incompatibilities or issues. All of this is designed to give users as much information as they can use,” says Loftus.

Members are able to take advantage of Internet2’s comprehensive set of analytics tools (See supplement on

Researchers often have multiple collaboration centers they need to connect directly and they also want to have a very deterministic service between those two centers.”George Loftus, Associate Vice President, Network Services, Internet2

Page 5: Accelerating the Pace of Discoverygate250.com/tc2/Internet2 WP_Final Draft.pdfAccelerating the Pace of Discovery In the Age of Data Intensive Science “The flood of sequence data,

last page, Optimizing Your Network Investment) to monitor and manage network performance. Direct peering and pre-negotiated rates are also available to a large number of cloud computing and storage providers (Cloud Services & Applications) such as Amazon, Microsoft Azure, Box, Google and many others.

TRUST MANAGEMENTEffective trust and identity management are always important and nowhere more so than in the heavily regulated and intensely competitive life sciences and healthcare domains. Internet2 operates the InCommon trust federation for the U.S. research and education community, offering certificates, assurance, multifactor authentication, among other features.

InCommon is a federated identity management framework that allows single sign-on at home organizations and allows a very granular level of control of the data by the data owner and releases the data owner from having to manage individual user name passwords and permissions.

So for example, the data owner sets access attributes for the data they are making available. If the requester satisfies those attributes, then the requester gains access to the data. The attributes can be very broad or very granular – as a hypothetical example, the attribute might be as broad as you are a student at a particular university or as narrow as you are particular professor in a specific department in that university. Internet2 offers a full complement of Trust, Identity, & Middleware tools.

There is perhaps no technology ecosystem better than the R&E community at fostering collaboration within and between disciplines. Scientists, technologists, and other leaders work together to develop enabling technology and advance research. By developing solutions to networking problems, working on best practices within domains or adapting approaches from another discipline, key human connections and advanced technology solutions enable new levels of collaboration and discovery.

The Ciena experience is an illustrative. “We use a general method

that I call explore–qualify-transition,” says Wilson. “It’s much of what I do globally in terms of our collaborations. Through Internet2, we are able to gain perspectives from end users, Big Data scientists, and a handful of people actually working on the next generation of computer technology. All of those help influence our product directions and in turn we’re able to help

them accomplish scientific goals,” says Wilson.

LIFE SCIENCE COLLABORATIONIn one instance, Ciena collaborated with researchers at UCSD who are leaders in ultra high definition visualization techniques. They were having trouble achieving both high resolution and low latency. The latter was causing a loss of information. Working with Ciena, a solution was developed for the project and the technology was later incorporated into a Ciena product.

“We were able to enhance one of our interface cards specifically for this ultra high resolution videography or imagery. Furthermore because some of this information was confidential and they wanted to protect it in novel ways, we were able to augment the interface with encryption,” says Wilson.

Another example is a pilot program conducted last year between Indiana University and the University of Delaware. It was a genomics project and the collaborators were faced with the challenge of moving large data sets across the network. Working together and with Cisco and Fusion IO (now SanDisk) the problem was solved and has been turned into a use case for transferring genomics data across Internet2, particularly with regard to the large file size.

Of course, Internet2 network maintenance and improvement are ongoing efforts—continually pursuing disruptive innovation. Ciena is currently working with Internet2 researching new network control mechanisms. “Networks are becoming increasingly dynamic and users want to be able to more creatively commandeer things and gear network capacity to their particular requirements. Internet2 and Ciena have developed a test-bed for this purpose,” says Wilson.

The Internet2 community and network also has deep links into high-performance science through its association with organizations such as XSEDE (Extreme Science and Engineering Discovery Environment supported by NSF). In fact, Internet2’s connections to HPC and deep science organizations can potentially provide members with access to resources otherwise out of their reach.

IT expertise within the life sciences is in high demand, so the ability to access and collaborate within the Internet2 community on networking issues is a powerful benefit not

Page 6: Accelerating the Pace of Discoverygate250.com/tc2/Internet2 WP_Final Draft.pdfAccelerating the Pace of Discovery In the Age of Data Intensive Science “The flood of sequence data,

easily replicated elsewhere. Also, Internet2 has a regular schedule of member meetings, often including specific domain end-user gatherings to discuss science, best practices, and technology development.

CONCLUSIONBig Data and collaboration have become an essential reality in life sciences. Indeed, it’s not an understatement to say the initial sequencing of the human genome was as much a tour de force of HPC and global collaboration as of new sequencing technology. Stitching all those millions of sequenced snippets, generated at

separate sequencing centers around the world, would not have otherwise been possible.

Today, the amount of life science data being generated weekly vastly dwarfs the datasets generated by the entire 10-year Human Genome Project. Moving and analyzing today’s flood of data requires very high speed and high reliability networks, powerful HPC assets, and ubiquitous collaboration. There is no other way.

The data intensive nature of life science activities will only grow. Squeezed by macro economic pressures and thin drug pipelines, drug developers and the biomedical research and healthcare communities are all under pressure to seek cost-effective ways to meet their technology needs. The Internet2 technology ecosystem provides cost effective, flexible and resilient connectivity solutions to meet the requirements of research collaboration, today and beyond.

ABOUT INTERNET2Founded in 1996, Internet2 is a not-for-profit community of U.S. and international leaders in research, academia, industry and government who create and collaborate via innovative technologies. Its mission is to accelerate research discovery, advance national and global education, and improve the delivery of public services. Internet2 comprises: 252 U.S. universities; 82 leading corporations; 68 affiliate members, including government agencies; 41 regional and state education networks.

Internet2 worked in partnership with Ciena to deploy a nationwide 100G network delivering a total capacity of 8.8 terabits. The Internet2 Network, powered by Ciena’s packet-optical solutions, enables scientists and researchers across the U.S. and globally to collaborate without limits, supporting the next generation of discoveries and innovations.

“Internet2 members have high-performance networking options designed to support research collaboration simply not available elsewhere.”

Rodney Wilson, Senior Director, External Research, Ciena

1 The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. TCGA is a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), two of the 27 Institutes and Centers of the National Institutes of Health, U.S. Department of Health and Human Services.

2 https://cghub.ucsc.edu/summary_stats.html; http://cancergenome.nih.gov/abouttcga

3 Presented on a panel at Leverage Big Data conference, March 2015; http://www.leveragebigdata.com;

4 http://www.internet2.edu/about-us/

ENDNOTES:

Page 7: Accelerating the Pace of Discoverygate250.com/tc2/Internet2 WP_Final Draft.pdfAccelerating the Pace of Discovery In the Age of Data Intensive Science “The flood of sequence data,

Network Offering

DELIVERING UNINHIBITED PERFORMANCEBuilt by and for the research and education community, the Internet2 Network offers 8.8 Terabits of capacity and 100 gigabit Ethernet technology on its entire footprint. Offering uninhibited performance on a deeply programmable platform, the Internet2 Network is propelling research and education forward.

The Internet2 Network also provides transparent operations and is continually being evaluated and optimized by the community to meet the unique and evolving needs of high-speed research and collaboration.

Here’s a snapshot of Intenet2 networking resources with links to more information:

ADVANCED LAYER 3 SERVICESAdvanced Layer 3 Services deliver a specialized research and education community network solution. Designed specifically for their unique needs and optimized for the most demanding, high performance peak-plus-potential network traffic - this service delivers a highly reliable, globally interconnected fabric.

ADVANCED LAYER 2 SERVICESAdvanced Layer 2 Services provide scalable and flexible global access to an open exchange network where members can support the most demanding data-intensive science or production applications by building short or long-term Layer 2 circuits between endpoints on the Internet2 Network and beyond.

ADVANCED LAYER 1 SERVICESAdvanced Layer 1 Services provide highly specialized and cost-effective optical networking services to build a custom, high-capacity network on the most advanced research collaboration platform in the world.

GLOBAL SERVICESAs global collaborations grow in number, complexity and importance, MAN LAN and other international exchange points allow member institutions to bring traffic effectively into the Internet2 Network and to its connectors.

CUSTOM NETWORKSInternet2 has extensive expertise in developing network capacity planning strategies for our members, resulting in custom network solutions that integrate the right blend of Internet2 Layer1, Layer2 and Layer3 services.

Optimizing Your Network InvestmentInternet2 has the performance monitoring tools to help network engineers quickly identify service problems across national and international networks. The Internet2 community helped lead the collaborative effort to develop one of the industry’s most popular tools for end-to-end monitoring and troubleshooting of network resources, perfSONAR, which is currently deployed in more than 100 locations around the world.

DEEPFIELD ANALYTICS SERVICEThis new cloud intelligence solution allows Internet2 members to track, model and visualize their use of the Internet2 Network. Both predefined and custom reports are available to help inform network, traffic and cloud decisions, and provide information for grant applications, statutory reporting requirements, and a variety of other needs.

PERFORMANCE ASSURANCEBuilding on Internet2’s tradition of transparent operations, the Performance Assurance Service gives members visibility into network service performance and availability metrics, providing a

framework for operational monitoring of network services, including hard and soft failures of the network, high- and low-level services, circuit provisioning and NET+ cloud solution access.

PERFORMANCE TOOLSThese dependable tools enable the installation, configuration, and management of performance monitoring software—available as both source code and binary packages in the RPM format. (A single, easy to install package is used to configure your CentOS 6 based Linux system to access software downloads.)