bonus edition: data management and the cloud1).pdf · bonus edition: data management and the cloud...

15
Cambridge Healthtech Media Group BONUS EDITION : Data Management and the Cloud www.bio-itworld.com

Upload: trantuyen

Post on 19-Jul-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

Cambridge Healthtech Media Group

BONUS EDITION:

Data Management and the Cloud

www.bio-itworld.com

Page 2: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

[2]

BON

US ED

ITION

: DaTa

Ma

Na

gEM

ENT a

ND

ThE ClO

UD

Subscriptions: Address inquires to Bio•IT World, 250 First Avenue, Suite 300, Needham, MA 02494 888-999-6288 or e-mail [email protected]

Reprints: Copyright © 2013 byBio•IT World, All rights reserved. Reproduction of material printed in Bio•IT World is forbidden without writ-ten permission. For reprints and/or copyright permission, please contact Jay Mulhern, (781) 972-1359, [email protected].

This index is provided as an additional service. The publisher does not assume any liability for errors or omissions.

Bonus Edition:Data Management and the Cloud

3 Inaugural Gathering of Lab IT Forum Wins Big Pharma Interest

5 BitSpeed Pushes Software Solutions for High-Speed Data Transfer

7 Cycle Computing CTO James Cuff on Clouds, On-Demand Computing and Package Holidays

10 Courtagen Leverages Level 3 to Provide Direct Access to Amazon Cloud

12 NetApp Eyes Opportunities in Health Care Data Storage

Follow us on Twitter, LinkedIn, and Facebook Google Plus, You Tube and Xing

®

editorial direCtor

allison Proffitt (617) 233-8280 [email protected]

aCCount ManaGer, Media

Jay Mulhern (781) [email protected]

lead Generation aCCount ManaGer, CoMPanies a-K

Katelin Fitzgerald (781) [email protected]

lead Generation aCCount ManaGer, CoMPanies l-Z

tim Mclucas (781) [email protected]

CorPorate MarKetinG CoMMuniCations direCtor

lisa scimemi (781) [email protected]

MarKetinG assistant

lisa Hecht (781) [email protected]

Contributing Editorsdeborah Janssen

John russell ann neuer

Cambridge Healthtech InstitutePresident

Phillips Kuhl

Contact [email protected]

250 First Avenue, Suite 300 Needham, MA 02494

Page 3: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

[3]

BON

US ED

ITION

: DaTa

Ma

Na

gEM

ENT a

ND

ThE ClO

UD

Inaugural Gathering of Lab IT Forum Wins Big Pharma InterestBy Kevin Davies | march 6, 2013

The chief architects of a fledgling coalition of IT firms, consultancies and biopharma representatives declared their first meeting last week a promising success.

The two-day gathering—at AstraZeneca’s research center in Waltham—was organized by Mike Santimaw (head of specialist computing at AstraZeneca), Kevin Granfield (director R&D IT support services at Biogen Idec), Jay Paghdal (head of regional service delivery at Novartis Institute of Biomedical Research), and Merck’s Alec Anuka, with support from Tom Arneman (president of Ceiba Solutions, a Boston-based IT managed services, products and information analytics provider).

In the absence of a catchier name, the group is calling itself the Lab IT Forum.

Other pharma companies represented in the group of some 25 representatives in-cluded Pfizer, Johnson & Johnson, Sanofi, and Alkermes.

In addition to Ceiba Solutions, the IT com-munity was represented by executives from Dell, Intel, Thermo Scientific and Microsoft. Representatives from Broad Institute, Har-vard Medical School, and Cognizant were also in attendance.

The vision of the group is to build a “peer-to-peer, pre-competitive network,” said Santimaw. “We do a good job with our customers, but we do what they want, not what they need.” The goal, he said, was to learn and deploy best practices to help R&D colleagues “do more of what they do best: science, quality, and manufacturing.”

“We’re all competitors,” he says, “but it doesn’t mean we can’t share best prac-tices… the vision is to make that easy… It’s about helping customers reach strategic goals through delivery of ‘value-add’ IT services… It’s about doing the right things right, not once but all the time.”

“This forum has great potential,” adds Gran-field. “IT professionals and key vendors are

collaborating to enhance the scientists’ ex-perience in the lab today—and we have the right people in the room to drive innovation for the lab of the future.”

“This is very focused on helping scientists do more science at an operational level,” says Arneman. “It’s about moving data, improving the quality of service, detection against viruses, backup, etc. It’s about the operational layer within the lab.”

In that regard, the Lab IT Forum differs from the Pistoia Alliance, which focuses more on informatics, or the Allotrope Foundation, which deals with instrument standards. Meanwhile, Microsoft spun off the BioIT Alliance, founded by Don Rule in 2006, as a translational medicine standards organiza-tion almost three years ago.

New alliance

“Over the past two years, we’ve all been suffering from common woes—scientists needing better service, managing applica-

Page 4: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

[4]

BON

US ED

ITION

: DaTa

Ma

Na

gEM

ENT a

ND

ThE ClO

UD

tions, security, and so on,” Arneman recalls. After several informal discussions over the past two years about forging collaborations between industry researchers and IT groups, Santimaw finally pushed him: “When are you going to connect us?!” he asked.

“Following Mike’s lead, I facilitated connec-tions between pharma and owners of IT support. Mike and others took over from there,” says Arneman. “It was out of their passion for the end users that this [meeting] came about.”

The first gathering of the Lab IT Forum was in Arneman’s view an experiment, but one that worked better than he expected. Opening day was about sharing concerns. Several breakout groups were convened on subjects such as lab IT support, client management, validation approaches, and lab/manufacturing security.

Organizers listed a host of areas ripe for improved information sharing, including:• Operating systems and software upgrade

management

• Data securing and antivirus protection

• Operational support

• Instrument/equipment management, scheduling and utilization

• Information collection and sharing for decision making and predictive analytics

• Packaging/manufacturing best practices

• Lab design—layout, enabling de-vices, software

Various participants spoke of the need for more seamless, efficient cooperation be-tween IT support staff and R&D users. One academic manager said her colleagues were “very demanding” and needed solutions fast. “Scientists just want it to work,” she said.

Another academic IT support professional said that his group is just starting to seri-ously examine instrument life cycle and asset management. “Our goal is getting scientists to do what they need to do. Our support stops when the instrument connects to the computer,” he said. He de-scribed an instance when a new instrument sat in its box unused for a month. “We could have got this running much faster if we’d known about it,” he said.

Pharma IT services staff also shared their perspectives. One discussed his headaches following a merger in sharing data between four global research sites. Another raised the issue of staffing models across global sites and the need for better forecasting and tracking systems, as well as more pro-active data analytics.

The consensus highlight of the first day was a first-hand perspective on data processing. Liping Zhou, a scientist from NIBR, “pro-vided an elegant, compelling description of why she needs support,” says Arneman, highlighting three major areas of frustra-tion: difficulty in obtaining information she wants, processing it, and communicating the information produced.

On the second day, the group discussed the concept of a “lab of the future,” covering is-sues such as mobility, data security and the ideal laboratory layout.

The meeting also included presentations from some of the industrial strategic part-ners. For example, a Dell representative dis-cussed investments in new mobile devices and WiGig (the wireless Gigabit Alliance). Another interesting development is the acquisition of McAfee and the notion of integrating security measures at the level of the microprocessor.

“It was important that this group understands how companies like Dell, Microsoft, and Intel go into life sciences. They have a dedicated practice on life science mobility and how that can be supported,” says Arneman.

For example, the deployment of Intel tablet devices in the lab has saved Merck about $1 million per year by improving data manage-ment within a compliant environment, says Arneman. A Thermo Scientific executive discussed resources for remote instrumen-tation management to improve productiv-ity. Unity Lab Services, a division of Thermo, allows scientists to focus on science by pro-viding a menu of lab support services from instrument management to data collection.

Ceiba’s goal is to help R&D teams innovate and better utilize their information assets. The firm started by offering services, but now offers “end-to-end responsibility for IT requirements from the scientists’ perspec-tive, including the network, PCs, software, processes, systems upgrades, etc.”

Arneman says Ceiba prides itself on reduc-ing resolution times from weeks to about a day. “The trouble with PC software is that it can take 15-20 days to close [a technical issue]. We’re the technical experts to solve it or the concierge to get it solved. The result is scientists get their day back.”

Ceiba also offers implementation and/or support for open-source or third-party ap-plications. For example, the company won a contract from Merck support more than 60 Rosetta Biosoftware customers. Ceiba continues to partner with Microsoft (which acquired the Rosetta assets) to enhance those product sets, Arneman says.

Following a deal with GSK, Ceiba is also the distributor for Helium, a cross-source data reporting tool that won a 2011 Bio-IT World Best Practices award. A community edition will be available shortly.

Next Steps

One of the future objectives of the Lab IT Forum is to create a training program and certification process for help desk person-nel to deliver differentiated lab support. Arneman envisions that several white papers will be published in the coming months before the group meets again in six months time.

The Lab IT Forum welcomes new mem-bers—membership is open.

Arneman emphasizes that it is early days and the group currently lacks structure. “It’s driven by the passion of individuals to better support science,” he says. “I don’t want to lose that. For good governance, we’ll work with any other groups in the space. We must balance passion with process.”

“This group needs a bit of advocacy within their own, between scientists and vendors,” he says. “More than one orga-nization would like to see security em-bedded in an instrument. That’s where they can use these things internally and educate their own organization. It’s about letting them know: ‘Here’s why this is hard, and why we can do it better.’”

Page 5: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

[5]

BON

US ED

ITION

: DaTa

Ma

Na

gEM

ENT a

ND

ThE ClO

UD

BitSpeed Pushes Software Solutions for High-Speed Data TransferBy Kevin Davies | FeBruary 7, 2013

Imagine a piece of software that could contemporaneously write the same data file in Los Angeles while it is actually stream-ing off a next-generation sequencing (NGS) instrument in New

York, essentially reducing the time to transport said data from hours or minutes to virtually zero.It sounds a little far-fetched, but that is the promised performance of Concurrency, the latest software product from Los Angeles-based BitSpeed. Currently being tested in a research lab at the University of Southern California (USC), BitSpeed executives believe it will warrant a close look by many life sci-ence organizations struggling to manage big data or balking at the cost or ease of use of existing commercial or open-source solutions for data transport.

Concurrency updates BitSpeed’s Velocity software, which expedites the transfer of large data files. Although based on a dif-ferent protocol, BitSpeed hopes to offer a compelling alternative to Aspera, which over the past few years has become the domi-nant commercial provider of data transport

protocols, gaining strong traction within the life sciences community.

The BioTeam consultant Chris Dwan, who is currently working with the New York Genome Center, says the bandwidth prob-lem addressed by companies like Aspera, BitSpeed, new tools such as EMC Isilon’s SyncIQ, and GridFTP from Globus Online, is critical. “There are a lot of underutilized 1 Gb/sec connections out there in the world,” says Dwan.

“Aspera’s done a good job,” BitSpeed co-founder Doug Davis conceded in an inter-view with Bio-IT World, before laying out why he thinks his software is superior in cost effectiveness, ease of configuration, features and performance.

Protocol Preferences

As reported in a Bio-IT World cover story in 2010, Aspera’s patented fasp data transfer protocol makes use of UDP (user datagram protocol), which was originally developed more than 30 years ago as a means to provide a fast way of accelerating data movement.

In Davis’ opinion, however, UDP is like “throwing mud on wall, then picking up what falls off with a shovel, and repeating the process until all the mud is on the wall.” A transmission might be reported as suc-cessful even as packets of data are still being sent to complete the transmission, he says.

BitSpeed, by contrast, is based on TCP (transmission control protocol). “We’re the only company with accelerated TCP,” says Davis. “We can perform better, provide more security and order than UDP–based solutions.”

TCP is an ordered protocol, which Davis argues is important for data integrity. “We grab the data, mixed up in some cases, and lay the data down at the destination in the same sequence. This is important – if the data are jumbled, you might need a third software package to re-order the data.”

UDP and TCP have their respective advo-cates, of course, but as Cycle Computing CEO Jason Stowe points out, both also have tradeoffs and there is only so much that can be deduced by evaluating algorithms theo-retically. “TCP inherently gets better reliabil-ity and is used by many protocols, including HTTP, at the cost of lower throughput and overhead,” says Stowe. He also points out that “noisy networks aren’t friendly to UDP either.”

But the only true test, says Stowe, is a benchmark with real-world data between real endpoints, ideally also including open protocols such as FDT and Tsunami.

Another potential advantage of TCP is that it is being studied extensively by standards

Moving Data BitSpeed was founded in 2008 by Davis and Allan Ignatin, who previously founded Tape Laboratories, a developer of back-up technologies and virtual tape libraries. The company developed a close relationship with Hewlett Packard (its back-up technol-ogy still exists as part of HP’s widely used NonStop series) until the company was sold in 2006.

Later, Ignatin reconnected with Davis, a former CEO of Tape Laboratories, and hatched the idea of BitSpeed. “We noticed problems in transferring data outside buildings,” says Davis. “But what did we know? We were just storage guys—we thought latency was just a necessary evil.”

Initially BitSpeed focused on local area network (LAN) optimizations, but the founders soon recognized a much bigger opportunity. Launched in 2010, Velocity gained a foothold in the video and enter-tainment sector as well as other verticals.

Some health care centers such as the Mayo Clinic also signed on, but the medical space wasn’t the initial focus.

Velocity is a peer-to-peer software package that does three things, says Davis: “Acceler-ate. Ensure. Secure.” It’s about enhancing the speed, integrity, and security of the data, he says. The product works on LANs as well as within organizations and be-tween storage nodes. “No other solution does that,” says Davis.

The software installs within a few minutes, says Davis, and configures automatically. Because of a modular architecture, it is em-beddable in other solutions. There are two licensing models—either point-to-point or multitenant. “You can put a big license in the cloud deployment or data center, and all clients are free of charge. It’s a compel-ling model,” says Davis.

Page 6: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

[6]

BON

US ED

ITION

: DaTa

Ma

Na

gEM

ENT a

ND

ThE ClO

UD

organizations. While modest improve-ments are being made to UDP, according to Ignatin, “TCP has thousands of organi-zations working on it, all of which we can take advantage of. They’re distributed and converted to each operating system fairly transparently. So when a new congestion/control algorithm [is released], we get it au-tomatically. We’ve added MD5 checksums on every block, so all data received are 100 percent intact.”

Stowe notes, however, that checksums are “commonly used to verify that transfers oc-curred without error in many systems.”

As the name suggests, one of the potential virtues of Velocity is speed of data transfer, which emerges from a complex multi-level buffering scheme that “gulps data off stor-age and puts it back on,” says Ignatin. “We take the connection between two points, and use all the available bandwidth. Most connections use 30-40 percent efficiency, so we get more bang for the buck. We take a single TCP connection, break it into multiple parallel streams, then re-assemble [the data] on the other end.”

“The bigger the bandwidth, the bigger the acceleration,” says Davis. In one benchmark-ing test, he says Velocity took 1 minute 43 seconds to move 10 gigabytes (GB) data to four sites in New York, Tokyo, Rome and Sydney—regardless of distance.

Thus far, the only significant deployment within a life sciences organization is in the lab of neuroscientist James Knowles at the USC Keck Medical Center. (The introduction was made by Ignatin’s wife, who is a USC faculty member.) At the time of Velocity’s in-stallation, the Knowles lab had three Illumi-na sequencers sending data to a Windows server and a Solaris server, writing at about 4 MB/sec. The Solaris server transfers data to the HPC computing center six miles away.

In the capable hands of system administra-tor Andrew Clark, Velocity has expedited the transport of about 1 terabyte of NGS data daily to the HPC computing center. What formerly crawled along at 5-7 megabytes (MB)/second was upgraded to nearly 80 MB/sec without configuration, and 112 MB/sec with configuration. Typical transport times of 20 hours were slashed to less than two.

When Clark’s team added compression, he found no benefit at first—until it became apparent that the storage I/O of the disk array in the HPC center wasn’t fast enough. “This is a pretty common result for the soft-ware,” says Davis. “Marketing geniuses that we are, it never occurred to us that we could do this.”

Following the installation of a faster disk array, transfer speeds doubled to nearly 235 MB/sec. Clark said Velocity “has proved absolutely invaluable to speeding up our data transfers.”

active Replication

As promising as Velocity looks, BitSpeed has particularly high hopes for its latest software, Concurrency—a patent-pending technology that does active file replication. The product was unveiled in May 2012 at the National Association of Broadcasters convention.

Explains Ignatin: “Concurrency senses the beginning of file and writes it in multiple locations at the same time. As data are created at source, it’s being created at the destination. The destination, in turn, can be transferring it simultaneously to another location. It’s called ‘chain multi-casting’ and saves a lot of time.”

“We’ve made it virtually automatic,” Ignatin continues. “We watch those folders for cre-ation of files that match a specific descrip-

tion—in name, or suffix, time, whatever. There is no limit to the number of watch folders we can handle. It’s not like a Drop-box. None of the SysAdmins at server B, C, or D have to do anything.”

With Concurrency, Davis says, data written to local the servers are also written to a cen-ter miles away. “When the sequencers have finished, it’s already there.” In theory, hours of transport time are reduced to essentially zero. At USC, Clark has been experimenting with Concurrency, but he told Bio-IT World that the product was still being evaluated and he had no further comment.

BitSpeed has also developed faster algo-rithms for data compression and encryp-tion. The compression algorithms run as the data are in flight, which in principle provides further performance advantages. A pair of encryption algorithms optimizes security, including a proprietary algorithm called ASC (Advanced Symmetric Cipher). “It’s a robust algorithm… with very little CPU usage,” says Ignatin.

The ability to have data encrypted during flight should prove attractive for patient and other data requiring HIPAA compatibility and other forms of compliance. “How do [users] get the big data to/from the cloud? How do they ensure it is secure?” asks Davis. It may expand use of the cloud, as a cloud provider’s security is of little use if the data aren’t secured en route, he says.

Davis says that BitSpeed’s software is at-tractively priced and interested parties can register online for a 15-day free trial.

But while rival protocols duke it out in the marketplace, Dwan from The BioTeam says they are still missing the bigger issue, name-ly “the question of making the data scientifi-cally useful and usable. None of these tools address that question at all.”

Page 7: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

[7]

BON

US ED

ITION

: DaTa

Ma

Na

gEM

ENT a

ND

ThE ClO

UD

November 15, 2011

Cycle Computing CTO James Cuff on Clouds, On-Demand Computing and Package HolidaysBy Kevin Davies | FeBruary 6, 2013

Bio-IT World: James, before we talk about your new gig, what were your chief responsibilities during your tenure at harvard?Cuff: It started in the life sciences, due to the complexity with genomics data, but rapidly expanded to include earth and planetary sciences, particle physics,

astrophysics, even economics and finan-cial modeling. Simulation is an exploding field in all domains, so we had to be agile enough to help with all fields. We learned about throughput and performance in the life sciences, and were able to apply that to other areas.

What is your visceral reaction to the phrase “big data”? Did you encounter that in all areas? It’s everywhere you look at this point. From an historical perspective, when I started at Harvard in 2006, we had 200 CPUs and a state-of-the-art, 30-Terabyte (TB) local NAS [network-attached stor-age] array. As I’m leaving, we’re at 25,000 processors and 10 Petabytes (PB). And that’s just a small, university-wide research computing offering.

In comparison, the breakdown of those data is exploding in all areas, even places like the museums of comparative zool-ogy. People are taking high-quality, high-resolution images, particularly things like the Giza Archives, there are forces at play where our artifacts may unfortunately only be the digital records of some of these areas. Everyone is collecting “big data” but this collection phase is a pre-lude to a second phase—namely once collected, trying to work out what we ought to do with it so history informs the future. “Big data” is a very hyped term, but it’s real.

The bigger question I think is one of data provenance: not only creating the data but being able to find it. The data retention policies of the National Science

The new Chief Technology Officer at Cycle Computing, James Cuff, spent the past seven years as Director of Research Computing and Chief Technology Architect for Harvard University’s Faculty of Arts and Sciences. His team worked “at the interface of science and advanced computing technologies,” providing a breadth of high-performance computing, storage and software expertise, all the while striving to manage a monstrous surge in data. Cuff previously led the construction of the Ensembl project at the Wellcome Trust Sanger Institute, before moving to the U.S., where he managed production systems at the Broad Institute, while his wife, fellow Brit Michelle Clamp, joined the lab of Broad director Eric Lander.

In his new position, Cuff aims to apply some of his insights and ideas to an even bigger canvas. Cycle has made headlines over the past 2-3 years by spinning up virtual supercomputers for academic and industry clients, as well as creating the Big Science Challenge, donating more than $10,000 in cloud compute time. CEO Jason Stowe says Cuff brings a wealth of knowledge and contacts, and could bring some managerial discipline to Cycle’s patent portfolio. He adds that Cuff will remain in the Boston/Cambridge area, which could impact Cycle’s local presence down the road. (Meanwhile Clamp, who moved to Harvard from the BioTeam last year, will fill Cuff’s shoes on an interim basis while the search for his replacement continues.)

Cuff spoke to Bio-IT World editor Kevin Davies and shared his views about big data, cloud computing, and the future of research computing.

Page 8: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

[8]

BON

US ED

ITION

: DaTa

Ma

Na

gEM

ENT a

ND

ThE ClO

UD

Foundation and others—it’s a headache… We’ve seen this in the ENCODE Project, where Ewan Birney and his team even encapsulated their virtual machine along with their data. We’re going to see more of this—to be able to have that frozen re-port of the science done, as it was.

Many people think the data storage prob-lem per se has been solved. Is that fair? I’m inclined to agree. The component parts of storage and processing are very much solved problems—from a cent/ca-pability measure. The amount of Terabytes I can buy per given spend or CPU horse-power I can buy is now trivial. The com-plexity is that we’re doing this at much larger orders of scale and magnitude.

The difficulty in the spending for smart motivated researchers and organizations is around how to orchestrate those events. If you look at a top-tier storage array, the price per Terabyte isn’t where the com-plexity is, it’s orchestrating Petabytes of this into a single namespace, or being able to make it act at a performance level. I can build a 4-TB storage array that will perform very differently than a 4-TB disk drive.

are you looking at data management solutions such as open-source iRODS or commercial equivalents? To keep religion out of the conversation here, the art of finding and annotating metadata at a massive scale is currently unsolved. One of the technology chal-lenges I see ahead is how to accelerate those to the point where the metadata analytics is of sufficient caliber to those that Lustre, WhamCloud, and now Intel can build robust parallel file systems. These are also non-trivial and not neces-sarily a solved problem either.

From the point of view of being able to find your data, or more importantly, what happened to it?—How did it get to that state?—is a bigger issue. We’re starting to see that, in order to publish, you have to have publishable provenance. What did my grad student do to these data and how re-producible are these scientific results? That’s going to be a big headache going forward.

I trust you didn’t leave harvard because your wife just arrived. What did Cycle’s CEO Jason Stowe do to lure you over?They were seven really exciting years. We basically built a start-up organization

within a very well respected, well regarded traditional IT environment. We started lis-tening to our customers at Harvard—the faculty, grad students, and researchers—and built what they needed.

I started having conversations with Jason about his technology a few years ago. Then the phone rang one day and he ex-plained they were growing because they had too many customers and they want to help their customers more and more. That rang a real bell with me, because as a bootstrapped company, the customers drive what the real business is. I started to talk with his engineering talent—he had me at ‘hello’ basically…

I actually see this as a natural progression. I used to run Silicon Graphics clusters back at Oxford, doing it by myself. When a patch to the SGI came out, I would have to put that on myself. Later on, at the Wellcome Trust Sanger Institute and the Broad Institute, I was the guy between the researcher and the compute. Even more so at Harvard, in many different do-mains—we were the guys in between. To me, it’s the logical progression—Cycle is that layer between the massive complexity needed to orchestrate hundreds of thou-sands of unique computer instances, to be able to deliver on our scientific promises.

For me, Cycle is like a light bulb: If I’m a scientist walking into a lab, I want to turn a light bulb on to do my research, my chemistry, etc. I don’t care how the energy is generated and distributed. I just want to throw the switch, do my science, turn the switch off and walk away. I want utility supercomputing to get to that point—to drive both supercomputing and storage to be consumable items as line items on NSF and NIH awards. Computing should no longer be a capital item. It should be an on-demand, as-you-need-it platform.

What do you intend to bring to Cycle? Will you work more on the technical side or closer to the clients? Cycle has amazing engineering tal-ent—that was where they were founded in terms of building customer solutions. I want to engage our customers more deeply in science outreach and under-stand their grand challenge science prob-lems. I want to bring to bear many years’ experience at being an interface between a brilliant faculty at Harvard and turn their

dreams into viable computing assets.

I was talking to the [Cycle] engineers here this morning, showing me the deep, dark corners of the Cycle server platform. I don’t profess to understand the thou-sands of hours these guys put into this. I want to help set strategy, work on gaps where we can be more competitive, and that means improve our customers’ expe-rience to the point where everybody gets more work done.

how much more can be done to push research into the cloud? Is cloud comput-ing still underutilized? It’s still like the early days of the electric company. Just because we have power, the distribution area to light up your house—there was a lot of energy in the early days to be able to handle fuse boxes and complexity. If I think of myself as a lone grad student in a lab and I’ve got a credit card and a happy PI willing to let me spend it, I’m not sure I’ll be all that effective with ‘the cloud’. What is it? It’s a cloud provider’s consul, but I’ve got to build an operating system, I’ve got to get my algorithms ported, I’ve got to work out what interconnects are…

If you look at the QIIME source code from Rob Knight’s lab, there’s thousands upon thousands of dependencies. If you look at cloud adoption, the tinkerers are currently tinkering around the edges, but Cycle has been tinkering for seven years now. We can get them onto these resources and think of them as a utility from the electric company perspective.

It’s the same reason Beowulf clusters were on a slow ramp, but once we started to get cluster administration tools and reliable batch schedulers and Linux stopped fork-ing every two weeks and things calmed down a bit, the top-tier providers—Dell, HP, IBM in particular—embraced cluster computing at a rate we weren’t expect-ing. We’re a few years away from that [in cloud], but not that far, and Cycle is defi-nitely positioned for the next logical step.

Some experts—notably The BioTeam’s Chris Dagdigian—have said amazon has a large if not insurmountable lead in cloud computing. Do you agree? I love Dag dearly, but I’m not inclined to necessarily agree with him. We go where the customers are. Today, the bulk of our

Page 9: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

[9]

BON

US ED

ITION

: DaTa

Ma

Na

gEM

ENT a

ND

ThE ClO

UD

customers are within AWS [Amazon Web Services]. To discount any player in this space is a dangerous game. As long as we keep following our customers and the sci-ence, I think everyone will be successful. As to the crystal ball—who wins that race? I don’t want to bet on that one..!

I think of researchers like folks picking holiday destinations. They go where the weather’s warm, right? Even if the cost to get down to Florida is a bit high, it’s bet-ter than staying in Massachusetts in the winter.

Where do you see growth opportunities in your offerings for the life sciences? To build credibility that you can leverage and use these resources at a high scale is what some of our recent press has been around. This week, we proved we have the technical capability to do 10,600 processors. But that’s not the business we’re in. We can show massive scale. I had similar conversa-tions at Harvard—the astrophysicists would happily consume tens of millions of CPUs if they could get their paws on it. Museum collections had data challenges but didn’t need 1 million CPU-hours.

Your compute challenge is of the order of 1,000-40,000 processors, which we now glibly consider as small clusters. We’ll

have difficult portability issues, security issues, compliance issues. There’s a set of things we want to do to help new customers get that work done. In the financial services areas, there are a lot of ‘just-in-time’ computing challenges of the order of the size of Sequoia or Titan or the National Center for Computational Sciences. Those big clients will always be available on a national level. There’s no way a university should be building a 20- or 30-Megawatt [machine] in a local computing facility to solve their comput-ing challenges.

What new technologies will most impact your services in the near future? Not to pick any particular technology, but the ability to do high-performance paral-lel file systems with the ability to retain some control of your metadata in remote computing environments is of consider-able interest to me.

I’m also aware of the challenges of the ‘last mile’—you can build national high-speed 100-200-Gigabit/sec networking infrastructure, but if your last mile is a much slower connection, you have to be clever about dealing with the type of technology you need on premises to be able to get in and out of these amaz-ing resources. So other than a teaser to

“watch this space,” I’ve been dealing with the last mile challenge for a while—how to get people’s computing off the desk-top. That’s what we’ve been doing in a university setting for a long time and I want to apply some of those lessons learned in anger here, with an amazing engineering team who can actually turn some of my dreams into reality.

how critical are technologies that facili-tate the transport of big data and how do you interact with them? The Aspera technology is amazing and those protocols work incredibly well at the national centers—if you’re Ewan Birney or the head of the NCBI and you can license those technologies centrally, where it’s one to many, where many is millions, there’s great benefit.

In terms of on-wire capability—back to the Florida analogy—we go where the weather is warm and our customers are. We’re all going to have to be smarter about how we move data around. The cheapest way is never to move it in the first place. There are techniques and ideas I have in terms of where repositories actually need to be. Does you ultimate repository need to be local? We’re going to have lots of fun there.

Page 10: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

[10]

BON

US ED

ITION

: DaTa

Ma

Na

gEM

ENT a

ND

ThE ClO

UD

Courtagen Leverages Level 3 to Provide Direct Access to Amazon CloudBy Kevin Davies | FeBruary 4, 2013

Although it didn’t require digging up any local roads in the end, a small biotech company has struck a partnership in life sciences with Level 3 Communications to create a seamless and se-cure data link that pipes genomic data directly from its laboratory just outside Boston to the Amazon Web Services (AWS) cloud facility in Ashburn, Northern Virginia.

“We have a dedicated EPL [Ethernet Private Line] that carries terabytes of genetic data into their servers and back again,” says Courtagen Life Sciences President and co-founder, Bren-dan McKernan.

Although the system only went live late last year, the early results could hardly be better. “Our in-formatics team is thrilled,” says McKernan. “Data is flowing and we’re getting patient results in a matter of minutes. It’s seamless; it’s perfect!”

Courtagen, founded by Brendan along with his brothers Kevin (Chief Technology Officer) and Brian (CEO), is a small firm of about 25 employees with a clinical laboratory gener-ating patient genomic data for diagnostic purposes. Brendan’s forte is the implementa-tion of world-class manufacturing concepts in running a laboratory, ideas and strategies honed over the past 15 years at the McKernan brothers’ previous company, Agencourt, and shared with partners such as the Broad Insti-tute’s sequencing lab.

At Courtagen’s offices in Woburn, Mass., the CLIA-certified laboratory contains half-a-dozen Illumina MiSeq sequencers, but no trace of a data center. The incoming saliva (or blood and tissue) samples, referred by a grow-ing network of physicians, are bar-coded and given a Genomic Profiling Project (GPP) num-ber. “Once samples are accessioned and a GPP is assigned, no-one in the lab can see the Pro-tected Health Information (PHI). PHI includes any information according to HIPAA laws that can identify a person,” says McKernan.

One of the key issues facing Courtagen today,

and in the future, is how to process patient genomic data as efficiently and securely as possible. The McKernans needed a data processing approach that was both scalable—throughput is expected to grow sharply in the next 1-2 years—and yet conservative and secure, something that could withstand HIPAA regulations regarding the privacy protection of patient data.

Selecting Level 3’s network and the on-de-mand Amazon cloud was an obvious choice. “Amazon has the scale,” says McKernan. “Our expertise will be in interpreting scientific data to enable researchers and clinicians to make better decisions regarding patient care and drug development. We outsource everything else that’s a non-core competency. We don’t have any IT infrastructure in our facility. The data comes off the sequencers and goes right to Amazon via the Level 3 network for processing, where we utilize our ZiPhyr bioin-formatics pipeline, which leverages standard industry algorithms in conjunction with our unique analysis workflows to generate results.”

“Amazon is one of the largest clouds in the world, so from a strategic standpoint, I don’t want to invest capital in something we’re not going to be number one at. The Amazon-Level 3 partnership gives us the ability to have global infrastructure that is scalable, cost ef-fective, and extremely secure.”

How to push the data into the cloud? Until last year, Courtagen had two options, neither one ideal. One was to ship hard drives to Amazon’s facility in Virginia, but that took two days. Courtagen’s average sample-to-report cycle time is fast—just 12 days. “But adding two days for shipping is unacceptable. Our Informatics team wanted data processing in a matter of minutes,” says McKernan.

The other method was to use traditional Inter-net delivery through an “old, slow pipe” but delivery often stalled. “It would take days to move data up to the cloud, and if it failed for

any reason, we’d have a pile-up. All the GPPs for the following week couldn’t get processed. From a scaling standpoint, we had to change,” says McKernan.

(While the data processed in the AWS cloud are de-identified, Courtagen stores and deliv-ers patient records in a private patient portal hosted by NetSuite, a new emerging ERP system or through Courtagen’s ZiPhyr iPad application. The physician portal is managed in facilities that are both HIPAA- and SAS-700-Type II compliant.)

On the level

McKernan began investigating the idea of a private line—off the public Internet—to transport data to AWS. In addition to avoiding pile-ups, it should provide additional security.

McKernan turned to Level 3 Communications, owner of an international fiber-optic network, and what he calls a “carrier of carriers.” Many of the major telecommunications firms run off Level 3. “Eventually everyone hits a Level 3 gateway,” says McKernan. “From there, it goes up to the cloud.”

Level 3 is one of the few global partners of Amazon’s that has “Direct Connect” capability, allowing clients to bypass the public Internet and go directly into the AWS servers.

The challenge was not so much how to transfer the data down to Virginia, but how to transmit it the 15 miles or so from Courtagen’s offices in Woburn to Level 3’s gateway on Bent Street in Cambridge, just behind the Broad Institute. “Eric Lander [Broad Institute director] must have been thinking about this 20 years ago, that’s smart. That’s one of the gateways to the Internet!” says McKernan.

As discussions with Level 3 progressed, McK-ernan was contemplating signing a purchase order to dig up roads and lay some new fiber-optic cable. “It was going to take a long

Page 11: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

[11]

BON

US ED

ITION

: DaTa

Ma

Na

gEM

ENT a

ND

ThE ClO

UD

time and cost a fair amount of money,” says McKernan.

At the last minute, another company entered the mix, providing the pipe for “the last mile.” Sidera—one of a number of companies that work with Level 3 to provide that local transmis-sion—already had fiber in the Courtagen office building, with the all-important DWDM (Dense Wavelength Division Multiplexing) technology for scalability. This means that for Courtagen to upgrade the network from 10-gigabit to 100GE down the road, McKernan says it will only require changing a couple of cards. “Our network is now scalable to move [data on] 2,000 patients or more,” says McKernan.

Courtagen insisted on working with Level 3 as the carrier, so in the event of any network problems, Level 3 alone would be responsible

for the end-to-end solution. In this instance, Sidera reports to Level 3.

Once the data connect from the Sidera pipe to the Level 3 gateway in Cambridge—one of 350 data centers Level 3 has across the world—it travels on a private line down to Ashburn. Courtagen pays Level 3 a month-ly subscription fee for a minimum data commitment.

In addition to Sidera, Level 3, and AWS, Courtagen had to work with Amazon’s hosts, the Equinix facility in Virginia, as well as Check Point (a leader in securing the internet). “These relationships allowed us to combine fast net-working technologies with the highest level of security for our employees and patient data,” says McKernan.

Although in the early days, McKernan says his colleagues are delighted with the way the network is working. Raw genome sequence data go in; what emerges is a rich analysis of a patient’s data with variant conservation and mutation prediction scores, which in many instances is helping Courtagen’s scientists and physicians identify deleterious mutations.

McKernan says Courtagen takes advantage of Amazon’s EC2 instances for sequencing analy-sis, primer design, and hosting of web servers. In addition, Courtagen utilizes StarCluster to dynamically start EC2 instances and stores their sequencing data in S3 buckets. Courta-gen is also beginning to migrate long-term storage to Amazon’s Glacier platform to save money, and is evaluating AWS Elastic Beanstalk to deploy custom applications.

Page 12: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

[12]

BON

US ED

ITION

: DaTa

Ma

Na

gEM

ENT a

ND

ThE ClO

UD

NetApp Eyes Opportunities in Health Care Data StorageBy Kevin Davies | FeBruary 1, 2013

Whatever happened to Netapp? When Bio-IT World launched in 2002, NetApp was one of the big names in big data storage in the biotech and life sciences arena. But over the past decade, while brand names such as Isilon, EMC, BlueArc, Quantum, Panasas, DDN and many oth-ers have cashed in on the data deluge, NetApp kept at best a very low profile in the space. That is not to say that it was not in use or the technology does not have its supporters: on the contrary, many data center managers could point to trusted NetApp installations. NetApp storage is used at Genentech and several other major biotech firms headquartered in California and beyond. For some, however, it was less of a pain to integrate their old NetApp systems than replace them with new.

But there are strong signs that NetApp is turning things around. For example, the com-pany has introduced flash-based storage solutions (such as FlashCache and SSD based architectures) to meet extreme performance requirements. These technologies have also been integrated with NetApp’s Virtual Storage Tiering solution in order to help customers leverage flash to improve performance while still utilizing cost effective storage.

Bio-IT World reached out to Dave Nesvisky, who joined NetApp in September 2010 as senior director of health care sales, for an update on NetApp’s new technology and re-kindled interest in the health care and life sciences sector.

Bio-IT World: Dave, what’s been going on at Netapp over the past few years as the life sciences has been swamped with data?Dave Nesvisky: There’s been significant change over the past couple of years and it continues to evolve. Health care’s obviously a very broad area and includes a lot of different segments—you’ve got providers, research, regulatory, device manufacturers, health in-surance, and distributors. There’s almost no limit to what you could include in the health care segment.

When I joined NetApp a couple of years ago, NetApp had several thousand customers in health care. Customers were using our prod-ucts for the kind of things that every NetApp customer uses our products for: Exchange and SharePoint and virtualized data centers, general IT, not anything specific to health care. But many of those clients, especially hospitals, clinics, providers, were very inter-ested in solving bigger problems. They were enjoying the benefits that NetApp brings in terms of storage efficiency and total cost of ownership and operational efficiency. They

said, ‘you’re solving that problem for us at a small level because the data you’re manag-ing represents a fraction of our overall data problem. Our bigger data storage expense is around diagnostic imaging, electronic medi-cal records. Can you help us with that?’

A couple of years ago, NetApp was not fully prepared to help our customers in that market… We did not necessarily have the skill set around the applications that health care customers were running. My first step in joining the company was to start building a team—bring in people that had come from big application companies that serve the provider market—companies like McKes-son and Siemens—and brought in a former health system CIO to help us better under-stand the market. We’re now in a much bet-ter position to support our customers around their bigger data problems.

Last year, we pulled together the payers and providers and a select number of software vendors and created the health care vertical that I lead today. That includes all stripes of

providers—for profit, not-for-profit, aca-demic medical centers, all that falls under our domain. Pharma and biotech is largely run out of a dedicated district that’s part of our Americas group, not part of the health care group today. As I said, different companies define health care differently. We’ve defined it around payers, providers, and some ISVs… It remains to be seen what’s going to make the most sense for NetApp, whether the ex-isting structure is good, or whether it should have an expanded definition. But that’s our definition today.

What are the shifts in medicine, the im-petus driving this growth in volume? and how is Netapp meeting that demand?Nesvisky: One element is in the basic re-search itself. They’re mapping more and more genomes and it’s obviously driving much greater data requirements within that industry itself. But we’re seeing effects on the rest of health care… Today medicine is deliv-ered reactively and episodically. You get sick. You go to the doctor. They treat you. That’s a very expensive way to treat people.

The push under the Affordable Care Act and ACOs (Accountable Care Organizations) is more in preventive medicine—the promo-tion of wellness rather than treating sickness. If you’ve got people with asthma or diabetes or high blood pressure, it’s really about pro-actively getting these people into programs to maintain their wellness so that they don’t get into the health care system any deeper than they’re already in.

Where the future and the alignment is with bio-IT is predictive medicine—the opportuni-ty to look at somebody’s genetic makeup and be able to predict with some level of accuracy you have the markers that indicate that in 20 years you’re likely to get these things. What can we do now? And then, in line with the pharma companies that are starting to be able to create custom-made pharmaceuticals for individuals, to treat them more effectively and target their disease more accurately. That’s where the convergence is…

Page 13: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

[13]

BON

US ED

ITION

: DaTa

Ma

Na

gEM

ENT a

ND

ThE ClO

UD

What is NetApp doing in the space? We acquired a company called Engenio from LSI a year or so ago to create a cost-effective and dense storage platform ideal for high throughput workloads, for object content repositories, for unstructured data, and for other use cases where you’ve got either high volumes or very large databases or very large object containers.

Actually, that was a part of the portfolio that we didn’t previously have. We had a broad product portfolio that could essentially do that function, but this is a platform that took it to the next level. It had very high through-put and very dense storage—obviously when you talk about very large data sets there are physical constraints to the data center before you have to expand it, so you want to be able to pack as much storage into the smallest possible space. We’ve been very successful with that E-Series product. It’s a product that we work into the space as well as it’s a very large OEM product for us.

What was it about that technology that particularly appealed to Netapp?Nesvisky: The footprint density. It’s a very dense storage platform and it had very high throughput for use cases like full motion video where typical SAN or NAS was not built to handle that effectively. It’s finding its way into a lot of different application areas. From the health care perspective, the two most interesting things are big data analytics and also very large object content reposito-ries in the multiple petabyte range.

In terms of the actual data that you’re sup-plying solutions for, what are you seeing?Nesvisky: There may be a future application telemedicine with video and image data. But that’s a little bit of a future state for us, not top-of-mind right now. Another emerging area is digital pathology. Today, the typical imaging modalities that you see—X-ray, CT, PET, MRI—as those modalities become more powerful and the images are more refined, they’re requiring more storage themselves. 3-D mammography was approved by the FDA last year. It uses almost ten times more storage per image than 2-D. The typical mo-dalities are taking up a tremendous amount of storage. In digital pathology, some of these things can run into a terabyte per study, which is an incredible amount of storage.

But we also see, on the genomics side, it’s taking up a lot of space and it requires high bandwidth. We have clients who moved to

NetApp because they’re getting a lot of ef-ficiency out of a capability in our FAS product line called flexible volumes, or FlexVol. That allows a lot of researchers to be allocated a lot of storage as far, say several terabytes. The administrator is really only carving up a smaller amount, but it gives the appearance to the user that they have all they need.

In a typical environment without NetApp, you would have to physically provision that amount of storage to each user. If ten researchers each needs 10 terabytes, you would physically have to provision 100 terabytes to those people, even though those guys might only be using one or two terabytes at any given time. With flexible volumes, you can tell them that they have access to ten but you know they’re not going to use that. You’re able to physically provision a lot less, which saves a lot on the storage.

The other part that people are finding with NetApp is it’s just easier to manage. We consistently find that our administrators can manage a lot more volumes of data, a lot larger arrays with a lot fewer people.

are there a couple of installations in the last 12 months in your health care arena that you can point to as good examples?Nesvisky: One that comes to mind is the Duke Institute for Genomic Sciences, which is a NetApp FAS customer. They were getting more and more grants and research projects and it was stressing their systems because they had more and more researchers on it. The way they were adding people and trying to manage things, it was just runaway data growth and they needed a new platform that was more efficient, that could work into their environment.

The two things they found with NetApp is NetApp works very well in a virtualized en-vironment. And the way of doing it before is you’d get a grant and you’d stand up a new system so you’ve got tons and tons of really underutilized servers and storage. And this is not a unique thing to genomics… They made an architecture decision to move to NetApp in a heavily virtualized environment and it gave them several tremendous advantages. It allowed them to reduce the footprint on the floor, which enabled them to extend the life of how long they could stay in their data center—if you can compress into a smaller footprint, that means your data center’s got more room to grow over time. That was really good. With fewer physical devices

running, you can run it with a much more efficient staff… They were able to continue with the current staff and handle bigger workloads efficiently. And they were getting tremendous throughput from the system. Some really good benefits from making a move to NetApp.

What’s your competitive advantage these days?Nesvisky: There are a couple of areas. Clearly there are very successful top tier players in the space, but the features of NetApp software, the flexible volume, the ability to provision virtually a lot more storage to the users than they had to physically provision, was very efficient to them, and the ease of management compared to other solutions.

Every other vendor tends to offer a portfolio of storage solutions—a particular solution for backup, another for production. And they have families of systems so when you outgrow one of them you have to forklift up-grade to the next bigger series of equipment and it has a different operating system. And so you’ve got to do a data migration, you’ve got to literally physically remove the system that was in there, put in the new system, mi-grate the data, retrain the staff, all that. And that comes into account.

When people assess the long-term impact of their storage decision, NetApp runs one operating system. We have an ‘agile data infrastructure.’ This is important: our agile data infrastructure means that from a single operating environment, Data ONTAP, we can offer people non-disruptive operation, which means that literally any upgrade of systems, software, adding equipment, retir-ing disk out of warranty or out of service or for whatever reason, anything you need to do in the maintenance of an array is done non-disruptively. You don’t have to schedule downtime, which is a huge advantage. No-body wants to schedule downtime!

We have non-disruptive operation. We have intelligence which means that we can put—we have different types, different per-formance profiles of disks, so we can put the data where it makes the most sense for the performance you need… In the agile data infrastructure you can build into the multiple tens of petabytes in single volumes. If you have to store large volumes of genomic data, you’re really never going to run out of steam.

The agile data infrastructure is something

Page 14: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

[14]

BON

US ED

ITION

: DaTa

Ma

Na

gEM

ENT a

ND

ThE ClO

UD

unique that no other company can offer. They all offer a family of products that re-quire you to literally retire one, forklift it out, migrate the data. It’s an expensive, complex process that’s time consuming and costly. We eliminate that. And when people rec-ognize what NetApp is doing, it’s absolutely revolutionary. You can literally build a stor-age architecture now with us with a single operating environment that goes from the smallest array, whatever you want to start with is just some very small system, to scale literally infinitely and without ever having to forklift anything out. That’s a big advantage.

What other innovative and exciting de-velopments do you foresee?Nesvisky: Wow, a lot of things! For me, on the health care part of the business, the future is really around analytics, whether it’s to pave the way for predictive medicine or manage populations of people. I think the Hadoop/E-Series combination is going to be very powerful.

There are a lot of companies in the space taking a lot of interest in how to go about

doing analytics for health care in various areas. Some of them are taking very broad approaches, some narrow approaches. Being able to do analytics in hospitals around the outbreak of, say, sepsis; they want to track that. Sepsis is very expensive to a hospital… Analytics around predicting is somebody likely to get that based on or are they show-ing the indications, can we treat it early before it fully evolves? That’s a big one for us.

We’re seeing more private clouds, organiza-tions operating clouds on behalf of the rest of their organization or other organizations that they’re affiliated with. We are also work-ing with some public cloud providers also that have some great solutions out there.

aren’t there fundamental issues with putting patient data in the public cloud?Nesvisky: Once you explain how the system is architected, it’s really not an issue. Frankly, a professionally managed, well-architected cloud data center, the patient information is much more secure than paper files laying around in a hospital. Once people under-stand how the data are encrypted at rest,

in motion, how the physical environment is secured, that really becomes a non-issue.

What challenges do you face in finding new customers?Nesvisky: As you might imagine, health care is a fairly conservative business in terms of their decisions because they’re entrusted with protecting patients’ lives. And so, our biggest challenge is just the status quo; hey, we’ve always bought from these guys. Why would we change? We just need to be in front of people.

One of my favorite quotes is from Woody Allen: “Eighty percent of success is show-ing up.” When we get our opportunity to present, people get very comfortable with us. We win our share of business. I think we have an absolutely superior solution for our industry… this vertical is a very new venture for NetApp. We just have to tell our story and effectively message and let them know what we have. Our biggest challenge is just really inertia.

Page 15: BONUS EDITION: Data Management and the Cloud1).pdf · Bonus Edition: Data Management and the Cloud ... scheduling and utilization ... ing off a next-generation sequencing (NGS)

250 First avenue, suite 300

needham, Ma 02494

www.bio-itworld.com

Cambridge Healthtech Media Group