2005 04 05 sri eln architecture
TRANSCRIPT
http://www.amphora-research.com/
So...
• You’re on holiday one day• Doing your normal thing• And then you get the call... • they want an ELN!
2
http://www.amphora-research.com/
ELN architecture
• Hopefully • I am not going to self-destruct • Your project won’t be as exciting
• Your task is to• Deliver a state-of-the-art ELN system• In tight timescales• With limited budget• In the real world• That the users like• And will serve you for many years
4
http://www.amphora-research.com/
Introduction
• About me• Started working with ELNs in ‘96• President & Co-founder of Amphora• IT background
• First ELN was enterprise-scale ELN for Kodak• Worldwide, 1,000’s of users, diverse user base• Completely Electronic Records (no paper)
• After a long & windy road• New products, lots more deployments, many industries• Certain amount of realism about ELN implementation• Provide Patent Evidence Creation & Preservation
Systems• Work with a wide variety of “ELN” systems etc.• Now based in the US & UK
5
http://www.amphora-research.com/
This presentation
• You can download a copy of this presentation from our web site
6
http://www.amphora-research.com/
Why does architecture matter?
• A good architecture can help• Integrate “Best of breed” tools with existing investments• Allow you to split the project into manageable pieces• Ensure you don’t get “captured” by the vendor• Help your system withstand the ravages of time• Keep your TCO down
• A bad architecture will hurt• Reliability, Scalability problems• Reduce your options going forward• Force you into “Big bang” project
• Some random thoughts on architecture
7
http://www.amphora-research.com/8
• Major issues• Diversity & Flexibility• Project size/Justification/ROI• Creating & Preserving Evidence for Patents• Need for long term access to ELN contents• Scalability• Web-based systems• How your network can help you
• Trends• Integration methods• Open Source• In the lab• Ones to watch
ELN architecture
http://www.amphora-research.com/9
• “Science” covers a wide variety of activity• Each of these is served by its own industry• Improvements in each area needs to happen at its
own pace• Things change
• Different techniques• New data types• Another R&D centre• New devices for use in the lab
• The very essence of “Research” is to change the way you work
• How do we design an ELN which can accommodate these changes?
Diversity & Flexibility
http://www.amphora-research.com/10
• Build on other projects & integrate• if it can be done within another project, then do so• Keeps your life simpler and more focused, clear aims• Those other projects can proceed according to the
rhythm and needs of the specific area • Where possible employ loose coupling between
systems• Message passing reduces implementation complexity• SOAP/OLE/XML etc.
Dealing with change
http://www.amphora-research.com/12
• Two approaches• Either attempt to justify the whole ELN in one go
(“Big bang”)• Or Phased
• Divide the project into phases• Each involves a smaller investment (risk)• With a corresponding payoff
• Move forward at a pace that’s comfortable for the business
Project size/Justification/ROI
http://www.amphora-research.com/13
• Historically this was very difficult to do with ELNs• Record keeping• Integration with other systems
• Needs to be designed into the project (& product) from the start• Patent evidence creation/preservation system• Generic science-neutral platform (can often be your
existing IT infrastructure)• Integrate/collaborate with discipline-specific software
• When you can do it, makes a huge difference• Can start at a departmental level if needed• Asking the business to take a small risk each time
Phased ELNs
http://www.amphora-research.com/
Creating & Preserving Evidence for Patents
• Specialized area with very specific (and unique) considerations
• Best done separately from science-specific ELN tools• Hard to reconcile requirements of science and records
in one system• You’ll often have a number of science-focused systems,
yet want only one Patent evidence system• Run by a small group of people who know they’ll end
up in court• Reduce risks & discovery costs
• You can have an “Electronic” notebook for the scientist and still create a paper record
14
http://www.amphora-research.com/
Paper or Electronic?
• The choice often comes down to• Comfort• Practicality• Cost
15
10 100 500 1000
Sys
tem
Co
st
PaperElectronic
http://www.amphora-research.com/
Long term access to ELN content
• Partly this is records management issue• But there’s a heavy technical component
• What format you store your data in• How you store your data• Metadata
• You need to make Open Data formats part of your purchasing requirements
16
http://www.amphora-research.com/17
• Publicly documented• Legally unencumbered
• No patents, copyright concerns etc.• Any patents or copyright must be in the public domain
• Ideally, self documenting (XML is a good start)• Degrade gracefully
• If you can’t the data, at least you can see a picture• Based on more open, primitive formats where
possible• At least two implementations of readers, one of
which is Open Source• Widely used (W3C or IETF standards are good
signs)
“Good” (open) file formats
http://www.amphora-research.com/18
• Good• For text: Plain ASCII, Unicode, HTML, possibly RTF• For graphics: PNG, SVG• For structured data: XML• To preserve appearance: PDF
• Worry about• Storing files in databases
• The database file format is probably undocumented• Store objects on the file system and use the
database to point to them• Anything that is proprietary - there’s no excuse for it,
and it dramatically increases your risk• Binary files generally• Mixing content in files (e.g. embedding XML in PDF)• Proprietary digital signatures
Data formats for the long term
http://www.amphora-research.com/19
IP concerns & data formats
• Companies have always used Proprietary Data Formats as a competitive weapon
• Companies are waking up to the use of IP tools (licenses, patents, copyrights) to reinforce their control over data formats
• Just because a format is published doesn’t mean it is open• The Microsoft Office XML formats are a particularly
bad example• Right now it looks positively radioactive• They’re being very careful what they say which
indicates to me they’re planning something• http://www.groklaw.net/article.php?
story=20050330133833843• (see section: 4. Dissecting Microsoft’s “Patent License”)
http://www.amphora-research.com/20
• There are so many to choose from!• Two key ways of generating “Standards”
• De Facto - dominant supplier/format• De Jure - committee based
• Who gets to “bless” a standard?• What makes a “good standard”
• De Jure process has difficulty keeping up with the real world
• De Facto process has risk of lock-in• Pragmatic approach
• Expect your suppliers to use open file formats• If there is an acceptable standard, use it• Make sure you are using the right kind of format for
each purpose
Standards
http://www.amphora-research.com/
Records considerations
• Not all the “Stuff” that’s generated during the research process is the same• Some of if needs to be kept for a long time• Some is only useful for the moment• Some will be benefit anyone• Some is only really useful for the person who created it
(using specialized tools)• Some material is suitable for long term
preservation, some isn’t• You can go crazy getting into this in too much
detail• But you also need to make sure your tools and
processes do allow you to manage the data/records you’re creating
21
http://www.amphora-research.com/22
• Geographical space• In wide area networks, latency becomes the most
noticeable issue• Over multiple timezones, acceptable “Maintenance
Windows” disappear• More data
• Number of data items• Size of individual data items
• Number of users• Larger populations generally mean more disparate
requirements• How many people will get upset if the system goes
down
Scalability
http://www.amphora-research.com/
Latency
• The science-specific “Deep” systems• Often highly interactive
• Lots of round trips to the server for data etc.• This is what makes them cool
• You can’t beat the speed of light (and network hardware add significant latency)
• Therefore need to have a server close to the end user• Federation will give you a single overview
• “Broad” systems have different usage characteristics• Very much like a normal web site, latency is much less
of a problem• Very easy to have one system for worldwide use, even
for large companies• Building large systems quite easy
23
http://www.amphora-research.com/
Web-based systems
• “Web based” has become a bit of a marketing tool• Generally thin clients offer a lower TCO• And hence IT like them
• In practice, most science-supporting ELN front ends will be delivered as a “thick” client• There’s a reason it’s called a browser• Wrapping an OLE object in IE is still “thick”
• However, “Ajax” systems like GMail and Google Maps show just what you can do with a web-based system
• Web based systems should expose a sensbiel URL interface
24
http://www.amphora-research.com/
How your network can help you
• There’s a whole load of useful network services and Interfaces that large companies have
• Useful ones• Single Sign On• LDAP• Printer/Fileserver etc.• Security/Status monitoring etc.
• Beware of Central Digital Signature Infrastructure• Mixing vulnerabilities - leaves you open to accidents• Often not designed for long term use
25
http://www.amphora-research.com/26
• Major issues• Diversity & Flexibility• Project size/Justification/ROI• Creating & Preserving Evidence for Patents• Need for long term access to ELN contents• Scale• Web-based systems
• Trends• Integration methods• Open Source• In the lab• Ones to watch
ELN architecture
http://www.amphora-research.com/
Integration methods
• RPC-like mechanisms• Service Oriented Architecture• SOAP• REST
• Text file passing (files, email, etc.)• URL launching
• Often overlooked, but very powerful
• What’s important• Loose-coupling• Open, lightweight systems• Consistent, stable keys• Stable URL (& domain) space
27
http://www.amphora-research.com/28
• Definitely one to watch• Not the “Free” lunch you might think, but a
pragmatic business too• Examples
• Linux• Postgres• JBoss, Tomcat etc.• Ghostscript
• Open Source is part of everyone’s infrastructure• Make sure you can run your systems on a variety of
platforms
Open Source
http://www.amphora-research.com/29
• Good for records• Gives you top-to-bottom control
• Good for TCO• We’re finding the Open Source infrastructure easier to
setup and reliable than proprietary alternatives• Enables a better solution
• Transparent systems mean you can do things the original designers didn't think of
• This is especially important for ELNs
Why?
http://www.amphora-research.com/30
• This is just our experience offering people alternatives for the server portion
• 2000 - “What's Open Source? What’s Linux?”• 2001 - No way!• 2002 - some pilots underway, some acceptance• 2003 - majority of installations are Open Source
infrastructure• 2005 - we’re wondering where Windows is• We’re not abandoning proprietary infrastructure
• But it is clear that Open Source is getting serious consideration
• Seeing a migration away from proprietary infrastructure to Open Source
Data point
http://www.amphora-research.com/
In the lab
• ELN use in the lab is a hard problem• Tablets, Laptops, Palmtops etc. doesn’t seem to be
working• What does seem to work
• Small form-factor PCs on the bench• Remote Desktop & Citrix
31