getting unstuck: working with legacy code and data cory foy –

49
GETTING UNSTUCK: GETTING UNSTUCK: WORKING WITH LEGACY WORKING WITH LEGACY CODE AND DATA CODE AND DATA Cory Foy – http://www.cornetdesign.com

Upload: edwin-webb

Post on 17-Dec-2015

220 views

Category:

Documents


4 download

TRANSCRIPT

GETTING UNSTUCK: GETTING UNSTUCK: WORKING WITH LEGACY WORKING WITH LEGACY CODE AND DATACODE AND DATACory Foy – http://www.cornetdesign.com

GoalsGoals

What is Legacy Code? How do we change Legacy Code? Common patterns for code bases Does Legacy Code have to be code, or

can it be something else like a really long bullet on a PowerPoint slide, or perhaps a database?

Next Steps

Legacy CodeLegacy Code

How do you define Legacy Code? Several definitions possible

Code we’ve gotten from somewhere else Code you have to change, but don’t

understand Demoralizing code (Big ball of mud) Code without unit tests

Legacy CodeLegacy Code

Legacy CodeLegacy Code

Code that needs to have behavior preserved

What is behavior? The way in which someone behaves The way in which a person, organism, or

group responds to a specific set of conditions

The way that a machine operates or a substance reacts under a specific set of conditions

Legacy CodeLegacy Code

What’s the behavior of the following code?

Legacy CodeLegacy Code

Does the following code add behavior?

Legacy CodeLegacy Code

Now have we changed the behavior?

How do we change Legacy How do we change Legacy Code?Code? Why would we want to change the code? Four reasons to change software

Adding a feature Fixing a bug Improving the design Optimizing resource usage

Each has unique attributes

Adding a feature / Fixing a Adding a feature / Fixing a bugbug Causes the following changes

Structure Functionality (adding or replacing)

Need to be able to know the new functionality works

Need to be able to know that the system as a whole is still functioning appropriately

Improving the DesignImproving the Design

Causes the following changes: Structure

Note that it does functionality is not listed above

Important to be able to know that all functionality works before and after the change

Optimizing Resource UsageOptimizing Resource Usage

Changes Resource usage May cause structure change

Again note that functionality is ideally not in the above list

Need to have a way to make sure functionality was not changed

Need to have a way to verify the optimization goals have been met (and stay met)

Edit and PrayEdit and Pray

Carefully plan the changes you are going to make

Make sure you understand the code to be modified

Make the changes Run the system to make sure the change was

made Do some additional testing to smoke test that

everything seems to be functioning Pray you don’t get a call at 2am that the system

doesn’t work anymore

Cover and ModifyCover and Modify

Verify that the system is working by running the tests

Write tests to expose the behavior you want to add or change

Write code to make the test pass Refactor duplication Wash, rinse, repeat Verify the system is still working by

running the tests

Feather’s Legacy Change Feather’s Legacy Change AlgorithmAlgorithm Michael Feather’s discusses a Legacy Code

Change Algorithm in Working Effectively with Legacy Code

Five steps Identify change points Find test points Break dependencies Write tests Make changes and refactor

These steps have common steps and scenarios

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Identify Change Points

One of the key areas architects and architecture comes into play

If you aren’t sure where, put it in – you can refactor later (with unit test support)

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Identify Change Points

Scenarios I don’t understand the code well enough to

change it Notes / Sketching Listing Markup

Separate Responsibilities Understand method structure Extract Methods Effect Sketch

Scratch Refactoring Delete Unused Code

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Identify Change Points

Scenarios My application has no structure

Tell the story of the system Naked CRC (Class, Responsibility, and

Collaborations) Conversation Scrutiny

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Find Test Points

Where can you write tests to exercise the behavior you want to add/change?

Important to have team standards for where unit tests should go

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Find Test Points

Scenarios I need to make a change, what methods should

I test? Reason about effects (Effect Sketch) Reasoning Forward (TDD) Effect propagation Effect reasoning Effect analysis

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Find Test Points

Scenarios I need to make many changes in one area – do

I have to break all dependencies? Interception Points Higher-Level interception points Pinch Points (encapsulation boundary) Pinch Point Traps

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Break Dependencies

Generally the most difficult part of the process

Usually don’t have tests to tell if breaking dependencies will cause problems

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Break Dependencies

Scenarios How do I know I’m not breaking anything?

Hyperaware editing Single-goal editing Preserve Signatures Lean on the compiler Pair Programming (aka Real-Time Code Reviews)

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Break Dependencies

Scenarios I can’t get this class into a test harness

Irritating Parameters Hidden Dependencies Construction Blob Irritating Global Dependency Horrible Include Dependencies Onion Parameter Aliased Parameter

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Break Dependencies

Scenarios I can’t run this method in a test harness

Hidden Methods “Helpful” language features Undetectable Side Effect

Sensing variables Command/Query Separation

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Break Dependencies

Scenarios I need to change a monster method and can’t

write tests Introduce sensing variables Extract what you know Break out a method object Skeletonize Methods Find Sequences Extract to the current class first Extract small pieces Be prepared to redo extractions

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Break Dependencies

Scenarios It takes forever to make a change

Understanding Lag Time Breaking Dependencies Build Dependencies

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Write Tests

Tests may be more difficult to write then normal unit tests

May have less-than-ideal scenarios

Patterns for the Change Patterns for the Change AlgorithmAlgorithm Write Tests

Scenarios I need to make a change, but don’t know what

tests to write Characterization Tests Characterizing Classes Targeted Testing

Writing Characterization Tests Write tests for the area you’ll be making the

change. Write as many as you need to understand the code.

Then write tests for the things you need to change If converting or moving functionality, write tests to

verify the behavior on a case-by-case basis

DEMO: Change Algorithm at DEMO: Change Algorithm at WorkWork Step through a common scenario,

implementing the tests as we go

Legacy Code isn’t just CodeLegacy Code isn’t just Code

Most applications aren’t just simple console apps

They deal with many dependencies File Systems Registries Databases Hardware

Legacy Code isn’t just CodeLegacy Code isn’t just Code

These dependencies can cause legacy problems of their own Database schemas Existing data in the tables Business logic in the database No access to development data that mirrors

production In other words, Legacy Data

Legacy DataLegacy Data

So where does this Legacy Data come from? Flat Files XML Documents RDB’s Object DB’s Other DB’s Application Wrappers Your DB Many, many sources

Legacy DataLegacy Data

Legacy data produces its own unique set of challenges Data quality Data architecture problems Database design problems Process-related challenges

Data QualityData Quality

Common Data Quality problems•A single column is used for several purposes•Determining the purpose of a column by the value of one or more other columns•Inconsistent data values / formatting•Missing data / columns•Additional columns•Important attributes and relationships are hidden in text fields•Data values that stray from their field descriptions and business rules

•Various key strategies for the same type of entity•Unrealized relationships between data records•One attribute is stored in several fields•Inconsistent use of special characters•Different data types for similar columns•Different levels of detail•Different modes of operation•Varying timeliness of data•Varying default values•Various representations

http://www.agiledata.org/essays/legacyDatabases.html#DataProblems

Data Architecture ProblemsData Architecture Problems

Common Architectural Problems may include: Applications responsible for data cleansing (instead

of DB) Different database paradigms Different hardware platforms / storage Fragmented / Redundant / Inaccessible data sources Inconsistent semantics Inflexible architecture Lack of event notification No or inefficient security Varying timeliness of data sources

Design ProblemsDesign Problems

There may be key design issues with the database Database encapsulation scheme exists, but

it’s difficult to use Ineffective (or no) naming conventions Inadequate documentation Original design goals at odds with current

project needs Inconsistent key strategy Design goals at odds with data storage

(treating relational DBs as object DBs, etc)

Design ProblemsDesign Problems

Example Application which presented custom forms

to users Implementers could create custom forms

with custom questions and validations Beautiful OO architecture – Forms had

Groups which had Items Everything was rendered dynamically and

could be updated on the fly

Design ProblemsDesign Problems

Example The Form, Group, Item and other “objects”

were all stored as individual records in one database table

A user in the system had on average 74 forms with an average of 30 questions. With a target of 20,000 users in the database, this would lead to over 50 million rows in the one table.

We identified one stored proc as one of the main culprits. It had something like the following

Design ProblemsDesign Problems

Example INSERT INTO @tmpTable

SELECT ot.myCol FROM OtherTable ot WHERE ot.bitMask & (144567 | 99435) = 0

This led to a full table scan for one of their most heavily used procs – degrading performance significantly (average page load time of over 7 seconds)

Working with Legacy DataWorking with Legacy Data

So how do you deal with legacy data? Strategies

Avoid it Develop Error Handling Strategy Work Iteratively and Incrementally Prefer Read-Only Legacy Access Encapsulate Legacy Data Access Introduce Data Adapters for Simple Data Access Introduce a staging database for complex

access Adopt Existing Tools

Working with Legacy DataWorking with Legacy Data

We couldn’t avoid the data – the proc had to be changed

So we developed an incremental 5 step plan Add an IsValidRecord column to the table Update the Column based on the bitmask for

each row Change the proc to use the column instead of

the bitmask Make sure all tests are still passing Introduce Update and Insert Triggers to

automatically populate the column

Working with Legacy DataWorking with Legacy Data

Advantages Required no change to application code We could rapidly test the application We could make incremental changes to see

improvements What made it work

Testing/QA Database with production-like data Regression tests to insure functionality Timing tests to show performance

improvement

Process ProblemsProcess Problems

All the issues aren’t technical Working with legacy data when you don’t have

to Data design drives your object model Legacy data issues overshadow everything else App developers ignore legacy issues You choose not to refactor the legacy data

sources Politics You are too focused on the data to see the

software

Refactoring DatabasesRefactoring Databases

Databases should not be left out of the refactoring process “An interesting observation is that when you

take a big design up front (BDUF) approach to development where your database schema is created early in the life of your project you are effectively inflicting a legacy schema on yourself. Don’t do this.”

Scott Ambler maintains a catalog of DB Refactoring

How do you refactor a database?

Refactoring DatabasesRefactoring Databases

Refactoring DatabasesRefactoring Databases

Implementing Database Refactoring in your organization Start simple Accept that iterative and incremental

development is the norm Accept that there is no magic solution to

get you out of your existing mess Adopt a 100% regression testing policy Try it

Next StepsNext Steps

Dealing with legacy code is hard Integration issues Code Issues Political Issues

There are ways out Important to address pain points first

Next StepsNext Steps

So where can you go from here? Working Effectively With Legacy Code by

Michael Feathers Agile Database Techniques by Scott Ambler Refactoring Databases by Scott Ambler http://www.agiledata.org NUnit, JUnit, CppUnit, CppUnitLite, dbFit,

Fitnesse http://www.cornetdesign.com