just another bughunt

Post on 18-Jun-2015

1.257 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

It's not the bugs you know that kill a website. It's the ones you can't see, lurking just out of sight, that get you. Learn how Lafayette College identified the Lovecraftian code horrors lurking beneath its feet with tools like Splunk (server log analysis), OSSEC (server-side bad behavior monitor) and SiteImprove (web page auditing tool) and then surgically eliminated the problems. Examples include PHP scripts spewing error notices into logs, undiscovered CAS authentication failures, and thumbnail generation scripts that choke on large files.

TRANSCRIPT

Just another bughunt?

Tools to improve your site without nuking it from orbit#DPA11Ken Newquist (@knewquist) | Charles Fulton (@mackensen)

Who we are

Ken NewquistDirector, Web Applications Development Lafayette College

Charles FultonSenior Web Applications DeveloperLafayette College

#DPA11

Rebuild or Fix?

● Your website’s problems may seem intractable

● The temptation to nuke the bugs and start fresh is strong

● We’ve found tools that identify the problems so we can surgically eliminate them○ (and find a few issues we didn’t know about in the

process)#DPA11

#DPA11Tools

● Crawls web presence● Reports broken links and common

misspellings● Shows changes over time● Pretty graphs!

Siteimprove

#DPA11

Pretty graph!#DPA11

Splunk

● Log aggregation● Real-time monitoring● Rich analysis● More pretty graphs!

#DPA11

Another pretty graph!#DPA11

Nagios

● Real-time monitoring● Defines a base-line of system performance● Does not detect presence of dinosaurs

#DPA11

Dinosaurs! #DPA11

OSSEC

● Log-based intrusion detection system● Define states of acceptable behavior● No pretty graphs

#DPA11

Not a pretty graph :/#DPA11

● Define expected behavior with OSSEC & Nagios

● Test expectations with Siteimprove & Splunk

● Here be monsters

Discovering your web presence

#DPA11

Investigations #DPA11

The Lost Thumbnails

● Site: Moodle● Tools: Splunk, OSSEC● Outcome: Improved

Apache configuration

#DPA11

Sky falling!

● Splunk reported ~400 500 internal server errors within a few minutes

● Also showed concentrated bursts of 404 errors when viewing resources

● Concern within department that sky was falling

#DPA11

Sky not falling!

● System ran out of memory generating thumbnails from massive images; threw 500s

● Preview of missing images generated the 404s

#DPA11

Outcomes

● Memory limits were not reasonable● Users do not report catastrophic errors

#DPA11

Comments

● Site: WordPress● Tools: Splunk, OSSEC● Outcome: WordPress

core fixes

#DPA11

What Lies Beneath

● 500 errors are reserved for server issues● WordPress has notions of its own

○ Double-submitted comment? 500 error○ Missing a required field? 500 error○ Blank comment? 500 error

● OSSEC would ban all of these for bad behavior

#DPA11

https://github.com/bigcompany/know-your-http#DPA11

Outcomes

● Learned reasonable mistakes can yield unreasonable error codes

● Hacked core to return 200s and 400s instead

● Core is discussing what to do○ https://core.trac.wordpress.org/ticket/11286

#DPA11

Revenge of the Base Theme

● Site: WordPress● Tools: Siteimprove● Outcome: WordPress

theme fix; Apache configuration change

#DPA11

March 10: the day the links broke#DPA11

Nothing to see here … oh wait--

● Developer dismissed initial reports of login issues as user error

● Then Siteimprove said we had 1,800 new broken links

● A two-character change in RHEL defaults for httpd.conf broke WordPress

#DPA11

Lessons

● Small changes have vast consequences● Documentation is doubleplusgood

#DPA11

The Incredible Shrinking Provost

● Site: Drupal● Tools: Splunk● Outcome: Cleaned data in

ERP system

#DPA11

Who’s the fairest of them all?

● The directory passes the search query via a GET parameter

● Splunk told us our associate provost, “Jane Doe”, was most-searched by an order of magnitude

#DPA11

...we searched for “Jane Doe”...

...and the search returned...

...NOTHING!

#DPA11

Lessons

● “Jane A. B. Doe !== Jane Doe”● Data lies

#DPA11

Dumpster fire#DPA11

The Virtual Tour

● Site: Custom app● Tools: Splunk● Outcome: Fixed PHP

bugs

#DPA11

Pretty graphs!● 238,908 errors...in three days● (We didn’t expect that)

#DPA11

Fixed it!

#DPA11

Outcomes

● No one cares that we fixed the Virtual Tour ○ (we feel better though)

#DPA11

Mr. Foo and Mr. Bar

● Site: WordPress● Tools: Splunk● Outcome: Disproved long-

standing alleged bug

#DPA11

I swear I wasn’t there!

● Various reports over the years alleging that WordPress improperly reported another user was editing a post

● Much speculation and theorizing in absence of facts

#DPA11

Outcomes

● People are wrong on the Internet

#DPA11

The Cache That Wouldn’t Die● Site: WordPress● Tools: Nagios● Outcome: Database

size reduced by two-thirds

#DPA11

Doom at 11….

● Nagios had concerns

● MySQL ran out of disk space

● Size of WordPress DB tripled in two weeks

#DPA11

Pretty terminal dumps?

SELECT option_name FROM wp_190_options WHERE option_name LIKE "displayed_gallery%";...| displayed_gallery_rendering_ffffb5e48845fbb7b3347244f8aa06d4 || displayed_gallery_rendering_ffffd6d9f2ab40195295c70f775b0ee8 || displayed_gallery_rendering_ffffe1416b8d969e25ec7a6094282bbe || displayed_gallery_rendering_ffffe8e4a0c399605f434bd51be2d9d7 |+--------------------------------------------------------------+722141 rows in set (2.28 sec)

#DPA11

…Salvation at Noon

● The Google Mini found something terrible lurking in club websites

● NextGEN Gallery bug caused near-endless crawl by the mini

● Code bug meant the cache never expired

#DPA11

Outcomes

● NextGEN Gallery has stability issues● Listen to Nagios● It’s turtles all the way down

#DPA11

Attack of the Python Script● Site: WordPress● Tools: Nagios, Splunk● Outcome: Quickly

identified source of massive load event

#DPA11

Traffic Jam!

● Load on a server spiked at 800%

● Seemed bad● Nagios had more

concerns

#DPA11

Hello there!

● Splunk real-time monitoring revealed top client IPs

● We’re very popular with a misconfigured IIS Server in Oregon and its “Python-urllib/3.4” script

#DPA11

Outcomes

● Banned the IP on the proxy

● Began developing rate-limiting rules for OSSEC

#DPA11

Alternatives #DPA11

Bughunting on the cheap

W3C Link Checker● Reports on broken links to a specified depth● http://validator.w3.org/checklinkGoogle Webmaster Tools● Details on broken links and server errors● https://www.google.com/webmasters/tools/

#DPA11

More options● Bureau of Internet Accessibility

○ Cheaper than Siteimprove○ Broken link and accessibility reports○ http://www.boia.org

● Google Analytics○ Identify high-traffic broken pages○ http://google.com/analytics

● vim | grep○ Eyeballing your logs can’t hurt

#DPA11

Conclusions #DPA11

Did we really fix all those errors?

Or is logging broken?#DPA11

● Data are free● Bugs are hard to find● Reports are expensive● Good reports make finding bugs easy● You can improve your site without rebuilding

it from scratch● You will find more bugs than you can fix

Takeaways

#DPA11

#DPA11

Anatomy of a Redirect

● Tool: Splunk● Forthcoming from

Lafayette College● WordPress tries to be

helpful!

#DPA11

Join the discussion at https://core.trac.wordpress.org/ticket/16557!

#DPA11

Ken Newquist ● newquisk@lafayette.edu● @knewquistCharles Fulton ● fultonc@lafayette.edu● @mackensen

Questions?

#DPA11

top related