all that movies
DESCRIPTION
All That Movies. BIFE Presentation. Agenda. Brief Introduction Dashboard Showcase Other Interesting Discoveries Technical Design Team Work. Brief Introduction. Cereal Killers. Jiang Yongli. Peng Cen. Xia Bing. Motivation. What we want to do is all about movies - PowerPoint PPT PresentationTRANSCRIPT
All That Movies
BIFE Presentation
Agenda
• Brief Introduction• Dashboard Showcase• Other Interesting Discoveries• Technical Design• Team Work
BRIEF INTRODUCTION
Cereal Killers
Jiang Yongli Peng Cen Xia Bing
Motivation
• What we want to do is all about movies– Analyze movies and movie business
from different perspectives– Give suggestions for different kinds of
people
Target Customers
• Movie fans• Movie Journalists• Movie companies• Directors and Actors/Actresses• Investors
Data source
• IMDB website• 11,000+ movies from 1898 to
present• 81,000+ actors/actresses• 4,700+ directors• 11,000+ movie companies
DASHBOARD SHOWCASE
OTHER INTERESTING DISCOVERIES
Sex Battle
Males’ FavoriteThe Shawshank Redemption
The Godfather
Hababam sinifi
The Godfather: Part II
Inception
Pulp Fiction
The Good, the Bad and the Ugly
Tosun Pasa
The Dark Knight
Star Wars: Episode V - The Empire Strikes Back
Females’ FavoriteThe Code Conspiracy
Little Ghost
The Shawshank Redemption
Inception
Toy Story 3
The Belly of the Beast
The Lord of the Rings: The Return of the King
Schindler's List
My Father and My Son
The Dark Knight
Our Parents’ FavoriteMovie Name Movie Year
Casablanca 1943
The Godfather 1972
12 Angry Men 1957
Rear Window 1955
North by Northwest 1959
To Kill a Mockingbird 1962
The Treasure of the Sierra Madre 1948
The Godfather: Part II 1974
Raiders of the Lost Ark 1981
One Flew Over the Cuckoo's Nest 1976
And many more…
• Directors born in Maryland are fond of Comedy movies (36/55 movies) and have no interest in Animation movies (0/55 movies)
• Directors born in Rome love Horror movies (40/83 movies) and hate Romance movies (4/83 movies)
• …
TECHNICAL DESIGN
Working Progress
ETL Processing
Design Logical Data Model
Design Data Warehouse Schema
Create Project
Create Report and Dashboard
ETL Processing
• Data scale:– 131,000+ web pages
• Crawler:– Simulate HTTP request
• Extraction:– XPath + Regular Expressions
• Save to DB:– ODBC + SQL
Logical Data Model
• Time Hierarchy
Year
Month of YearQuarter
Month
Day
Logical Data Model (continued)
• Geography HierarchyContinent
State
City
Country Language
Movie
Logical Data Model (continued)
• Production Hierarchy
Company
Gender
Director
Birth DateBirth Country
Movie
Genres
Performer
Data Warehouse Schema
• 16 Look Up Tables
Data Warehouse Schema (continued)
• 2 Fact TablesMOVIE_BUSINESS
MOVIE_ID
BUDGET
BUDGET_RANGE_ID
BOX_OFFICE_US
BOX_OFFICE_WORLDWILD
BOX_OFFICE_OPENING_WEEKEND
…
RATING_FACT
MOVIE_ID
WEIGHTED_AVERAGE_RATING
AVERAGE_RATING
AVG_FEMALE_RATING
AVG_MALE_RATING
AVG_UNDER18_RATING
AVG_ABOVE45_RATING
…
Data Warehouse Schema (continued)
• 6 Relationship TablesMOVIE_DIRECTORMOVIE_PERFORMERMOVIE_GENRESMOVIE_COUNTRYMOVIE_LANGMOVIE_COMPANY
Project/Report/Dashboard Design
• 25 Tables including one Data Mart table• 21 Attributes• 53 Facts• 3 User Hierarchies• 72 Metrics• Used smart metric, level metric, evaluation order,
derived metric, view filter, conditional metric, report as filter, etc. in our reports
• Widgets used: Interactive Stack Graph, Interactive Bubble Graph, Media, Data Cloud, Heat Map, Time Series Sliders, etc.
• Miscellaneous selectors
Problems We Met
• Media widget automatically shrinks image whenever we resize it
We set filling color the same as the border color and put it in another container with same filling color to make this not obvious.
Problems We Met
• We cannot use dynamic text for different attributes with the same name(e.g. Director’s Birth Date and Performer’s Birth Date), even if we use {[dataset name]}:{[object name]}.
• We use grid to show these attributes and using formatting tricks.
Problems We Met
• View Filter on most Metrics is not valid in dashboard.
• We try to make sophisticated level metric and report filter to solve the problem.
Problems We Met
• Flash mode always timeout when loading after we merged all dashboards together.
• We divided our dashboards into two.
Problems We Met
• And many more problems…• And many more solutions…
TEAM WORK
Cooperation
• Face to face discussion• Communicator• Email• Shared Folders• Shared Intelligence Server• Everyone took part in each section
more or less
Work Foucuses
• Xia Bing:– Team leader, ETL process, recommended
directors and performers dashboard and related reports
• Jiang Yongli:– Warehouse design, project building, movie
business dashboard and related reports• Peng Cen:– Logical model design, top and bottom movies
dashboard and related reports, dashboard formatting
THANKS
Do Not Imitate! We Are Professional!