audeosynth: music-driven video montage - zichengl.netzichengl.net/stuff/montage-sg15talk.pdf ·...
TRANSCRIPT
AudeoSynth: Music-Driven Video Montage
Zicheng Liao
Zhejiang University
Bingchen Gong
Zhejiang University
Lechao Cheng
Zhejiang University
Yizhou Yu
University of Hong Kong
ACM SIGGRAPH 2015
The success of visual media synthesis
Video textures [2000] Animating pictures [2005] De-animating video [2012]
Progressive video loop [2013]Cinemagraphs [2012] Cliplets [2012]
Video synopsis [2008]
The success of visual media synthesis
Image analogy [2001]Graph cut texture synth [2003]Texture synthesis [1999 & 2001]Pyramid blending [1983]
Gradient domain editing [2003] Digital photo montage [2004] stitching & panorama [2003]
*Silent* pixels
Other dimensions of human sensation are absent
- hear, touch, smell or taste
- design for 5-sense [Jinsop Lee 2013]
Add sound to the game
- why sound?
Source: www.MontblancOneSecond.com [#NOT paper result]
Co
nte
nt A
na
lysis
Op
tim
ization
Vid
eo M
on
tage
Music Driven Video Montage
Applications
Video summary and online sharing
Timelapse photography [Louie Schwartzberg 2011]
Hyperlapse videos [Joshi et al. 2015, Kopf et al. 2014]
Smartphone app in Apple Store or Google Play
A challenging new task
How to formulate this task?
How to write an objective function?
How to find a solution?
How to evaluate?
How to translate the subtleties of an artistic process
into a machine algorithm?
Principle I: Synchronization
Time & pace of visual activities to follow with music
Audio-Visual Synchresis
- Mental fusion when sound and visual occur at the same time
- An instinct for survival developed from the ancient
- Footsteps synchronized with music beat, popping with drum
- Film editing, animations, dancing (“dance to the beat”).
[Michel Chion 1994]
Principle II: Cut-to-the-Beat
Montage: A language of visual expression
Timing is KING
- Music transition points
- Beginning of music bars
“Mosaic, assembling, or a juxtaposition of imagery, …
an orchestration” - Alfred Hitchcock
[Walter Murch 2001]
“to separate and punctuate an idea from what follows”
- Walter Murch
Alfred Hitchcock
FormulationM
usi
cV
ideo
cli
ps
scaling factor
segment 1 segment 2 segment 3 segment 4
Music-Driven Imagery
segment 1 segment 2 segment 3 segment 4
mu
sic
vid
eos
Energy function
pairs
synchronization
Overview
Music
Video clipsVideo clipsVideo clips
Analysis
Video clipsVideo clipsmotion
frequency
dynamism
segments
note onsets
saliency
Optimization
Pre-
compute
Output
Rendering
MCMC
optimization
Energy
function
Music analysis
MIDI: Musical Instrument Digital Interface
- Music industrial standard protocol (1983)
- Connects instruments, sequencers and software
- Online databases (free-midi.org; 8notes.com)
- Semantical encoding language of music A MIDI controller
source: http://wikipedia.org/MIDI
MIDI formatMIDI event
TIME EVENT ID channel P1 P2
Event types:
ID P1 P2
Note off 0x8 pitch velocity
Note on 0x9 pitch velocity
Note aftertouch 0xA note # value
Controller 0xB controller # value
Program change 0xC program # channel
Channel aftertouch 0xD value NA
Pitch Bend 0xE value 1 value 2
Program change event: <0xC program# channel>
Program #
01 – 08: Piano Timbres
09 – 16: Chromatic percussion
17 – 24: Organ Timbres
25 – 32: Guitar Timbres
…
105 – 112: Ethnic Timbres
113 – 128: Sound Effects (Tinkle Bell, Breath noise, Bird Tweet, etc)
Music metadata
Clef, meter and tempo
Music segmentation
Bottom up hierarchical segmentation
“Agglomerative image segmentation with superpixels”
Music bars as “superpixels”
Bar 1 Bar 2 Bar 3 Bar 4 Bar 5
Music temporal saliency
For audio-visual alignment (synchronization)
8 note onset scores
pitch-peak, pitch-shift, deviated-pitch, before-a-long-interval, after-a-long-interval, start-of-a-bar, start-of-a-new-bar, start-of-a-different-bar
Convolve with Gaussian kernel
salie
ncy
MID
I
Optical flow as generic visual descriptor [Liu et al. 2005]
Motion change rate (MCR)
Iterative back propagation [Yang et al. 2011]
Visual temporal saliency
Video analysis cont’d
Motion frequency- Project motions in discretized directions
- Power spectral density analysis over time window
- Take the frequency with largest 𝑝𝑠𝑑
Flow peak and dynamism
𝑑 = 0
𝑑 = 1
𝑑 = 2
𝑑 = 3
…………
Matching cost
Synchronization cost Pace/frequency cost
Transition cost
Pace/velocity compatibility # tracks/dynamism compatibility
Optimization
A combination of continuous and discrete optimization
Non-convex
Cannot do gradient descent
Two-Stage Optimization
Stage 1m
usi
c
segment 1 segment 2 segment 3 segment 4
Stage 2
MC
R
scalable sliding window
frame (video timeline)
Stage I
start framescaling factor
music timelinemusic temporal saliency
Global
alignment
music timeline
Temporal
snapping
end frame
Stage II
Metroplis-Hasting algorithm- Two mutations options
- node label update
- Two nodes label swap
- Reversibility constraint- Uniform distribution for label update
segment 1 segment 2 segment 3 segment 4
musi
cvid
eos
Result: wild
Input: 35 videos of wild life scene; music: Exploration (excerpt)
Result: Aurora
Input: 36 aurora videos; Music: Someone like you (excerpt)
Result: City timelapse
Input: 55 timelapse videos of city timelapse; Music: Clocks (excerpt)
Comparison sync/no-sync
Feature turned offFeature turned on
Without cut-to-the-beatWith cut-to-the-beat
cut-to-the-beatComparison
User study
Experiment set up
- 5 groups: ours, - cut-to-the-beat, -sync, Avg User, Expert User
- 6 examples (right)
- 29 participants
- random order, rate from 1 to 5
- Subpopulation analysis by questionnaire
Aurora City timelapse
Happy birthday Adventure
Ballet Wild
w/out sync.
Manual edits
w/out cut-to-beatOurs Expert User Avg user
User study resultsAverage rating of different methods Average rating of different examples
Fraction of best VS worst rate for each method Fraction of higher VS lower rate in pairs of methods
Limitations
Manual selection of music
Storyline is not preserved
The use of MIDI: bad side and good side
Future work
Replace MIDI with .wav and .mp3
Put human in the loop
Video-based music recommendation
...
http://web.engr.illinois.edu/~liao17/montage.html
Thank You