multimedia michael christel alex hauptmann rong jin (ta) cs.cmu/~alex/mmcourse
DESCRIPTION
Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) http://www.cs.cmu.edu/~alex/mmCourse. How to get in touch with us. Mike Christel [email protected] http://www.cs.cmu.edu/~christel (412)268-7799 or x8-7799 WeH5212 Alex Hauptmann [email protected] http://www.cs.cmu.edu/~alex - PowerPoint PPT PresentationTRANSCRIPT
CarnegieMellon
CarnegieMellon
Multimedia
Michael ChristelAlex Hauptmann
Rong Jin (TA)
http://www.cs.cmu.edu/~alex/mmCourse
CarnegieMellon
How to get in touch with us• Mike Christel
• http://www.cs.cmu.edu/~christel
• (412)268-7799 or x8-7799
• WeH5212
• Alex Hauptmann
• http://www.cs.cmu.edu/~alex
• (412)268-1448 or x8-1448
• WeH5124
– Office Hours by Appointment
CarnegieMellon
Teaching Assistant
• Rong Jin
• Office WeH5316
• Office hours by appointment
• (412)268-4050 or x8-4050
CarnegieMellon
Course Outline, Part 1 of 3
More details at www.cs.cmu.edu/~alex/mmCourse
October 22 Intro to Multimedia October 25 Multimedia Enabling Technologies, Macromedia
Flash Intro and DemoOctober 29 Sound Processing, Speech RecognitionNovember 1 Digital Video Creation and Transmission
November 5 Speech Synthesis
CarnegieMellon
Course Outline, Part 2 of 3
More details at www.cs.cmu.edu/~alex/mmCourse
November 8 Image Processing
November 12 Digital Music and Music Processing
November 15 Multimedia Internet Protocols, SMIL
November 19 Synthetic Interviews: A Multimedia Company (Experiences from the Field)
November 22 Programming for Interactive Multimedia (CGI Scripts/ASP)
CarnegieMellon
Course Outline, Part 3 of 3
More details at www.cs.cmu.edu/~alex/mmCourse
November 29 Content Analysis and Coding of Digital Audio and Video, Multimedia Storage and Retrieval Management.
December 3 Video Retrieval Evaluation and TestingMultimedia Interface Design, Digital Libraries
December 6 Visual Design, Multimedia Interface Design Guidelines, Multimedia use in the future (Experience on Demand)
December 10 Multimedia as Entertainment Technology, Virtual Reality
CarnegieMellon
Homeworks
• See http://www.cs.cmu.edu/~alex/mmCourse
• 9 Homeworks planned, 10 points each
• One hard homework will be worth 20 points
• No final, no midterm
• Publish homeworks on your web page - email us URL
• Space?
CarnegieMellon
Today: Intro to Multimedia
Apple Knowledge Navigator Vision 1988
AudioAudio
ImagesImages
InformationInformationRetrievalRetrieval
StorageStorageSystemsSystems
NetworkingNetworking PsychologyPsychology
HCIHCI
DataDataCompressionCompression
NaturalNaturalLanguageLanguageProcessingProcessing
MultimedMultimediaia
CPU PowerCPU Power
VideoVideo
CarnegieMellon
Definition of Multimedia
• Multi (latin multus - numerous)
• Media, medium (latin medius, medium: middle, center, intermediary; latin mediat: intermediary, means)
• Multiple types of information captured, stored, manipulated, transmitted, and presented.
• Specifically: Images, Video, Audio (+Speech) and Text
CarnegieMellon
Definition of Multimodal• Multi (latin multus - numerous)
• Modal (latin modus: manner)
• Traditionally refers to input/output formats:
• Input:
• sounds, speech (mike)
• gestures (camera, tablet)
• eye-gaze (camera),
• mouse,
• keyboard
• Output:
• sounds, speech
• video
• Pictures
• Animations
• Text
CarnegieMellon
Perceived Information
• Physical Variables
• Sound is a waveform
• An image is a waveform
• light is electromagnetic radiation with different intensity in spatial coordinates
• color corresponds to wavelength
CarnegieMellon
History of Multimedia I
• Analog signals to sensors
• E.g. vinyl records
• Fidelity is faithfulness to the original
• Digital representation (‘60s)
• Sampling
• Quantizing
• Coding
• codec, modem, (A/D and D/A)
CarnegieMellon
Hardware Advances• CPU• Bus • Network I/O• Keyboard, Mouse• Disk• Mike + A/D Board• Camera + A/D Board• Speakers (+ D/A Board)• Display
CarnegieMellon
History of Multimedia II
• Analog controls only
• Special hardware (Displays, Scanners, FFTs)
• Integrated hardware components
• Further Integration
• Other devices
CarnegieMellon
History of Multimedia III
Limiting Factors:
• Storage Limits
• CPU Speeds
• I/O Speeds
• Network Bandwidth
CarnegieMellon
Why Digital?
• Universal storage, transmission format
• CD, internet
• Precision (Range of values, number of bits, floating point)
• Lossless transmission/storage
BUT:
• sampling rate distorts information
• size requirements may be ‘large’ compared to analog
CarnegieMellon
Digitization Process
• Sampling from an analog signal• Sampling Errors relate to signal frequencies
• Quantization Errors
CarnegieMellon
Text
• ASCII, Unicode• Formatted Text, Rich Text• Document Formats:
– Structured: Tex, HTML– Page Descriptions: Postscript, PDF
CarnegieMellon
Graphics
• Objects– circles, splines, rectangles, lines
• Editable– resize, reshape, move, colorize
• Synthetic
CarnegieMellon
Images (Pictures)
• Fixed digitized representation– bitmap, colors per pixel
• Editable in limited ways– retouch, cut and paste, remap colors, filter
[Photoshop tools]– no ‘model’ of the thing
• Captured– not just from real life, clip art, screen dump
CarnegieMellon
Audio• Sounds
– hear 15 Hz to 20 kHz– Speech is 50 Hz to 10 kHz
• Speech Recognition– It is hard to wreck a nice beach– Ice cream I scream
• Synthesis– Speech– Music
MIDI for 127 instruments, 47 percussion soundsNotes, timing
CarnegieMellon
Speech Recognition Issues
• Continuous vs Discrete• Vocabulary Size• Channel (Microphone)• Environment (Location of mike and Speaker)• Speaker Dependent/Speaker Independent• Context (Language Model)• Interactivity (Dialog Model)
CarnegieMellon
Acoustic ModelingDescribes the sounds that
make up speech
LexiconDescribes which
sequences of speechsounds make up
valid words
Language ModelDescribes the likelihoodof various sequences of
words being spoken
Speech Recognition
Speech Recognition Knowledge Sources
CarnegieMellon
Speech Variations
Style Variationscareful, clear, articulated, formal, casual
spontaneous, normal, read,dictated, intimateVoice Quality
breathy, creaky,whispery, tense,
lax, modal
Contextsport, professional,
interview, free conversation,
man-machine dialogue
Speaking Ratenormal, slow, fast,
very fast
Stress in noise, with increased vocal
effort (Lombard reflex),emotional factors (e.g. angry),
under cognitive load
CarnegieMellon
Video
• Frames comprise the video– Frame rate = delay between successive frames– minimal change between frames
• Sequencing creates the illusion of movement> 16 fps is “smooth” Standards: 29.97 is NTSC, 25 is PAL, 60 is HDTVInterlacing
• Display scan rate is different – monitor refresh rate– 60 - 70 Hz (= 1/s)
CarnegieMellon
Captured vs. Synthetic
• Animation vs Video
• Graphics vs Pictures
• Synthesizer vs Recording
• Storage? Manipulation? Processor Requirements?
• Fidelity to real world
• Hybrids are possible
CarnegieMellon
Why is Multimedia Important?• Our society -
– captures its experience,– records its accomplishments,– portrays its past– informs its masses……in pictures, audio and video
• For many, CNN has become the “publication of record”• Multimedia learning leverages “multiple intelligences” Gardner, 1993
• Multimedia Digital libraries are an essential component of– formal, informal, and professional learning– distance education, telemedicine
CarnegieMellon
Technology Push vs Market Pull
– Home Entertainment– Catalog Ordering– Multimedia Training, Education– Videoconferencing– Professional Video Services– Videomail– Speech Recognition
CarnegieMellon
Hype vs. Reality
• What is feasible, under what circumstances?
• What is possible?
• What is impossible?
• What is unlikely?
CarnegieMellon
Multimedia Visions
• DARPA: Dominate the Battle Space• HP “1995”• LSI “Flash Point”• HP “Synergies”
CarnegieMellon
Intro to Multimedia
That’s all for today
CarnegieMellon