user benefits of non-linear time compression

32
User Benefits of Non-Linear Time Co User Benefits of Non-Linear Time Co mpression mpression 1 User Benefits of Non-Linear User Benefits of Non-Linear Time Compression Time Compression Liwei He & Anoop Gupta Liwei He & Anoop Gupta September 21st, 2000 September 21st, 2000 Microsoft Research Microsoft Research

Upload: twyla

Post on 21-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

User Benefits of Non-Linear Time Compression. Liwei He & Anoop Gupta September 21st, 2000 Microsoft Research. Overview. In comparison to text, audio-video content is much more challenging to browse Time-compression has been suggested as a key technology that can support browsing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time CompresUser Benefits of Non-Linear Time Compressionsion

11

User Benefits of Non-Linear User Benefits of Non-Linear Time CompressionTime Compression

Liwei He & Anoop GuptaLiwei He & Anoop GuptaSeptember 21st, 2000September 21st, 2000

Microsoft ResearchMicrosoft Research

Page 2: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 2

OverviewOverview

► In comparison to text, audio-video content is much In comparison to text, audio-video content is much more challenging to browsemore challenging to browse

► Time-compression Time-compression has been suggested as a key has been suggested as a key technology that can support browsingtechnology that can support browsing

► Time compression speeds-up the playback of audio-Time compression speeds-up the playback of audio-video content without causing the pitch to changevideo content without causing the pitch to change

► Simple forms of time-compression are starting to Simple forms of time-compression are starting to appear in commercial streaming-media products appear in commercial streaming-media products from Microsoft and Real Networks. from Microsoft and Real Networks.

Page 3: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 3

Non-linear time compressionNon-linear time compression

► In this paper we explore the potential In this paper we explore the potential benefits of more recent and advanced benefits of more recent and advanced types of time compression, called types of time compression, called non-non-linear time compressionlinear time compression..

►The most advanced of these The most advanced of these algorithms exploit fine-grain structure algorithms exploit fine-grain structure of human speech (e.g., phonemes)of human speech (e.g., phonemes) to differentially speed-up segments of to differentially speed-up segments of

speech so that the overall speed-up can speech so that the overall speed-up can be higherbe higher

Page 4: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 4

OverviewOverview

►Also we explore what are the actual Also we explore what are the actual gains achieved by end-users from gains achieved by end-users from these advanced algorithmsthese advanced algorithms

►And whether the gains are worth the And whether the gains are worth the additional systems complexity.additional systems complexity.

►Categories: Categories: Time compression, Digital library, Time compression, Digital library,

Multimedia browsingMultimedia browsing

Page 5: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 5

MotivationMotivation

►Digital multimedia information on the Digital multimedia information on the Internet is growing at an increasing rateInternet is growing at an increasing rate corporations are posting their training materials corporations are posting their training materials

and talks onlineand talks online universities are putting up their videotaped universities are putting up their videotaped

courses onlinecourses online news organizations are making newscasts news organizations are making newscasts

availableavailable►While the network bandwidth is somewhat While the network bandwidth is somewhat

of a bottleneck todayof a bottleneck today The eventual bottleneck really is the limited The eventual bottleneck really is the limited

human time.human time.

Page 6: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 6

Motivation!Motivation!

It is highly desirable to have Technologies that It is highly desirable to have Technologies that let people browse audio-video quickly let people browse audio-video quickly

The impact of even a 10% increase in browsing speed can The impact of even a 10% increase in browsing speed can be largebe large

people may have different reading rates people may have different reading rates We can provide people the ability to speedup or We can provide people the ability to speedup or

slow-down audio-video content based on their slow-down audio-video content based on their preferences preferences

Also we try to focus on informational content Also we try to focus on informational content with speech (e.g., talks, lectures, and news) with speech (e.g., talks, lectures, and news) rather than entertainment content (e.g., music rather than entertainment content (e.g., music videos, soap operas), videos, soap operas),

Page 7: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 7

TechnologyTechnology

►Core technology is called Core technology is called time-compressiontime-compression►Simple forms of time-compression have been used Simple forms of time-compression have been used

before in hardware device contexts and telephone before in hardware device contexts and telephone voicemail systemsvoicemail systems

►systems today use systems today use linear time-compressionlinear time-compression speech content is uniformly time compressed,speech content is uniformly time compressed,

►e.g., every 100ms chunk of speech is shortened to e.g., every 100ms chunk of speech is shortened to 75ms. 75ms.

►users can save more than 15 minutes on a one-hour users can save more than 15 minutes on a one-hour lecture.lecture.

Page 8: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 8

non-linear time-compressionnon-linear time-compression

►we explore how much additional benefit can we explore how much additional benefit can be achieved from be achieved from non-linear time-non-linear time-compression compression techniquestechniques

►We consider two such algorithms:We consider two such algorithms: The first, simpler algorithm combines pause-The first, simpler algorithm combines pause-

removal with linear time compressionremoval with linear time compression► It first detects pauses (silence intervals) in the speechIt first detects pauses (silence intervals) in the speech► then shortens or removes the pausesthen shortens or removes the pauses►Such a procedure can remove 10-25% from normal Such a procedure can remove 10-25% from normal

speechspeech► It then performs linear time compression on the It then performs linear time compression on the

remaining speech.remaining speech.

Page 9: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 9

non-linear time-compressionnon-linear time-compression

►Algorithm 2 is much more Algorithm 2 is much more sophisticated sophisticated It tries to mimic the compression It tries to mimic the compression

strategies that people use when they talk strategies that people use when they talk fast in natural settings fast in natural settings

Also it tries to adapt the compression rate Also it tries to adapt the compression rate at a fine granularity based on low level at a fine granularity based on low level features (e.g., phonemes) of human features (e.g., phonemes) of human speech.speech.

Page 10: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 10

Core QuestionsCore Questions

► non-linear algorithms, while offering the non-linear algorithms, while offering the potential for higher speed-ups, require:potential for higher speed-ups, require: more compute (CPU) cyclesmore compute (CPU) cycles increased complexity in client-server systems for increased complexity in client-server systems for

streaming mediastreaming media may result in a jerky video portionmay result in a jerky video portion

► core questions we address:core questions we address: What are the additional benefits of the non-linear What are the additional benefits of the non-linear

algorithms over the simple linear time-algorithms over the simple linear time-compression algorithm implemented in products compression algorithm implemented in products today?today?

Page 11: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 11

Core QuestionsCore Questions

► Most people will not listen to speech at such fast Most people will not listen to speech at such fast rates. rates. We are interested in understanding people’s We are interested in understanding people’s

preference at more comfortable and sustainable preference at more comfortable and sustainable speed-up rates. speed-up rates.

if the difference at sustainable speed is large will it be if the difference at sustainable speed is large will it be worthwhile to implement these algorithms in worthwhile to implement these algorithms in products.products.

► How much better is the more sophisticated How much better is the more sophisticated algorithm over the simpler non-linear algorithm? algorithm over the simpler non-linear algorithm? magnitude of differences will again guide our magnitude of differences will again guide our

implementation strategy in products implementation strategy in products

Page 12: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 12

Linear Time Compression Linear Time Compression (Linear) (Linear)

► time-compression is applied consistently time-compression is applied consistently across the entire audio stream across the entire audio stream with a given speed-up rate, without regard to the with a given speed-up rate, without regard to the

audio information contained therein audio information contained therein The most basic technique for achieving time-The most basic technique for achieving time-

compressed speech involves taking short fixed compressed speech involves taking short fixed length speech segments (e.g., 100ms), and length speech segments (e.g., 100ms), and discarding portions of these segments (e.g., discarding portions of these segments (e.g., dropping 33ms segment to get 1.5-fold dropping 33ms segment to get 1.5-fold compression), and abutting the retained compression), and abutting the retained segments.segments.

Page 13: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 13

Linear Time Compression Linear Time Compression (Linear)(Linear)

►Discarding segments and abutting the Discarding segments and abutting the remnants remnants produces discontinuities at the interval produces discontinuities at the interval

boundaries and produces audible clicks and boundaries and produces audible clicks and other forms of signal distortion other forms of signal distortion

►To improve the quality of the output To improve the quality of the output signal:signal: a windowing function or smoothing filter–a windowing function or smoothing filter–

such as a cross fade– can be applied at the such as a cross fade– can be applied at the junctions of the abutted segments junctions of the abutted segments

Page 14: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 14

Linear Time Compression Linear Time Compression (Linear)(Linear)

►A technique called A technique called Overlap Add (OLA) Overlap Add (OLA) yields good quality:yields good quality:

Page 15: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 15

Linear Time Compression Linear Time Compression (Linear)(Linear)

►The technique used in this study is The technique used in this study is SOLA SOLA It consists of shifting the beginning of a It consists of shifting the beginning of a

new speech segment over the end of the new speech segment over the end of the preceding segment to find the point of preceding segment to find the point of highest waveform similarity. highest waveform similarity.

Once this point is found, the frames are Once this point is found, the frames are overlapped and averaged together overlapped and averaged together

SOLA provides a locally optimal match SOLA provides a locally optimal match between successive frames and mitigates between successive frames and mitigates the reverberations the reverberations

Page 16: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 16

Pause Removal plus Linear Time Pause Removal plus Linear Time Compression (PR-Lin)Compression (PR-Lin)

►Non-linear time compression is an Non-linear time compression is an improvement on linear compression:improvement on linear compression: the content of the audio stream is analyzedthe content of the audio stream is analyzed and compression rates may vary from one and compression rates may vary from one

point in time to anotherpoint in time to another

►Typically, non-linear time compression Typically, non-linear time compression involves compressing redundancies, i.e.:involves compressing redundancies, i.e.: pauses or elongated vowelspauses or elongated vowels

Page 17: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 17

Pause Removal plus Linear Time Pause Removal plus Linear Time Compression (PR-Lin)Compression (PR-Lin)

►The PR-Lin algorithm used in this paper, The PR-Lin algorithm used in this paper, first detects pauses:first detects pauses: It leaves pauses below 150ms untouched, It leaves pauses below 150ms untouched,

and shortens longer pauses to 150msand shortens longer pauses to 150ms It then applies linear time-compressionIt then applies linear time-compression variety of measures can be used for variety of measures can be used for

detecting pauses even under noisy conditionsdetecting pauses even under noisy conditions►““Energy” and “Zero crossing rate (ZCR)” is usedEnergy” and “Zero crossing rate (ZCR)” is used►Also, in order to adjust changes in the background Also, in order to adjust changes in the background

noise level, a dynamic energy threshold is usednoise level, a dynamic energy threshold is used

Page 18: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 18

Pause Removal plus Linear Time Pause Removal plus Linear Time Compression (PR-Lin)Compression (PR-Lin)

► If the energy of a frame is below the dynamic threshold and its ZCR is under the fixed threshold, the frame is categorized as a potential-pause frame, otherwise it is labeled as a speech frame. Contiguous potential-pause frames are marked as

real-pause frames when they exceed 150ms.

► Pause removal typically shortens the speech by 10-25% before linear time-compression is applied.

Page 19: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 19

Adaptive Time Compression Adaptive Time Compression (Adapt)(Adapt)

A variety of sophisticated algorithms have A variety of sophisticated algorithms have been proposed for non-linear Adpt.been proposed for non-linear Adpt.►i.e. preserving the phoneme transitions in the i.e. preserving the phoneme transitions in the

compressed audio to improve understandabilitycompressed audio to improve understandability

Audio spectrum is computed first for audio frames of Audio spectrum is computed first for audio frames of 10ms10ms

If the magnitude of the spectrum difference between If the magnitude of the spectrum difference between two successive frames is above a threshold, they are two successive frames is above a threshold, they are considered as a phoneme transition and not compressedconsidered as a phoneme transition and not compressed

►Mach1 makes further improvements and tries to Mach1 makes further improvements and tries to mimic the compression that takes place when mimic the compression that takes place when people talk fast in natural settingspeople talk fast in natural settings

Page 20: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 20

Adaptive Time Compression Adaptive Time Compression (Adapt)(Adapt)

strategies come from the linguistic studies strategies come from the linguistic studies of natural speech:of natural speech:►Pauses and silences are compressed the mostPauses and silences are compressed the most►Stressed vowels are compressed the leastStressed vowels are compressed the least►Schwas and other unstressed vowels are Schwas and other unstressed vowels are

compressed by an intermediate amountcompressed by an intermediate amount►Consonants are compressed based on the Consonants are compressed based on the

stress level of the neighboring vowelsstress level of the neighboring vowels►On average, consonants are compressed more On average, consonants are compressed more

than vowelsthan vowels

Page 21: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 21

Adaptive Time Compression Adaptive Time Compression (Adapt)(Adapt)

Mach1 estimates continuous-valued measures of Mach1 estimates continuous-valued measures of local emphasis and relative speaking rate.local emphasis and relative speaking rate.

Together, these two sequences estimate the Together, these two sequences estimate the audio tension:audio tension:► the degree to which the local speech segments resist the degree to which the local speech segments resist

changes in rate.changes in rate.

High tension regions are compressed less and High tension regions are compressed less and low-tension regions are compressed more low-tension regions are compressed more aggressively.aggressively.►Based on the audio tension, the local target compression Based on the audio tension, the local target compression

rates are computed and used to drive a standard time-rates are computed and used to drive a standard time-scale modification algorithm, such as SOLA.scale modification algorithm, such as SOLA.

Page 22: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 22

Systems Implications of Algorithms

► In deciding between these three In deciding between these three algorithms for inclusion in products, algorithms for inclusion in products, there are two considerations: there are two considerations:

1.1. what are the relative benefits (e.g. what are the relative benefits (e.g. speed-up rates) achievablespeed-up rates) achievable

2.2. what are the costs (e.g. implementation what are the costs (e.g. implementation challenges). challenges).

We explore the former in the User We explore the former in the User Study section Study section

Page 23: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 23

(a) computational complexity(a) computational complexity

The first issue is computational complexity The first issue is computational complexity or CPU requirements.or CPU requirements.

►The first two algorithms, Linear and PRLin, are The first two algorithms, Linear and PRLin, are easily executed in real-time on any Pentium-easily executed in real-time on any Pentium-class machine using only a small fraction of the class machine using only a small fraction of the CPUCPU

►The Adapt algorithm, in contrast, has 10+ times The Adapt algorithm, in contrast, has 10+ times higher CPU requirements although it can be higher CPU requirements although it can be executed in real-time on modern desktop CPUs.executed in real-time on modern desktop CPUs.

Page 24: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 24

(b) complexity of client-(b) complexity of client-serverserver

Assumption:Assumption: people will like the time compression feature to be people will like the time compression feature to be

available with streaming-media clients where they can available with streaming-media clients where they can just turn a virtual knob to adjust speed-upjust turn a virtual knob to adjust speed-up

►a key issue has to do with buffer management and flow-a key issue has to do with buffer management and flow-control between the client and server.control between the client and server.

►The Linear algorithm has the simplest requirements, The Linear algorithm has the simplest requirements, where the server simply needs to speed-up its delivery where the server simply needs to speed-up its delivery at the same rate at which time compression is at the same rate at which time compression is requested by clientrequested by client

►The nonlinear algorithms (both PR-Lin and Adapt) have The nonlinear algorithms (both PR-Lin and Adapt) have much more complex requirements due to the uneven much more complex requirements due to the uneven rate of data consumption at the clientrate of data consumption at the client

Page 25: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 25

(c) audio-video synch. quality(c) audio-video synch. quality

►With the Linear algorithm, the rendering of With the Linear algorithm, the rendering of video frames is speeded up at the same rate as video frames is speeded up at the same rate as the speed-up for speech.the speed-up for speech.

While everything happens at higher speed, the video While everything happens at higher speed, the video remains smooth and perfect lip synchronization remains smooth and perfect lip synchronization between audio and video can be maintained.between audio and video can be maintained.

►This task is much more difficult with nonlinear This task is much more difficult with nonlinear algorithms (PR-Lin and Adapt) i.e. :algorithms (PR-Lin and Adapt) i.e. :

consider removal of a 2-second pause from the audio consider removal of a 2-second pause from the audio track: track: ►Option 1: remove the video frames corresponding Option 1: remove the video frames corresponding

to those 2 secondsto those 2 seconds

Page 26: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 26

(c) audio-video synch. quality(c) audio-video synch. quality

►In this case the video will appear jerky to the end-In this case the video will appear jerky to the end-user, although we will retain lip synchronization user, although we will retain lip synchronization between audio and video for subsequent speech.between audio and video for subsequent speech.

Option-2 is to make the video transition Option-2 is to make the video transition smoother by keeping some of the video smoother by keeping some of the video frames from that 2-second interval and frames from that 2-second interval and removing some later ones:removing some later ones:►but now we will loose the lip synchronization for but now we will loose the lip synchronization for

subsequent speech. There is no perfect solution.subsequent speech. There is no perfect solution. bottom line is that non-linear algorithms add significant bottom line is that non-linear algorithms add significant

complexity to the implementer’s taskcomplexity to the implementer’s task

Page 27: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 27

User Study Goals

► Highest intelligible speedHighest intelligible speed What is the highest speed-up factor at which the user still What is the highest speed-up factor at which the user still

understands the majority of the content?understands the majority of the content?► ComprehensionComprehension

Given the same fixed speed-up factor for all algorithms, Given the same fixed speed-up factor for all algorithms, what is a user’s relative comprehension?what is a user’s relative comprehension?

► Subjective preferenceSubjective preference When given the same audio clip compressed using two When given the same audio clip compressed using two

different techniques at the same speed-up factor, which different techniques at the same speed-up factor, which one does a user prefer? one does a user prefer?

► Sustainable speedSustainable speed What is the speed-up factor that end-users will settle on What is the speed-up factor that end-users will settle on

when listening to long pieces of content (e.g., a lecture), when listening to long pieces of content (e.g., a lecture), still assuming some time pressure?still assuming some time pressure?

Page 28: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 28

Experimental Method

►24 people participated their study 24 people participated their study ►variety of background variety of background

from professionals in local firms to retirees from professionals in local firms to retirees to homemakers to homemakers

►All of them had some computer All of them had some computer experience experience

►The listener study was Web based The listener study was Web based ►All the instructions were presented to All the instructions were presented to

the subjects via web pages the subjects via web pages

Page 29: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 29

Experimental Method

► The study consisted of four tasks:The study consisted of four tasks: Highest Intelligible Speed Task Highest Intelligible Speed Task

►find the fastest speed at which the audio was still find the fastest speed at which the audio was still intelligible intelligible

Comprehension Task Comprehension Task ► four multiple-choice questions about the conversation four multiple-choice questions about the conversation

Subjective Preference Task Subjective Preference Task ►The subjects were instructed to compare 6 pairs of clips The subjects were instructed to compare 6 pairs of clips

time time

Sustainable Speed Task Sustainable Speed Task ►asked them to imagine that they were in a hurry, but asked them to imagine that they were in a hurry, but

still wanted to listen to the clips still wanted to listen to the clips

Page 30: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 30

Listener Study Results

►Highest Intelligible SpeedHighest Intelligible Speed non-linear algorithms do significantly better than non-linear algorithms do significantly better than

LinearLinear► Comprehension TaskComprehension Task

Adapt to do best, followed by PR-Lin and Linear, Adapt to do best, followed by PR-Lin and Linear, and the comprehension differences to increase at and the comprehension differences to increase at the higher speed-up factor the higher speed-up factor

► Preference TaskPreference Task there is slight but non-significant preference for there is slight but non-significant preference for

Adapt over PR-Lin Adapt over PR-Lin ► Sustainable SpeedSustainable Speed

There is no significant difference between Adapt There is no significant difference between Adapt and PR-Lin and PR-Lin

Page 31: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 31

Concluding RemarksConcluding Remarks

► Results show that for speed-up factors most Results show that for speed-up factors most likely to be used by people, the more likely to be used by people, the more sophisticated non-linear time compression sophisticated non-linear time compression algorithms do not offer a significant algorithms do not offer a significant advantage. advantage.

► Given the substantial implementation Given the substantial implementation complexity associated with these algorithms complexity associated with these algorithms in client-server streaming-media systems, we in client-server streaming-media systems, we may not see them adopted in the near future may not see them adopted in the near future

Page 32: User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression 32

Concluding RemarksConcluding Remarks

►Based on a preliminary studyBased on a preliminary study the problem is not that the benefits are the problem is not that the benefits are

small because the sophisticated small because the sophisticated algorithms are not very good. algorithms are not very good.

In fact, end-users cannot distinguish In fact, end-users cannot distinguish between these algorithms speeding-up between these algorithms speeding-up speech and a human speaking faster. speech and a human speaking faster.

Thus delivering significantly larger time-Thus delivering significantly larger time-compression benefits to end-users compression benefits to end-users remains an open challenge for remains an open challenge for researchers.researchers.