a video display processing platform for future tv concepts

12
A video display processing platform for future TV concepts Citation for published version (APA): With, de, P. H. N., Jaspers, E., Meerbergen, van, J., Timmer, A. H., & Strik, M. T. J. (1999). A video display processing platform for future TV concepts. IEEE Transactions on Consumer Electronics, 45(4), 1230-1240. https://doi.org/10.1109/30.809213 DOI: 10.1109/30.809213 Document status and date: Published: 01/01/1999 Document Version: Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication: • A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement: www.tue.nl/taverne Take down policy If you believe that this document breaches copyright please contact us at: [email protected] providing details and we will investigate your claim. Download date: 28. Jan. 2022

Upload: others

Post on 28-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

A video display processing platform for future TV concepts

Citation for published version (APA):With, de, P. H. N., Jaspers, E., Meerbergen, van, J., Timmer, A. H., & Strik, M. T. J. (1999). A video displayprocessing platform for future TV concepts. IEEE Transactions on Consumer Electronics, 45(4), 1230-1240.https://doi.org/10.1109/30.809213

DOI:10.1109/30.809213

Document status and date:Published: 01/01/1999

Document Version:Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can beimportant differences between the submitted version and the official published version of record. Peopleinterested in the research are advised to contact the author for the final version of the publication, or visit theDOI to the publisher's website.• The final author version and the galley proof are versions of the publication after peer review.• The final published version features the final layout of the paper including the volume, issue and pagenumbers.Link to publication

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, pleasefollow below link for the End User Agreement:www.tue.nl/taverne

Take down policyIf you believe that this document breaches copyright please contact us at:[email protected] details and we will investigate your claim.

Download date: 28. Jan. 2022

A VIDEO DISPLAY PROCESSING PLATFORM FOR FUTURE TV CONCEPTS

Peter H. N. de With', Egberl G. T. Jaspers', Jef L. vati Meerbcrgeti2,', Adwin H. Timmer' and Marino T. J. Strik'

'University of Mannheim. Fac. Computer Engineering, 60131 Mannheim, Germany 'Philips Research Labs., Prof. Holstlaan 4, 5656 AA Eindhoven, The Netherlands 3Eindhoven University of Technology, Den Dolech 2, 5600 MB Eindhoven (NL).

A I N r t , o - - 'This paper 1ircsc111s it geiieric niulti-pmrcasor iirchi- tcrture for video ~iroccssinp,~, fcvtnring an array of proclini i t~i i t~i~c aliplieiilios-specilic coprocessors ; ~ n d a ~~rugrammai i lc swilclc-lmr o,msianie;rtioII nctwork. Kcy ndvnslages sf the pl;itfomi are that the order 01 l imclims cui IN c l ~ m g c r l and tliiit virlco tnsks caii be pro- gmnnmed at system acid Rtnc1ion;tl levcl itsing tasks of any length, by applying dyiwinic rhta-fluw. 'lhis ~imtliilai roiircpl ih cnst-cffwlivc and suitill,le f ir n large rnnge of qil~liciatioiis and featurcs i n llie vidco tloriiain. The .*ysleni ~ a i i lie well applied in coinl~inatiOn witli micro- coiilrollcrs OI metliii proccs\ois and i t i s ~ w d for iinplcn~ciiling :in cx- Iieriiacnlid TV ~~lnlfonn.

K',?,,,~~~'lioo,ii,~-ViIlel, ~ n w c s i s l : iircliifcCtiirC, HWISW EO-dcbigii, pro- grainrn;iblc hilrtl~arc, dynamic di t to flew, multi-window, multi- proccswr, p l . i i l i c I prorcssing, switcli ~t ia l i ix , iinilicd ~nemory.

I . IN'IIWIIIJ(:TION

HE VLSI coinplexity and the corrcsp~inding siiftwiirc T rnr irnpleirrenling iiiultii i icdiii architectures iircluding advaiiced video processing require a n incrcasingly large de- sigil elfori. Chips Sor ciinsuincr-clcctriiiiics ~xoclucts arc currently being designed coiitainiirg LIP to 10 Mi l l ion gates, having ii ciinrpuliiig power i n the iirilcr iif 10 (;OPS (Giga Operations Per Secnnd) I 11. I%irtlicrinurc, the ci irres~~oiid- i i ig cmbcddcd siiftw;irc iipproaclies I MBytc Stir TVs. The arcliitcctiirc i i S ciinvcntiiinal ' [V sets lias bccn intriiduccd iiinrc tl i i irr Iialf ii century agii. Sincc that time, the tiinctiiin- alily I l lat was oSScrcd to the c i i ~ t ~ ~ n c r hiis been iiicrcased sig- nilic;intly, withiiut rcconsidering the arcliitccturc of the sys- tcin. I:catiirc extension hiixes I h e been addcd t i i upgrade Ihc c~iiivciitionnl system to top-high-cnd T V sets currently av;iilahlc iii the ciinsuiiier market. l ' ig. I shows iiii exmi- ple til' ia c~~ivcnti i i iairl top-high-end T V architecturc. Whcir standiirdiziitiiirr iif the electronic televisiiin was carried 0111 121, the architecture ;ilretidy sIIowcd the traditional stiindard video path. It consisted of the deinoduliitiis. decoder and the picture ciintrol hlock with a scp;iralc audiii path. Siiicc Ilien, an iidJitioiiial l'clctext path was iiitroduccd. 50-til- 1110 I~Iz cii~ivcrsion, I'AI .plus decoding, iin additional Picture in I'icturc (l'il'), dual-screen, new si)und standards and niiirc

I A p i q w is bcing prcswlcd sit ICCR99 by E. Jaspers t ial . . cntilled ''Chip Sci 1)n Video I l i s p l i q 01 Mullimcdia i n l m x ~ l i o d ' , dcscrihing it liml m- iiiimtioii ol l l ic nwI~ilcclul:. Onc of llie ICs. cnmiicicially aviiiliililc tis

SAASXiO. is explained in m01: delai l i n it [piqicv by 0. Sroinlnll el <(I. en ti^ llcti "TCP: A Nenl Ckxcmlion lor 'TV Contlol Pcoccssmg" (Digesl 'lbclia.

~~ ~

ICCt99. Publ. No. 99Cll36277. June 1999).

Pig. I. Thc HW aicliilcclwc o l a coiivciitiniinl Ihigli-end TV sct

rcccntly, digital lra~is~nissii in st;indards b;ascd on MI'EG All thcsc new features arc tin cxterision ,if tlic ciii~veiitioii- ill iiiipleiiieiitatioti, rcsulting i n ii suboptimal structure. Ue- ciiiise each function uses its own mcniory device ;and has a dedicated commuiiicalion patli within the ilnpleiiicntiitioii, tlic siiIutioii becoiilcs very ciist inetticient and r i g id 'Cliese restrictiiios arc iii coiill ict with nccurring Lreiids in TV appli- cations. Rcquireincnts SIN the design of future 1'V systeins should be iis fiillows. . Will1 tlic i i i tr i idi ictin~i nfdigi la l TV and tlic increasing tise ni' digital image etiliiiticcinciit, 1'V i s hcciiiniog very cli- verse in tcrins o1'Ib1ilurcs, ~cqui r i i ig II flexiblc and ICCOII-

figllrahlc systcm. . TV i s ioflueiiced by 1'C technology with respect to coiia- puting tcchiiiques, SW architectures and applicatioiis. Siricc rnetnory devices continuously I~ccoiaie larger and

Authorized licensed use limited to: Eindhoven University of Technology. Downloaded on July 07,2010 at 10:28:41 UTC from IEEE Xplore. Restrictions apply.

cheaper, i t is ilcsiralile tti hiivc oiic uii i l i ir i i i sliarctl back-

New features arid stiii idards fr(iiii intcriiiitional hodics liiivc to he rciilizcil in a short tiiiic i'raiiic iiiid requirt: iiii cx- tctisildc and scalable systeiii. Parts n l tlic iic

processing arc l rwci i , wl ieIca~ iii iitlier parts ~~Ii igrai i i ini i - tiility i s ellowed or cvcii exploitcd in~ciisivcly. New ;ipplications i n 21 1 V sct iiccd a cnst-cficiciil iliiplc- ~iiciitiitiiiii, siricr it i s ii highly cniiipctitivc Iiiiirkct.

Sonic attcinpt in the Iiiist wcrc iiiiitlc tir deal with tlicsc rc- qiiirciiiciits. l:oll-pni~rarniiinlilc miiltiinctliii siilulinns like tlic Multiincdia Video I'roccssiir (MVI') I3 I ciiii be vcry piiwcrliil, hiit iirc ratlicr cx~iciisivc 1111. ~~IISLIIIIC~ ~ i r ~ i c l u c t s Iiccause they arc iiiorc gciicral tliaii tlic 'I'V qiplicatiiiii do- iiiniii dcsircs. 1:iiIl ~iriigramrniihlc svluti~iiis tliiit arc iiinrc aliplicatiiiii specific like tlic Clieiilis processor iiiodule 141 a i d the so-ciillcd Videti Signal I'roccssiir [SI , ciiiitain video- spccilic qieratiiins bul lack prnccssitig power. H(it l i s i i l ~ i - t io i ls iirc b;isctl oii tine-grain l i i i ictioiial ~ i i i i t s i n ii rcc1111- liguriililc c ~ i i i i ~ ~ ~ i i ~ i i c ~ i t i ~ i ~ i iictwork. Since the l'iiiictiniid re- sniirccs iirc rclativcly limited, tl ic niiioiiiit 01 parallclisin i s riot high eiiiiiigli. A merit trciiil i \ ki mi: Very I llillg liistriic- tion W(iIil (Vl.lW) processors to I x m t processing Iinwcr 161 171. Ilowcvcr, t l i is i s still not sullicicnt tv covcr all process- i i ig rcqiiirciiiciils. Siinply iiicrcnsiiig tlic i i i i r i~ I~cr nl' l i i i ic- tiniiiil iiiiits woolcl iiicrciisc tlic pxsc i i t schctluliiig pri iblc~i i f i r siicli so l i i t io i is .

Mnrc piiwcrfiil iiiililciiicnt;itions arc pnssiiblc by ailalitiiig tlic systciii iiiorc closely io tlic hrgct qiplicati i ir clniiiiiiii. As ii result, the priiccssiiig pcrlorniaiicc ai111 cl'ficicncy ciiii be iiicrciiscd by innking iisc i,l'thc liinction-lcvcl piirallclisiii t h t is exprcsscd in n rlcsigncrs s i g d Ilow-graph (SIG) o l tlic applicatioii. Tlic grain of the i ipcr~i l io i is within ii I l i i ic- ticiiiiil unit i s iiicrciiscd ti) llic lcvcl o lc i i i i ip lc lc 1'V iunctiiiiis l i ke ii sctilcr, iioisc rcductioii, shiiqiiicss cnlii~iiceinciit. licld riitc co~ivcIsioii, etc.

g~"ulll1 Incmllry.

11. !'I<(II3l,IiM S'I'AI'I~,MliNT

A. ~ ~ J n l ~ ~ l ~ l ~ l l ~ ~ l l l r i ~ <<ffiJtl

I .ct 11s iinw i i i i a l yx why the ;iibreirictitiiiiicd IprocessnI' sys- tems were iin1 rlircctly adriptcd iii cur~ciil 1'V systems, iil-

tlinugh i t scciiis Llinl the rcquircd appIii:atiniis could Ire i n - pleineiitcd on ii Sully priigl-;irnin;ible iiiultimcdi;i priiccsso~. Oiic o l the qiicstioiis to he iiiiswercd is IIVW i i i i ic l i cniii- puling power sliiiiild tie rciilizcd. 1 ' ~ this rcwvii, ii i i i i t i i -

bcr iil typical 'I'V si~iiel~proccssii ig l ~ i i i c t i i i i i s wcrc stiitl- ied nod ciiiiipoLatioriii1 i i i i t l iiicinnry rcquircniciils were eva- iiiitcd. 'lkhlc I shows the intrinsic priiccssirig piiwcr o l scvcr~iI ' I V fiiiictions at staiihrd d c l i i i i l i ~ ~ i ~ (SI)) rcsnlutioii. With respect to iiperatiim counting, i iddit i~i i l i , iiiiilliplicii- tiniis, siiiiiplc read niid writcs, ctc., are coiisidcrcd iis single o~icriitiiiiis. 'L'lic fiiiirtli cnlui i i i i s tllC allloUllt ( I t IIlcIll- i iry or caclic rcipirccl. Hcrc it i iiiiecl tliat i i ~ f i i r i ~ i a l i n ~ ~ caii hc retrieved nr stored Iiy single rc;icls iiiid writcs (which

is iipliiiiislic). 1 h r a iiiiriiial T V applic~itiiiii will1 51)- 100 I Is coii\~crsinii, l'ictiirc~iii-l'ictorc (l'il'), iinise I e d ~ c t i ~ i i and aspcct-l-;itiii ciiiivcrsinii, tlic iiiiioiiiit iif ~[icl.iitiiiiis iilready exceeds 0 (;Ol'S (Gigs vperatiiiiis liar sccoi~d). This is lint rcadily i i i i~ileii icntcd on ii gciicrnl-purpose I i rwxs i i r cost- cl'lcctively. Conscqucntly. tlic iisc n l a~)~il ici it ioii-specil ic co~ i~occss~ i r s ti1 iiicrciise piiriillclisiii iiiid Iociil cniiiputiog power, iii ciimhiniition with gciicral-purpiisc 1iriicc~siirg i s iiiiiivoidalilc to keep systcm c(ntr IOW.

/ I . Melilot) c,lll.sirler-ntio,ls

1lcc;iosc niciiiirry liinctioris dctcriniiic a signiliciiiit piirt of the syslci i i costs, tlic iicccss iiiiil the iiiiiniiii~ iil i i ic i i ioq arc iinportant i ipti i i i imtii i i i ~iiii'ainclcrs in tlic systcin dcsigii. I'or the wile rci isoi i , tlic ilistributcrl iiicinnry as shmvii iii I'ig. I i s not acccptalilc iii future prndiic~s. 'lh crciitc ii ciist- elliciciit si i lut i i in, 011c uiiilorin Iinckgrniiiiil ~i ici i i i i ry i s iiscd, hiisetl nii ii standard ~i l '~-t l ic-sl iclSI~AM. 'lb nviiiil cxccssivc ineinriry iiccess, the ~ir~iccssors 11iivc s i i i i i l l Inciil c i ic l~c inen- urics, C.C. a vcrtica-filter coprocessor inay Iiavc l i i ic iiiciiiii- irics Irrcnlly. Tlic caching V S liclil iiicniorics 111- e.g. MC- 100 I l z cniiversinn, dynainic i i i i ihe rcductiim or coiiili filtering i s tnii cxpciisivc for oil-chili i i i t cg r~ i l i~ i i~ 11s cirrhedded nicmi,ry (I field = 4 Ml~il). Iiistc;id, tlicsc I'iinctiuns i i rc iiilcgratcd in l l ic largc unified mciiiirry i n Llie heckgrn~inrl. l l~ iwcvcr , ttic consuiiiptiiiii ol' iilrc;idy sciircc hiiiidwiiltli to th is ineinnry s l m i l d be i i in i i i tnrcd l~iirtlicriiiurc, the iircliilecturc s l i i i u l i l cii;ihlc direct c~i i i i i i ioi i ic i i t i~i i i bclwccn p r ~ e e ~ s v r ~ witl i i iui necessary ~ C C C S S to external ~iici i iory. ' l ' l i i s epliroach l i i i i i l s Llic inciii(iry iicccss to liiiicliniiiil usage. Gciicsiilly, i in ly sys- lciiis that iisc liirgc packets tn cniiiiiiiiiiiciitc data (e.g. video licld/liamcs) i i i r Iiigli-tlirnuglipiit coiiiiiiiiiiiciiti0111 hctwccii iiiultililc prncessvrs, need cxtrciiic ius t c~i i i i i i ioi i icnt i~i i i inet- wiirks with ii liigli bandwidth and ii co~isidcriililc n i ~ ~ n i i ~ ~ t 0 1 additioiiiil mcmiiry. 17xtiinples 01 si icl i systcins iirc givcii iii I 61 alld [ X I .

Altliiiogli the iiieniory r c i ~ ~ i i r c ~ ~ i c i i t s 01 tlic dcsircd systcm iirc optimized, typical inciirory sizes wnuld still lic 4-16 M-

Authorized licensed use limited to: Eindhoven University of Technology. Downloaded on July 07,2010 at 10:28:41 UTC from IEEE Xplore. Restrictions apply.

Byte with a bandwidth in the order OS 3.50 MBytels. For si icl i Ineiiiory devices. the synchronous DRAM (SDKAM) represent tlie inainstreatii inarket, offcring the I~iwcst price per Byte. However, for high-bandwidth applications, the newcr but morc cxpcnsivc ~1ouhlc-datii-r;ite DRAM (DDR DRAM) and Direct Rarnhus (Direct RDRAM) devices arc an alternative. However, for an eflicient data transfcr, t h e devices should be acccsscd fur bursts iifdata instead of sin- gle data words, due to the overhead i n tlie addressing. Sincc inoltirnedia processing stores mostly ncighbouring data (if the actual sainple i n memory (spatial locality), inoltimcdia inemory cacliing can be used to reducc tlic address over- head. When performing signal processing o n vidcu sari-

plcs, it is dcsired to liave ii high peek bandwidtli with low latency. Therefore, the low-density static RAM (SRAM) is used at the on-chip cache level.

111. ANALYSIS OF ' I v APPLICAI'IONS

In this section, tlie'I'V applications are further analyzed with respect to the timing requirements i n order to motivate thc design of thc new chip architecture. Prior to discussing the dctiiils, the cnnccpts of graphs and tasks are introduced, as they will be applied extensively i n thc system analysis.

A task is a basic TV function, a set of closely coupled cal- culations, wliicli are executed on a stream of data samples (a vidco signal). The set of TV functions is well defined for the high-end TV system. Tasks arc weoklj' prograniiiiable, that is, they caii be sct to differclit video quality levels with different resource requirements, hut tlic nature of tlie func- tion itself catniot be modified. For example, a filter can bc programmcd to a siiialler leiigtli resulting in a lower quality, thcrchy reducing local memory usage. 'I'liis meniory caii be re-used for other filter tasks.

A gruph consists of a set of TV functions (and thus tasks) which sliould be executed in ptirallcl. It represents an ap- plication iii tlie 1'V domain, c.g. a main video signal with a PIP iii the upper corner.

The first step in coming to a new architecture is to clas- sify the different tasks into tliree groups: control, sop r a d tinif and hurd rml-t ime tasks. Examples of control tasks

(CT) are user interactions sucli as chaiinel switching and user menu generation, a modem function, etc. Ciintrol tasks are related to the interaction of the systeiii with the eiivi- r~inmcnt. Tlic sccond group contains soft real-time tasks (Xl). Kcy aspect is tliat sucli tasks have a "soft" rcal-time ile;idline, meming tliat a deadlinc ciin be missed and is not critical for the system perfomiaiicc. An example is tlie de- coding oca Teletcxt page.

Hard rcal-time tasks (HRT) sliould not miss their dead- line, since this would dircctly lead to system inalS~inction- ing. Typical Irard rcal-time tasks are most video process- ing operations, sincc output pictures have to be displaycd at predetermined time intervals (field rate). I?xainples arc hor- izontal arid vertical sample-rate conversion, sharpness CII- I~anccmcnt, noise reduction, etc.

Hard and soft real-tiine tasks c m be represented in il

sigriiil Row-graph (SFG) as sliown in Fig. 2, which por- trays two input video streams (HRT) and onc 'I'eletext page (SRT). The results are displaycd in three difbrenl video windows on tlie screen. 'I 'k user can manipulate several fcatures (sile, position, s11;ipc) of tlie smaller windows.

'Ilic hard and soft rcal-time tasks tliat require a liirgc con- putati~iiial effort are inapped onto iiiiirc dedicated proces- sor hiirdwiirc to increase the aiiiount (if parallelism. This approach was adopted to cotne to ii cost-effective system design, enabling a large programmable set of applications with a single chip. In [9], it was discusscd that a fully pro- graminable architectitre for ii broad range of TV applica- tions is far t ~ i cxpcnsive for consitiner tclevision. Although tlie aforementioned applicatioa-specific (co)processors are weakly progriiinmable and are able to process various iasks, tlic type of task is fixed, e.g. a sampling-rate converter can pcrCorm aspcct-ratio conversion, scaling, geometric correc- tions or data compression, but caiinnt do noise reduction or sharpness enhancement. The system rcquircs an architec- ture wliicli provides multi-tasking capabilities, becausc scv- era1 diffcrcnt tasks have to be proccsscd simultaneously by the same coprocessor.

Let us now summarize the results of the analysis of the set of TV functions and the computiitioiial estimations from

Authorized licensed use limited to: Eindhoven University of Technology. Downloaded on July 07,2010 at 10:28:41 UTC from IEEE Xplore. Restrictions apply.

tlie previoiis sectinti and f ind iniplicatiniis f i x tlie architec- tural clcsigii. . 'l'lic required computing power i s in the ordcriif 10 (;OI'S

(sec 'lhhlc 1). . If all the coniinuiiiciition hctwccn tasks in ii typical flow- graph are accumulated, several GHytclsec rcquircd hand- width i s obtained. Each liill-color video cliannel requires 32 MUytc/s bandwidth (at 50-Hz Mil rate, sec T;ihle 1). . The ciinstraitits for hard real-time streams iiiiist be met under a11 cnoditinns. This requires wnrst-case design o l tlic architecture in r this part cif the system. . Otic of the disadvantages of current rigid 'I'V arcliitccturcs

'ing functions canniit he iised in di l - Serent orders (sec Scctiuii I). Such ii feature would ciiablc an nptim;il T V quality setting iiiidcr a l l conditions witli- in tlic limits of available rcsnurccs. 'Ilie new architecture sliould oSlcr i l l i s Ilexibilily, i.e. tlic 1e;iture to cnnstruct diffcrcnt flow-grapli~ with Llic sallie set of TV fi~nctioiis. Analysis liss sliown tliat aboiil 100 ilii'lerent graphs exis t for cuiiiinon high-end ' IV applicalioiis. . Tlic resulting quality n l the pc rh r i i ~cd video tasks shonld he sc;ilahlc to adapt tu the amount of available Iiardware rcsnurccs (cniiiputing pnwcr and biindwidth). 'The qiiality depends tliereSnre on the clinscii flow-graph. 1:nr CXHIII- ple, if only onc video strciini i s being proccsscd, ill1 video functions ciin he switched lo tlie highest quality level, re- quiring e.g. tlic hcst liltcrs nr t i i us t advanced processing (ci~mputati i~nii l ly intci c for each task). If two video strcaiiis hevc to h e prii .cd, the siiinc video task can ap- pear twice in the How-graph, s o that ciicl i 'IV functiiin i s executed ill nicdiiii i i quality. 'l ' l i is might hc acccptalilc, e.g. ii Pil' npplication docs not require tlie sairic video quality as an application with one niiii i i video pliiiic. More precisely, the vidcu streams are processed at the clock rate (64 MFIz), hut can liiivc f110r different pixcl nitcs: 16, 32, 48 and 64 MIIz. Uiicli cuprnccssnr ciin process a number nf streams in pardlcl, given hy tlic followiiig expression:

wlicre n l h represeiits tlic nuniber of streams with 16-MIIz pixel rate, 1132 stands for the number of strciiins with 32- MI I z pixcl rate, and so on. . The syslcti i slrould be capable 01 processing a i i i inirnuiii of two rea-lime input video streams, bccausc our aim i s a molti-windnw 'IV system. 'The clocks i i l the streams iirc riot rcliitcd (iisynchronous). a s l l icy originale l m i i differ- eiit sources.

111) to this pnint, signal prnccssinng and its rcquircincnts were discussed prininl-ily. I.et us now conccntl-atc on tlic iiiciriory wliicli i s associated with the signal prnccssing. As w, 'is , c f ' . ~scusscd in subscctiun II.B, t l ic objective i s to iisc unc

single unified iricinory. Suiiirnari/.irig tlic i i iosl ii i i l inrtanl rcasoiis for this:

. memory r e - u s ~ : i f fiinctioiis arc switched nfS, mcmnry can

. CO.SU: off-the-sliclf large-scale nicniory cniiipiincnts give

An cxperiinental system iiiip1cmciit;itiiiii i s based (in ii

memory bank OS 96-MHz Synchronous DRAMS. Fur 32- bit words, tliis gives a tlicorctic;il biindwidth up to 384 MHytels, wliicli i s lower tliaii tlic nii ixi i i i i i i i i reqiiircnieiits ~iientioncd in Scctinn 11. Ihercfnre we make a disliiictioti iii the ciiii i i i i i i i i icatinii requirements, similarly to the clistinc- tion in tasks. Some cornniunicatii~n channels dii n o t nccd iicccss to external irieniory and caii he kept on chip. For otli- er fiinctions, especially tlinse requiring complete field iiiciii- nrics, access to cxIcrnii1 inetiiory i s inhcrcnt to tlic functinn. sucli a s temporal nnisc reductioii. Anuthcr cxainplc i s t l ic synclirriii izatii~ii of two independent video struiiiis, rcquir- ing a vidcii frame memory (the i i i ix task in k;ig. 2). Since meniory biindwidth i s a scarce paranielcr, cxtcrniil iiiciiio-

'cs arc indicated in the signal Ilow-gr;ipli: l l iey are niodclcd i is extra inputs nnd outpiits to the grapli. Portlier- mire, tlic gr;iplis are cut at points wlicrc lramc syncliriinizn- t imi takes place. This leads to the graplis tis shown in Pig. 3.

Ily culting tlic Ilow-graph into pieces, we liavc implicitly introduced aii ;vlditiiinal cunccpt. ciillcd suhgniplis. A sub- Xmph i s a se1 1ilcliiscly ciinncctcd tasks. wlierc al l comiiiii- nication between tlic tasks belonging ti) the siiine subgraph takes place via on-clrip cotiirirunicetii)ri. Inputs aiid outputs oi subgraplis are either regular inputl i iutpi~t streiinis or iic- C C S S ~ S tn cxteriiiil iiieinory. Coininunication between twu different subgraplis always takes place via cxteriial memo- ry.

he assigned to otlicr tasks;

Inw system cnsts.

Mem ~ p e i n

Video Input 1 c Mem

.. . Video

Input 2: .i + Mem

Mem , f l e m

IV. AIKHITECTLJI~E I)ESION

The application analysis of Yeclilrn III iind thc cnmputatinn- a1 requirements of Section 11 are starting points for desigo- ing the ;ircliitectiire. We Ii;ivc discusscd sets 1 1 1 tmks being cnmbiocd into graphs, and the a~sociated Incmiiry cnniinu- nication. Consequently, tlic architecture cniitiiiiis principal iiiodiiles and tlicir cumpnncnts, m d a s SLICII, it has ii hicr-

Authorized licensed use limited to: Eindhoven University of Technology. Downloaded on July 07,2010 at 10:28:41 UTC from IEEE Xplore. Restrictions apply.

archiciil striictiire. 1.et LIS therefore start at thc top c i i the hierarchy.

A 7 h p l e v e l orrhitecturr

Since tlie prnperties iifciintrnl and the signal processing arc totally different, wc wi l l dclinc three different subsystem: a cotitrol subsystem, a signal-processing subsystem iind a iiieniory subsysteiri. 'The control suhsystetii i s rcspotisiblc for the cxccutiiin u t a l l control-oriented tasks nrid some of tlie soft real-time tasks. The signal-processing subsystc~n executes a l l hard real-time lasks and soiiie of tlre soSt r e a - tiiiic tasks. Tlic memory subsystem performs the memory finetioms for hotli the systcin cnntriil and signal processing.

The nature of tlic processing is tiitally differetit. l ' lre cum triil subsystc~ii i s based on ntv. I t l i i is to deal with uti-

prcdictalilc cvciits coming Sriiiir tlic c ~ ~ v i r o i i ~ i i c ~ ~ t . Tlierc- lorc, i t sends rundorn requests tu the mctnory subsysteiii. Tlic signal-proccssing subsystem i s cliaracteriml by tlic pc- riodicity OF tlic operatioits. It sends periodic reqocsts tu tlic mcniory sulisystcm. The top-lcvcl i i r ~ l r i t ~ ~ t i ~ r e i s visual iml in Fig. 4.

l'lic princip;il dcsign step at the tiip-level coneelitrates on tlic arbitrxtiun bctwccn the two typcs nf rcqucsts. l 'he dif- ticulty i s that difScrent aspccts liave to b e optiinizcd. lo thc ciisc (if riiiidniii requests, tlie larcnc,' s I i o ~ i l d be iiiinimized. 1 1 1 tlie case of periodic requests, tlie fhrotigh~irrt cif the sys- tem lias to tic guaranteed, because a11 hard sca-time tasks arc mapped onto t l ie signal-proccssing subsystem wliicli i s pcriiiilic. Thus, in the ciisc of periodic requests we want t i i dcsign for the tlirougliput that sliould be ohtaincd. The latency i s less relevant, sincc i t can bc hidden i i suflicient huffcring i s I irnvidcd

Summarizing, air nrhitration scl iemc i s nccdcd that en- siircs the throughput for tlic periodic rcqucsts, while iiiini- mizing the latency for the raiidoin requests. For our system, we used an xbitratinn sclicnic friim I IO], that providcs a

low latency and high-priority ~ C C C S S tor the inicrocontrollcr and cnsiircs the required lianclwirlth fn r the vidco signal pro- cessors.

11. Si~rl"l-pr~cc.ssing .Srrh.v.y.st~rn

The tasks fo be peIforincd for a complete SIG i n ii TV set ea11 he classilied glolially info two groups: tasks rcqiiir- ing a similiir typc of signal processing iind tasks which iirc both different lrom li incti~inal and signal-pr~iccssiiig pnint of view. lixainplcs cif the first group arc Iiorizaiital com- pression for I'iP and aspect-ratio coiivcrsiun. Thc required typc [if signal processing i s sample-rate conversion in both cases. Examples of the second group iirc nnise rcductiirn and sliarpncss ciiliancemeiit wliicli arc essentially diSSeren1 i n i i ios t algorithmic aspects. Using tlic previous d i ~ t i n c t i ~ t i , a numbcr US choices ea11 be inadc for thc new architcctiirc. . After Suiictioiial decoiiipiisitinri a task i s mapped on a scp-

ariitc proccssor if tlicy iirc different and i s miqqiccl on tlic siiiiie processor otlicrwisc. . Sincc tlic tasks arc kniiwo in a[lvancc at dcsign tiiiic, each pruccssiir can 1~ optitnizccl fnr ii particii1;ir job. . 'lb i i l low the [napping ofditTcrcnt graphs, a reconfigoralilc coniinunication network i s added. This nicatis tlial signal- processing Siiiictioiis ciin be iiscd in various orders iind c m be progratir~iicd.

The rcsult of the arcliitcctural choices i s that ii hctcrogc- tieous multiprocessor i~rcli i tccti~re i s olitaincd, which i s cii- pablc o f true task-lcvcl parallelisin. The arcliitecture i s shown in Fig. 5 . ' l ' l i is w i l l now be discunscd i n i i iure dc- tail.

First, i t ciiii be iinticcd that all ~~roccssors are surround- cil by I'IFO buflcrs at their inputs and outpiits. 'l'lie aim i s to separate signal processing from corninotiication, so that

Authorized licensed use limited to: Eindhoven University of Technology. Downloaded on July 07,2010 at 10:28:41 UTC from IEEE Xplore. Restrictions apply.

tlic activity of tlic priiccssors is de-coupled f rm i tlic coni- niiiniciitiini nctwiirk. Second, the choice lor l :I lU huntrs i s inotiviiteil. They have tiecn i i i l i q~ (ed liceituse the iirrows iii the SI'(;s (called cd,qes) represent atrciiming (video) sig- i ia ls, i.e. iiiciisuiiil i lc pliysiciil qiiiiiitiiics siimplctl at discrete points i n liinc iind Iiiiiary c~icodcd. ' l l ic only identiliceliiiii ofthcdillerent sai i iples in the slreiitn i s given Iiy tlic incler cif tlic siiinplcs. S;implcs iirc pii iducctl i i d y oiicc iiiid ciinniit he lost IIII the ciimmuiiic;itiiin chaiiiicls. Fin s t r e a m with Lhc aloreiiientioned fxtures, scpiiI;itiiiii 01 cotiiiiiiiiiicatiiiti alid piiiccssiiig ciin be well l ie performed with 1 W O s . 'lliird, tlic architcctiirc i s hiisctl on ii d y i i m i c diitii- l low i d c l ( l l l ) ly) ,

iiistciitl (if syiichronoos diitii l l i iw. 'l'lic rcascnls arc as Col- lows.

. Arialyzing the tasks in an SIC, sonic vidco signals cover tlic fit11 display, wli i lc oilicss c(ivcr ii only ii

p r t lit' tlic scrccii, sucli i i s ii Pil'. In iiiiiiiy stiigcs, thr: aiiioiin( of coiiilni[iiiiotis i s priipiwtiiiniil t i i Ihc anioiiiit or samples proccsscd. 'l'licrciiirc, ii surpliis 01 cwiI iL i ic pow- er can be uscd 111 ixikc ii priiccssor iivailablc for iiltcriiii- tive tiisks. Such task switching i s iiiorc dillicult with skilic

. 'She held Iilnnking, i.e. the iuii-iictivc vcriical pait ( 1 1 a viilco sigiiiil, ciiii Iic uscd fur soft rciil-iiiiic tasks. Tlic

I struini i s in tlic blanking or riot i s a riiri-iitiic dccisioii, sincc input strciiiiis iiio asyncliroiiiius will1 rcspcc~ to each <rtlicr. l ' l i i s re-iisc i n so11 rciil-time

111oc1e1. . A ~t rc i in - l i i i~cd computing tiio(1cl i s i i l ici i simpler to i n - plcmcnt iii ciimparisoii with ii skitic synclironiios systciii, bccnosc i t cim pcrliirtn riiti-time scllcduling b;iseil oii I i ici i l aviti1;ibility of data. . l'hc concept i s lictter scalable wit11 rcspecl to tlic iidditioii or rciiiiwal iif processors. Ii i s oxpcctcrl tliiit priicessiirs wi l l hcciiiiic increasingly dytianiic. A variablc-lciigtli tlccodcr, wliicli 1inidii

data-deticndciit iiiiiiilier iif tokens.

syllchloninls systcms.

y . k . ' . <is s i s easily itnplcmcntccl with ii dynaiitic strciiin based

I.et LIS IKIW discuss brictly how i l ie sigtids wil l fliiw i n tlic 1)1)1' Inoilcl. ' T h c ciiiiiiiiiiiiicalirrii Iichiiviour i s siicli tliat tlic sigtiiil chiiii i icls ciin l ie blockcd l ~ ~ c i i l l y (bliickiiig scmiiiilic), according t i i tlic 1)I)l' 11iode1. A Iiical processor is stiippcd wlieii the initput l'I1:Os are tiill or i l ie input 1'11'0s iirc c i i i p ty, l l i c decisioii i s pc i f i i r~ i icd 1oc;illy iii tlie shells surrorttid- ing tl ic Iiiiiccssors. I i irheriniirc. the iiut1iitt lW0s inust he Iiliickcd when the iiiput F11:Os iirc full. This i s doiie l iy iinplcmcntiiig ii Ipv nctwiirk. So far, resource sliariiig, tliiit is, thc rc-use ot'a pr~~ccssor fin iinotlicr lask was only incnlioncd but i i i i t yet Laken into account. For tlie miidcl, a onc-Lo-onc ciirrespiindc~icc hctwccii tasks in tlic graph ancl processors i n the arcliitecturc l i i is heeii as:suincd. In the liil- Iiiwiiig suliscctiiin, rcsonrcc shiiriiig wi l l he disciisscd.

Ilesourcc sharing i s clesirablc in ordcr Lo ohlain ii Ilex-

ible iiiitl ti sciilablc ;ircliitcctorc. Tlircc ditfcrcnt l o r i ns lit' rcsoLirce shiiriiig itre clisliiiguishcd. First, dit'krcnt tasks ciin l ie executed on tlic siiinc processor. 'L'his leads 10 the proces- sor miidel 11isi:usscd hcliiw. Second, Fll~Os cim lie shiircd, hut this i s i iv~ i i~ lcd, bccaiisc ileiidlr~ch miiy i i cc~ i r in l l ic i i i d

ti~iriiccsscir systcni. 'I'liircl, resoiivxx in the ciiiiiiiiiiiiiciitiuii network can Iic sliiired. T h i s iiptiiiii is iilso discusscd below.

B. I Processin i i iodcl

l'lic siiii ie liinction (e.g. lioiiziintiil wqi l ing-ratc conver- sion, HS) can ' i l i l icx riiiisc tliaii oncc ill iin WG. I;or cost rcasmis, tliesc convcrsioIis wi l l all he executed on the satlie MSI<C Iit'ocessor. 1 Iowcvcr, tlicrc arc soinc liinitatiiins that wi l l be cxpliiilicd iiow.

' iir Inoilcl portrayed by Fig. 6 l i i i s twii iiiput piirts l(p) mil two iiutpilt O(p) ports. ' l l i i s c d d 111- cxaiii- plc l ie a teiiipimil mise reduction coprocessor that ~ i c c d s iiii

input vidco sign;il and tlic filtcrcd rcsult fii im tlic tctiipii- ral I(iqi :is inputs iind h a s ii vidco iiutliiit a i d ii hackward cliiiiiticl t i i n i c w n y :is initputs. I(acli i n l i i~ t ani1 ciotpui port i s crrnticctcd to frrul- I ~ I l ~ O s , 1;ibclcd 1-4. When ii iask i s started, one 0 1 iiioic FIFOs iiic iissigi~cil t i i this [;ish at uic11 input and outpit pori. V i e iiuiribcr c i f assigned lT1'0s ti1 ii

task is dcpcnilcnt ori the required ;imiiunt ol Ir;tndwidth and i s cxpliiiiicd iii iitorc dctiiil iii the i iext suhscclioii. As long ;is tlic task i s x t ivc , tlic ;issigiicd F11'Os iirc only process- ing diitii ixiircspiindiiig tii this (ask. Associiiteil witl i each task, a Iiiciil iiicinory state cxists, callcd ,siut(> s p c e . 'IIILIS, iiccortling t i i tlic depicted ~nixlcl, twcr dil'lerent state spices ciiti cmcrpc, wlierc ciicli state sliiicc i s assigncd 10 a particit- liir function. As :I rcsult, this inode1 ciiii start up to iiiiixiiiial twii tasks i n piirallcl (the cxperiiiieiitiil chip discussed in [ O j ciin pcrfiirtn oiic to l l i icc piirallcl [asks, ilcpendcnt on the coprocessor).

11.2 (:otiiiiiutiicatiiiii nc tw i ik

l l i c task 1 1 1 tlic ciiminoiiicatioii network is tii provide sufji- ciciit handwidth tin tlic &ita strciitiis hctwccn tllc (nitput atid the input I'Il~Os. Viir cvciy coti~iect i~i i i i n the SI;O, a piitli i s

Authorized licensed use limited to: Eindhoven University of Technology. Downloaded on July 07,2010 at 10:28:41 UTC from IEEE Xplore. Restrictions apply.

created i n the programmable conncction nctwiirk in I'ig. 5 fiir transport nf tlic data The path is created using circuit switching.

The network is a so-cnlleil TST nctwork with space ancl time switches. The reason to build such a network is to guar- ;uitcc nnn-hliicking cnn~~cctiiii~s hetween output FIFOs l it ' prnccssi~rs and input kWOs of succeeding proccssors, with a predetermined ;nnount of bandwidtli for each conncctinn. Fig. 7 shows a spacc switcli at the left-hand side making a conncctiiin bctwccn ' 1 2 and b n and iinother conncctinn he- tween uq and 172. At the right-hand side of Pig. I, the same connections me shuwn using ii time (T-)switch. At the in- put, ii inultiplcxcr is added and a dennilliplcxcr at tlic out- put. lising lour tiiiic slots, the total bandwidth cqoals tlic swii of bandwidths of tlic individunl cliiinnels. This Icxls to the TST network a s shown in Fig. R . A s an exaniplc, twii piths thrnugli the network iirc inilicatcd. 1i;lch pnlli is pro- gratnmed by putting the ~ n r r c c t codc for the three different parts of the network. A tahlo with four different liinc slots is used. I'or example, the x connectio~i is prngrammed in phase two via the correct code at the piisitions labeled with an ,r. 'l'his way, bandwidth is alliicatcd cnrscsponding to otic phase. The coiinectioti labeled y liiis twice tlie batid- width of oiic cha11nc1, because it is prograinnied during two phascs. ?'he programming flexibility eiisiires that sufficient bandwidth can be priividcd for particular stages or signals i n tlic SIT;.

R.3 Intcractiiin bctwccn controller and processni

Also interactive ciimmunicatio~i is necessary to adapt tlic TV lunctinn t i i the ciintcnt nf tlic video signal, e.g. the noisc-rcductiiiii proccssnr nieiisures ii significant noise in- crease, comniiinicatcs th is prnperty to the CPU, which i s suhsequcntly progr;tmmcd by the CPU to perrorln ~nnr: noise reduction For this type of event-driven connnuiiic:i- tinn, one interrupt (irq) line to the CPU is uscd. The signal- processing subsysteiii contains ii large range of interrupts tri

cover ;ill possible cvenls. H;ah interrupt is represented hy one bit in a meniory- napped interrupt register. The CPU receives iin intcrrupt i l one of tlicsc hits indicates iiii "irq" by iiiciins of a hardware OK function. Subscqocntly, the CPU inay read the interrupt rcgistcrs via (lie intcrrupt scr- vice routine i n nrdcr to identify which cvcnt has occurred. The "irq" is then acknowlrdgcd Ry resetting the c~irrcspond- ing irq-bit in the nicmory. 'l'liis general cominunicatioo pro- tiicol is visualizcd in l'ig. 9. 'l'lle interrupt mechanisni is al- so w e d when priicessing gencratcs errn~~cnus behaviour or when signals arc not i n accordance with specified video lor- mats. Since all interrupts generated by tlic sigiial~processiiig suhsystcm operate a1 vidcc ficld rate and can bc disabled individually, the interrupt rate to the ~iiicrnciintrollcr sys- tciii c m he kept sufficicntly low. 'l'llis nptimimtiiin is ad- vantageous, becausc interrupts incrense tlic c~immunication overhead (ciintcxt switching). Adaptation of thc proccssing is suflicicnt at field trequency, sincc it I)rovidcs sufficient rcspons-time of tlic system. This a l s i i givcs ample time fiir the Illicrncnntroller system to pcrfnrm other tasks, such a s e.g. modem coiiiiiii~iiic~iti~i~i.

V. MI-;MOI<Y

A. Subsysteni corninunication

As was mcntioiied i n section 11, a cost-effective solution re- quires m e onifrirni background incinory that is hiised on i~

standard off-the-shelf RAM. 'Thc RAM can bc accessed by both tlic CPU and the coproccssiirs. Aii attractive property of the systcin architecture is that tlic control-subsystcni arid tlic signal-processing subsystem ciin be used its st;iod-alnile devices, because thcy provide there nwn control for opcr2i- tion. Holh systcms are Savorably ciinihined intn ii piiwerful

Authorized licensed use limited to: Eindhoven University of Technology. Downloaded on July 07,2010 at 10:28:41 UTC from IEEE Xplore. Restrictions apply.

memory map

3 read i iq bi ls l o

4. coninwiiicate w i l l i lile corrcspollrling COplnCeSSO,

Fig. 9. Coiiiiiiiiiiic;iiiuiI ~ ~ O I O C O I liii iiitciiiclivL: Ipruccnsing

full-featured systcni by ii ciinnectiiin vi i i tlie meinory l ius, a s depicted in Fig. IO. ‘ lhis enables cominunicatioii between tlie CPU and the signal-processing suhsystctn by means of memory-iiiopped 1/0. One unitied memory inap i s applied in which all paranleter and operation-iiiode registers of tlic copriicessiirs are located. ‘ l l ic CPU caii access all the pa- raiiictcr registers t i i progrm tlic copriiccssor settings and derive information from the processors tuicl/iir the signals heiiig prnccssetl.

I ll I

Fig IO. Sland-;ilonc sys~ciiis :SKI a ciiinliirictl syslcni with oailicil ~tiamory.

Sincc i n tlic architecture proposal a single nicniory i s uscd, bandwidth limitations arc likely tii appcar. Somc mea- sures have liccn taken to relax this so-ciillcd vno-Neumanii hottlcncck [ I 11. As aii ex;iinple, tlic frequciicy 111‘ the m e n - ory hus is twicc the rrcqutncy of the CPU, whereas the bus widtlr is equal to the word widtli of tlic CX’U (32 bits). liur- thcrmore, coprocessors c m be priigr~iniiiied sucli tliiit iiietii-

ory access cai i l ie reduced ;it the cost nf sonic quality. l l i e s e aspcc~s arc discussed i n the fnllowing subsection.

B. Memory ~ ~ ! . S O U I % C . S v(!r.?u.v qurility

New appliciitions Sur tlic system arc mainly limited by the computing powcI of the rcsoiirccs, tlic handwitltli nS tlie cnproccssiirs, and tlic bandwidth of oil-chip ant1 extcriii i l memory. Ciinscquently, special attentioil shoiild he paid to Ilic architectural dcsigii u l the coiiiiiiuiiicalioii nctwiirk. Ah

was iiiciitioiicd i n Scctioii I I . H , commiiiiicatioo h r vidco signals vii i the niemiiry shniild be limitcd to large iiiemory fiinctions only, e.g. tlie tick1 tlelays for tciiiporal noise re- duction and field-rate cn~iversioii. Additiiinal key issues i n tlic design of tlic communic;ition wcliitecturc h a t 1i;ivc a di- rect i inptct nn tlic iiicniory resnurccs are listed hcliiw.

1,ocal FIFO buffers, wliicl i arc rclativcly small, Iiicateil at the input and mitput stage n S all cnpro CIIRIIIC date cxchaiigc bctwccn cii1)roccssors witliiiut access to the extcroal b;ickgriiu~id memory. This signiticiintly rc- duces the 1i;iiidwidtli requircmcnls of tlic cxteriiiil iiicmiiry.

Mixing or ,juggling cif video streams i s ilcsigncd sucli that i t requires a inii i i i i i i ini iiiiiouiit cif iiiciiiory handwirltli. In tlic btickglound inciiiory, two lield hlocks (for iiitcrliiced video) arc allociitcd to construct tlic coinposed video trtiine. ‘lhcsc iiiemnry blocks are l i l lcd with the ndd and cvcii liclds of the picture, except fin the pixcl positiiins wlicrc aiiotlicr nvcrla[~piag viden window wi l l be [insitioiied. This unused mcinory iircii in tlic memnry i s iiscd by a sccoiid viden path to write tlic ovcrlapping video window. Cnnscquently. the total iiiiiouiit cif diiki stnrcd i s cquil to the data 0 1 OIIC cniii- pletc picture instcad OS tlic suiii of iiidividual pictures. Sinii- larly, the total rcquircd haiidwidtli appriixiniatcly ct~uals the handwidtli for writing uiic complete stream instead lit’ tlic s u n of the bardwicltlis nfthc individu;il vidcii slrcuns.

Bandwidth equnlization i s used tu dccrcac tlic rcquircd peak bandwidth. 0qu:iliZiition in meinory rcadlwritc tasks is nhtained by spreading the diila transfer o i iiii active video line over the tiiiic cif a ciiinplctc video line including tlic I i~irizoiital blanking. Common videii signals ciintain I S Yo

hiirizoiital line hlankiiig, Icadiiig to ii siiiiilar rcduction of tlie peak haiidwiiltti.

An additional system aspcct besides tlic aforementiooed key issucs, i s tlic desire tn realize .scolnble processing power, in order to enable tlic exchange nf bmdwidth with picIurc quality. Somc examples are given below.

Vertical scaling at high quality rcquircs that tlie intcrlaccd input video i s converted to pIiigrcssive video prior to sain- pli i ig rate convcrsioii. As a result, a iiccess;iry I~iickground mcinory tunctioii i s to write intcrlaccd video ficlds and to read the odd and cvcii vidco lines prn~ressivcly. Subsc- qucntly, de-interlacing sliould he applied prior to the scal- ing. Aitcr scaling the progressive frames, they arc intcr-

Authorized licensed use limited to: Eindhoven University of Technology. Downloaded on July 07,2010 at 10:28:41 UTC from IEEE Xplore. Restrictions apply.

laced again, by removing the ndd or cvcn lines belore tlie final result i s co~ii~nuniciited to the rest of the system. 'I'hc tiieriiory requircinciits for such ii vertical scaling task iirc significant. At l e a f nne l i e l d meinory i s necessary and ii

memory bandwidth of three tiiiics the ba~idwidth oC one s- ingle intcrlnccd video struii i i i s requireil tn write intcrlaced and to read progressive video. To save some memory rc- sniirccs, i t i s possiblc to sciile the interlaced fields, thus to perSorin intr+tield processing instead of inter-field Iiriicess- ing. This would ~Iccrciisc the picture quality at the gain oS Incmnry resourccs.

Graphics generation i n the Iiackgriiund Iiiemory rcqitires ii field or li'ainc mcmory, clcpcnding oil tlic desired qiiali- ~ y . Wlieri a tield incmiiry is iiscd :ind tlie content is read lor hnth odd and even fields, the a lnnun t of Incinnry i s reduccd at the cost of'soiiil: loss in verticiil rcsolulioii. Since synthct- iciilly geiierated grnphics inay contain high spatial Srcqucn- cics, tlie use ofa frame meinory may result in annoying line Ilickcr when the tncmory i s ilisplaycd in intcrliiccd Iiiodc. Mnreovcr, i t i s iilsn expeiisivc iii teriiis o f mcmory size. For 50-60 Hz iiiterlaccd videii, it field ineiniiry i s iiiosl attractive, whereas fur field rates higher tliaii 70 Hz, ii fraiiie inemiiry could be used fiir high-rcsolutiiin gr;iphics.

VI. CONCLUSIONS & AI ' I ' I , ICAII~INS

l h i s piper Ixcsctits ii video proccssiirig architecture with high parallclism, based or a p r u g ~ i ~ i ~ ~ i i i h l c array 01 copro- ccssnrs a n d ii c~~in~nui i ica l ion nctwiirk. l'oiiction-speci~c Iii~rdwarc coprocessors pcrfiirm pixel-Iiaseil processing alid

are programm;lhle at functional iind system IcvcI. A switch Iiia(rix enables p a ~ d l c l proccssitig 0 1 tasks and ciiahles high coiiiiiiiiiiiciitioti bmdwidtl i. Sincc tlic matrix i s pro- greiiiiiiablc. tlic order of signnl Iprocessing tasks can he moditicd (prngriminiahlc signal flow-graph) to realize new 'SV qiplications. lndiviilual coprocessor proccssiing pow- er and pel-lnrniancc ciin be aihptcd iind oplimizcil for each TV applic;ition tlirw-graph. Tlic ;Irchitectorc iises an cxter- tiiil hackgrinincl nieniory which conibines dl large mcmory fuiictions. l 'h is iiietnory i s shared with a inicriiciintroller. Hach coprocessor h a s sotiic IOCBI video memory, i n cor- respondcncc with tl ic Ibnctinil carried out, to avnid severe oicmiiry-handwidllli prnlilems.

A novelty nf thc iircliitccturc is the LISC of ilynaiiiic data- flow for videii processing, i n coinbination with applicnlion- specific proccssors. In the cnncept, priicessors pcrform ill-

depeiident priicessiiig and data streams :ire excliaiigcd via sinal l local buffers. 'Sliesc buffers can start and stop tlie processors ilepci~dcnt on wlictlicr a huffer i s full or empty while additional eOiiiiiiiiniCiitii)ii rules avoid deadlock prob- lems. The advniitugc of this syslcni i s that video processing liinctions ciiii hc I~~i~idlcil iis tasks cif any length. Thus video framcs of various s i x s can he applied in the system witliiiut

Inw-level reprogrntniiiing nf the hardwiire. 1Ca task is ternii- natcd, proccssiir Iiardwarc tiecoiiics awilablc for iiltcriialivc tasks.

'She proposed systein ciin Iic applied i n a large set of a[)- pliciltion domains cont;iioirig componciit video processing. For Iclc\,ision, the architecturc lms been implcnicntcd in two expcrimeiital ch ip , ii tnicriiciiiitrollcr iiiid a coproccs- s o 1 iirray. 'I'hc iiiicrocontrollcr pcrli irms set ciintriil, 'lblctcxt mid graphics, iuodein m d low-spcctl data ciirnmoiiic;itiiin. The coprocessor array conlaiiis functional units IbI hnri- mnial ;tiid vertical scaling, sliarpiicss etihmicemeiit, iniitiiin- adaptive temporal noise reduction, gq i l i i cs hleiiding, mix- ing o l video slrciims atid 100-HZ upconversion. Due to the Hexihility 0 1 the coprocessors iiiid tlic ~irogratnmitbility of the signal tlow-graph, tlic qiplicatiiin riuigc not nnly covers tl ic intrinsic integrated fiinctions [ 121. Init i i l s i i i tspec l -do cnnvcrsioiis, Pip, a niosiiic scrceiis, pixel-based 2D graphics lor PCl ike applications, etc. In fact, the chip set enables all tlicse Itatores in a molti-window TV cnviroiinient with high quality. i\s an cxample, Fig. I I shows the rcsultiiig displayed picture 01 the tlic 1'V qiplicatiiiii v i s u a l i d by the sigiial iliiw-graph in l'ig. 2.

Fig. I I . Fxiu~plc o l n ti i i i l l i-wiri i lw iipl~liciilimi.

The niodulai~ity [if the architecture concept iillows l l iat oihcr advaiiccd fiiiicfioiis can he added easily in the form of new applicatiiin-specific processiirs, e.g. MPEG dccoil- ing and 3.1) graphics rcndcring. I h r consumer systems, i t i s pnssible that tlie arcliitcctiire can he used for a numhcr ol' ycars to ciinie, due to i t s cost-eficicncy. As tcchnolri- gy procccds ;mil tlic deinand for flcxiliility increases (e.g. MPEG-4, MP'HG-7), the low-cniiiplcxity and irregular pni- ccssing fiiiictions will be impleincntcd i n siittwarc on atl- vaiiced iiiicroci)iitriillers or iiicdin processors, s o that corrc- spoiiding coprocessors wil l disappear. Coiiibiiiing the prii- posed platfbrin with s ~ i c h generic proccssors i s attractive, hecausc a hardw;irc-soflwarc trade-off ciin bc iiiiiilc. At the si i t i ie time, i t i s cxpcctcd that new ciinil)iit;itiiinaIly expen- sivc fiinctions, requiring specific Iiarilwarc support, can he introduced .

Authorized licensed use limited to: Eindhoven University of Technology. Downloaded on July 07,2010 at 10:28:41 UTC from IEEE Xplore. Restrictions apply.

I'ctcr H.N. de With grtidu-

i i ig froin the Uiiivcrsity o f ate11 in clcctrii:al en&' ~IIIccr-

Ibchoology iii liiiidhovcii. I n 1002, l ie rcccivcil the 1'11.1). degree Sso1n the u- niversity O S 'l'cchiiology I k l f t , ' l l i c Netherlaiids, fix l i is work on vidco hit-rate reduct i~ t i for recording all- plications. Ilc joined I'hi- lips I<csenrch Laboratories

JcS van Mccrhcrgeii recci- veil tlic 1ilcctric;il Hiigiiiecr- i i ig 1)cgIcc and the l'li.l), ilcgrcc l m n the Kalliolic- kc lliiiversitcil I .ciivcii,

Ilclgium. in 1975 and 1980, rcspcclivcly. 11, 1979 lie ,joiiicd tlic Pliililis I<csetircll L;ibol.atiirics iii I3iidli(ivcii, tlic Ncllierlancls. He was ciil:iigcd iii tlic design o S MOS tlixital circuits, 1 k -

With i s a sciiiw rncii i l icr 11i tlic IElIli, iicmbcr 01' tllr pro- g l i ~ i ~ i cuinmittcc o i the IliIiU CI 1s awl I m r d iiiclnber OS the Ueiiclux wiirkiiig p u p for. IiiSoriiiatiiin i ~ t d Coinmunica- tioii theory.

Ugberl laspcrs was born iii Ni,imcgeii, 'l'hc Ncthcrl;inds. iii 1969. lie grailiiatcd iii

clcctricel cngineel-i iig Srom ll ic Venlii I'iilytcclinical Col- lege in 1992and subsc~~uciit- ly, lie .ioincd Philips Re-

iiiid gciiclal-liiiriiiisc digital sigiial proccssors. I n IO85 hc started worhiiig iiii apl i l ical im driven higli-level syiithcsis. lnitiiilly this work was target- ctl towiirtls iiudiii and tclccniii USI' appliciitims. Later ~lic ;ipplic;ition doiiiaiii sliiRcd towmls high-tliroiigliput q ip l i - c;itiiins (I'tiidcii) wtiicli received the ticst pqicr i i w d ill

Llic I997 IIIM'l'C conference. H i s ciiirciit iiitcrcsts are in system level design mctli~ids, hctcriigeiieuus iriuItilrIoccss~r systcins iind rccoiitigulal~lc iircliitccLures. .IcS van Mccrber- gcn i s ii Philips Kcsearch b " c w i i i id Associ;iic IXditiii. of "llesigii Autiiiii;ilion for Einbeildcil Systems". He i s also ii

part-time profemir at tlic liiiirlliiivcii Uiiivcrsity o l l i tch i io l -

ogy.

lioveii. For one yair, Iic wiirkcd oii video coiii~iIcs- sioii fin digil;il HIYI'V i~e-

cording. In 1903, lie coil- tinucd h i s educatioii at the liindlioven University of

'I'cchniiliigy, Ssom which lie grailu;itcd iii clccti ical eiiginccr- iiig io 1990. In l l i e saiiic year, hc .ioiricd Philips Research L:iboratorics Uiiidlioveii, wlicrc he lieconic ti iiiciiilier OS the TV Systems Departmciit. Iic is currently involved iii the re- suirch of progr;lmiiiiiIiIc iircliitccturi:s and their i i i ipleti ic~i- talioii for 1 V aiid coinpulcr systeins.

Authorized licensed use limited to: Eindhoven University of Technology. Downloaded on July 07,2010 at 10:28:41 UTC from IEEE Xplore. Restrictions apply.

1240

Adwin H. Timnicr was horn i n Apeldoorn, Thc Ncther- lands, i n 1966. Hc rcccived tlic electrical cngineeriiig a n d 1'Ii.D. dcgrccs froni the I'indhoven Univcrsity of Technology, The Nether- lands, in 1990 and 1996, respcctivcly. 111 1995, hc joined tlic Philips Research Laboratnrics, Eindhovcn. In 1998, hc was a visit- ing IC architcct with tlic Philips

Scmicoiiductors WSG husiness line, Mountain View, CA. His cusrcnts intercsts are in IC architectures for high- performance signal processing applications. system-level dcsign dcsigii methods, hardwarclsoftwarc cmk~igi i , and compilation tecliniqucs for cmbedded DSPs.

Merino Strik graduated i n electrical engineering from tlic University of Technol- ogy in Eindhoven in 1992. I n 1994 he completed tlic post graduation designcrs coussc of the Univzrsity of Tectinology in I?indhovcn. Thcsc two ycars he worked on high level synthcsis and code geiicration for appli- cation domain specific pro- cessors. He joincd Philips Rcscarch in 1995 in thedig- "

ita1 VLSI group. Now he is involved i n IC design tnelhod- tilogy with a focus 011 digital systcm on ii chip sealizatiiin.

Authorized licensed use limited to: Eindhoven University of Technology. Downloaded on July 07,2010 at 10:28:41 UTC from IEEE Xplore. Restrictions apply.