hardware accelerated video streaming with v4l2 on i.mx6q

55
Hardware accelerated video streaming with V4L2 on i.MX6Q 06/24/2015 Gabriel Huau Embedded software engineer

Upload: adeneo-embedded

Post on 17-Aug-2015

443 views

Category:

Technology


42 download

TRANSCRIPT

Page 1: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware accelerated videostreaming with V4L2

on i.MX6Q

06/24/2015

Gabriel Huau

Embedded software engineer

Page 2: Hardware accelerated video streaming with v4l2 on i.MX6Q

SESSION OVERVIEW

1. Introduction

2. Simple V4L2 application

3. V4L2 application using CSC

4. V4L2 application using OpenGL

5. V4L2 application using OpenGL and vendor specific features

6. V4L2 application using OpenGL and vendor specific features

(zero copy)

7. Conclusion

Page 3: Hardware accelerated video streaming with v4l2 on i.MX6Q

ABOUT THE PRESENTER

• Embedded Software Engineer at Adeneo Embedded

(Bellevue, WA)

I Linux / Android

♦ BSP Adaptation

♦ Driver Development

♦ System Integration

I Former U-Boot maintainer of the Mini2440

I Contributing to a Home Automation project: Domotab

Page 4: Hardware accelerated video streaming with v4l2 on i.MX6Q

Introduction

Page 5: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Introduction

WHAT'S V4L2?

• Video For Linux version 2

• An API to access video ICs devices (/dev/videoX)

• Standard System Call: write()/read()

• A framework to regroup common code for every driver

• Handle audio, brightness, contrast, ...

5

Page 6: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Introduction

SET YOUR GOALS

• Resolution: HD, full HD, VGA, ...

• Frame rate to achieve: does it matter?

• Image processing: rotation, scaling, post processing

effects, ...

• Hardware availability:

I CPU performances

I GPU

I Image Processing IP (IPU, DISPC, ...)

6

Page 7: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Introduction

WHY ARE WE HERE?

• Covering the development of a full V4L2 application

• Explaining advantages and drawbacks

• Showing real customer solutions

7

Page 8: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Introduction

HARDWARE SELECTION

• Freescale i.MX6Q

SabreLite

• Popular platform

• Geared towards

multimedia

8

Page 9: Hardware accelerated video streaming with v4l2 on i.MX6Q

Simple V4L2 application

Page 10: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Simple V4L2 application

MEMORY MANAGEMENT

Different way to handle video capture buffers:

• V4L2_MMAP: memory mapping => allocated by the kernel

• V4L2_USERPTR: user memory => allocated the user

application

Only MMAP will be covered in this presentation.

Support

Drivers don’t necessarily support both methods

10

Page 11: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Simple V4L2 application

ARCHITECTURE

11

Page 12: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Simple V4L2 application

ARCHITECTURE

Query capabilities:

1 ioctl(fd, VIDIOC_QUERYCAP, &cap);2

3 if (!(cap.capabilities & V4L2_CAP_VIDEO_CAPTURE))4 exit(EXIT_FAILURE);5

6 if (!(cap.capabilities & V4L2_CAP_STREAMING))7 exit(EXIT_FAILURE);

Support

All camera does not necessarily support both Streaming/Video

Capture

12

Page 13: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Simple V4L2 application

ARCHITECTURE

Reset cropping area:

1 ioctl(fd, VIDIOC_CROPCAP, &cropcap);2

3 crop.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;4 crop.c = cropcap.defrect;5 ioctl(fd, VIDIOC_S_CROP, &crop);

The area to capture/view needs to be defined

13

Page 14: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Simple V4L2 application

ARCHITECTURE

Set video format:

1 fmt.fmt.pix.width = WIDTH;2 fmt.fmt.pix.height = HEIGHT;3 fmt.fmt.pix.pixelformat = V4L2_PIX_FMT_NV12;4 fmt.fmt.pix.field = V4L2_FIELD_ANY;5 ioctl(fd, VIDIOC_S_FMT, &fmt);

Support

VIDIOC_ENUM_FRAMESIZES should be used to enumerate

supported resolution

14

Page 15: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Simple V4L2 application

ARCHITECTURE

Request buffers:

1 req.count = 4;2 req.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;3 req.memory = V4L2_MEMORY_MMAP;4 ioctl(v4l2_fd, VIDIOC_REQBUFS, &req);

4 capture buffers need to be allocated to store video frame from

the camera

15

Page 16: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Simple V4L2 application

ARCHITECTURE

Query buffers:

1 for (n_buffers = 0; n_buffers < req.count; n_buffers++) {2 buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;3 buf.memory = V4L2_MEMORY_MMAP;4 buf.index = n_buffers;5

6 ioctl(v4l2_fd, VIDIOC_QUERYBUF, &buf);7 buffers[n_buffers].length = buf.length;8 buffers[n_buffers].start = mmap(NULL, buf.length,9 PROT_READ | PROT_WRITE, MAP_SHARED,10 v4l2_fd, buf.m.offset);11 }

Memory information such as size/adresses need to be retrieved

and stored in the User Application

16

Page 17: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Simple V4L2 application

ARCHITECTURE

Start capturing frames:

1 for (i = 0; i < n_buffers; ++i) {2 buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;3 buf.memory = V4L2_MEMORY_MMAP;4 buf.index = i;5

6 ioctl(v4l2_fd, VIDIOC_QBUF, &buf);7 }8

9 type = V4L2_BUF_TYPE_VIDEO_CAPTURE;10 ioctl(v4l2_fd, VIDIOC_STREAMON, &type);

Capture buffers need to be queued to be filled by the V4L2

driver

17

Page 18: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Simple V4L2 application

ARCHITECTURE

Rendering loop:

1 /* Dequeue */2 ioctl(v4l2_fd, VIDIOC_DQBUF, &buf);3

4 /* Conversion from NV12 to RGB */5 frame = convert_nv12_to_rgb(buffers[buf.index].start);6 display(frame);7

8 /* Queue buffer for next frame */9 ioctl(v4l2_fd, VIDIOC_QBUF, &buf);

Framebuffer pixel format is RGB

YUV clarification

YUV: different formats exist (Planar/Semi Planar); in this exam-

ple we are using NV12.

18

Page 19: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Simple V4L2 application

DEMONSTRATION

19

Page 20: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Simple V4L2 application

CONCLUSION

Drawbacks:

• Poor performances

• Cannot do any geometric transformation

(rotation/scaling/translation)

Advantages:

• Easy to implement

20

Page 21: Hardware accelerated video streaming with v4l2 on i.MX6Q

V4L2 application using CSC

Page 22: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 application using CSC

ARCHITECTURE

What we had:

• Conversion is really slow by the

CPU

• memcpy can be overkill

22

Page 23: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 application using CSC

ARCHITECTURE

What could be done:

• Do the conversion by the

hardware (IPU)

23

Page 24: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 application using CSC

CONCLUSION

Drawbacks:

• Limited to small resolution generally (1024x1024 on this

SoC)

Advantages:

• Hardware support for fast conversion compared to CPU

• Supports geometric transformations (Rotation/Scaling)

24

Page 25: Hardware accelerated video streaming with v4l2 on i.MX6Q

V4L2 application using OpenGL

Page 26: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

ARCHITECTURE

What we had:

• IPU is resolution limited (needs

splitting into different tasks)

26

Page 27: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

ARCHITECTURE

What we are going to do:

• Using GPU with OpenGL

• Do the conversion on the GPU

via shaders

27

Page 28: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

TEXTURES

Generate two textures for planar Y and UV:

1 glGenTextures (2, textures);

Texture pixel format

No ’standard’ support in OpenGL for YUV texture

Image data: YYYYYY UVUVUVUVUV...

28

Page 29: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

RENDERING LOOP

1 /* Dequeue */2 in = ...3

4 glActiveTexture(GL_TEXTURE0);5 /* Y planar */6 glBindTexture(GL_TEXTURE_2D, textures[0]);7 glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE,8 width, height, 0, GL_LUMINANCE,9 GL_UNSIGNED_BYTE, in);10

11 ...12

13 /* Queue */14 ...

• GL_LUMINANCE has a size of 8 bits => Y

• Bind it to our first texture

29

Page 30: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

RENDERING LOOP

1 /* Dequeue */2 in = ...3

4 glActiveTexture(GL_TEXTURE1);5 /* UV planar */6 in += (width*height);7 glBindTexture(GL_TEXTURE_2D, textures[1]);8 glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE_ALPHA,9 width/2, height/2, 0, GL_LUMINANCE_ALPHA,10 GL_UNSIGNED_BYTE, in);11

12 /* Queue */13 ...

• GL_LUMINANCE_ALPHA has a size of 16 bits => UV

• Bind it to our second texture

• Shaders don’t need anything else

30

Page 31: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

ARCHITECTURE

Vertex shader:

1 void main(void) {2 opos = texpos;3 gl_Position = vec4(position, 1.0);4 }

opos is the texture position => send it to the

Fragment Shader for color conversion.

31

Page 32: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

ARCHITECTURE

Fragment shader:

1 void main(void) {2 yuv.x=texture2D(Ytex, opos).r;3 yuv.yz=texture2D(UVtex, opos).ra;4 yuv += offset;5 r = dot(yuv, rcoeff);6 g = dot(yuv, gcoeff);7 b = dot(yuv, bcoeff);8 gl_FragColor=vec4(r,g,b,1);9 }

• texture2D(Ytex, opos).r =>

GL_LUMINANCE

• texture2D(Ytex, opos).ra =>

GL_LUMINANCE_ALPHA

• Do the conversion using the GPU

32

Page 33: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

ARCHITECTURE

• Copy V4L2 buffer to OpenGL textures

• Vertex Shader: draw a quad => the view

• Fragment Shader: convert and fill fill the quad/triangles =>

the video

• Display the frame

33

Page 34: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

DEMONSTRATION

34

Page 35: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

CONCLUSION

Drawbacks:

• Need some OpenGL skills

Advantages:

• Good performance

• Handle geometric transformation

(rotation/scaling/translation)

• CPU load is low

• Generic solution compared to the IPU

35

Page 36: Hardware accelerated video streaming with v4l2 on i.MX6Q

V4L2 application using OpenGL and

vendor specific features

Page 37: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

ARCHITECTURE

What we had:

37

Page 38: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

ARCHITECTURE

What we are going to do:

• Handle YUV OpenGL

Texture directly => no

shader conversion

anymore

38

Page 39: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

RENDERING LOOP

1 /* Get a GPU pointer */2 glTexDirectVIV (GL_TEXTURE_2D, V4L2_WIDTH, V4L2_HEIGHT,

GL_VIV_NV12, &pTexel);3

4 /* Dequeue */5 ...6

7 glBindTexture(GL_TEXTURE_2D, textures[0]);8 memmove(pTexel, buffers[buf.index].start, width * height * 3/2);9 glTexDirectInvalidateVIV(GL_TEXTURE_2D);10

11 /* Queue */12 ...

39

Page 40: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

ARCHITECTURE

Vertex shader:

1 void main(void) {2 opos = texpos;3 gl_Position = vec4(position, 1.0);4 }

40

Page 41: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

ARCHITECTURE

Fragment shader:

1 void main(void) {2 yuv=texture2D(YUVtex, opos);3 gl_FragColor=vec4(yuv,1);4 }

41

Page 42: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

ARCHITECTURE

• Copy V4L2 buffer to OpenGL textures

• Vertex Shader: draw a quad => the view

• Fragment Shader: fill the quad => the video

• Display the frame

42

Page 43: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

DEMONSTRATION

43

Page 44: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

CONCLUSION

Drawbacks:

• Need some skills in OpenGL and GPU API

Advantages:

• Good performance

• Handle geometric transformation

(rotation/scaling/translation)

• Application is less complex => no conversion code needed

anymore

44

Page 45: Hardware accelerated video streaming with v4l2 on i.MX6Q

V4L2 application using OpenGL and

vendor specific features (zero copy)

Page 46: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

ARCHITECTURE

What we had:

46

Page 47: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

ARCHITECTURE

What we are going to do:

• Remove the

memcpy/memmove by

using DMA

47

Page 48: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

RENDERING LOOP

1 /* Dequeue */2 ...3

4 glBindTexture (GL_TEXTURE_2D, textures[0]);5 /* Physical and Virtual addresses */6 glTexDirectVIVMap(GL_TEXTURE_2D, width, height, GL_VIV_NV12, &

buffers[buf.index].start, &(buffers[buf.index].offset));7 glTexDirectInvalidateVIV(GL_TEXTURE_2D);8

9 /* Queue */10 ...

Physical address can be retrieve during the query buffer step.

48

Page 49: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

ARCHITECTURE

• Copy V4L2 buffer to OpenGL textures by using the DMA

• Vertex Shader: draw a quad => the view

• Fragment Shader: fill the quad => the video

• Display the frame

49

Page 50: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

DEMONSTRATION

50

Page 51: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 V4L2 OpenGL

CONCLUSION

Drawbacks:

• Need some OpenGL skills and GPU API

Advantages:

• No more memory copy (memcpy/memmove)

• Good performance

• Handle geometric transformation

(rotation/scaling/translation)

• Application is less complex => no conversion code needed

anymore

51

Page 52: Hardware accelerated video streaming with v4l2 on i.MX6Q

Conclusion

Page 53: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Conclusion

CONCLUSION

• Highly hardware dependent

• Other solutions: NEON, parallel computing, distributed

systems, ...

53

Page 54: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Conclusion

QUESTIONS?

54

Page 55: Hardware accelerated video streaming with v4l2 on i.MX6Q

Hardware V4l2 Conclusion

REFERENCES

• Fourcc: http://www.fourcc.org/

• Kernel Documentation: https://www.kernel.org/

v4l2-framework.txt

• Freescale GPU VDK

55