exploring internet of thing s processing on currie with...

19
Exploring Internet of Things Processing on Currie with Quark Sam Siewert Abstract The Intel Curie System-on-Chip module includes an Intel x86 Quark and a Synopsis ARC sensor co- processor, but it is so small it is wearable and designed to be built into clothing, gear and the emerging world of the Internet of Things (IoT). Key features that make this tiny but mighty SoC ideal for makers, inventors, and researchers working on IoT sensor networks and wearables are not just the core features, but the built-in 6-axis accelerometer and gyroscope as well as co-processors for digital signal processing (DSP). Likewise, the Curie Quark is supported by Zephyr, a Linux Foundation project to support micro kernel applications running on IoT SoCs, cross developed on Linux, Windows or Mac OS-X. Based on experience, the Curie Quark can be used in Arduino mode for a quick start and rapid prototyping by makers, but can also be re-flashed with Zephyr for the most efficient and unconstrained use of sensors, DSP co-processors and the very capable core processor. This paper summarizes my experience getting to know Curie Quark, first in Arduino mode, then with Zephyr, and finally by coding and porting favorite pattern match, DSP, and core processor benchmarks used for audio and image processing. While many compelling DSP and pattern analysis applications run on large scale SoCs such as FPGA and GP-GPU chips running up to 10 or more Watts, what’s fascinating about Curie Quark is imagining what can be done with the most power per gram using the least energy possible to build the world’s smallest intelligent and deeply embedded applications. Introduction to Curie / Quark and Software Development Along with my student research team at Embry Riddle Aeronautical University and University of Colorado my research is focused on how to do more with less power in sensor networks and with smart cameras. A primary interest is how much can be done per Watt, per gram and networked with low- energy wireless sensor networks like BLE (Bluetooth Low Energy) found on the Curie Quark. Scale up is interesting, but only when combined with scale down so that the Internet of Things can connect to mobiles and the Cloud. Curie Quark has been designed for wearable applications to track sports enthusiasts with sensor networks built into gear and clothing. As an education and research team, we can imagine interesting uses for small mobile robots, closer to insect size with Coin-cell power and BLE connections to uplink data in real-time. While the idea of tiny sensor networks has been around since introduction of Berkeley Motes, most of these smart sensors have very low compute capability, even on a per gram and per Watt basis and are often not easy to program. So, when presented the opportunity to test Curie Quark, I wanted to focus on significant pattern matching algorithms like clustering for segmentation and 2D transforms used for image processing (imagine a fly-eye on our Curie Quark with an 80x60 photometer) as well as 1D transforms for inertial sensors and audio. For DSP, likewise, Curie is put to the test with 2D convolutions (used to sharpen images), the Fast Fourier Transform (for audio analysis and sensor frequency domain analysis), and finite impulse response filters. Finally, for core functions, I chose to port custom error correction code memory encoding and a simple prime number hunter to test the mettle of the Curie Quark.

Upload: others

Post on 28-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

Exploring Internet of Things Processing on Currie with Quark Sam Siewert

Abstract

The Intel Curie System-on-Chip module includes an Intel x86 Quark and a Synopsis ARC sensor co-processor, but it is so small it is wearable and designed to be built into clothing, gear and the emerging world of the Internet of Things (IoT). Key features that make this tiny but mighty SoC ideal for makers, inventors, and researchers working on IoT sensor networks and wearables are not just the core features, but the built-in 6-axis accelerometer and gyroscope as well as co-processors for digital signal processing (DSP). Likewise, the Curie Quark is supported by Zephyr, a Linux Foundation project to support micro kernel applications running on IoT SoCs, cross developed on Linux, Windows or Mac OS-X. Based on experience, the Curie Quark can be used in Arduino mode for a quick start and rapid prototyping by makers, but can also be re-flashed with Zephyr for the most efficient and unconstrained use of sensors, DSP co-processors and the very capable core processor. This paper summarizes my experience getting to know Curie Quark, first in Arduino mode, then with Zephyr, and finally by coding and porting favorite pattern match, DSP, and core processor benchmarks used for audio and image processing. While many compelling DSP and pattern analysis applications run on large scale SoCs such as FPGA and GP-GPU chips running up to 10 or more Watts, what’s fascinating about Curie Quark is imagining what can be done with the most power per gram using the least energy possible to build the world’s smallest intelligent and deeply embedded applications.

Introduction to Curie / Quark and Software Development

Along with my student research team at Embry Riddle Aeronautical University and University of Colorado my research is focused on how to do more with less power in sensor networks and with smart cameras. A primary interest is how much can be done per Watt, per gram and networked with low-energy wireless sensor networks like BLE (Bluetooth Low Energy) found on the Curie Quark. Scale up is interesting, but only when combined with scale down so that the Internet of Things can connect to mobiles and the Cloud. Curie Quark has been designed for wearable applications to track sports enthusiasts with sensor networks built into gear and clothing. As an education and research team, we can imagine interesting uses for small mobile robots, closer to insect size with Coin-cell power and BLE connections to uplink data in real-time. While the idea of tiny sensor networks has been around since introduction of Berkeley Motes, most of these smart sensors have very low compute capability, even on a per gram and per Watt basis and are often not easy to program. So, when presented the opportunity to test Curie Quark, I wanted to focus on significant pattern matching algorithms like clustering for segmentation and 2D transforms used for image processing (imagine a fly-eye on our Curie Quark with an 80x60 photometer) as well as 1D transforms for inertial sensors and audio. For DSP, likewise, Curie is put to the test with 2D convolutions (used to sharpen images), the Fast Fourier Transform (for audio analysis and sensor frequency domain analysis), and finite impulse response filters. Finally, for core functions, I chose to port custom error correction code memory encoding and a simple prime number hunter to test the mettle of the Curie Quark.

Page 2: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

Based on past experience with software-defined smart cameras, I felt that it’s important that all tests be published and available for replication, so the reader can learn about Curie Quark and draw their own conclusions about the performance. To test Curie Quark well for wearable and deeply embedded uses, the focus is not just on how fast each benchmark runs, but also measuring efficiency. So, for example metrics such as Samples / second / Watt processed should be considered as well as capabilities of the Curie Quark compared to other options with per gram metrics. My research is in fact focused on efficient intelligent image processing and use of sensors in the Arctic running on fuel cells – demanding in terms of Watts, grams and capability [1]. Rather than summarizing those metrics here, the programs to build, flash, run and observe are provided.

Fast Start with Arduino

For the impatient, the Curie Quark can be brought up and interactively programmed just like any Arduino board. The Arduino 101 form factor of the Curie Quark allows for USB interaction using Arduino tools. Just add the Arduino 101 with Boards Manager as shown in Figure 1. I have never used Arduino before, but found this to be a great way to get going fast to test functionality and explore before digging into Zephyr to write custom C and C++ for a bare metal micro kernel application.

Figure 1: Using Curie Quark as an Arduino 101 for Learning Curve Rapid Ascent

The simple serial hello counter application I wrote is shown in Figure 2. What’s nice about the Arduino mode is that the user can learn, but make use of Curie Quark right away, graduating to more custom Zephyr applications after mastering Arduino features. Capes for Arduino from AdaFruit and others can be tested in Arduino mode as well before writing custom applications and device interfaces in Zephyr. I have been writing embedded C/C++ for OS-less,

Page 3: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

micro kernel, RTOS and embedded Linux platforms for more than 25 years, so I decided to jump into Zephyr sooner rather than later, bringing over my favorite real-time pattern, DSP, and core processor tests. Wind River and Intel also have future plans to support Rocket on the Curie Quark, so makers, researchers and inventors will have a range of platform development choices including Arduino, Zephyr micro kernel applications, and RTOS applications. This means getting a Curie Quark application or prototype out the door will be rapid and tailored to a range of uses without lots of porting headaches. For educators in embedded systems programs and high schools even, this means the Curie Quark is a sound investment for early learning and advanced, something than not all IoT SoCs offer.

Figure 2: Using Curie Quark as an Arduino 101 for Learning Curve Rapid Ascent

Along with build and download of these basic Arduino mode sketches, it is also possible to compile most C programs for testing in Arduino mode using the avr-gcc and avrdude upload tool using the Arduino IDE [15, 16, 17, 18]. Either way, you’ll want a Linux box (I recommend Ubuntu 14.04 LTS or equivalent) or a Linux VM (Virtual Machine) installation on a Type-2 Hypervisor such as Oracle’s Virtual Box. The native Linux development environment will be the fastest and easiest to use with USB interface to the Arduino, but Oracle Virtual Box is a reasonable alternative so you can run Linux and develop on Linux using your existing Windows or Mac OS-X machine – I have a number of Linux development getting started How-To’s that you can find on my Embry Riddle website [2]. The Virtual Box boot of Ubuntu Linux is shown in Figure 3. To get beyond sketch mode with Arduino, it is possible to cross-compile code for the Arduino 101. This is what I found posted on April 21 by Martino Facchin for Curie C/C++ compilation in Arduino mode [20]. It might be an interesting test case since makers may want to work in Arduino mode as well

Page 4: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

as with cross development using Zephyr and it’s useful to go beyond Arduino sketches to build more custom code. However, I decided to just move on to build Zephyr nanokernels for the ARC and microkernels for the Quark. I found Zephyr well documented, easy to learn, and the nanokernel and microkernel resources along with Linux cross development tools make native application development fast and easy, yet powerful.

Figure 3: Using Curie Quark as an Arduino 101 on a Linux VM

The main issue with trying to use Virtual Box Linux with Zephyr is proper configuration of USB ports so that you can use the serial over USB and the JTAG, so rather than going deep on how to do this through the virtual driver interfaces, I’ll assume that the reader (like me) would prefer to just work with a native Linux installation for cross development. I installed Linux Ubuntu 14.04 LTS (Long Term Support) using a USB drive on my existing Lenovo T450 Windows 7 laptop (splitting the drive in half to dual boot) as described on numerous Ubuntu help blogs [22]. The more I played with Arduino 101 in Arduino sketch mode, the more I wanted to jump into Zephyr. So, let’s move on to the custom benchmark tests (please do download, try and send me your comments on what you find). My education and research group at Embry Riddle, CU Boulder and U. of Alaska believes in replication and that the only good benchmark is one you can run, modify, and build into an application. All Arduino 101 quick start tests were done on the same Lenovo T450 dual boot Windows 7 and Ubuntu, Intel Core i5 laptop. A dual boot laptop is a very nice thing to have for your Curie / Quark development. Or, perhaps you can just repurpose and old laptop that’s collecting dust or from E-bay.

Embedded Software Cross Development

The Zephyr SDK and tools simplify native code development and download. Test and use of the benchmarks described here and used to drive the Curie Quark with Zephyr kernels were tested first on Linux (Ubuntu 14.04 LTS) as Linux applications and then built into a Zephyr microkernel application and downloaded to Curie Quark using FlySwatter2. FlySwatter2 is a JTAG (Joint Test Application Group) USB tool that can flash any image to the Curie SoC (System-on-Chip) for Quark or the ARC processor. All work presented here used a simple Linux laptop running Ubuntu 14.04 LTS with an Intel Core i5 quad-core CPU (that also boots Windows). For those unfamiliar with Ubuntu Linux, try it out on Windows using Virtual Box using my How-to’s found on my Embry Riddle web site [2]. While Zephyr code can be

Page 5: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

developed on Windows, Linux or Mac OS-X, it is a Linux Foundation collaboration project, so Linux is probably the best development choice. The Linux Foundation has specific documentation for how to flash the Zephyr kernel on Arduino 101 (Curie/Quark) [19] that I followed. To flash the image, I used the Flyswatter2 JTAG (Joint Test Applications Group) in-circuit debugger [21]. To learn how to build and flash example Zephyr applications follow these steps:

1. Once you have installed Ubuntu Linux 14.04 LTS on your laptop, do a “sudo apt-get update” to make sure it’s all current. Follow the https://www.zephyrproject.org/doc/getting_started/installation_linux.html instructions to install the Zephyr SDK on your Linux system.

2. Clone the Zephyr source code on Linux with “git clone https://gerrit.zephyrproject.org/r/zephyr zephyr-project” [23].

3. Set up the Flyswatter2 [21] and make sure it works by verifying expected plug-in detection with dmesg [19] which is detailed well by the Linux Foundation for the Arduino 101.

4. Add a .zephyrrc file in your Linux home directory and in it, place “export ZEPHYR_SDK_INSTALL_DIR=/opt/zephyr-sdk”, assuming you installed the SDK in the default location like I did. Also, add “export ZEPHYR_GCC_VARIANT=zephyr” to the same file on the next line.

5. In the cloned zephyr-project, source the setup with “source zephyr-env.sh”, which should in turn source your home directory .zephyrrc and should define the SDK installation directory for you.

6. Change directory with “cd $ZEPHYR_BASE”.

7. Make a backup of the current ROM on the Arduino 101 (Curie / Quark) with “./boards/arduino_101/support/arduino_101_backup.sh”, which takes a while, but is also a good test of your Flyswatter2 setup in Linux and should create files A101_OS.bin and A101_BOOT.bin (see Figure YY).

8. Now, in $ZEPHYR_BASE/boards/arduino_101/support, do “./arduino_101_load.sh rom”, which flashes the Zephyr boot loader (see Figure ZZ).

9. Just to make sure all of your environment is defined correctly, do “source zephyr_env.sh” again and then “printenv | grep ZEPHYR” and you should see all three key environment variables set (see Figure AAA).

10. Now, in $ZEPHYR_BASE/samples/hello_world/nanokernel do “make pristine && make BOARD=arduino_101_sss ARCH=arc” to make sure you have a binary image built. It should build without error (if you have issues, see Zephyr Getting Started for more help).

11. Now, flash the ARC kernel with “make BOARD=arduino_101_sss flash” in the same directory where you did the build above and when you see “Done flashing”, you have now flashed the demonstration hello_world image to the ARC processor.

Page 6: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

12. Now for the x86 kernel by first building with “make pristine && make BOARD=arduino_101 ARCH=x86”.

13. After a successful build, then “make BOARD=arduino_101 flash”.

14. Finally, using a USB to TTL serial cable, verify the x86 and ARC kernel output for the hello_world example. The Linux Foundation Zephyr page recommends a couple of cables including this one from Adafruit that I used.

15. If for some reason you don’t yet have an Arduino 101, a Flyswatter, or the serial cables to verify output, you can build applications for QEMU (the Quick Emulator), for example with “make BOARD=qemu_x86 qemu” instead of step 12 and the QEMU will execute the Zephry kernel and example right away with emulation.

Note that the Synopsis ARC co-processor is intended to run a nanokernel image to transform sensor input, which in turn, can be further processed by the x86 Quark. As such, the Quark waits on the ARC co-processor to boot and run, so for the purpose of this paper, I just flashed “hello world” so it would always boot, but I did not use it otherwise. It would have value for sensor I/O and pre-processing for the Quark, so I plan to investigate it more in future work.

Building and Flashing Example Benchmark Zephyr Code

Now that we’ve gone through build and flash of Linux Foundation Zephyr examples, let’s try building the examples I’ve created for this paper. For building a custom application, you’ll want to refer to the Linux Foundation Zephyr “Application Development Primer”. To build and flash the Zephyr benchmark applications follow these steps:

1. The simplest way to add a new microkernel Zephyr application is to create a new directory for your project in $ZEPHYR_BASE/samples, so for example I created $ZEPHYR_BASE/samples/Quark.

2. In the Quark directory, I then have the pattern-match directory I used to test my code in Linux and in that directory; I created dct2_quark and copied everything from samples/hello_world/microkernel into my dct2_quark directory.

3. I then adapted the code in the subdirectory src/main.c to be my dct2.c test program.

4. To test it, I first tried it out with QEMU, building with “make BOARD=qemu_x86 qemu” and ran it to make sure it builds and runs on the emulator.

5. Now for the x86 kernel, build it with “make pristine && make BOARD=arduino_101 ARCH=x86”.

6. Make sure you have your Flyswatter JTAG connected as shown in Figure 4 noting the pin position of the red wire on the ribbon cable. After a successful build, then “make BOARD=arduino_101 flash”.

7. Test using the serial cable connected as shown in Figure 5 with the Adafruit FTDI cable green wire on the RX terminal, white wire on the TX terminal and black on Ground terminal (the power is floating if you power the board via USB, which I did).

Page 7: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

This is how I created Quark versions of all the benchmarks I original developed on Linux and tested there for comparison. Start with something simple just to get the process down. You should for example flash the samples/hello_world nanokernel on both the Synopsis ARC sensor processor and the x86. Note that you must have something flashed on the ARC sensor processor or your Zephyr will hang, so just flash hello_world to it and forget about it if you don’t have specific sensor pre-processing you want it to do for you. The nanokernel Zephyr is smaller and well adapted to ARC sensor processing. I recommend the microkernel build for the x86 Quark so you have a few more kernel features and libraries to help you write your application.

Figure 4: Flyswatter JTAG Cabling and Configuration

The Flyswatter is a JTAG with a USB interface to your Linux laptop that can be used with the Zephyr tools to flash and single step debug the processor. Like most embedded systems, with Zephyr, you can cross-compile the nanokernel or microkernel on your host and download and flash an image to run.

Figure 5: Adafruit Serial Cable Connection to Curie / Quark

The Adafruit FTDI cable provides an USB connection to any Windows, Mac OS-X or Linux system where you can run a terminal emulator program. I chose to run Putty on my windows machine, but I could

Page 8: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

have used minicom for example on Linux. Serial setup can be found here - https://www.adafruit.com/products/954 . If you set up your serial and use Putty on windows to communicate with your Curie / Quark as I did, you’ll find the Adafruit examples for setup quite useful for their FTDI cable; however, they do not yet have a Curie / Quark specific example, so I captured the cabling details shown in Figure 5. The only other trick is to note the COM port on Windows that is assigned to the FTDI chip once you download the Prolific driver. You can find this in Windows by using the System Device manager in Control Panel as shown in Figure 6.

Figure 6: Verifying COM port Used by Adafruit FTDI Cable and Prolific driver

Notice that in my case, the Prolific driver assigned COM6 to my cable, but your assignment may be different. Use this in you Putty setup for serial.

Performance and Benchmarking

The Curie Quark was tested with 3 main categories of benchmarks that are of interest to IoT sensor networks and wearable computing. First, we explore pattern matching and pattern compression. Second, we look at DSP transforms (from time series to frequency), convolutions (edge sharpening of small images), and filtering (with finite impulse response). Let your imagination run wild – this could be the basis of some augmented reality application like heads-up display on ski goggles, a smart sensor network built into clothing or just that Internet toaster we’ve all been waiting on. The key is small, low power, low mass, but very capable in terms of processing

Page 9: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

sensor data; and able to link data to other mobile devices or sensors in a network to get data to a user or the Cloud.

Pattern Analysis

I originally developed the benchmarks examples that follow for Linux-based courses in operating systems and embedded Linux digital media and computer vision taught at University of Colorado - they were easily adapted for build and test with Zephyr. Pattern analysis and recognition is typically applied to 1 dimensional time series data (or frequency transformed time series), but is also often applied to 2 dimensional area scans – images. While an SoC as small as Curie Quark has modest I/O bandwidth compared to larger SoCs with PCI-Express, it can host shields from Adafruit and other Arduino suppliers that include audio ADC (Analog to Digital Converter) decoders and DAC (Digital to Analog Converter) encoders for 1D audio signals [web 10]. Likewise, Adafruit has a TTL (Transistor to Transistor Logic) level 5 volt serial JPEG (Joint Picture Experts Group) camera, which is just the type of fly eye photometer I have really wanted to work with along with bigger photometers I already use. Inspired by both types of sensors and the built-in 6 axis inertial sensors in Curie Quark, the goal is to test real-time transforms that could be used to feed machine learning and advanced wearable human computer interfaces (stereo ski goggles). First, a 2D DCT (Discrete Cosine) transform was tested, which is the basis for JPEG and MPEG (Motion Picture Experts Group) image compression. Digital cinema and JPEG2000 use wavelets, but both are useful to deal with 2D real-time data. The DCT almost seems magical, but is based upon the simple idea of constructing arbitrary waveforms by adding many together, as shown in Figure 7.

Figure 7: Approximating Complex Waveforms with Simpler, A Basis for Compression

Note that in Figure 7, the left image shows simple adding of 2 cosine functions to produce a more interesting composite function (the basic principle behind the DCT). In the right image, we see approximation of a scan-line intensity waveform by waveform composition and the much smaller

Page 10: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

magnitude and higher frequency error function (shown in green). Figure 8 shows application of the DCT to tiles in a 2D spatial transform. For those not familiar with this, it often appears magical, but in fact we are just recovering the original data that was transformed using and inverse DCT as found in our example benchmark, which is the bases of I-frame MPEG compression to a large degree.

Figure 8: Compressing Data – A thumbnail of a Local Cactus

Just looking at the DCT transformed image, we can see that compression methods applied to the transformed image such as Huff and run-length encoding will reduce the data stored for the image. For Curie / Quark, given that the sharpen_quark code is located in your zephyr-project directory under samples (as a new sample you’re working on), just use the following commands to build, test, build, flash and run your code:

1. make pristine; make BOARD=qemu_x86 qemu, to test with the QEMU emulator

2. make pristine; make BOARD=arduino_101 ARCH=x86

3. make BOARD=arduino_101 flash

When you flash, the board should reset and you should see the following DCT, followed by an inverse DCT test run 10 times as shown in Figure 9. The DCT is typically a transform done by MPEG encoders and decoders, which are dedicated ASIC (Application Specific Integrated Circuit) co-processors, but this is a good test of the capabilities of Quark. The 2D DCT can be valuable for custom compression methods and another 2D image compression technique, wavelet, is also of value for the fly eye concept.

Figure 9: Output from Discrete Cosine Transform and the Inverse on Quark

Page 11: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

Clearly, if we can store basic waveform data rather than the wave data itself, this is much more efficient. The DCT can be lossless and can be inverted, but we can also make use of truncation to form lossy or lossless compression of data. Like all waveform transformations, the point of the DCT is to take small tiles form images and to compress them by removing high frequency detail that is less important to viewers (lossy compression) as shown in Figure 8. Decoding of MPEG and JPEG can be done with a dedicated co-processor, but the ability to apply this transform is a great test of the Curie Quark. Transformation of 1D times series and 2D spatial time series is fundamental to interesting IoT applications, especially those that can be aware of the environment and interface naturally to their human hosts. Similarly, pattern analysis that can segment a time series or spatial map (image) is also a basic need for analysis of continuous sensors, audio, and vision. To test the Curie Quark, I implemented K-means clustering, which is an NP-hard problem (in other words not solvable in linear or polynomial time, but solved only by a heuristic). Figure 10 shows an example of K-means clustering of a color image that can be done using OpenCV (not used on Curie Quark).

Figure 10: Example of the value of K-means Clustering – Separation of Clouds, Sky, Birds and building (from http://opencvpython.blogspot.com/2012/12/k-means-clustering-2-working-with-scipy.html )

K-mean can be attacked using Lloyd’s algorithm – solved in time that is on the order of the product of n vectors, k clusters, d dimensions for each vector and i iterations – O(nkdi), which has been done and has

Page 12: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

been implemented in MATLAB and in C based upon this approach [6], [7]. Or, put simply, it’s a tough workload that’s about order N to the fourth power that must run at a rate of say 30 Hz for segmentation of continuous intelligent vision – keep in mind that a bumble bee can do this, but many processors can’t and certainly not at the energy and mass efficiency of an insect – perhaps Curie Quark is getting closer! The k-means clustering (or Lloyd’s) algorithm is very data intensive, so I took the small 40x30 sunset image and decimated it by two, for a 20x15 sub-sampled image with output as shown in Figure 11. My only goal was to cluster sky and the water. The code in kmeans_quark was built as before and runs in reasonable time, but any higher resolution would be difficult on the Quark given memory limitations. Methods such as Sobel or Canny edge detection might be more productive on Quark for segmentation.

Figure 11: Quark Microkernel output from K-means Clustering Benchmark

Digital Signal Processing

The second set of benchmarks developed for Zephyr testing were either developed by myself or available as open source and readily adapted to testing on Linux and with Zephyr. I have a number of Linux application benchmarks including the Fast Fourier Transform, which is fundamental to DSP. This transform is a DFT (Discrete Fourier Transform) that is an approximation of the general Fourier transform and is one of the most versatile methods to analyze a time series in terms of frequency components in a signal. I have developed FFT using the Cooley-Tukey method, which is an optimal order n log(n) transform which can also be found on Rosetta Code in C, C++, Ada and many other programming language implementations. One application for the FFT is to take audio, music even, and to analyze the frequency components to recognize tones and even musical notes often described with MIDI (Musical Instrument Digital Interface). Along with filters like FIR (Finite Impulse Response), the FFT and FIR filter are most useful for audio and single dimension and channel based signals rather than spatial, which is more my interest. Two dimensional transforms are more processing intensive, so I decided to follow my interests and really push the Quark for DSP. I have not yet ported my FFT code to the Quark, but plan to do so. In the meantime, I decided to port code I wrote myself to apply a 2D PSF (Point Spread Function) convolution used for image enhancement as shown in Figure 12.

Page 13: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

Figure 12: Use of a PSF Convolution to Sharpen an Image

To complete our 2D DSP analysis of Quark, I have included the image convolution, a point spread function, which computes new values of image pixels (X, Y data maps) based on neighboring values. This simple convolution can be used to sharpen, blur, emboss, and more [3]. It’s a classic 2D DSP algorithm. All of the Linux pattern and DSP benchmarks (2D DCT, K-means and PSF) can be built on Linux just by typing “make” in the directory where you download them to compare to the Quark cross-compiled versions. For the Curie Quark, we must use the Zephyr microkernel tools as described earlier. For Curie / Quark, given that the sharpen_quark code is located in your zephyr-project directory under samples (as a new sample you’re working on), just use the following commands to build, test, flash and run:

4. make pristine; make BOARD=qemu_x86 qemu, to test with the QEMU emulator

5. make pristine; make BOARD=arduino_101 ARCH=x86

6. make BOARD=arduino_101 flash

For example, I have provided a use case of the Intel SIMD (SSE) instructions in an earlier paper as shown by the sharpened sunset picture in Figure 12 [4]. In a 40x30 graymap format, a nice fly eye size, the image looks like the thumbnail and hex data shown in Figure 13.

Figure 13: Fly-Eye Sunset Image at 40x30 4:3 Aspect Ratio

Page 14: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

The sharpen_quark example was built to test this 40x30 (aspect ratio 4:3) fly eye image that might be of value to an IoT sensor used in say robotics with a PSF (Point Spread Transform) to detect basic targets much like an insect. In my test using the microkernel and high priority, this results in a 5Hz sharpening rate as seen below in Figure 14, with the final image dumped in hex data in case you want to capture.

Figure 14: Fly-Eye Sunset Image PSF Running on Quark

CPU, Memory and I/O Throughput

The final set of benchmarks developed for Zephyr testing were either developed as benchmarks I use in teaching to test concurrent thread (or task) scaling, built-in programming language concurrency and data structures compared to library, and just generally how much processing can be done. The first simple benchmark is the sieve of Eratosthenes, which is not the mathematically most advanced prime number finder, but it is the original algorithm, is easy to understand and well documented, and is a good test of a processor. The simple Sieve of Eratosthenes prime number hunter, which can be threaded, sequential, recursive and has formed a basis for finding large primes and semi-primes used for example for cryptography. Today there are much better prime number hunting algorithms, but this is the classic and a good way to test a CPU. Given that the sieve code, sieve_quark, is located in your zephyr-project directory under samples (as a new sample you’re working on), just use the following commands to build, test, build, flash and run your code:

1. make pristine; make BOARD=qemu_x86 qemu, to test with the QEMU emulator

2. make pristine; make BOARD=arduino_101 ARCH=x86

3. make BOARD=arduino_101 flash

Page 15: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

The algorithm is well documented on Wikipedia [https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes]. The Linux versions include C/C++ single thread iterating, recursive, threaded, Python and Java. For the Curie / Quark, I adapted the simple C version and ran it to find the primes between 0 and 1000 as shown in Figure 15.

Figure 15: Prime number search with Sieve of Eratosthenes on Curie / Quark

Most embedded systems, especially those in mission critical applications, where failure is not an option, must detect and correct errors in data. The errors can be introduced by Cosmic Rays (always worth trying to blame a software bug on Cosmic rays), by EMI (Electromagnetic Interference) or by Solar wind particle radiation (at altitude and in space). I coded a simple SECDED (Single Error Correction, Double Error Detection) example in C for a memory emulation interface to test encode, decode and recovery rates on the Curie Quark. While many of us have perhaps studied Hamming codes in school, they can be confusing, so here’s a brief primer, and for more detail, please see my textbook on Real-Time systems with Linux and RTOS [5]. First, for the normal case, where there are no bit-flips for bits at rest in memory, Figure 16 shows the basic encode and decode for a Hamming SECDED interface to memory or flash.

Figure 16: Hamming SECDED with No Errors

Page 16: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

While the IoT is mostly fun, non-critical applications, there could be need for mission critical sensor networks, even wearable, for emergency responders, or extreme athletes (think base jumpers and wing suits). As such, various codes such as simple XOR, Hamming, less simple Reed-Solomon and other methods such as low-density parity are useful for detection and correction of errors in real-time for memory, for wireless links, and anywhere that bits are at rest or in flight. To get an idea for how Hamming SECDED encoding can correct a single bit error, discovered on data access (a memory read), Figure 17 shows a bit flip in data bit 2, and how the syndrome (check bits) encode the location of the bit that flipped.

Figure 17: Hamming SECDED with Single Bit Flip Error

To better illustrate bit-error correction, I created an Excel spreadsheet model for the Hamming SECDED that can be downloaded from his Embry Riddle website. The complete set of Hamming SECDED examples includes more bit-flip cases where not only data bits are flipped, but also the parity bits within the encoded word and the overall parity bit [5]. The Hamming code extended as presented here is a SECDED which can recover from any single bit-flip anywhere in the stored data and can detect any double bit-flip. Triple bit-flips or more are not detectable or recoverable, but are highly unlikely. Most often, bit-flips are due to an SEU (Single Event Upset), caused by cosmic rays and trapped particle radiation, which very often, only affects a single bit at a time. Double bit errors most often occur when the single bit errors go unnoticed and over time, a second SEU just happens to impact the same word. If for example a Curie/Quark solution was to be used on a cube satellite or for a high altitude application, this is when memory protection is most critical. As before, the code in secded_quark can be built the same way as previous examples, first testing with the QEMU emulator, then flashing to the Quark, which should produce output shown below in Figure 18.

Page 17: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

Figure 18: Hamming SECDED Encode/Decode Testing on Quark

Both the Sieve code and SECDED can be built and run on Linux for comparison to the Quark microkernel versions.

Bluetooth Low Energy Given all the great IoT and sensor network applications one can build with Curie Quark, it only makes sense to build in a network of these devices into smart clothing (wearable computing) or for use in sensor networks placed in-situ for monitoring (e.g. buildings and bridges for safety) or valuable items (for security). To do this, the Curie Quarks need to talk to each other and uplink data and receive commands from mobile devices, dashboards, or even the Cloud via BLE to Ethernet bridges and routers. To start learning more about BLE, see zephyr-project/samples/Bluetooth and try building the beacon example and flashing to test. Take a look also at the zephyr-project/drivers/Bluetooth. Building, flashing and testing more of the samples provided for the arduino_101 board is the best way to learn. Note that many of the board specific examples will not run on the QEMU emulator and will only run on the Curie / Quark board, especially those involving specific I/O hardware, that use the x86 FPU (Floating Point Unit) or specific ARC sensor co-processor features. I plan to explore Curie drivers more in the future and the ARC co-processor – the I/O and I/O co-processing are a unique feature of Curie. Many of these features available in Arduino mode are now being developed in Zephyr and in the future, may also be available with Wind River’s Rocket RTOS (Real-Time Operating System). Getting to know Curie and Quark with microkernel applications is a good start, but there’s plenty more to learn from Intel, the Linux Foundation and developer papers, where the goal is to share tips, tricks, technology and code.

Next Steps

We invite readers to download the code shared here, run them on your favorite Linux devices and build and run them with Zephyr on the Curie Quark. Keep in mind that while performance will be higher on a Core i5 or i7, the capability of the Curie Quark normalized in terms of mass (grams) and Watts consumed

Page 18: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

per unit of performance will likely show advantageous power, size, mass efficiency. The fact that this tiny SoC can do any DSP, pattern analysis and high throughput CPU and memory transactions is pretty amazing. Computing is nowhere near the efficiency and intelligence of biological systems and I mean even insects – they are smart, light, energy efficient. However, Curie Quark is a step in the right direction for sensor networks and the Internet of Things. Please download code from - http://mercury.pr.erau.edu/~siewerts/extra/code/Zephyr-IoT-examples/

Web Resources

1) http://inside.mines.edu/fs_home/dhale/jtk/bench/index.html

2) http://www.openpr.org.cn/

3) http://www.bdti.com/InsideDSP/2011/05/18/JeffBierImpulseResponse

4) http://www.fftw.org/

5) https://www.cs.virginia.edu/stream/

6) http://www.eembc.org/

7) https://inet.haw-hamburg.de/teaching/ss-2012/master-projects/Projekt2-Perrey.pdf

8) http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4327007/

9) https://www.adafruit.com/products/1788 – Adafruit MP3 shield

10) https://www.adafruit.com/products/94 – Adafruit DAC/Audio shield

11) https://www.adafruit.com/products/802 - Adafruit low-resolution 1.8” Color TFT shield

12) Adafruit TTL Serial JPEG camera (https://www.adafruit.com/products/13886 )

13) https://www.adafruit.com/products/2829 - Adafruit Bluefruit LE

14) https://www.adafruit.com/products/2995 - Adafruit Feather Bluefruit LE

15) http://www.java2s.com/Open-Source/Android_Free_Code/Hardware/energy/index.htm

16) http://blog.onlycoin.com/posts/2013/10/3/coin-arduino-ble-dev-kit

17) http://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/xeon-5500-hyperscan-brief.pdf

18) https://downloadcenter.intel.com/download/25832 - firmware download

19) http://www.prolific.com.tw/US/ShowProduct.aspx?p_id=225&pcid=41 – Prolific PL2303 Windows Driver Download

20) https://www.zephyrproject.org/doc/kernel/microkernel/microkernel_tasks.html

Formal References [1] S. Siewert, V. Angoth, R. Krishnamurthy, K. Mani, K. Mock, S. B. Singh, S. Srivistava, C. Wagner,

R. Claus, M. Demi Vis, “Software Defined Multi-Spectral Imaging for Arctic Sensor Networks”, SPIE Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XXII, Baltimore, Maryland, April 2016.

Page 19: Exploring Internet of Thing s Processing on Currie with ...mercury.pr.erau.edu/~siewerts/extra/papers/Curie... · Exploring Internet of Thing s Processing on Currie with Quark Sam

[2] Linux Development Getting Started, Sam Siewert - http://mercury.pr.erau.edu/~siewerts/extra/documents/Linux/

[3] Engineer’s DSP Handbook - http://www.dspguide.com/

[4] https://software.intel.com/en-us/articles/using-intel-streaming-simd-extensions-and-intel-integrated-performance-primitives-to-accelerate-algorithms

[5] S. Siewert, J. Pratt, Real-Time Embedded Components and Systems Using Linux and RTOS, 2nd Edition, Mercury Learning and Information, Dulles Virginia, December 2015, ISBN 978-1-942270-04-1.

[6] Seber, George AF. Multivariate observations. Vol. 252. John Wiley & Sons, 2009.

[7] Faber, Vance. "Clustering and the continuous k-means algorithm." Los Alamos Science 22.138144.21 (1994).

[8] Bergmann, Gábor, et al. "A benchmark evaluation of incremental pattern matching in graph transformation." Graph Transformations. Springer Berlin Heidelberg, 2008. 396-410.

[9] Fykse, Egil. "Performance Comparison of GPU, DSP and FPGA implementations of image processing and computer vision algorithms in embedded systems." (2013).

[10] Wind, Intel, “Rocket – Supported Platforms”, 2015.

[11] Gomez, Carles, Joaquim Oller, and Josep Paradells. "Overview and evaluation of bluetooth low energy: An emerging low-power wireless technology." Sensors 12.9 (2012): 11734-11753.

[12] Ishii, Yasuo, Mary Inaba, and Kei Hiraki. "Access map pattern matching for high performance data cache prefetch." Journal of Instruction-Level Parallelism 13 (2011): 1-24.

[13] Zivojnovic, Vojin, et al. "DSPstone: A DSP-oriented benchmarking methodology." Proceedings of the International Conference on Signal Processing Applications and Technology. 1994.

[14] Perrey, Heiner. "Performance Analysis of Bluetooth Low Energy with Merkle’s Puzzle." Ausarbeitung Masterkurs, Projekt 2 (2012).

[15] https://balau82.wordpress.com/2011/03/29/programming-arduino-uno-in-pure-c/

[16] https://www.ashleymills.com/node/327

[17] https://www.arduino.cc/en/Main/Software

[18] https://www.arduino.cc/en/Guide/Linux

[19] https://www.zephyrproject.org/doc/board/arduino_101.html

[20] https://blog.arduino.cc/2016/04/21/intel-releases-the-arduino-101-firmware-source-code/

[21] http://www.tincantools.com/wiki/Flyswatter2

[22] http://www.alphr.com/operating-systems/1000061/how-to-install-ubuntu-run-linux-on-your-laptop-or-pc

[23] https://www.zephyrproject.org/doc/getting_started/getting_started.html

[24] http://www.engineering.com/IOT/ArticleID/11530/How-Linuxs-IoT-Zephyr-Operating-System-Works.aspx

[25] https://github.com/wind-river-rocket/rckt-newlibm-app/tree/master/application/src