ecoxip industry pubs & in-the-news · 2019-04-23 · • ead while rite (rww) flexible erase...
Post on 01-Jun-2020
4 Views
Preview:
TRANSCRIPT
EcoXiP Industry Pubs & In-the-News
IoT device designs are relying upon Execute-in-Place (XiP) system architecture to
achieve lower controller cost and higher performance.
However, commodity memory stands in the way of achieving the cost and
performance targets of an XiP implementation.
Adesto EcoXiP is specifically designed for XiP, with a high performance protocol
that enables blazingly fast 266 MB/s performance at half the power consumption
and total lower cost.
Adesto Technologies Corporation 3600 Peterson Way | Santa Clara, California USA 95054 | Phone: 408-400-0578 FAX: 408-400-0721
www.adestotech.com
EcoXiP Industry Pubs & In-the-News
Table of Contents:
EcoXiP Overview
EcoXiP Datasheet (requires registration)
EcoXiP e-Bulletin Channel Announcement
EcoXiP EVK Product Sheet
The Linley Processor Report:
Adesto Execute-in-Place
Article: Embedded Computing Design
Is your Quad Device Choking your System Performance? EcoXiP can help http://www.embedded-computing.com/iot/is-your-quad-device-choking- your-system-performance
Article: Embedded Computing Design
Selecting the Optimal Flash for your Embedded Application http://www.embedded-computing.com/guest-blogs/selecting-the-optimal- flash-device-for-your-embedded-application
In-the-News: Adesto and STMicroelectronics Collaborate
In-the-News: EcoXiP Supports New JEDEC Standards
White Paper :
Crossover to Memory Expansion with Adesto EcoXip and NXP's i.MX RT Crossover Processors
Adesto Technologies Corporation 3600 Peterson Way | Santa Clara, California USA 95054 | Phone: 408-400-0578 FAX: 408-400-0721
www.adestotech.com
EcoXiP Evaluation Kits Now Available
Adesto EcoXiP
Best Performance and Best Power for XiP Applications
High-speed xSPI-Octal memory designed for Execute-In-Place
System designers are leveraging Execute-in-Place (XiP) as the transformative system architecture that
will deliver higher-performance IoT and edge devices.
Adesto EcoXiP xSPI-Octal memory is specifically designed with the right architecture and features to
meet the requirements of high-speed, low-power, instant-on XiP applications.
Now you can get going even faster with your XiP design by requesting your free EcoXiP product
samples or by ordering the EcoXiP evaluation kit.
EcoXiP Delivers:
Performance
Blazingly fast Octal interface (up to 266Mbytes / sec)
Optimized for microcontroller cache controllers
Power
Up to 50% lower power than other high-speed Octal devices
25% better power efficiency than other Quad devices for same Mbytes / sec
Price
Cheaper than other high-speed Octal devices
Lower solution cost than other Quad devices for same Mbytes / sec
EcoXiP Evaluation Kits Now Available
EcoXiP is the only memory that fully supports these JEDEC specs:
xSPI standard JESD251 and 251-1
The new Reset Signaling Protocol JESD252
The latest version of SFDP JESD216D
Evaluation Kits now available
EcoXiP EVKs with NXP's i.MX RT1050 can be ordered directly from Adesto or through an Adesto
distributor.
Third party reference design boards using EcoXiP are also available. Contact Embedded Artists at
www.embeddedartists.com .
Learn more and get samples
Adesto Technologies Corporation 3600 Peterson Way | Santa Clara, California USA 95054 | Phone: 408-400-0578 FAX: 408-400-0721
www.adestotech.com
EcoXiP™
Evaluation Kit ATXPxxx-EVK
NXP iMXRT 1050 Evaluation Kit (EVK) featuring Adesto EcoXip Execute-In-Place memory devices. The ATXPxxx-EVK-iMXRT1050 evaluation kit features the NXP iMXRT1050 cross-over controller alongside the
Adesto EcoXiP 32-Mbit or 128-Mbit xSPI memory designed to support NXP's implementation of the Arm® Cortex®-
M7 core for high speed Execute-In-Place (XiP) applications.
Kit Contents
• MIMXRT1050-EVK board
• USB cable (Micro B)
• Adesto EcoXiP Part No: See Product Offerings table below
• Display (optional): RK043FN02H-CT 4.3 inch LCD 480 x 272 pixels with capacitive touch
• User Guide (downloadable) Document # AN106 link: https://www.adestotech.com/products/ecoxip/
• Tool support: MCUXpresso, IAR, Keil, and MDK
Product Offerings
Ordering Code Adesto EcoXiP Part Number Display Included
ATXP032-EVK02-iMXRT1050 32 Mb EcoXiP ATXP032 Yes
ATXP128-EVK02-iMXRT1050 128 Mb EcoXiP ATXP128 Yes
ATXP032-EVK01-iMXRT1050 32 Mb EcoXiP ATXP032 No
ATXP128-EVK01-iMXRT1050 128 Mb EcoXiP ATXP128 No
Co rp o rat e Of f ice
California | USA
Adesto Headquarters
3600 Peterson Way
Santa Clara, 95054
Phone: (+1) 408.400.0578
Email: contact@adestotech.com
© 2019 Adesto Technologies. All rights reserved
Adesto, the Adesto logo, CBRAM and DataFlash are trademarks or registered trademarks of Adesto Technologies Corporation in the United States and other countries. Other company, product, and service
names may be trademarks or service marks of others. Adesto products are covered by one or more patents listed at http://www.adestotech.com/patents.
Disclaimer: Adesto Technologies Corporation (“Adesto”) makes no warranties of any kind, other than those expressly set forth in Adesto’s Terms and Conditions of Sale at http://www.adesto-
tech.com/terms-conditions. Adesto assumes no responsibility or obligations for any errors which may appear in this document, reserves the right to change devices or specifications herein at any time
without notice, and does not make any commitment to update the information contained herein. No licenses to patents or other intellectual property of Adesto are granted by Adesto herewith or in connec-
tion with the sale of Adesto products, expressly or by implication. Adesto’s products are not authorized for use in medical applications (including, but not limited to, life support systems and other medical
equipment), weapons, military use, avionics, satellites, nuclear applications, or other high risk applications (e.g., applications that, if they fail, can be reasonably expected to result in personal injury or
death) or automotive applications, without the express prior written consent of Adesto.
EcoXiP-EVK–02/2019
NON-VOLATILE MEMORY | OCTAL FLASH
EcoXiP ATXP Series High-performance low-power octal flash
Octal interface optimized for execute-in-place
Best Performance, Best Power
Performance CoreMark® test on NXP’s i.MX RT1050 with 8 instruction cache invalidations every ms to simulate task switching & interrupt handling.
Efficiency CoreMark® score / power consumption
Best Performance blazingly fast eXecute-in-Place (XiP)
As microcontrollers push the performance envelope and use cutting edge technologies, the cost to integrate internal flash quickly
becomes prohibitive. With its blazingly fast performance and low power consumption, EcoXiP allows even time critical software to
be executed directly out of non-volatile memory, reducing boot time and system cost.
Best power high-efficiency low-power design
For all battery powered designs, power consumption is critical. Often this means sacrificing performance. EcoXiP’s intelligent
power management helps you simplify your design without the need for compromises. Special power saving modes and high
efficiency read operations make EcoXiP perfect for any battery operated application.
www.adestotech.com
Read While Write (RWW) simplified OTA updates
Updating program code can be a tedious task. Once downloaded, then the system has
to pause while it performs the update, before finally returning to normal operation. Read
While Write (RWW) allows an update to be programmed in the background without any
impact to the user. Once ready, the user can be prompted and the upgrade happens
immediately. No wait, no fuss, better user experience and simple design.
cs
�
0
applications
• Instant on modules
• Industrial IoT
• Building automation
• Wearables
• Consumer devices
• Smart appliances
• Medical devices
• OTA intensive applications
• Network modules
• Audio subsystems
Density
Part Number
Speed
Quad
QPI DDR
OPI DDR
RWW
32Mbit
ATXP032
ATXPO32R
133MHz
133MHz
•
•
•
•
•
•
•
64Mbit
ATXP064
ATXPO64R
133MHz
133MHz
•
•
•
•
•
•
•
128Mbit
ATXP128
ATXP128R
133MHz
133MHz
•
•
•
•
•
•
•
Performance for the real world Today’s Internet of things (IoT), smart devices, and embedded processors de-
mand high performance and instant-on capabilities while keeping power con-
sumption to a minimum. eXecute in Place (XiP) technology is well suited to meet
these needs. Adesto’s EcoXiP takes this to the next level.
Specifically designed to work with cache controllers, EcoXiP dramatically reduces
latency for cache misses. Unlike other octal flash solutions that sacrifice power
consumption for high data rates, EcoXiP maintains low power operation by utiliz-
ing Adesto’s proprietary technology.
EcoXiP offers the perfect solution for memory expansion in systems that don’t
have enough embedded flash or SRAM. With its high performance, even
time-critical code can execute directly out of flash, eliminating the need to add
expensive and power hungry external DRAM.
CONTROL AND 1/0BUFFERS DS PROTECTION LOGIC AND LATCHES
SCK
SI (1/0o)
SO(l/01) INTERFACE
WP(l/02) CONTROL
AND 1/03 LOGIC Y-DECODER Y-GATING
1/04 I
Technical Specifications
• eXecute in Place (XiP)
- Instant-on capability
- Lower system cost
• Reduced cache latency
- Critical word first
- Zero latency for additional cache lines
• Up to 266MBytes / sec
- Octal DDR xSPI interface
- Full JESD251 , JESD252, and
JESD216D compatibility
• Low power consumption / high efficiency
- Low read current
- Variable strength I/O
- Deep sleep / ultra-deep sleep modes
• Read While Write (RWW)
• Flexible erase and program architecture
- Block erase: 4, 32, and 64KBytes
- Byte / page program (1-256 bytes)
- Suspend / resume, erase and program
operations
• Hardware and software write protection
• 256 byte OTP security register
I/Os
I/Os
o
...J FLASH (/)
• 100K erase / program cycles
• 20 years data retention
1/07 (/)
w X-DECODER MEMORY
c::: ARRAY 0
RESET -c
• Single 1.8V supply
• Industrial temp range: -40°C to +85°C
• Pb / Halide-free / RoHS compliant
3600 Peterson Way, Santa Clara, CA 95054 USA | Phone: +1 (408) 400-0578 | www.adestotech.com | e-mail: info@adestotech.com
© Adesto Technologies 2019 all rights reserved PBEXrev1-0219
© The Linley Group • Microprocessor Report October 2016
C/A Wait Line N Line N+1 Line N+2 Line N+3
ADESTO EXECUTES IN PLACE
New EcoXIP Memory Simplifies IoT Design
By Linley Gwennap (October 10, 2016)
...................................................................................................................
Adesto is a small memory supplier with big plans.
After introducing an innovative nonvolatile memory ear-
lier this year, it has taken standard NOR flash and added a
new high-speed interface designed specifically for streaming
instructions—a technique that designers call execute in place
(XIP). The new EcoXIP product is now sampling in 32Mb
(4MB) capacity, with production expected in 1Q17. Addi-
tional capacity options will follow.
Many small systems employ a microcontroller with
embedded flash memory that holds the application code.
When these devices add a radio for IoT capability, they re-
quire larger storage to hold the wireless protocol and IP
stack as well as security software. MCUs normally top out
at 1MB of internal flash, but IP-based IoT devices often
need more code space, requiring an external flash device.
Commodity flash chips connect through a low-speed
SPI, requiring the MCU to copy the code into a large inter-
nal SRAM to maintain reasonable performance. Many sys-
tems execute directly from the external flash (XIP); they
are not only slower but may require a second flash chip to
support over-the-air (OTA) code updates. In an XIP de-
sign, even writing data (such as log information) to flash
can be challenging.
Announced at the recent Linley Processor Conference,
Accelerating the Bus To more efficiently implement XIP, Adesto redesigned the
basic SPI protocol to better handle the typical access pat-
terns. SPI is designed for random accesses; it returns the re-
quested cache line (e.g., 16 bytes), then waits for the next
request. This approach works well for data storage, but
instruction fetches tend to be sequential as the CPU pro-
ceeds through a block of code. Therefore, EcoXIP continues
to provide sequential bytes until it receives a new command.
It calls this approach “command fusing.”
This approach can double the bus throughput, as
Figure 1 shows. Using an octal (8-bit) SPI, a transaction
typically requires 1 bus cycle for the command, 2 cycles for
the address, about 14 cycles to wait for the response, and 8
cycles to transmit 16 bytes of double-clocked data (DDR).
Fetching the next 16 bytes requires another 25 bus cycles,
or 50 for the two transactions. Using EcoXIP, the first 16
bytes take the same number of cycles, but data then con-
tinues to flow, delivering four cache lines in 49 cycles.
The benefit of Adesto’s approach comes when the
CPU executes a sequential set of instructions. For a 32-bit
MCU, each line holds four instructions, and a branch occurs
about every seven instructions; about half of all branches
are taken. The company estimates that the average number
EcoXIP solves these problems by enabling simul-
taneous read and write transactions. It can deliver
instructions at a sustained rate of 156MB/s (266MB/s
peak), which is fast enough for most MCUs and
better than other XIP memories. Because it uses a
modified SPI to improve performance, however, the
Octal SPI at 133MHz
C/A Wait Line N C/A Wait Line N+1
EcoXIP at 133MHz
new memory works only with compatible MCUs. At
the conference, Adesto CTO Gideon Intrater dis-
closed that NXP, a leading MCU supplier, will sup-
port the EcoXIP interface in future MCU products.
Figure 1. EcoXIP bus timing. C/A=command/address. By chaining data
responses using “command fusing,” the Adesto design can deliver twice
as many cache lines in the same number of bus cycles. (Source: Adesto)
© The Linley Group • Microprocessor Report October 2016
T
hro
ug
hp
ut
(MB
/s) L
ate
ncy (
ns
)
2 Adesto Executes in Place
Price and Availability
Adesto is currently sampling a 32Mb EcoXIP prod-
uct to lead customers; it expects to enter production in
1Q17. The company withheld pricing. To download a
free copy of the Adesto presentation from the Linley
Processor Conference, access www.linleygroup.com/
processor-conference. For more information on EcoXIP,
access www.adestotech.com.
of line fetches per instruction-cache miss is 3.84, or nearly
14 instructions (the first cache line may have fewer than
four useful instructions if the target is in the middle of the
line). Using this average, Adesto calculates the sustained
throughput of its 133MHz EcoXIP at 156MB/s and the
average latency at just 57ns. When the CPU reaches a tak-
en branch, it sends a new request to the EcoXIP, which
then begins transmitting data from the new address.
Most flash chips have a quad SPI to reduce pin count
and cost. These chips generate as little as 58MB/s of sustain-
able throughput at an 80MHz bus speed. More-expensive
parts offer an octal SPI and operate at up to 200MHz, but
even they fall well behind the 133MHz EcoXIP in through-
put and latency for XIP applications, as Figure 2 shows.
Adesto plans to increase the EcoXIP bus speed to
200MHz in order to boost this performance further. In
addition to modifying the protocol, the EcoXIP interface
has an extra data strobe signal, which simplifies the imple-
mentation of designs that operate at speeds above 80MHz.
Current high-speed designs require a dynamic delay line to
synchronize the DDR transfers, but the strobe allows the
MCU to capture data using a simple fixed delay.
Two Banks, No Waiting Flash memory retains data even when a device is powered
down, but this feat requires a complex and time-consuming
write operation. For NOR flash, this operation involves
0
20
40
60
80
100
applying a high voltage (above 5V) to the cell for a period of
roughly 1ms. During this period, the MCU cannot fetch
instructions from the flash chip. If the flash is solely for code
execution (XIP), this situation won’t arise, as no writing is
necessary. But many systems use flash to store data, such as
configuration parameters and event logs. OTA code updates
also require writing data to the flash.
Of course, the system can simply stall for 1ms each
time it writes to flash, but that delay hampers performance.
Another option is to load a small amount of code into the
MCU’s internal memory before starting an OTA update,
but if anything unusual occurs (such as an interrupt), the
rest of the code will be unavailable. Thus, many designs
include two flash chips, so one can be read while the other
is written, but this approach adds cost.
EcoXIP separates its internal flash memory into two
banks. Doing so allows the MCU to read from one bank
while writing to the other. Designers can adjust the bound-
ary between the banks to split the memory 50/50 or put as
little as one-eighth in one bank. The former approach en-
ables OTA updates to store a complete set of code without
overwriting the original code; the latter is good for systems
that just need a small amount of data memory.
As a further enhancement, EcoXIP implements an
automatic power down after a write. Most other flash chips
require the MCU to stay awake during the 1ms write so it
can power down the flash once the write completes. With
EcoXIP, the MCU can “fire and forget,” starting the write
and immediately going into a low-power mode while the
flash chip finishes the write and then puts itself to sleep.
The Adesto chip provides a variety of sleep modes that
trade off power savings against wake-up time.
A Zippier XIP Adesto offers two product lines. One is a unique nonvola-
tile memory, called conductive-bridge RAM, that is CMOS
compatible (see MPR 2/22/16, “Adesto Targets IoT Us-
ing CBRAM”). It also acquired a family of standard NOR-
flash chips from Atmel in 2012. CBRAM is a lower-power
alternative, but NOR flash remains less expensive for stor-
ing large amounts of boot code. EcoXIP builds on these
standard products, adding a custom interface that improves
performance.
160
140
120
100
80
60
40
20
0
EcoXIP
133MHz
Octal XIP
200MHz
Octal XIP
133MHz
Quad XIP
80MHz
Quad SPI
80MHz
120 140
160
180
200
Other vendors also offer fast flash memories using
custom interfaces. For example, Cypress (formerly Span-
sion) offers the proprietary HyperBus interface, which can
deliver 333MB/s using a 166MHz DDR octal interface that
supports arbitrarily long bursts. The Macronix OctaFlash
and Micron XTRMFlash have similar capabilities at speeds
of up to 200MHz using modified SPI protocols. But all of
these parts are designed for fast boot in systems that copyFigure 2. Adesto EcoXIP performance. All numbers are for XIP operation and assume 16-byte instruction-cache lines
and an average of 3.84 line fetches per instruction-cache
miss. (Source: Adesto)
the code into RAM for execution, so they are available only
in sizes of 128Mb (16MB) and larger. These systems em-
ploy higher-performance processors instead of microcon-
trollers and often run complex operating systems.
Adesto Executes in Place 3
Adesto began with a similar concept but optimized it
for XIP applications. Products such as XTRMFlash are de-
signed for long predetermined bursts, whereas EcoXIP
allows the CPU to inject a new target address into the burst
at any time. Furthermore, Adesto targets applications with
more than 1MB of code but less than 16MB, a range that
encompasses many MCU-based IoT clients that use a basic
real-time OS or no OS at all. The company’s dual-bank
design is a unique capability that can reduce cost in sys-
tems that would otherwise require two separate flash chips.
For MCU-based systems, EcoXIP is less expensive
than a large on-die flash memory, since an embedded-flash
process adds cost compared with a flash-optimized pro-
cess. Using XIP eliminates the need for a large and costly
on-die SRAM; in fact, EcoXIP can couple with an inexpen-
sive MCU that has minimal on-die memory. The dual-
bank chip is also less expensive than two separate flash
chips of half the capacity, in part because of package-cost
savings. EcoXIP’s unique capabilities should help Adesto
gain a foothold in the IoT market. ♦
To subscribe to Microprocessor Report, access www.linleygroup.com/mpr or phone us at 408-270-3772.
© The Linley Group • Microprocessor Report October 2016
Adesto Technologies Corporation 3600 Peterson Way | Santa Clara, California USA 95054 | Phone: 408-400-0578 FAX: 408-400-0721
www.adestotech.com
In the News
PRESS RELEASE
Adesto’s EcoXiP™ enables ultra- low-power, low-latency XiP system operation on STMicroelectronics’ new STM32L4+ MCUs TTThhhuuurrrsssdddaaayyy,,, FFFeeebbbrrruuuaaarrryyy 222222,,, 222000111888...
Combined solution lets designers create sensor-rich IoT devices with
long battery life
SANTA CLARA, CA – February 22, 2018 – Adesto Technologies
(NASDAQ: IOTS), a leading provider of application-specific, ultra-low
power non-volatile memory (NVM) products, announced its collaboration
with STMicroelectronics to enable ultra-low-power, low-latency eXecute-
in-Place (XiP) system operation on ST’s STM32L4+ microcontrollers
(MCUs) through Adesto’s EcoXiP™ system-accelerating NVM. The
combination of STM32L4+ MCUs and EcoXiP enables designers to
create IoT devices requiring numerous sensors and other advanced
capabilities while ensuring long battery life.
STM32L4+ series MCUs build on ST’s popular STM32L4 series MCUs by
increasing performance, adding more embedded memory, and
delivering richer graphics and connectivity features, while maintaining
ultra-low power consumption. The STM32L4+ is the first STM32-family
architecture to offer two Octal SPI ports, which support NOR Flash
including XiP operation.
For many applications, the embedded memory in an MCU is not
sufficient, and external XiP memory provides the natural solution. Built
on an innovative memory and protocol architecture, EcoXiP sets a new
standard for performance, cost and power for devices requiring a XiP
architecture. It delivers high system performance, optimized latency and
throughput, concurrent read/write capability, enhanced security, and the
best standby power for a wide range of connected products including IoT
edge devices, wearables, connected and wireless embedded systems,
medical monitors and POS controllers.
“By pairing ST’s new ultra-low power STM32L4+ MCUs with EcoXiP,
designers can architect high-performance, low-power XiP systems for
more energy-efficient, lower-cost products,” said Gideon Intrater, Adesto
CTO. “Not every application needs a XiP solution, but for those that can
benefit from it, there is no better solution in the market than EcoXiP.”
DDDeeemmmooonnnssstttrrraaatttiiiooonnn aaattt EEEmmmbbbeeeddddddeeeddd WWWooorrrlllddd
Adesto will demonstrate EcoXiP running with an STM32L4+ device at
Embedded World, being held February 27 – March 1 in Nuremberg,
Germany. Visit Adesto in Hall 4A, Booth 259. Contact
info@adestotech.com to arrange a personal demonstration.
AAAvvvaaaiii lllaaabbbiii llliii tttyyy
Samples of Adesto’s EcoXiP non-volatile system-accelerating memory
are available now. EcoXiP is available in optimized densities from 32Mb
to 128Mb. For more information, see:
https://www.adestotech.com/products/ecoxip.
AAAbbbooouuuttt AAAdddeeessstttooo TTTechnologies
Adesto Technologies (NASDAQ:IOTS) is a leading provider of application-
specific, ultra-low power non-volatile memory products. The company
has designed and built a portfolio of innovative products with intelligent
features to conserve energy and enhance performance including Fusion
Serial Flash, DataFlash® and products based on Conductive Bridging
RAM (CBRAM®) technology. CBRAM® is a breakthrough technology
platform that enables 100 times less energy consumption than today’s
memory technologies without sacrificing speed and performance. Adesto
is focused on delivering differentiated solutions and helping its
customers usher in the era of the Internet of Things. See:
www.adestotech.com.
In the News
PRESS RELEASE
Adesto’s EcoXiP Supports New JEDEC Standards to Pave the Way for a New Era of Smart Devices and Edge Processing Tuuuesday, November 13, 2018.
ELECTRONICA, MUNICH, GERMANY – November 13, 2018 – Adesto
Technologies (NASDAQ: IOTS), a leading provider of innovative
application-specific semiconductors for the IoT era, announced it has
shipped the first serial NOR flash devices supporting the new xSPI
standard (JESD251 and JESD251-1), Serial Flash Reset Signaling Protocol
(JESD252) and the latest version of the SFDP standard (JESD216C).
Adesto’s EcoXiP™ eXecute-in-Place (XiP) non-volatile memory (NVM) fully
supports these specifications, which were recently released by
microelectronics standards body JEDEC. These standards make it easier
for system designers to reap the benefits of EcoXiP in their designs and
deliver smarter, more efficient and user-friendly devices.
Today, many emerging Internet of Things (IoT) and high-end
microcontroller (MCU) designs need more program memory and data
processing storage than can be implemented economically on-chip
using embedded flash or SRAM. The new standards make it simpler for
the industry to adopt NVM devices that use the Octal Serial Peripheral
Interface (SPI), such as Adesto’s EcoXiP, which delivers the higher
performance and storage space needed. EcoXiP eliminates the need for
expensive on-chip embedded flash, and it hits the sweet spot for power,
system cost and performance, with significantly lower power consumption
compared to other Octal devices and dramatically higher performance
versus Quad SPI devices.
“NXP architected crossover processors with no on-board flash and
provided an Octal interface to optimize off-chip NVM performance. This
allows NXP to deliver a class of microcontrollers that boost processing
performance and increase power efficiency at a very competitive price
point,” said Joe Yu, GM of Low-Power MPUs at NXP Semiconductors.
“Ultimately this helps designers add more features to their products and
improve the consumer experience. A low-power external memory device,
such as Adesto’s EcoXIP, is a complementary device for our i.MX RT
series.”
The new xSPI (expanded SPI) standard establishes hardware guidelines
to enable designers to easily add high-throughput Octal and Quad
devices to their systems. The Serial Flash Reset Signaling Protocol
defines a way to reset flash devices without a need for a dedicated reset
pin. The SFDP (Serial Flash Discoverable Parameter) standard provides a
consistent method of describing the functional and feature capabilities of
serial flash devices in a common set of internal parameter tables. With it,
OEMs can speed firmware development and time-to-market. The latest
revision of the SFDP specification adds support for Octal SPI.
“Adesto delivers solutions that ignite innovation for next generation IoT
devices. It was important that we help drive the development of these
standards, and we are delighted to be the first company to ship serial
flash devices with full support,” said Gideon Intrater, Adesto’s CTO. “This
is the first time that there is a robust set of standards that defines ways
for serial NOR flash to communicate, making it possible for companies to
easily integrate the latest technology and increase the performance of
their designs.”
Demonstration at Electronica
At Electronica 2018, being held November 13 – 16, 2018, Adesto will
demonstrate its EcoXiP NVM integrated with the NXP i.MX RT1050
crossover processor via the Octal xSPI interface, in compliance with the
new standards. Visit Adesto’s booth: Hall C3, Booth 121 at the Messe
München exhibition center. For more information, contact
info@adestotech.com.
Adesto Technologies Corporation 3600 Peterson Way | Santa Clara, California USA 95054 | Phone: 408-400-0578 FAX: 408-400-0721
www.adestotech.com
CROSSOVER TO MEMORY EXPANSION
WITH ADESTO ECOXiP AND NXP’S
i.MX RT CROSSOVER PROCESSORS
Donnie Garcia, NXP Semiconductor: Solutions Architect
Eyal Barzilay, Adesto Technologies: System and Software
INTRODUCTION
With 8.4 billion connected “things” having shipped in 2017, the internet of tomorrow is clearly upon us. We
have entered a new age of human to machine interactions where technology is guiding many aspects of
our lives. For a variety of end devices such as wearables, home monitoring nodes and industrial controllers,
the capabilities of the embedded processor play a vital role in addressing the insatiable demand for a
higher order of functionality. This has led to industry focus on machine learning enabled by vision and audio
processing to bring the computation needed to make decisions at the edge node. These capabilities require
elevated levels of processing performance and memory space for MCUs. The push for processing has led
to a new breed of semiconductor device which does not fit into a traditional definition of a microcontroller.
The ‘Crossover Processor’ integrates attributes of a microprocessor such as higher CPU speeds, multimedia
interfaces and expandable memory into a microcontroller form factor built for cost effectiveness and fastest
development time. This new crossover processor class of device provides embedded developers the ability to
solve many problems in today’s fast-moving technology markets.
Collaboration between semiconductor manufacturers and memory vendors plays a vital role in ensuring that
the embedded systems that are brought to market achieve performance and usability goals. This is
accomplished by closing the gap between the typical embedded flash device and the crossover MCU with
external memory. Using external memory, crossover processors have the ability to support massive amounts
of software and data memory space. This is done with keeping the same look and feel of a traditional
embedded flash microcontroller. Together, the right serial flash memory coupled with a capable processor
address the challenges of performance, security, power consumption and development experience.
For the processor, considering eXecute-in-place (XiP) from the start of the semiconductor chip design brings
together a microarchitecture that is built for memory expansion. For serial flash, there are advancements in
the interface protocol, low energy read of memory, and read while write programming capabilities to address
these challenges. This paper will provide an overview of how performance and usability are addressed for
systems depending on external memory. The following sections will explore how the Adesto EcoXiP serial
flash and the i.MX RT1050 crossover processor pair together to provide the embedded platform needed to
conquer the challenges of future embedded designs.
i.MX RT: Advanced Processor Architecture 9
Understanding XIP Performance 10
Throttling Test Case 11
Instrumenting Test Case 12
Examining Example Applications 14
Development and Debug with XiP 14
Conclusions 15
Resources 15
TABLE OF
CONTENTS
Crossover to Memory Expansion with Adesto’s
EcoXiP and NXP’s i.MX Crossover Processors 1
Introduction 1
Overview of Serial NOR Flash and eXecute in
Place (XiP) 2
Microcontroller Memory Architectures 2
How XiP is Achieved 3
FlexSPI Memory Controller 4
Adesto EcoXiP: Advanced Serial Flash 5
Application Use cases 7
2
OVERVIEW OF SERIAL NOR FLASH AND EXECUTE IN PLACE
Serial NOR flash comes in the form of integrated circuits (ICs) with a range of memory size and physical
interface options. These memory devices typically operate at 1.8V or 3.3V, support 100 thousand write erase
cycles, and can easily be placed on printed circuit boards. The serial flash IC allows embedded systems to
easily introduce a non-volatile memory (NVM) with various packages ranging from the basic 8-pin to very
small chip scale. There are many use cases for applying serial NOR flash to a system. Persistent data logging
is one example of a common application use case which benefits from this technology. Another important
use is storing and executing software for the ever growing embedded applications.
The eXecute in Place, or XiP, is a capability that allows a processor to execute code directly from external
flash memory. Many embedded applications require connectivity stacks, audio processing, and vision. The
amount of executable code for these functions has grown to substantial sizes. When considering these
application requirements together for one embedded system, the capability of XiP with external flash is an
essential enabler as it allows nearly limitless data space for the embedded system. In the semiconductor
industry, thousands of capable microcontrollers are already integrating the type of memory controller
needed to support XiP cability from Serial NOR flash.
Microcontroller Memory Architectures
For embedded processing, there are several common memory architectures as shown in Figure 1. Starting
from the left, for most microcontrollers, internal non-volatile memory provides the execution space for
the software. Here the NVM is all provided internal to the chip. There are advantages due to the system
integration, but a limitation with regards to scalability. If the system needs more memory than what is
provided internal to the processor, then external memory must be added. Often, external memory (such as
EEPROM) is needed to store persistent data for other uses in the system as shown in the diagram.
The second architecture in the middle, is a copy-to-execute architecture. This means that the code is stored
in external flash but copied to internal RAM at startup and then executed. In this case external NVM is used
in conjunction with execute memory in RAM. This architecture, will be limited by the size of the internal
SRAM memory. If the size of code is larger than internal SRAM, software must bring in portions of code
as needed by the application. This copy to execute has penalties with regards to copy time and software
complexity. Large internal SRAM size could have a significant impact on cost. Alternatively, if external DRAM
is used, system cost can be reduced because of the low cost per bit for DRAM versus internal SRAM.
When using DRAM there are challenges with regards to power consumption. This is due to the volatile
nature of the DRAM memory and the need for self-refresh for low power states of DRAM. Even if the code
fits into SRAM, a low-power system would probably require shutting down the SRAM during sleep mode.
This means that a copy to SRAM would be necessary on each transition from sleep to active mode. In other
words, the system will be slow to wake up.
Internal NVM
Copy to Execute
External Execute
in Place (XIP)
CPU
I$/D$
SPI
Execute
Memory
FLASH
CPU
SPI
Execute
Memory
FLASH
MemCtrl
CPU
I$/D$
SPI
EEPROM EEPROM Execute
Memory
Ext. DRAM
Execute
Memory
FLASH
Figure 1: Memory architecture diagrams 3
Furthest to the right, the XiP architecture depends on the external memory for the execution of code. This
memory architecture has advantages with regards to scalability. Designers do not have to face issues with
over buying for a larger memory size to protect against software growth. The choice of external memory
can be made for what is needed for the embedded design. This ensures that every penny spent on the
processor components in the system goes towards relevant features for the end product. This architecture
reduces both risk and design cycle times as the XiP system architecture can be scaled with only a change
to the serial NOR flash in the bill of materials for the circuit boards. In addition, XiP brings an advantage in
terms of power and fast wakeup from sleep mode.
Still, there are challenges when using this architecture. In the coming sections, we will discuss how these
challenges are being mitigated by intelligent designs incorporated for both the processor and the serial
flash.
How XIP is achieved
Central to the support of XiP is the integration with a smart SPI (Serial Peripheral Interface) host controller on
the processor. Akin to a standard SPI, these host controller peripherals support a synchronous serial protocol
that depends on data and clock signals. For example, Figure 2 shows the most basic SPI read where an
opcode and address are sent to a slave device via Serial In (SI), and data is returned to the master device via
Serial Out (SO).
Figure 2: Example SPI data transfer
In addition to operating as traditional SPI, in order to better support the XiP use case, these enhanced
peripherals also operate as system memory controllers. They can take internal bus transfers generated in
the chip and translate them into the right serial commands needed to interact with the external memory. In
this way, data transfers from the external memory are accelerated by hardware. The instructions and data
residing in external serial NOR flash are directly fed into the CPU pipeline or other chip peripherals based on
memory transfers occurring inside the microarchitecture of the chip.
FlexSPI Memory Controller
One such memory controller is the FlexSPI. FlexSPI is NXP’s latest generation of the serial flash memory
controllers. The block diagram in Figure 3 represents the FlexSPI which is integrated on the i.MX RT
crossover processors. The 64bit AHB bus is the interface to the system bus which will come from a CPU or
other on-chip masters such as an LCD controller. The IPS BUS is a separate interface which allows software
to directly send commands to the NOR flash device by way of the FlexSPI register model. This interface is
also used for the initialization and configuration of the external serial flash as it can be used to initiate the
process of sending commands.
The capabilities of the i.MX FlexSPI memory controller enhance XiP. In the diagram, just to the right of the
AHB_CTL block, both transmit (TX) and receive (RX) buffering are shown. This buffering is used for
prefetching data when reading the external memory to improve latency and overall compute performance
for the XiP operation. 4
Data
ARB_CTL
(Arbitrator)
Data
ARB_CMD
Data
Data
Data Data
AHB_CTL
Data
Data
AHB_TX_BUF
IP_CMD
ta IP_RX_FIFO
IP_RX_FIFO
IP_TXF_CTL
IP_CTL
Da
Data
AHB BUS
64-bit
RX_FIFO
ASYNC
Data
SPI Bus
CDC_SYNCH
SEQ_CMD
Cross)
SEQ_CTL
Data
Data
IO_CTL
SPI Bus
FB Port
IPS BUS
32-bit TX_FIFO
ASYNC
Data
Figure 3: FlexSPI Block Diagram
Shown in the diagram on the right side is the sequence control block. The sequence control block is a
large look-up table which holds preset instructions for different serial flash operations such as read, erase
and program. This block is what links accesses from the 64-bit AHB bus to the read command sequence
which is sent to the external serial flash. Not every flash will have the same command set or I/O interface.
The sequence control engine is programmable for adjusting the SPI transfers based on the command set
defined by the serial flash. This allows processors like the i.MX RT to interface to a broad range of external
flash types and capabilities. This flexibility allows the crossover processor to utilize flash attributes that play
an important role in supporting the most capable XiP embedded systems.
ADESTO ECOXIP: ADVANCED SERIAL FLASH
Serial flash is not only for storing code and data but also for executing code directly from flash (Execute-
in-Place or XiP). Advancements in serial flash technology have made it possible for newer serial flash to be
used in systems with high performance requirements. These advancements allow serial flash devices such
as Adesto EcoXiP to respond quickly to read requests from the host MCU and deliver instructions and data
with low latency and high throughput.
One advancement is the multi-line SPI interface. Traditionally, communication with a serial device was (as
the name suggests) serial. Data would be transferred over a single line at a time. For more capable devices,
communication is parallel, and data is transferred over up to eight data lines as shown in the Octal-SPI
transfer diagram in Figure 4. Adesto’s EcoXiP devices are equipped with JEDEC’s latest Octal SPI protocol
(xSPI), making the communication close to 8x faster than a single wire serial flash.
Figure 4: Example Octal-SPI Data Transfer 5
6
Supplementing the Octal interface, serial flash can feature double data rate (DDR). This capability is more
common in high-speed DRAMs. With DDR, data bits are sampled on both the rising and falling edges of
the serial clock. Since it takes only half a clock cycle to send out a data bit, this feature has the potential to
double the throughput from the external memory. In addition, modern serial flash devices deliver high clock
speeds north of 100MHz. This is achievable due to a data strobe signal driven by the flash during the data
phase of a read.
To address latency, Adesto EcoXiP supports features to reduce the overhead of the command interface.
Latency is the time from when there is a request for data until the time that the data is available to the
requestor. EcoXiP supports special read commands such as Read Array to allow faster access to data by
reducing the number of clocks needed for subsequent reads of data. As shown in Figure 5, the Read Array
command with Octal SPI and DDR reduces the number of clock cycles needed for passing the command
and address data. An 8-bit command and 24-bit address are passed with only 3 clocks. Then subsequent
accesses to sequential data are available. All of these serial flash features (read array command, DDR, fast
clock speeds and Octal SPI) work to support the XiP use case.
Figure 5: Read Array Command
Application use cases
Beyond addressing the performance of eXecute in Place operation, there are other unique features in
EcoXiP to support application use cases. EcoXiP’s concurrent read-write, also known as read-while-write
or RWW, allows the host processor to continue reading from a partition of the flash memory array while
modifying data on another part. As an example, periodic logging of data which involves erase and program
operations to the serial flash does not put the XiP program on hold. With the RWW feature, instruction and
data fetching during programming continues as usual in a different partition of the flash. This scheme allows
read operations from one bank while the device is busy programming or erasing another bank. The serial
flash device can be configured into two banks: Bank A and Bank B. The border between the banks can be
set with a granularity of 1/8th of the full flash array size. Read commands to one bank can be done while a
write is in progress in the other bank.
The XiP architecture also provides advantages to systems which leverage power-down modes to save energy.
Unlike execute-from-RAM scenarios, wake up from very low-power modes is much faster. There is no need
to copy from a non-volatile memory device into the SRAM execution memory. The system can be set to start
executing immediately from external flash. The flash standby power consumption is significantly lower than
DRAM systems due to NOR flash memory technology.
In general, the serial flash leakage of Adesto’s memory devices is so low that there is no need to turn the
flash completely off. Devices like EcoXiP offer deep power-down and ultra-deep power-down modes which
result in an extremely low power consumption with only a small impact to wake up time. As shown in Table
1, there are power modes as low as 200 nanoAmps. The end energy consumption (current over time) is
significantly lower than what would be required to copy the code into RAM for DRAM based architectures
which may require self-refresh.
7
Parameter EcoXiP Specifications
Densities 32 Mbit (4 MByte), 64 Mbit (8 MByte),
and 128Mb (16 MByte)
Interface Quad/Octal, SDR/DDR
Read Bandwidth (max) 133 MBs
Power Supply 1.7V – 1.95V
Max. Operating Frequency 133 MHz
Temperature Range (Ta) -40 °C - 85 °C
Temperature Range (Tj) -40 °C - 105 °C
Supply Current (Ultra Deep Power Down) 200 nA
Supply Current (Deep Power Down) 4 µA
Supply Current (Standby) 35 µA
1.8V Supply Current – Octal DDR 35 mA
1.8V Supply Current (Program/Erase) 15 mA
Table 1: Adesto EcoXiP Specifications
When not in power-down mode, EcoXiP offers competitive power consumption for active mode while
reading from memory and sending data to the host processor. The savings can be as much as half compared
to similar Octal SPI devices in the market. For 133MHz Octal SPI reads, the Adesto EcoXiP read current is
typically 35mA.
Flash devices offer security features as well. For example, EcoXiP contains a specialized OTP (One-Time
Programmable) security register that can be used for purposes such a unique device serialization, system-
level Electronic Serial Number (ESN) storage, locked key storage, etc. This register can be programmed
but not erased, so only a one-direction transition is possible for each bit. In addition, this register can be
permanently locked.
Flash devices are supported by the embedded development ecosystem in different ways. EcoXiP provides
flash-loader plug-ins for various embedded tool chains. The flash loader is engaged by the integrated
development environment once it detects that a program’s binary image, or part of it, falls into the flash
memory address range. It will initialize the flash and erase and program memory regions on-demand
as requested by the host tool. In this context, it’s worth mentioning a new feature called Serial Flash
Discoverable Parameter (SFDP) which provides useful information about the flash in a standardized way. This
allows the host to automatically figure out flash attributes and set it up the interface accordingly. In theory,
one could develop a universal flash loader which would work on all serial flash devices. An update of SFDP
to support the new Octal-SPI (xSPI) standard has been recently ratified by a JEDEC committee JC42.
I.MX RT: ADVANCED PROCESSOR ARCHITECTURE
Contributing to the support of the external serial flash in embedded systems are the advanced processor
architectures which are now available. For example, the i.MX RT crossover processor is built with the highest-
performance Arm® Cortex-M® processor, the Arm Cortex-M7. This CPU can execute up to two instructions
every clock cycle and supports 6-stage pipelining, improving computational ability versus other CPUs in
its class. The high-performance CPU ensures that even though slower memory accesses may stall the CPU,
the high compute power is delivered when data is made available. In addition to the CPU, the internal bus
system associated with this class of processor is the same as what has previously been used for higher-end
controllers built with Arm Cortex-A family of devices.
8
AXI Masters
LCD
USB
PXP
(2D processing)
DCP
(Crypto)
Camera
AHB2AXI
2xSD/eMMC
AHB2AXI
AHB Masters
eDMA
ENET
The diagram in Figure 6 represents the architectural details of the i.MX RT 1050 crossover processor. With
regards to cache, the i.MX RT integrates 32KB for the instruction and 32KB for the data caches. This is the
largest size in the market and reduces the CPUs sensitivity to any delays imposed by slower memories. For
the Tightly Coupled Memory (TCM), the i.MX RT has a FlexRAM block of memory. This intelligent RAM
memory controller allows customization of the TCM up to the largest sizes available on the chip. The user
can select the maximum size, or repurpose the FlexRAM to work as on-chip SRAM to be shared with other
chip peripherals. Having a large TCM allows software architects to choose this memory option for the
portions of their code which need the absolute maximum performance. Software placed in the TCM will
achieve the lowest latency access times, producing the highest performance.
Other Masters FlexRAM
128 KB
SRAM
(32-bit)
128 KB
SRAM
(32-bit)
ITCM
DTCM
600 MHz
Arm® Cortex®-M7
Processor
Other Masters
256 KB
SRAM
(64-bit)
DTCM
32 KB
I-Cache
32 KB D-Cache
AXI Interconnect AHB Interconnect
FlexRAM
On Chip
AXI Slaves
SEMC
AXI2AHB
FlexSPI
AHB Slaves
RAM 8/16-bit SDRAM/ PSRAM/NOR/
NAND/8080
Serial Flash/ RAM/NAND
4xAIPS Peripherals
Arm Cortex-M7 Slave Port
Figure 6: i.MX RT Architecture Diagram
With regards to the use of the 64-bit AXI on the i.MX RT, there are a broad range of AXI masters which
are integrated onto the chip. The AXI bus is a split-transaction protocol and supports multiple outstanding
transfers. Some specific peripherals to highlight which are relevant to emerging application trends are the
camera interface and cryptographic accelerator (Data Co-Processor-DCP). These components differentiate
the i.MX RT in the market and align with the need for image processing capabilities and security. The FlexSPI
controller allows for these other masters to make use of the receive buffer. This allows the data stored in the
external flash to be quickly accessed as with the case of displaying graphics on a screen.
Finally, most relevant to the computational capabilities of the i.MX RT with external flash is the processor
speed. Reaching 600MHz allows the i.MX RT to be throttled up for the most intensive calculations. Once
data is available to the processor, it is processed at the CPU speed. With all of these capabilities working
together, the end result is a processor using XiP that can achieve high performance and is expandable to
a nearly limitless memory footprint. Figure 7 details how the CPU and the FlexSPI work together to reduce
stalling the flow of application code. Starting at stage 0, the figure represents the case of a full miss of the
target data and subsequent prefetching done by the FlexSPI. The stages show how the levels of cache and
buffers have to be missed to stall the CPU.
Serial Flash
5: High performance serial flash
reduces access latency with high
speeds, double transfer rates and
up to 8 data lines
Trace Port (4-bits)
0: CPU fetches a target address for
instruction or data
Cortex M7
MPU ITM
CTI
ETM
TPIU
ROM Table
MCM
TSGEN
6: The prefetch from the FlexSPI
accelerates all subsequent reads,
even a full miss, with no cached
data will be accelerated by
prefetching
1: CPU Cache is checked for target
FPU
DPU ROM Table
AHB AP
DAP
SWJ DP
FlexSPI
1 KB Pre-Fetch
3: Bus access seeks data from
FlexSPI buffer
I$ D$ Bufferaddress
2: Cache miss leads to bus access
at target address
32 KB 32 KB
ITCM DTCM0 DTCM1 OCRAM
FlexRAM (512 KB)
Clock
Control
DFT
Control
4: Prefetch buffer miss leads to
FlexSPI read sequence for pre-
initialized read command to
external Flash
Figure 7: XIP Memory Access Stages
UNDERSTANDING XIP PERFORMANCE
As detailed in the previous sections, the technology associated with the processor and the external NOR
Flash memory is built to obscure the latencies involved with using XiP. This presents a challenge with
regards to fully understanding the performance impact for this architecture. For example, with the FlexSPI
receive buffer, each read access made to the serial flash can range from one cache line (32 bytes for the
Arm Cortex-M7) up to the full size of the receive buffer. The receive buffer is 1KB for the i.MX RT1050
processor. The maximum size of the read transaction to completely fill the receive buffer is preset as part of
the configuration of the FlexSPI.
Due to the receive buffer, smaller code loops, such as an iterative mathematical calculation, or a case
statement, after a few cache lines are pulled from external memory, the processor no longer depends
on additional data. At this point, the processor will be executing from buffered data. The receive data
continues to be drawn from the serial flash to fill the buffer size that has been preset. Because of this,
traditional methods of monitoring memory accesses as an indication of performance do not apply. High
performance is achieved even with high access rates to the external memory. Performance cannot be
directly correlated to the amount of external memory accesses made by the system.
In addition, many standard industry benchmarks are relatively small programs. These programs often fit in
the caches integrated on the processor. As such, they don’t represent full scale applications which push
memory size boundaries. Thus, in order to understand the expected performance levels for XiP, various
methods have to be applied. These are divided into the following three cases: throttling, instrumenting and
evaluating example application code.
Throttling Test Case
The throttling test case simulates a scenario where a change in program execution would result in processor
accesses which are all outside the CPU cached data. For throttling test cases, the industry standard
benchmark EEMBC CoreMark® is used. This benchmark is first placed in zero latency TCM to produce the
ideal case CoreMark score. This is the control measurement. Then, the benchmark is run in external serial
flash while periodically invalidating the instruction cache at set intervals. This method has the advantage of
relating to a standard benchmark (CoreMark). The generated results can be compared to many number of
publicly posted results that are hosted by EEMBC.
The drawbacks to this method are that for typical application code, such drastic changes to program flow
would rarely lead to a scenario where all of the CPU instruction cache would be invalidated. Estimating the
rate at which the cache should be invalidated is challenging. Regardless of these limitations, this test case
provides insight into how the technology enables high performance with XIP. The results show that with
feature rich serial NOR flash devices such as the Adesto EcoXiP set for Octal SPI and double data rate,
performance is only slightly affected by the CPU cache invalidation events. 9
10
Figure 8 shows measurements taken with various cache invalidation rates (1ms, 500us, 250us and 125us).
There are two different serial flash conditions: the orange line represents a a single data rate, 4 I/O serial
flash, and the blue line represents the Adesto EcoXiP set for Octal SPI and DDR. The chart shows the
performance advantage of high performance serial flash like Adesto EcoXiP versus slower, lower pin count
flash. Considering the 1ms invalidation rate, there is just over a 3% impact to the CoreMark benchmark.
The 1ms condition is a relevant test case as the typical RTOS tick rate is set to 1ms. Even lower-performing
serial flash devices represented by the orange line have a minimal impact at this rate, delivering 88% of the
CoreMark score versus the ideal case. When considering more extreme cases where CPU cache invalidation
occurs 8 thousand times per second for example, the higher-performance technology delivers nearly 83% of
the performance compared to the ideal case.
EEMBC CoreMark Throttling
IDEAL CASE 2599
2950 2950
COREMARK - 1 ms I$ INVALIDATE
COREMARK - 500 us I$ INVALIDATE
COREMARK - 250 us I$ INVALIDATE
1490
1868
2241
2857
2770
2616
COREMARK - 125 us I$ INVALIDATE 2445
0 500 1000 1500 2000 2500 3000 3500
Quad-SDR 102 MHz Octal-DDR 131 MHz
Figure 8: Throttling CPU Cache Results
For the case of invalidating the CPU cache every 125 microseconds, the end result still achieves a 2,445
CoreMark score. This is significantly higher than many other processors in the market.
Instrumenting Test Case
In order to evaluate performance without using a drastic cache invalidation, code can be instrumented in
a way to allow cache misses to occur more naturally. For the instrumenting test case, a large block of
code is placed in sequential address space which is larger than the size of the CPU cache. So when there
is a cache miss, it is due to a more natural software execution scenario. This method involves creating a
number of smaller loops which can be set to execute a variable number of times (n). These smaller loops
are concatenated together to create a sequential code block that is larger than the CPU cache. When the
smaller loops are executed more frequently, by setting larger values of n, there are more cache hits. When
the smaller loops are executed less frequently, then there are more cache misses.
Figure 9 is a graphical representation of this method. For the purpose of creating a measurement to
evaluate, Fibonacci calculations were used. As shown in the diagram, the processing of each block always
requires one pass of the Fibonacci calculation loop leading to cache misses for that pass. When the CPU first
reaches a Fibonacci block, the first iteration will be cache misses, but all subsequent passes will be executed
from cached data. For the case of n = 10, the first Fibonacci calculation is a miss and the subsequent 9
Fibonacci calculations are cache hits. For the case of n = 30, the first Fibonacci calculation is a miss and the
subsequent 29 Fibonacci calculations are cache hits.
11
I-C
ach
e
n Fibonacci 1
n Fibonacci 2
n Fibonacci 1023
n Fibonacci 1024
Figure 9: Instrumented Code
Measurements were taken for 10, 20 and 30 iterations of the Fibonacci calculations. Measurements of the
total number of Fibonacci calculations are taken with different memory space location and different types of
serial flash. Higher performance is represented by a higher number of Fibonacci calculations. As shown in
Figure 10, at 30 iterations, the impact to the number of Fibonacci calculations is just over 15% reduction.
Fibonacci Comparison
FIB - 30 ITERATIONS PER LOOP
19448 22529
26610
FIB - 20 ITERATIONS PER LOOP
FIB - 10 ITERATIONS PER LOOP
6629
13142
16807
20768
26357
25628
0 5000 10000 15000 20000 25000 30000
Quad-SDR 102 MHz Octal-DDR 131 MHz RAM
Figure 10: Results of Instrumented Code
As the cache miss rate is increased, the data shows that having high-performance serial flash leads to less
impact than using standard serial flash. This is comparing the orange bar to the gray bar results. Though
this method allows precise control over the cache miss rate, it does not fully represent standard application
code. The cache miss rate on standard application code can vary broadly depending on use case.
Examining example applications
To overcome the limitations of instrumented code and throttling cache, running an example application in
different target memory scenarios offers additional proof points to the performance when using XiP. This test
scenario is easily accomplished because XiP is enabled through the MCUXpresso Integrated Development
Environment. (IDE). The MCUXpresso IDE projects can be created to place software into the TCM zero-
latency memory. After performing measurements, the same software can be applied to the external serial
NOR flash space and measured again. There are many example projects to choose from in the software
12
development kits (SDKs) offered by NXP. The entire process with the measured results is detailed in a step-
by-step lab guide (see link provided in the resources section). This guide allows developers the opportunity
to explore these methods themselves. The examinations can be done with the provided SDK application
examples or with the final application software created by the developer.
For the case demonstrated by the lab guide, Arm Mbed TLS benchmarking of Elliptical Curve Digital
Signature Algorithm (ECDSA) was performed. The results show that with CPU cache enabled, for this specific
benchmark the measured difference between ITCM and using external flash does not change. Whether
executing from the best case memory, the TCM, or executing from external serial flash with XiP, the ECDSA
benchmark application shows the same results.
For a different case, when using MCUXpresso compiler optimizations set for performance, the measured
difference for ECDSA computations is shown to be less than 6% lower for the XIP case. Changing
compiler settings changes the generated machine code so that it is much more compact. The end result is
approximately a 4x improvement for the ECDSA calculations. As the code becomes more optimized, the
throughput provided by the external serial flash begins to affect the measured performance, leading to the
slight impact when using XiP.
DEVELOPMENT AND DEBUG WITH XIP
As demonstrated by the lab guide, other experiments for XiP can be performed with the enablement
provided by the MCUXpresso. For example, the speed of the external memory can be varied by changing
definitions inside the project. The MCUXpresso platform provides the tools needed to quickly examine this
and other scenarios, allowing the developer to fully leverage the benefits of the expandable XiP architecture.
For downloading and debugging application software, the MCUXpresso IDE is preset to allow a seamless
connection to the serial flash components placed on the i.MX RT Evaluation Kit. (EVK). When a debug
session is initiated by the user, the flash loader scripts are automatically used by the debug tool. In addition
to the development tools, the off-the-shelf configuration of the i.MX RT EVK hardware has both a high-
performance 8-wire SPI as well as a 4-wire SPI. With both of these serial flash options placed on the board,
the user can choose the right attributes for their end design.
Figure 11: Selecting Adesto Serial Flash
13
When importing SDK projects into MCUXpresso, the choice of the serial flash hardware is made based
on the memory settings in the memory configuration editor. The lab guide provides the detailed steps to
choose the Adesto flash during the import as highlighted in Figure 11. With a special edition of the i.MX RT
EVK that has the Adesto EcoXiP placed on the board, nearly all of the SDK examples can be run and debug
with the Adesto external flash. The operation of the enablement tools with the crossover processor is just as
it would be for a traditional microcontroller which contains embedded flash.
CONCLUSIONS
External memory for an embedded processor offers a scalable platform aligning to the challenges of
today’s embedded systems. When using external serial flash memory, success can be achieved with the
right processor and memory technology. Modern Arm CPUs integrate cache that greatly enhances the use
of external memory. In addition, processor designs are architected to use execute in place with memory
controllers, such as the FlexSPI memory controller which provides buffering and prefetch. Coupling this
with the enhanced capabilities offered by serial NOR flash addresses cost, power, performance and security
challenges. Furthermore, the infrastructure provided by tools such as MCUXpresso allows developers the
ability to get from concept to deployment quickly and efficiently.
RESOURCES
The following table includes links to resources which support developer investigation into using XIP.
Resource Description
Processor summary page The i.MX RT1050 family summary page provides links to chip documents (Data Sheet and Reference Manual)
Hardware evaluation kit The i.MX RT EVK provides a platform for embedded development. Multiple boot interfaces are supported
Software SDK The MCUXpresso SDK is the software enablement which provides drivers and middleware for the i.MX RT
Arm Cortex-M7 Whitepaper Detailed description of the Arm Cortex-M7 CPU
MCUXPresso IDE training Training material to understand the MCUXPresso Integrated Development Environment features
Using XIP Lab Guide This is the lab guide mentioned in this paper which provides the detailed steps for experimenting with XIP
CONTRIBUTOR
Wim Rouwet
Systems and Architecture Engineer
www.nxp.com
NXP and the NXP logo are trademarks of NXP B.V. All other product or ser vice names are the proper ty of their respective owners. The
Power Architecture and Power.org word marks and the Power and Power.org logos and related marks are trademarks and ser vice marks
licensed by Power.org. Arm is a registered trademark of Arm Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reser ved.
© 2018 NXP B.V.
Document Number: NXPADESTOWP REV 0
Release Date: September 2018
Adesto Technologies Corporation 3600 Peterson Way | Santa Clara, California USA 95054 | Phone: 408-400-0578 FAX: 408-400-0721
www.adestotech.com
top related