Wednesday, June 11, 2025

Benchmarking TensorFlow and TensorFlow Lite on Raspberry Pi 5



All the way in which again in 2019 I spent a variety of time taking a look at machine studying on the sting. Over the course of about six months I revealed greater than a dozen articles on benchmarking the then new technology of machine studying accelerator {hardware} that was solely simply beginning to seem in the marketplace, and gave a sequence of talks round the findings.

Quite a bit has modified within the intervening years, however after a getting a current nudge I returned to my benchmark code and — after fixing among the inevitable bit rot — I ran it on the new Raspberry Pi 5.

Headline outcomes from benchmarking

Operating the benchmarks on the new Raspberry Pi 5 we see important enhancements in inferencing velocity, with full TensorFlow fashions working virtually ×5 quicker than on they did on Raspberry Pi 4. We see an identical improve in inferencing velocity when utilizing TensorFlow Lite, with fashions once more working virtually ×5 quicker than on the Raspberry Pi 4.

Nonetheless maybe the extra spectacular result’s that, whereas inferencing on Coral accelerator {hardware} continues to be quicker than utilizing full TensorFlow fashions on the Raspberry Pi 5, the brand new Raspberry Pi 5 has related efficiency when utilizing TensorFlow Lite to the Coral TPU, displaying primarily the identical inferencing speeds.

ℹ️ Data As per our earlier outcomes with the Raspberry Pi 4 we used energetic cooling with the Raspberry Pi 5 to CPU temperature steady and stop thermal throttling of the CPU throughout inferencing.

The conclusion is that customized accelerator {hardware} might not be wanted for some inferencing duties on the edge, as inferencing straight on the Raspberry Pi 5 CPU — with no GPU acceleration — is now on a par with the efficiency Coral TPU.

ℹ️ Data The Coral {hardware} makes use of quantization the identical means TensorFlow Lite does to cut back the dimensions of fashions. Nonetheless to make use of a TensorFlow Lite mannequin with Edge TPU {hardware} there are just a few additional steps concerned. First it’s essential convert your TensorFlow mannequin to the optimized FlatBuffer format to signify graphs utilized by TensorFlow Lite. However moreover you additionally must compile your TensorFlow Lite mannequin for compatibility with the Edge TPU utilizing Google’s compiler.

Conclusion

Inferencing speeds with TensorFlow and TensorFlow Lite on the Raspberry Pi 5 are considerably improved over Raspberry Pi 4. Moreover, the Raspberry Pi 5 now provides related efficiency to the Coral TPU.

Half I — Benchmarking

A extra in-depth evaluation of the outcomes

In our unique benchmarks present we noticed that the 2 devoted boards, the Coral Dev Board from Google and the JetsonNano Developer Equipment from NVIDIA, had been the perfect performing out of our surveyed platforms. Of those two boards the Coral Dev Board ran considerably quicker, with inferencing occasions round ×4 shorter than the Jetson Nano for a similar machine studying mannequin.

Nonetheless, on the time the benchmarking outcomes made me ponder whether we had gone forward and began to optimize issues in {hardware} just a bit too quickly.

The considerably quicker inferencing occasions we noticed then from fashions which making use of quantization, and the dominance of the Coral platform which additionally relied quantization to extend its efficiency, urged that we should always nonetheless be exploring software program methods earlier than persevering with to optimize accelerator {hardware} any additional.

These outcomes from benchmarking on the Raspberry Pi 5 appear to bear my unique doubts out. It has taken 4 years for basic CPUs to meet up with what was then the perfect at school accelerator silicon. Whereas newer NPU {hardware} is now obtainable — and sure, I will be taking a look at that once I can — the Raspberry Pi 5 is now performant sufficient to maintain up with inferencing in real-time video and performs on a par with the Coral TPU.

Whereas a new technology of accelerator {hardware} is now beginning to develop into obtainable, which can be extra performant, the Coral TPU continues to be seen as “finest at school” and is presently in widespread use regardless of an absence of help from Google for his or her accelerator platform. These outcomes suggest that for a lot of use circumstances Coral {hardware} may be changed for a major value saving by a Raspberry Pi 5 with none efficiency degradation.

Abstract

As a result of lack of help from Google for the pycoral library — updates appears to have stopped in 2021 and the library not works with trendy Python distributions — together with the issue in getting Coral TPU {hardware} to work with trendy working techniques the numerous discount in inferencing occasions we see on the brand new Raspberry Pi 5 could be very welcome.

Half II — Methodology

In regards to the benchmarking code

Benchmarking was accomplished utilizing TensorFlow, or for the {hardware} accelerated platforms that don’t help TensorFlow their native framework, utilizing the identical fashions used on the opposite platforms transformed to the suitable native framework.

For the Coral EdgeTPU-based {hardware} we used TensorFlow Lite, and for Intel’s Movidius-based {hardware} we used their OpenVINO toolkit. Benchmarks had been carried out twice on the NVIDIA Jetson Nano, first utilizing vanilla TensorFlow fashions, and a second time utilizing these fashions after optimization utilizing NVIDIA’s TensorFlow with TensorRT library.

Inferencing was carried out with the MobileNet v2 SSD and MobileNet v1 0.75 depth SSD fashions, each fashions educated on the Frequent Objects in Context (COCO) dataset. The 3888×2916 pixel take a look at picture was used containing two recognizable objects within the body, a banana🍌 and an apple🍎. The picture was resized right down to 300×300 pixels earlier than presenting it to the mannequin, and every mannequin was run 10,000 occasions earlier than a median inferencing time was taken.

ℹ️ Data The primary inferencing run, which may take as much as ten occasions longer resulting from loading overheads, is discarded from the calculation of the common inferencing time.

Whereas within the intervening years different benchmark frameworks have emerged which are arguably extra rigorous, the benchmarks offered listed here are supposed to replicate actual world efficiency. Numerous the opposite newer benchmarks measure the time to finish solely the inferencing stage. Whereas that’s a a lot cleaner (and shorter) operation than the timings measured right here — which embody arrange time — most individuals aren’t actually concerned with simply the time it takes between passing a tensor to the mannequin and getting a consequence. As a substitute they need end-to-end timings.

One of many issues that these benchmarks do not do is optimization. They take a picture, go it to a mannequin, and measure the consequence. The code is easy, and what it measures is similar to the efficiency a median developer doing the identical activity may get, quite than an skilled machine studying researcher than understands the complexities and limitations of the fashions, and how one can adapt them to particular person platforms and conditions.

Organising your Raspberry Pi

Go forward and obtain the newest launch of Raspberry Pi OS and arrange your Raspberry Pi. Except you’re utilizing wired networking, or have a show and keyboard hooked up to the Raspberry Pi, at a minimal you’ll must put the Raspberry Pi on to your wi-fi community, and allow SSH.

When you’ve arrange your Raspberry Pi go forward and energy it on, after which open up a Terminal window in your laptop computer and SSH into the Raspberry Pi.

ssh **@*********pi.native

When you’ve logged in you’ll be able to set up TensorFlow and TensorFlow Lite.

⚠️Warning Beginning in Raspberry Pi OS Bookworm, packages put in by way of pip should be put in right into a Python digital atmosphere. A digital atmosphere is a container the place you’ll be able to safely set up third-party modules in order that they received’t intrude along with your system Python.

Putting in TensorFlow on Raspberry Pi 5

Putting in TensorFlow on the Raspberry Pi is much more difficult than it was, as there is no such thing as a longer an official package deal obtainable. Nonetheless fortuitously there’s nonetheless an unofficial distribution, which a minimum of means we do not have to resort to constructing and putting in from supply.

sudo apt set up -y libhdf5-dev unzip pkg-config python3-pip cmake make git python-is-python3 wget patchelf
python -m venv --system-site-packages ~/.python-tf
supply ~/.python-tf/bin/activate
pip set up numpy==1.26.2
pip set up keras_applications==1.0.8 --no-deps
pip set up keras_preprocessing==1.1.2 --no-deps
pip set up h5py==3.10.0
pip set up pybind11==2.9.2
pip set up packaging
pip set up protobuf==3.20.3
pip set up six wheel mock gdown
pip set up opencv-python
TFVER=2.15.0.post1
PYVER=311
ARCH=`python -c 'import platform; print(platform.machine())'`
pip set up --no-cache-dir https://github.com/PINTO0309/Tensorflow-bin/releases/obtain/v${TFVER}/tensorflow-${TFVER}-cp${PYVER}-none-linux_${ARCH}.whl

Putting in TensorFlow Lite on Raspberry Pi 5

There may be nonetheless an official TensorFlow Lite runtime package deal obtainable for Raspberry Pi, so set up is far more easy than for full TensorFlow the place that possibility is not obtainable.

python -m venv --system-site-packages ~/.python-tflite
supply ~/.python-tflite/bin/activate
pip set up opencv-python
pip set up tflite-runtime

Operating the benchmarks

The benchmark_tf.py script is used to run TensorFlow benchmarks on Linux (together with Raspberry Pi) and macOS. This script can even used — with a TensorFlow set up which incorporates GPU help — on NVIDIA Jetson {hardware}.

supply ~/.python-tf/bin/activate
./benchmark_tf.py --model PATH_TO_MODEL_FILE --label PATH_TO_LABEL_FILE --input INPUT_IMAGE --output LABELLED_OUTPUT_IMAGE --runs 10000

For instance on a Raspberry Pi, benchmarking with the MobileNet v2 mannequin for 10,000 inference runs the invocation can be:

./benchmark_tf.py --model ssd_mobilenet_v2/tf_for_linux_and_macos/frozen_inference_graph.pb --label ssd_mobilenet_v2/tf_for_linux_and_macos/coco_labels.txt --input fruit.jpg --output output.jpg --runs 10000

This can output an output.jpg picture with the 2 objects (the banana and the apple) labelled.

The benchmark_tf_lite.py script is used to run TensorFlow Lite benchmarks on Linux (together with Raspberry Pi) and macOS.

supply ~/.python-tf-lite/bin/activate
./benchmark_tf_lite.py --model PATH_TO_MODEL_FILE --label PATH_TO_LABEL_FILE --input INPUT_IMAGE --output LABELLED_OUTPUT_IMAGE --runs 10000

⚠️Warning Fashions handed to TensorFlow Lite should be quantized. To take action the mannequin should be transformed to TensorFlow Lite format.

Getting the benchmark code

The benchmark code is now obtainable on GitHub. The repository consists of all of the assets wanted to breed the benchmarking outcomes, together with fashions, code for all of the examined platforms, and the take a look at imagery used. There may be additionally an ongoing dialogue about how one can enhance the benchmark to make it extra simply run on new {hardware}.


👇Observe extra 👇
👉 bdphone.com
👉 ultraactivation.com
👉 trainingreferral.com
👉 shaplafood.com
👉 bangladeshi.assist
👉 www.forexdhaka.com
👉 uncommunication.com
👉 ultra-sim.com
👉 forexdhaka.com
👉 ultrafxfund.com
👉 ultractivation.com
👉 bdphoneonline.com

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles