

# VDAT-2020

## ReARM: A Reconfigurable Approximate Rounding-Based Multiplier for Image Processing

**Rajat Bhattacharjya**<sup>1</sup>, Alish Kanani<sup>2</sup>, and Neeraj Goel<sup>3</sup>

rajat.iiitg@gmail.com, kanani.1@iitj.ac.in, neeraj@iitrpr.ac.in

<sup>1</sup>Dept. of Electronics and Communication Engineering, Indian Institute of Information Technology Guwahati.

<sup>2</sup>Dept. of Electrical Engineering, Indian Institute of Technology Jodhpur.

<sup>3</sup>Dept. of Computer Science and Engineering, Indian Institute of Technology Ropar.

#### **Motivation**

Accurate computation is the key to hardware design.

In some domains, **approximations** in basic computations do not impact much in overall accuracy of the application.





## Outline

- Introduction
- Background
- Proposed Methodology
- Experimentation and Results
- Conclusion and Future Works

Approximations can be of various levels including software and hardware level.

Approximate circuits are much faster, smaller and power efficient.

Multiplier is the basic building block in many error resilient algorithms.

Approximate multipliers can result in good reduction of area and delay with some loss in accuracy.

We propose a divide and conquer based algorithm alongside rounding with configurable accuracy to get desirable accuracy at the execution time.

Because of adaptive accuracy, our multiplier can be used in various error resilient algorithms.

We have shown one such application in image processing with minimal loss of accuracy.

## Background

#### Some basic questions to ask:

- Why make use of rounding?
- Why a divide and conquer approach?
- How is it specifically related to Image processing?
- Overall benefit?

# **Proposed Methodology**

## **Proposed Design**

#### **DIVIDE AND CONQUER**

- A is divided into  $A_H$  and  $A_L$
- B is divided into  $B_H$  and  $B_L$

Multiply part by part:  $A_H B_H$ ,  $A_H B_L$ ,  $A_L B_{H_2} A_L B_L$ 

Shift and add:  $A_H B_H << N + (A_H B_L + A_L B_H) << N/2 + A_L B_L$ 



### **Multiplication of One Block**

**General Equation:**  $AxB=A_RxB + B_RxA - A_RxB_R + (A_R-A)x(B_R-B)$ 

#### **Approximate Equation:** $AxB=A_RxB + B_RxA - A_RxB_R$

 $A_{\rm R},\,B_{\rm R}$  are closest powers of 2 of A and B respectively.

Applying expansion for all four terms, i.e.,  $A_H B_H$ ,  $A_H B_L$ ,  $A_L B_H$ ,  $A_L B_L$ 



## Closest Power of 2 Pointer(CP2P)

$$y_{[n]} = a_{[n-1]} \cdot a_{[n-2]}$$
$$y_{[i]} = a_{[n-1]} \cdot a_{[n-2]} \prod_{j=i}^{n-1} \overline{a_{[j]}} + a_{[i]} \cdot \overline{a_{[i-1]}} \prod_{j=i+1}^{n-1} \overline{a_{[j]}}$$
$$y_{[1]} = a_{[1]} \cdot \overline{a_{[0]}} \prod_{j=2}^{n-1} \overline{a_{[j]}}$$
$$y_{[0]} = a_{[0]} \prod_{j=1}^{n-1} \overline{a_{[j]}}$$

## Generality of the Algorithm



## **Experimentation and Results**

## Error Analysis

P: Accurate product
P<sub>app</sub>: Approximate product
D: Maximum Error Distance
L: Number of inaccurate results
N: Number of Test Cases

$$ER(\%) = \frac{L}{N} * 100\%$$
$$MRED = \frac{1}{N} \sum_{N} \frac{P - P_{app}}{P}$$
$$NED = \frac{1}{N} \sum_{N} \frac{P - P_{app}}{D}$$

| Multiplier Type | Bit Width | ER (%) | MRED   | NED      |
|-----------------|-----------|--------|--------|----------|
| ILM             |           | 94.04  | 0.0282 | 0.011232 |
| RoBA            |           | 94.04  | 0.0282 | 0.011232 |
| ALM             | 8         | 99.99  | 0.51   | 0.186    |
| ReARM           |           | 81.5   | 0.024  | 0.0996   |
| ILM             |           | 99.96  | 0.0288 | 0.11164  |
| RoBA            |           | 99.96  | 0.0288 | 0.11164  |
| ALM             | 16        | 100    | 0.52   | 0.19     |
| ReARM           |           | 99.7   | 0.0280 | 0.1102   |

#### Error vs Accuracy Level



#### Hardware Implementation

# All multipliers described using **Verilog HDL.**

Area, power and delay statistics taken out using **Synopsys Design Compiler** with **SAED 90nm Cell Library.** 

| Multiplier Type | Delay (ns) | Area (µm²) | Power(µW) |
|-----------------|------------|------------|-----------|
| Conventional    |            |            |           |
| Multiplier      | 4.30       | 24974.61   | 2.25e+03  |
| Vedic           |            |            |           |
| Multiplier      | 4.27       | 6369.43    | 1.62e+03  |
|                 |            |            |           |
| ILM             | 6.05       | 3000.59    | 800.31    |
|                 |            |            |           |
| RoBA            | 5.92       | 729.92     | 2664.66   |
|                 |            |            |           |
| ALM             | 1.92       | 672.41     | 66.75     |
|                 |            |            |           |
| ReARM           | 5.11       | 5228.45    | 1.58e+03  |

## **Image Processing Application: JPEG Compression**

#### JPEG image compression

(a) Lena (256×256) original image;

(b)Exact Multiplication;

(c) **ReARM**, PSNR=36.2496 dB & SSIM=0.9751;

(d)**ILM**, PSNR=34.8836 dB & SSIM=0.9738;

(e) **RoBA**, PSNR=34.8836 dB &SSIM=0.9738;

(f) ALM, PSNR=27.9769 dB & SSIM=0.8395





## **Conclusion and Future Work**

#### **Main Contributions:**

Divide and conquer alongside rounding based reconfigurable approximate multiplier giving various levels of accuracy, including accurate multiplication.

Mainly aimed at image processing applications, hence more focussed on having better accuracy at 8-bit configurations. JPEG compression results highlight effectiveness of ReARM.

In the future, we'll investigate techniques so as to support floating point operations as well.

# Thank You