



# An FPGA-based Sound Field Renderer for High-Precision Sound Field Auralization

Yiyu Tan<sup>†</sup>, Xin Lu<sup>†</sup>, Guanghui Liu<sup>\*</sup>, Peng Chen<sup>§</sup>, Truong Thao Nguyen<sup>§</sup>, and Yusuke Tanimura<sup>§</sup>

<sup>†</sup> Department of Systems Innovation Engineering, Iwate University, Japan

\* Inflammatory Bowel and Immunobiology Research Institute, Cedars-Sinai Medical Center, USA <sup>§</sup> National Institute of Advanced Industrial Science and Technology, Japan

## (i) Introduction

- Sound field auralization is computation-intensive and memory-intensive. The solutions include FPGA-based direct hardware implementation and software simulations general-purpose on processors and GPU<sup>[1]</sup>.
- Accurate room impulse response is critical to achieve precise auralization results. The wave-based methods provide high accuracy, but require much higher computational capability since spatial grids are oversampled to suppress dispersion errors. In this research, an FPGA-



**based** accelerator is developed to speed up computation in the generation of room impulse response through a sound field renderer.

## (ii) System Design and Implementation



Figure 3. Computing unit.

#### **System Implementation**

- System is designed using OpenCL and implemented by using a DE10-Pro FPGA card. Spatial and temporal blockings are applied to alleviate the required external memory bandwidth and reuse data, respectively.
- The sound field renderer consists of the Data input module, computation engine, and the Data output module. The computation engine contains 16 PEs (processing elements) to compute sound pressures of grids in a spatial block at continuous 16 time-steps.
- High-speed and high-bandwidth on-chip memories is employed to implement a sliding-window-based data buffering to reduce the required memory bandwidth and data access overhead between the rendering engine and on-board external memory.

 $P^{n}(i,j,k) = D1 \times \left[P^{n-1}(i-1,j,k) + P^{n-1}(i+1,j,k) + P^{n-1}($  $P^{n-1}(i, j-1, k) + P^{n-1}(i, j+1, k) + P^{n-1}(i, j, k-1) +$  $P^{n-1}(i, j, k+1) + 2P^{n-1}(i, j, k) - D2 \times P^{n-2}(i, j, k)$ 

| Table 1: Parameters  |                       |                    | C |
|----------------------|-----------------------|--------------------|---|
| <b>Grid Position</b> | <b>D1</b>             | D2                 |   |
| General              | 1/4                   | 1                  |   |
| Interior             | $\frac{R+1}{2 (R+3)}$ | $\frac{3R+1}{R+3}$ |   |
| Edge                 | $\frac{R+1}{8}$       | R                  | C |
| Corner               | $\frac{R+1}{2 (5-R)}$ | $\frac{5R-1}{5-R}$ |   |

System Design

**The explicit compact FDTD rendering** algorithm is applied to compute room impulse response, and 7-point stencil scheme is adopted.

D1 and D2 are based on reflective boundary conditions. They are chosen according to the position of grids.

To update sound pressure of a grid requires six additions, one subtraction, shift operation, and one two multiplications.

#### Accelerator Data input PE<sub>3</sub> nory

### (iii) Performance Evaluation

#### **Evaluation environment**

|                                                               | FPGA                              | Software Simulation      |  |
|---------------------------------------------------------------|-----------------------------------|--------------------------|--|
| Computing unit                                                | Stratix 10 SX                     | Intel Xeon Gold<br>6212U |  |
|                                                               | (1SX280HU2F50E1VG)                |                          |  |
| Cores                                                         | 5760 DSP blocks                   | 24 cores                 |  |
| <b>Clock frequency</b>                                        | 357 MHz                           | 2.4 GHz                  |  |
| External memory                                               | 8 GB DDR4-2400                    | 512 GB DDR4-2933         |  |
| OS                                                            | CentOS 7.2                        | CentOS 7.2               |  |
| Programming language                                          | OpenCL                            | С                        |  |
| Compiler                                                      | Intel FPGA SDK for<br>OpenCL 19.1 | gcc 4.8.5                |  |
| Sound space: 16m×8m×8m; Time steps: 32; Block size: 128 x 128 |                                   |                          |  |

**Data:** single precision floating points; **# of PEs:** 16

Hardware resource utilization

**Table 1: Hardware resource utilization** 

Logic utilization DSP blocks RAM blocks Clock frequency

269,159 (29%) 342 (6%)

357 MHz



Rendering time

Table 2 Rendering Time Per Time Step (s)

| FPGA   | Software Simulation |
|--------|---------------------|
| 0.0486 | 0.5363              |

1785 (15%)

# (iv) Acknowledgements

Thanks for Intel's donation of the FPGA board DE10-Pro and the software tools through University Program. This work was supported by the JSPS KAKENHI Grant Number JP19K12092 and JP22K12123.

#### **Reference:**

[1] Y. Tan, T. Imamura, and M. Kondo, "FPGA-based acceleration of FDTD sound field rendering", Journal of the Audio Engineering Society, Vol. 69, No. 7/8, pp. 542-556, 2021.

High Performance Computing in the Asia-Pacific Region (HPCAsia) Feb. 27- Mar. 2, 2023, Singapore.