Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
MRT Lattice Boltzmann Method on multiple Graphics Processing Units with halo sharing over PCI-e for non-contiguous memory
Luleå University of Technology, Department of Engineering Sciences and Mathematics, Fluid and Experimental Mechanics.ORCID iD: 0000-0002-9707-5396
2022 (English)In: Proceedings of the Eleventh International Conference on Engineering Computational Technology / [ed] B.H.V. Topping; P. Iványi, Civil-Comp Press , 2022, article id 8.5Conference paper, Published paper (Refereed)
Abstract [en]

The Lattice Boltzmann Method (LBM) has been shown to be well suited for implementation on Graphics Processing Units (GPUs). The benefit of GPU implementations compared to CPU is in the reduction of computational time, by as much as 2 orders of magnitude. This staggering difference is due to how computations for LBM are both explicit and local, meaning that it can make full use of the GPUs capabilities, like most other cellular automata methods. Although GPUs have a significantly larger performance in terms of floating-point operations per second (FLOPS) compared to a CPU it has two significant drawbacks; First, the complexity of the calculations is limited due to the relative simplicity of the GPU core design compared to a CPU, secondly, the memory of a GPU is usually limited in comparison, ranging from a few GB up to ???? 100 GB for high-end enterprise cards. Because the LBM method is suitable for execution on GPUs the first point is not necessary to consider. But the second point becomes a limitation as larger, or more highly resolved computational domains are of interest. This can be remedied by distributing the computations across several GPUs executing in parallel. The GPUs share values in overlapping regions called halo-values that need to be transferred each time step. If the memory is contiguous then each transfer can be executed as a single efficient memory transfer call that utilizes the PCI-e lanes efficiently. If this is not the case then support exists for copying of so-called strided memory which has a constant offset between values for either single strided (2D) or double strided (3D). These functions practically result in bad PCI-e lane utilization and to remedy this a method is proposed, the halo-values are calculated and packed into a contiguous memory buffer that is then communicated between the GPUs via the PCI-e lanes. It is shown that the method introduces some additional overhead compared to single GPU execution but maintains a reasonable 70% performance compared to the single GPU case.

Place, publisher, year, edition, pages
Civil-Comp Press , 2022. article id 8.5
Series
Civil-Comp Conferences, ISSN 2753-3239
Keywords [en]
lattice Boltzmann method, GPU, multi-GPU programming
National Category
Fluid Mechanics
Research subject
Fluid Mechanics
Identifiers
URN: urn:nbn:se:ltu:diva-93180DOI: 10.4203/ccc.2.8.5OAI: oai:DiVA.org:ltu-93180DiVA, id: diva2:1698053
Conference
The Eleventh International Conference on Engineering Computational Technology (ECT2022), Montpellier, France, August 23-25, 2022
Funder
Swedish Research Council, 2017-04390Available from: 2022-09-22 Created: 2022-09-22 Last updated: 2025-02-09Bibliographically approved
In thesis
1. Transitional flow in ordered porous media
Open this publication in new window or tab >>Transitional flow in ordered porous media
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Porous media, here defined as any permeable structure allowing a fluid to flow through, are relevant to a multitude of engineering applications and natural processes. The observed macroscopic properties of the porous media such as mixing, heat transfer and apparent permeability are properties which are affected by the flow and especially the type of flow, or flow region. The flow regions are characterized by the ratio of the convective to viscous forces, called the Reynolds number (Re). Of these regions the transition from inertial laminar flow to fully turbulent flow is the least understood. In comparison to flows in straight pipes the onset of inertial and unsteady phenomena in porous beds do not coincide, also the transition region stretches over orders of magnitude in Re for most porous beds. In porous media this domain is characterized by temporally long-lived and spatially large scale flow structures which interact in unpredictable ways leading to dramatic shifts of the behavior of the macroscopic properties. To improve the understanding of this transitional domain, ordered materials, that reduce geometrically induced flow complexities, are studied with both numerical and experimental methods.

In Paper A two types of ordered porous media with the same porosity but varying tortuosity are investigated using tomographic Particle Image Velocimetry and pressure measurements. The variation of Re gives an almost complete overview from the onset of inertial effects up to the start of the turbulent region. Two pore-scale phenomena were disclosed from the complex flow patterns that appeared. The first is an inertial steady effect first assumed to be caused by wall effects. In Paper D it was, however, discovered that the phenomenon materializes independently of wall effects. Instead it is a specific case of a more general inertial transition occurring for a wide range of porous media. A second pore-scale effect is a form of inertial core symmetry break-up that occurs in low-tortuosity porous media. This symmetry break-up is correlated to a sharp increase in the average pressure drop. The second flow structure was reproduced using numerical methods in Paper B forming the basis of a more comprehensive discussion on how these structures impact the usage of periodic conditions when modelling porous media.

The possibility of using high performance Graphics Processing Unit (GPU) implementations of the Lattice Boltzmann Method (LBM) for simulating thermal turbulent flows in porous media has also been investigated in Paper C. It is concluded that the GPU LBM implementations provide fast, efficient and accurate simulations of thermal turbulent flows in porous media, as well as for a wide range of other flows. Furthermore, in Paper E, a multiple GPU implementation of a hydrodynamic LBM model is presented.

Place, publisher, year, edition, pages
Luleå tekniska universitet, 2022
Series
Doctoral thesis / Luleå University of Technology 1 jan 1997 → …, ISSN 1402-1544
Keywords
Porous media, Lattice Boltzmann Method, GPU Programming, Tomographic PIV, Laser Doppler Velocimetry
National Category
Fluid Mechanics Computational Mathematics
Research subject
Fluid Mechanics
Identifiers
urn:nbn:se:ltu:diva-93183 (URN)978-91-8048-149-6 (ISBN)978-91-8048-150-2 (ISBN)
Public defence
2022-11-18, E632, Laboratorievägen 14, Luleå, 09:00 (English)
Opponent
Supervisors
Funder
Swedish Research Council, 2017-04390
Available from: 2022-09-22 Created: 2022-09-22 Last updated: 2025-02-09Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPublisher's full text

Authority records

Forslund, Tobias O.M.

Search in DiVA

By author/editor
Forslund, Tobias O.M.
By organisation
Fluid and Experimental Mechanics
Fluid Mechanics

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 97 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf