|INSIGHT: The Need for Speed--Optimizing for CFD Performance
Good predictive accuracy makes CFD possible for commercial design, but fast turnaround makes it practical. In this month's issue of The Flow, we sit down with Bob Ni and ADS product manager Michael Ni to discuss tips on how to optimize your CFD runs for performance when you're in a design crunch. Prior to founding ADS, Bob spent nearly 30 years at Pratt & Whitney leading turbomachinery CFD in support of compressor and turbine design.
FLOW: For those of us that don't have the luxury of growing our clusters, what are your suggestions for optimizing CFD performance?
BOB: Obviously the answer to this question is extremely case and cluster dependent, but as a general guideline I'd suggest the following approach for typical 3-D steady and unsteady analysis:
- Conduct an initial mesh refinement study
- Evaluate the default partitions for balance
- Balance loads as needed
- Distribute loads evenly across as many physical machines as possible
FLOW: Let's go through each of these steps in a bit more detail. Why conduct a mesh refinement study?
BOB: Apart from the good practice of demonstrating mesh independence from results, it also helps the user to find the right balance of accuracy and turnaround time. By identifying the lowest density mesh that properly reveals flow insights for your case, you can avoid element "overkill" that will bog down your runs with little impact on results.
FLOW: Got it. How do you evaluate partitions for balance?
BOB: As you know, when setting up a case for parallel execution, a mesh is partitioned into blocks that can be executed in parallel. This is usually carried out by the CFD software. For optimum performance, these partitions should be relatively equal in size; one simple way to gauge this is to look at the mesh element counts for each partition. Here at ADS we partition by row for steady runs and by passage for usnteady runs. By looking at the mesh element counts generated by the system during mesh generation, you can quickly assess the relative balance of your partitions.
FLOW: And what can be done if the partitions don't appear to be balanced?
MIKE: Most CFD vendors will give you ways to further partition a case to more finely control balance. For example, let's say you want to conduct 3-D steady analysis on a 1.5 stage turbine. By default, the CFD system partitions the case into three blocks--one for Vane 1, one for Blade 1 and one for Vane 2. If the mesh element counts are 100K/100K/100K respectively, the partitions are balanced so nothing needs to be done. But let's say instead that the mesh element count is 200K for Blade 1. Now you've got a bottleneck since the blade has twice the mesh element count of either vane. To balance the load you can further partition the blade into O mesh and H mesh blocks, for example. So the result is now four balanced partitions of 100K: Vane 1, Blade 1 (O), Blade 1 (H) and Vane 2.
FLOW: Is this also possible for centrifugal compressor design?
BOB: Yes, in addition to O-H mesh breakup, centrifugal compressor designers can partition an impeller with splitters. Consider a classic single stage centrifugal compressor with impeller and diffuser. If the impeller consisted of a main blade plus two splitter blades, you could further partition the case by decomposing the impeller mesh into three blocks, one for the main blade and one for each splitter.
FLOW: So once a case has been partitioned and balanced, it has to be assigned to processes in the cluster for parallel execution. Tell us more about "round-robin" distribution.
MIKE: I'll take this one, Bob. As a general guideline, we recommend that you spread the case evenly across as many physical machines as possible. So for a case with 16 partitions, if you have 16 physical machines we'd actually recommend assigning one partition to each physical machine rather than loading up on four quad core machines. Though cases and clusters vary widely, we suggest this approach beacuse it's a simple yet effective way to mitigate some of the limitations associated with multi-core, multi-CPU clusters.
FLOW: What do you mean by limitations?
MIKE: Multi-core machines are pervasive today and offer excellent price-performance, but as we all know,a quad core machine will not deliver 4x the performance of a comparable single core machine. Multi-core machines share resources, including memory, cache and network interconnects, so an overload on any of these fronts can significantly degrade performance. This becomes particularly apparent with larger cases where communication can account for over half of total execution time and where it's critical to have as many dedicated network interfaces as possible. Spreading the workload evenly across as many machines as possible offers a simple yet effective way to circumvent multi-core cluster performance limitations.
FLOW: Thank you, gentlemen.
BOB, MIKE: You're welcome.
TECHTIPS: Partitioning a Single Airfoil Row with OH Mesh Breakup Learn how to partition a single airfoil mesh for improved parallel performance using OH mesh breakup. <more>
TECHTIPS: Partitioning a Radial Impeller with Splitters Learn how to partition a radial impeller with splitters for improved parallel performance. <more>
TECHTIPS: Examining Film Cooling Angles Using ParaView
When using Code Leo to model the effects of individual film cooling holes on a turbine blade or vane, this handy utilities helps you verify proper definition of flow angles. <more>
Welcome to The Flow
Welcome to The Flow, a newsletter for monthly insights on turbomachinery CFD published by AeroDynamic Solutions, Inc.
Each month we'll spotlight a topic of interest, discuss a case study and/or provide useful pointers about how to get the most out of the ADS CFD system.
You are receiving this email because you or someone else you know thought you would be interested. To unsubscribe please click here. We value your privacy.