28 #define z_swap(_a_, _b_, _t_) do { (_t_) = (_a_); (_a_) = (_b_); (_b_) = (_t_); } while (0) 35 static inline void sort_node_indices_sort_bubble(INT n, INT *keys)
39 for (i = 0; i < n; ++i)
42 while (j > 0 && keys[2 * j + 0] < keys[2 * (j - 1) + 0])
44 z_swap(keys[2 * j + 0], keys[2 * (j - 1) + 0], ti);
45 z_swap(keys[2 * j + 1], keys[2 * (j - 1) + 1], ti);
56 static inline void sort_node_indices_radix_count(INT n, INT *keys, INT shift, INT mask, INT *counts)
60 for (i = 0; i < n; ++i)
62 k = (keys[2 * i + 0] >> shift) & mask;
72 static inline void sort_node_indices_radix_rearrange(INT n, INT *keys_in, INT *keys_out, INT shift, INT mask, INT *displs)
76 for (i = 0; i < n; ++i)
78 k = (keys_in[2 * i + 0] >> shift) & mask;
79 keys_out[2 * displs[k] + 0] = keys_in[2 * i + 0];
80 keys_out[2 * displs[k] + 1] = keys_in[2 * i + 1];
86 #define radix_n (1 << rwidth) 93 void Y(sort_node_indices_radix_lsdf)(INT n, INT *keys0, INT *keys1, INT rhigh)
95 const INT radix_mask = radix_n - 1;
96 const INT rhigh_in = rhigh;
100 omp_get_max_threads();
105 INT *from, *to, *tmp;
110 INT tid = 0, tnum = 1;
112 STACK_MALLOC(INT*, lcounts, (
size_t)(tmax * radix_n) *
sizeof(INT));
120 #pragma omp parallel private(tid, tnum, i, l, h) 122 tid = omp_get_thread_num();
123 tnum = omp_get_num_threads();
126 for (i = 0; i < radix_n; ++i) lcounts[tid * radix_n + i] = 0;
128 l = (tid * n) / tnum;
129 h = ((tid + 1) * n) / tnum;
131 sort_node_indices_radix_count(h - l, from + (2 * l), rhigh_in - rhigh, radix_mask, &lcounts[tid * radix_n]);
137 for (i = 0; i < radix_n; ++i)
139 for (l = 0; l < tmax; ++l) lcounts[l * radix_n + i] = (k += lcounts[l * radix_n + i]) - lcounts[l * radix_n + i];
143 #pragma omp parallel private(tid, tnum, i, l, h) 145 tid = omp_get_thread_num();
146 tnum = omp_get_num_threads();
149 l = (tid * n) / tnum;
150 h = ((tid + 1) * n) / tnum;
152 sort_node_indices_radix_rearrange(h - l, from + (2 * l), to, rhigh_in - rhigh, radix_mask, &lcounts[tid * radix_n]);
166 if (to == keys0) memcpy(to, from, (
size_t)(n) * 2 *
sizeof(INT));
176 void Y(sort_node_indices_radix_msdf)(INT n, INT *keys0, INT *keys1, INT rhigh)
178 const INT radix_mask = radix_n - 1;
182 omp_get_max_threads();
190 INT counts[radix_n], displs[radix_n];
192 INT tid = 0, tnum = 1;
194 STACK_MALLOC(INT*, lcounts, (
size_t)(tmax * radix_n) *
sizeof(INT));
199 #pragma omp parallel private(tid, tnum, i, l, h) 201 tid = omp_get_thread_num();
202 tnum = omp_get_num_threads();
205 for (i = 0; i < radix_n; ++i) lcounts[tid * radix_n + i] = 0;
207 l = (tid * n) / tnum;
208 h = ((tid + 1) * n) / tnum;
210 sort_node_indices_radix_count(h - l, keys0 + (2 * l), rhigh + 1, radix_mask, &lcounts[tid * radix_n]);
216 for (i = 0; i < radix_n; ++i)
218 for (l = 0; l < tmax; ++l) lcounts[l * radix_n + i] = (k += lcounts[l * radix_n + i]) - lcounts[l * radix_n + i];
220 displs[i] = lcounts[0 * radix_n + i];
221 if (i > 0) counts[i - 1] = displs[i] - displs[i - 1];
223 counts[radix_n - 1] = n - displs[radix_n - 1];
226 #pragma omp parallel private(tid, tnum, i, l, h) 228 tid = omp_get_thread_num();
229 tnum = omp_get_num_threads();
232 l = (tid * n) / tnum;
233 h = ((tid + 1) * n) / tnum;
235 sort_node_indices_radix_rearrange(h - l, keys0 + (2 * l), keys1, rhigh + 1, radix_mask, &lcounts[tid * radix_n]);
240 memcpy(keys0, keys1, (
size_t)(n) * 2 *
sizeof(INT));
244 for (i = 0; i < radix_n; ++i)
249 Y(sort_node_indices_radix_msdf)(counts[i], keys0 + 2 * displs[i], keys1 + 2 * displs[i], rhigh);
251 sort_node_indices_sort_bubble(counts[i], keys0 + 2 * displs[i]);