Right now i’m trying to run very large for loops for some task, nearly about 8e+12 iterations. I tried using c++11 threading, but it do not seems to be working that fast as required. I am using system with 8 gb ram, i5 cpu and intel graphics 4000 card. If i use openmp would it be better or i have to use nvidia gpu and use cuda for this task? My code is as below:
void thread_function(pcl::PointCloud<pcl::PointXYZRGB>::ConstPtr cloudB,vector<int> v,int p0) {
for(size_t p1=0;p1<v.size() && ros::ok();++p1) {
int p0p1 = sqrt( pow(cloudB>points[v[p1]].xcloudB>points[v[p0]].x,2)
+pow(cloudB>points[v[p1]].ycloudB>points[v[p0]].y,2)
+pow(cloudB>points[v[p1]].zcloudB>points[v[p0]].z,2) ) * 1000;
if(p0p1>10) {
for(size_t p2=0;p2<v.size() && ros::ok();++p2) {
int p0p2 = sqrt( pow(cloudB>points[v[p2]].xcloudB>points[v[p0]].x,2)
+pow(cloudB>points[v[p2]].ycloudB>points[v[p0]].y,2)
+pow(cloudB>points[v[p2]].zcloudB>points[v[p0]].z,2) ) * 1000;
int p1p2 = sqrt( pow(cloudB>points[v[p2]].xcloudB>points[v[p1]].x,2)
+pow(cloudB>points[v[p2]].ycloudB>points[v[p1]].y,2)
+pow(cloudB>points[v[p2]].zcloudB>points[v[p1]].z,2) ) * 1000;
if(p0p2>10 && p1p2>10) {
}
}
}
}
x[p0] = 3;
cout<<“ended thread=”<<p0<<endl;
}
This task is really important for my algorithm to complete. I need a suggestion how to make this loops run very fast. In above code the thread_function is the main function where i’m putting the for loops currentely. Is their any way to increase its performance in above code?

This topic was modified 1 year ago by kane.