So, I went through the code and plotted the distribution, and this
answered all my questions:
1) The fact that setting the multiplier from 1e-5 to 1e-500 makes no
difference is because the data is considered to behave according to a
gaussian distribution. Despite this, the threshold used to consider
outliers is only used on the right tail of the distribution. If you
think of it, it makes sense, because you don't want to be considering
outliers points that have a mean distance to their neighbors that is
less than the average in the distribution, since this means they are
more packed and therefore are not likely to be artifacts from your
acquisition process.
2) That being said, i realized that setting my threshold with
std_dev_mul \approx 0 doesn't really cut out half of the data nor all
of the sparse outliers, as it was expected if this followed a gaussian
distribution, because you would be cutting half of its area. But, when
you think about it, this also makes sense: if your acquisition is done
properly, you should have more inliers than outliers, so the
distribution is probably bimodal. what happens is, when you fit a
gaussian onto it, your mean will be in the center of those modes,
probably biased a little bit to the left, because the inlier mode has
more mass. Notwithstanding this bad model, the thresholding does its
trick. One thing to note is that if you want to be more aggressive
with your pruning (i.e., make sure you have no outliers even if you
are throwing away much of the inliers), you CAN set std_dev_mul to a
negative value which will put the threshold in a value smaller than
the average. To help me do this, i added two lines to
statistical_outlier_removal.hpp after line 98:
PCL_WARN("MEAN + MUL*STDDEV: %f + %f * %f \n",mean, std_mul_, stddev );
PCL_WARN("THRESHOLD: %f\n", distance_threshold );
In conclusion:
- The module is working as it should, but maybe something should be
added to the docs stating that std_dev_mul can be negative. Adding the
two lines of code i mentioned before might have some educational value
and should give you some insight about the nature of the data one is
processing, so it might be a handy addition. Changing the variables to
double precision might also be a good idea for cases where there is a
wide range of depths, but I'm guessing changing that will also imply
changing
std::vector<float> nn_dists (mean_k_);
so i didn't get around to do it.
- Regarding the visualization of data so as to select the threshold
appropriately, i was thinking that maybe one interesting way of
showing the data instead of plotting the histogram would be to encode
in the color information the distance so you get a visual of which
points you are cutting out for each threshold
One other note, the outlier removal tutorial does not preserve the rgb
information of the original point cloud, but changing PointXYZ to
PointXYZRGB gives me a problem at runtime:
statistical_removal:
/usr/local/include/pcl-1.1/pcl/ros/conversions.h:91: void
pcl::detail::FieldMapper<PointT>::operator()() [with Tag =
pcl::fields::rgb, PointT = pcl::PointXYZRGB]: Assertion `Data::value
== field.datatype' failed.
Is this the right way to do it?
Thanks!
Ricardo
On Wed, Jul 27, 2011 at 16:17, Radu B. Rusu <
rusu@willowgarage.com> wrote:
> Ricardo,
>
> There are two versions of the filter: one working on PointCloud<T> with an
> implementation in the HPP file, and one for PointCloud2 with an
> implementation in the CPP.
>
> Looking forward to hear your conclusions!
>
> Cheers,
> Radu.
> --
> Point Cloud Library (PCL) - http://pointclouds.org
>
> On 07/27/2011 03:51 PM, Ricardo Silveira Cabral wrote:
>>
>> Hi Radu!
>> I was actually changing the .cpp file, so i was wondering why no
>> change was coming through.
>> I think now i should be able to output the points, plot the
>> distribution and see what's wrong with this.
>> I'll try the double change as you mentioned, and keep you posted about
>> the results.
>>
>> Thanks,
>> Ricardo
>>
>> On Wed, Jul 27, 2011 at 12:23, Radu B. Rusu<rusu@willowgarage.com> wrote:
>>>
>>> Ricardo,
>>>
>>>
>>> On 07/27/2011 09:58 AM, Ricardo Cabral wrote:
>>>>
>>>> Good morning everyone,
>>>>
>>>> Two quick questions:
>>>>
>>>> 1) does the std_dev multiplier have a maximum precision to it?
>>>
>>> It shouldn't have. It's basically a double.
>>>
>>>> 2) is there a way to plot or output the mean, std_dev and distances of
>>>> all
>>>> points so as to view the distribution and appropriately select the
>>>> parameters?
>>>
>>> You can try feeding them into some form of a histogram and then use
>>> HistogramVisualizer or any other standard tools like gnuplot. We can make
>>> the visualization of things like this way more awesome in PCL if we had a
>>> bit more input (and time :D).
>>>
>>>> I'm asking this because i'm running the provided tutorial and the
>>>> results
>>>> don't seem to change when i change the multiplier from 1e-3 to 1e-200
>>>> and
>>>> i doubt the distribution is that highly peaked: In both cases i only get
>>>> a
>>>> reduction of about 222407 to 164043 points.
>>>
>>> Interesting. Line 95 from statistical_outlier_removal.hpp shows:
>>>
>>> 95 double distance_threshold = mean + std_mul_ * stddev;
>>>
>>> And then this is what we use to filter data afterwards.
>>>
>>> Though I did notice that the distances vector is of type float (!). So
>>> maybe
>>> that's where the error occurs!
>>>
>>> Can you please try to change the distances vector to double and see if
>>> that
>>> helps you? Alternatively, I can provide a patch, or you can provide a
>>> small
>>> example that we can try to fix.
>>>
>>> Thanks a lot Ricardo!
>>>
>>> Cheers,
>>> Radu.
>>>
>>>
>
>
_______________________________________________
PCL-users@code.ros.org /
http://pointclouds.org
https://code.ros.org/mailman/listinfo/pcl-users