Cpp

Image processing is a computational task that lends itself very well to GPU compute scenarios. In many cases the most commonly used algorithms are inherently massively parallel, with each pixel in the image being processed independently ...
Image processing is a computational task that lends itself very well to GPU compute scenarios. In many cases the most commonly used algorithms are inherently massively parallel, with each pixel in the image being processed independently from the others. As a result, image processing toolkits have been early adopters of the new GPGPU programming model. Many of these mass-market toolkits, however, can be more accurately described as image manipulation packages that offer "image-in, image-out" capabilities; in other words, for each operation there is an input image and a resulting output (manipulated) image. In contrast, an image processing workflow differs from this in that the goal is usually the portrayal or extraction of analytical information which is determined after some multi-step processing workflow. These workflows are commonly employed in scientific and technical industries like medical imaging. For the last two years, Core Optical Inc. has been building an image processing toolkit for the .NET framework called PrecisionImage.NET. Internally it centers around two separate execution branches, one targeting multicore CPUs and the other targeting GPU execution. While the CPU branch is a fully-managed CLS-compliant implementation leaning heavily on the .NET framework's excellent built-in thread pool, the GPU branch is implemented using Microsoft's brand new C++ AMP compiler. We had two requirements when choosing the GPGPU tool we would use for that branch of our toolkit. First, the generated code needed to be vendor agnostic so that a decision to use our SDK wouldn't overly restrict our customer's choice concerning graphics hardware. The current minimum platform for C++ AMP is DirectX 11, a version that will soon be ubiquitous among modern GPUs from Intel, Nvidia and AMD. Secondly, since we focus on the Microsoft developer stack we needed something that would play nicely with the .NET framework. Obviously C++ AMP is the best bet in this regard since it's produced by Microsoft. For a v1 product we've found C++ AMP to be both solid and easy to program to. Although Microsoft doesn't produce an official managed wrapper, accessing AMP in .NET was a straight forward matter of P/Invoking from our existing C# code base. To keep the surface area between the two to a minimum, we stuck with our managed code for the CPU fallback and condensed the various operations of the SDK into hundreds of compact AMP kernels compiled in combinations of single/double precision and 32/64-bit implementations. In almost all cases we found the simpler untiled model readily met our speedup goals. When this wasn't the case, we were able to produce a tiled version that met our performance objectives with minimum drama. To demonstrate the performance of the GPU branch we decided to compare the speed of two operations running on a 6-core CPU (multithreaded managed code) versus the C++ AMP version running on two different GPUs from Nvidia. The first operation was chosen as an ideal case for GPU implementation and consisted of a 2D convolution using a large kernel implemented using AMP's simple untiled model. The second was chosen as for its unsuitability to GPU processing and was implemented using the tiled model. Even when including the overhead of marshalling arguments from managed to native code, and memory copying to/from the GPU, we saw huge gains (60x) in the first test case. Perhaps more surprising were the gains achieved in the second, less suitable, test case – up to 7x – an indication of the quality of the AMP compiler. Based on our experience to date, if you are a developer considering using AMP from a managed code base we can recommend it without reservation. Currently, one aspect of C++ AMP imposes a performance limitation (acknowledged by Microsoft) for our particular use cases: redundant memory copying between CPU and GPU. This is partly imposed by hardware and partly by software. Since our SDK is designed to allow the easy as
about 1 hour ago
libcds is a C++ template library of lock-free containers and safe memory reclamation (SMR) algorithms. It contains implementations of well-known SMR algos such as Hazard Pointers, Pass-the-Buck and user-space RCU, and a lot of lock-f...
libcds is a C++ template library of lock-free containers and safe memory reclamation (SMR) algorithms. It contains implementations of well-known SMR algos such as Hazard Pointers, Pass-the-Buck and user-space RCU, and a lot of lock-free and fine-grained lock-based intrusive and non-intrusive containers (stack, queue,
about 22 hours ago
Hi, Nana C++ Library provides a GUI programming with Standard C++ style, and also includes C++11 features. I would like to hear your feedback. [link] This library welcomes your suggestions, contributions and bug fixes. Regards! Jin...
Hi, Nana C++ Library provides a GUI programming with Standard C++ style, and also includes C++11 features. I would like to hear your feedback. [link] This library welcomes your suggestions, contributions and bug fixes. Regards! Jinhao -- [ See [link] for info about ]
3 days ago
I would like to continue our discussion with a particularly nasty case in which the result is not well defined.
I would like to continue our discussion with a particularly nasty case in which the result is not well defined.
5 days ago
This has been annoying me for a while: template class Vec { public: Vec(size_t len) : len_(len) { data_ = new T[len]; } ~Vec() { delete data_; } Vec(const Vec &) = delete; void operator = (const Vec &) = delete; size_t size() c...
This has been annoying me for a while: template class Vec { public: Vec(size_t len) : len_(len) { data_ = new T[len]; } ~Vec() { delete data_; } Vec(const Vec &) = delete; void operator = (const Vec &) = delete; size_t size() const { return len_; } T &operator [] (size_t pos) { return data_[pos]; }
5 days ago
[I can't figure out how to post this both here and comp.std.c++, so I have to multi-post later. Unless a moderator here can do it.] { My experience suggests it's best to avoid cross-posting to moderated groups -mod/we } This ...
[I can't figure out how to post this both here and comp.std.c++, so I have to multi-post later. Unless a moderator here can do it.] { My experience suggests it's best to avoid cross-posting to moderated groups -mod/we } This is based on a recent Stack Overflow post at: [link]>. As a commenter to my
5 days ago
Hi group, Possibly OT, but comp.software.patterns seems to be just a spam NG and am exclusively interested in c++ at this time, so here goes { Not at all; your question is relevant to the C++ community -mod/we } I wonder if you could...
Hi group, Possibly OT, but comp.software.patterns seems to be just a spam NG and am exclusively interested in c++ at this time, so here goes { Not at all; your question is relevant to the C++ community -mod/we } I wonder if you could recommend some books or some online resources where I would find exercises for each of the specific design patterns.
6 days ago
Consider a small unit test case struct A { virtual void func(){} A& foo() { A *obj = reinterpret_cast(0xdeadbee f); return *obj; //1 } int main() { A obj = obj.foo(); At line 1 is it implementation defined/unspecified that th...
Consider a small unit test case struct A { virtual void func(){} A& foo() { A *obj = reinterpret_cast(0xdeadbee f); return *obj; //1 } int main() { A obj = obj.foo(); At line 1 is it implementation defined/unspecified that the deference would not happen as we are returning by reference and the program
7 days ago
I have a problem with the sequence of inheritance: // header1.h class Base { public: Base() {} virtual ~Base() {} virtual void foo() = 0; }; // header2.h #include "header1.h" class Unrelated1 {}; class Unrelated2 {}; class...
I have a problem with the sequence of inheritance: // header1.h class Base { public: Base() {} virtual ~Base() {} virtual void foo() = 0; }; // header2.h #include "header1.h" class Unrelated1 {}; class Unrelated2 {}; class Derived1 : public Base, public Unrelated1, public Unrelated2 {
7 days ago
In the C++ Standard - before C++11 - any attempt to initialize a variable inside the body of a class would fail at compilation. I am sure that there is / was a very good reason for this, but can't understand why it is so. I thoug...
In the C++ Standard - before C++11 - any attempt to initialize a variable inside the body of a class would fail at compilation. I am sure that there is / was a very good reason for this, but can't understand why it is so. I thought that maybe it was a restriction that was imposed due to some
8 days ago