James Pierechod: Solid State memory

Found this great article on the development of solid state drives in the VFX industry and the impact this will have on specific aspects.

Thanks to Joe from http://blog.renderstream.com

In any workstation, render server or cluster, one will encounter bottlenecks. A hardware bottleneck can be defined as any hardware component that limits or restricts another component from operating as fast as it should. In this article we’ll talk with Vincent Brisebois of FusionIO ( http://www.fusionio.com ) and go over why using solid state flash memory in CG production can drastically improve your studio’s performance. We would also like to introduce our GODBOX line of compositing workstations that take advantage of pci-e solid state flash memory.

From a hardware standpoint, the fast clockspeeds of the CPU, as they continue to keep up with Moore’s Law, are becoming slightly less significant to the CG industry. Granted, we will all be relying on the CPU in our industry for years to come, but we’re starting to see things change. With the recent accomplishments in GPU rendering, you might have had to familiarize yourself with new terms like “CUDA cores” or “TFLOPS”. If you aren’t familiar with the term “IO” or “IOPS” you owe it to yourself to read on.

IOPS is a common benchmark for hard disks, solid state drives, and other computer storage devices. It is a measurement of the number of input/output calculations per second (read and writes). Often abbrevieated as IO, currently this can be considered one of the biggest bottlenecks in CG production.

Compositing is a large part of CG/VFX production and this process can benefit tremendously from compositing workstations equipped with solid state memory. Lets ask Vincent of FusionIO why this is...

RS: Can you explain to our readers why it is important for a compositing workstation to be able to handle a large amount of read and writes?

VB: Compositing encompasses many tasks that range from simple color grading to assembling multiple render passes and integrating CG elements or replacing green screens. In all cases, you have to be able to play back your footage in real-time and often play back multiple pieces of footage simultaneously (layers are an example of this). Traditionally, you would use your RAM as a buffer to cache the data, so you would tend to buy a lot of RAM to fit longer sequences or work in proxies with lower resolution footage.
As an example, let’s say you have a 20 layer composition made up of different elements split up in passes (diffuse pass, reflection pass, ambient occlusion, etc). When you change frames, the application has to read all the frames necessary to make up that image from your drive. When you create effects, the temporary render has to be put in a cache, which is usually your RAM, but if you exceed the amount available you “spill out” to you scratch disk which can be incredibly slow if it’s a spinning disk or single SSD. The more throughput you get from your storage, the faster the data can be processed by the application and the more interactive your work becomes.

RS: Can you describe the data flow process in a traditional compositing workstation (using rotational hard drives in a RAID) and outline the bottlenecks associated with such a system?

VB: There are two paths for the data to flow, you may have the data locally (freelance) or it may be on a server (studio). If it’s on a server, your read speed will be limited by your network speed and the speed of the array the data is hosted on. Depending on the situation, it may be preferable to copy the data locally (provided you have fast local storage). Most compositing applications have built-in disk caching tools which will copy the source data to a location of your choosing, sometimes unchanged but most often slightly processed, chopped up and in RAW format for easier use by the application. When using these tools, the source becomes less important as it will only be a factor on the initial import but it makes your local cache speed all the more important.

With fast IO on your source material (either through fast network server or fast local disk cache) the CPU will come into play. The workstation is a delicate ecosystem. Without proper IO, your CPU will be waiting, without enough CPU, your GPU will wait, without a proper GPU you won’t be able to display fast enough etc… When you scrub through your timeline, the storage device has to feed the requested frames for all the layers to the CPU, process the image and effects and then cache the result in RAM (or scratch disk) and then play it back for you.

With only RAM as a fast cache, you have to store the source and finished results in you end up with only seconds of real-time playback (especially with stereoscopic HD or 2K). If you have a very fast low latency local storage (like a Fusion-io card), you in a sense extend your RAM by the space on the card as the application will run on the Fusion-io device when it runs out of RAM. Using Fusion-io cards, I only have 6GB or RAM in my stereoscopic compositing system as the FIO cards give me performance close to that of RAM.

Data Flow With Traditional Drives in a RAID. (Lower bandwidth, more hurdles)

Date Flow With PCI-E Solid State Memory (Larger Bandwidth, fewer hurdles)

RS: In a recent interview you gave on CG channel, you mention that before you started using your solid state memory cards, you were never able to fully utilize your Quadro FX card. Can you elaborate on that? How do the GPU’s and the Fusion-io cards relate to each other and when does it make sense to buy another GPU or assemble an SLI?
VB: This is an interesting case. Before the Fusion-io cards, I never had enough data throughput to saturate my CPU and GPU. Even with a 4 drive Sata RAID, you only have four reading heads (one per disk), so if you have a 20 layer comp, you aren’t able to feed 20 frames simultaneously to the CPU. With SSD technology you can feed hundreds of images simultaneously, allowing you to easily “feed” the CPU. With the CPU now at 100% utilization, you can quickly process the images and render a result which then goes to the GPU to display. With a four drive raid, I could realistically work on 1080p footage pretty well, but stereoscopic 2K was impossible and 4K was just a pipe dream.
Then I came across the Fusion-io card, a single ioXtreme (workstation card) that delivers over 600MB/s, with two in RAID 0 I have over 1.2GB/s of read throughput. With that much performance, I can now easily work on multilayer 2K or stereoscopic 2K content, in fact I can even work on uncompressed 4K footage interactively. With IO taken care of, the bottleneck becomes the CPU but with a nice dual socket XEON system the bottleneck now becomes the GPU. A single nVidia Quadro FX 5800 can handle 2K, even stereoscopic 2K but once you get into uncompressed 4K footage you can start to benefit from having multiple GPUs. Using a QuadroPlex for instance allow you to support a large 4K monitor and playback in full-resolution. The applications have to be built for multi GPU in order to really benefit but where I find I need multiple GPUs is so that I can have a software GUI to interact with the tools and footage and the second GPU outputs two 2K streams to a RealD theater screen (or Planar 3D monitor) for instance.
I would never have dreamt of connecting my workstation to a theater projector before but now that I can actually work interactively it becomes an option. Multiple GPUs are really mostly for multi monitor support or CUDA/OpenCL programming these days, very few applications benefit from SLI in my experience.

RS: Are there any other hardware considerations one should take into account that can improve the overall performance of a compositing system equipped with IO Drive cards? It would seem to make sense that large main memory is no longer necessary but that may not be the case. How about the CPU? Any special considerations there?
VB: As I mentioned earlier, it’s a delicate balance and really depends on your tasks and the applications you use. In general, if you can fit your file in RAM, you should do so. For instance with 3D content creation, the file has to load into RAM and the GPU, so skimping out on RAM could limit the size of the files you can work with or render. That being said, with compositing and editing it’s a different story – you will never be able to fit your entire shot in RAM (unless it’s very short or you have a huge amount of RAM). So in these situations, where you will be streaming or swapping to disk, having a very fast storage device to pull from and write to does remove the need for tons of RAM.
The general rule of thumb is 2GB per core, so for a quad core system, you should go with 8GB (since DIMMs are in threes now, you would go with 6 or 12GB). Then if you use an application that has a notion of disk cache, media cache or scratch disk you would assign it to the Fusion-io drive. Since the Fusion-io drives have very low latency and extreme throughput, the applications can swap to “disk” with little difference from RAM. If you are working in 2K or stereo 2K, a good professional GPU should do the trick, your limiting factor will become the CPU. Since most compositing rendering is done in the CPU, trying to do a large blur using motion vectors could hold you up. If the application you use is well multi-threaded, a second CPU is a wise investment.
You can find out where your bottleneck is by using Autodesk Composite (which comes free with 3dsmax, Maya and Softimage), simply hover your mouse over a player and hit CTRL+ALT+SHIFT+S and you will see a screen overlay that gives you all the CPU, hard-drive and GPU metrics as you work. This really helps figure out which part of your workflow creates the bottleneck and thus which components you should upgrade for more performance.

RS: Aside from compositing software, have you explored any other uses for these cards in the CG production pipeline? Are there other software tools that can take advantage of them?
VB: There are actually many tasks that benefit from higher throughput. Review and playback of footage is a clear winner, with applications like Tweak RV and Iridas FrameCycler taking full advantage of our speed. Another common task is camera tracking. Trackers and processors are now very efficient and you can easily process multiple frames per second if you have the throughput to handle it, making this is a task that also accelerates greatly with a Fusion-io card. With 3D content creation, we have a limited impact due to the nature of the applications. You mostly benefit form fast load and save, but there is one aspect which benefits immensely: Particle system playback. With large particle system simulations, you have to read cache files from your source drive, which contain positional info for millions of particles. By hosting your particle cache on the Fusion-io drive, you can massively accelerate playback and allow real-time scrubbing of your scene.
There is also a huge play on the server backend. Fusion-io products are primarily used to accelerate databases and benefit form server consolidation. In fact, Fusion-io drives are used by over half of the Fortune 100 companies specifically to accelerate their data access. With such a huge performance boost on the server side, Fusion-io cards are ideal for render farm manager stations (uploading the render data to hundreds of nodes simultaneously). We also have clients using our cards to accelerate their internal messaging systems and asset management databases. Lastly we have clients using our large capacity ioDrive Duo as a “glorified USB drive”, copying film content onto the card, traveling across the world or shipping it and then playing back the footage within seconds of receiving the drive (plug it in a and press play). Benefiting from ECC error correction and flashback technology, the Fusion-io drives are more reliable than any other currently used storage device.
Film productions have started using our products onsite as a local storage device for daily footage. They transfer data from their camera storage (CF cards, hard-drives or express card modules) to the portable Fusion-io powered storage device hooked up to their capture system. (they commonly used USB or Firewire drives). They leave the set with the Fusoin-io card(s) and head back to the studio with the data on an ioDrive Duo, they can immediately start working on the data while it’s being copied to the main storage array at over 900MB/s. This makes the turnaround time for dailies and revue clips infinitely faster.
IO is currently one of the biggest bottlenecks in production. Fusion-io is tackling these issues with not only incredible hardware, but also development partnerships with software vendors helping them optimize their products to better harness our low latency and massive throughput.

RenderSream is pleased to announce our GODBOX Line of compositing workstations that take advantage of these amazing cards. For more information and a quote, please call 512-850-4098 or email us at info@renderstream.com

Tuesday 2 November 2010

Solid State memory - compositing

No comments:

Post a Comment