Fort Worth Linux Users Group Discussion Board

by **TexWebslinger** » Sat Nov 28, 2009 12:00 pm

Hey thought I would get your thoughts on this project. The numbers are rather staggering.....

I have a client that does video production. They generate 6 hours of material each and every week. They wish to encode this material into various formats. Generally speaking, they are talking about 6 renditions of each generating about 30gig per 2 hour production. Oh, they wish to redo the past productions in this way. There are about 3000 past productions. Yes, 3000 x 30 =90,000 gig guess that's 90 Teribytes ! To use Blueray as a permanent storage media, that's 3000 disks!

They were wanting to do a single server with various virtual machines working simultaneously. Their issue using a winders os was all the updates and maintaince for each machine they had. I'm not sure, why you have a basically static unit in need of updates? Nor virus concerns if the machine is never exposed to the net or outside media? I told them they would be better off with a rendering farm. The main issue is going to be storage, and through put to me. I have seen 2TB sata3 USB drives, but seeing 45 of these or more in a systems just doesn't seem, possible or reliable to me.Yes, I know the spec. What what is the real world say? Anyone done usb raid systems with 90 to 100 drives!

I would think to build a number of encoding boxs, that would generate the various versions of each production. Each of these boxs would burn a blue ray as a master. They could, if the need was really there, copy their files to a network file server in order to keep them all onlline. I know that this machine would need some front end manager in order to select the item to burn a disc/dvd or blue ray for distribution.

They naturally keep all this production off the lan to the internet.
I am not sure if their software will run in a virtual machine, yet. Some software requires direct access to devices. I had a game that would not run in a virtual machine.

WHAT ARE YOUR THOUGHTS!!!!!!?

by **maczimus** » Sun Nov 29, 2009 12:34 am

heard of this? although i couldn't imagine 90 terabytes....

http://www.drobo.com/

by **Davemon** » Sun Nov 29, 2009 11:53 am

Interesting project Wayne.

I am thinking that encoding the videos in 6 different formats is like having a book written in 6 different fonts. Would be much easier to have a link to VLC on their download page, which will play any format. AND cut down on a significant amount of hardware.

You have non IT people who don't understand that there are easier less expensive ways to do things.

Unless you have expanded the server room at your building, I'm pretty sure there isn't enough room. Not to forget the heat that will be generated from some kind of 100 Terabyte storage array, or the extra energy consumption.

If they are footing the bill, power to them. I can see easily 30k $ for this project.

Davemon

by **Terry** » Sun Nov 29, 2009 6:12 pm

Check this out: http://www.aberdeeninc.com/abcatg/backup-server.htm

Looks like they can build a 96TB server for $27,480 Hummm... that's a lotta $$$ but then I guess that's a lotta server...

Dave, your $ estimate appears to be pretty much on target.
But this being all one server makes it doable and no problem with space requirements - looks like just the ticket for a project like this.

by **Terry** » Mon Nov 30, 2009 4:35 am

maczimus wrote:heard of this? although i couldn't imagine 90 terabytes....

http://www.drobo.com/

Drobo.com is offering the "DroboPro 32TB Business Bundle" at $ 6,599.00 ea.
Although it would be 6 boxes to string together the price appears to be only $21k.

by **TexWebslinger** » Mon Nov 30, 2009 11:54 pm

A) No one said anything of the location of the server(s) Dave

B) They are on TV and radio in over 28 states, and on in several continents
C) End users are NOT COMPUTER literate and would like a one click solution to things.
D) Data is not replaceable. Redundant multiple backups is a safe way to insure the survival of the data.

With a budget that is globally based on the product, whatever the cost it is what it is. I figured a single server would take over 5 years
to do all the backup.

I suppose your reference to media format and font maybe worth something. After all the bill board shows up a 8pt rather well.
Their point is that everyone has their way to do things. If there are N ways, most likely there will be at least N+1 real ways

It will always change, because technology changes. The product is not the format, it is the data. How ever the end user wants
the data, they need it that way. My cell requires 3gp, so I can't do YOUTUBE for example. Well not easily. I can't do flash on my
cell. Yet a low quality 3gp would be too low quality to watch on the computer or HDTV. There is not one answer, but as many as
there are users.

The questions are;
a) is what is the best technical design of the project for production and cost benefits.
b) Who has worked with Virtual Machines to give input on to what the expectations are.
c) File transfer to the local and network file servers- bandwidth issues?
d) How to best match the computational power to encode with the io controller and drive throughput.

They wanted to have one super server running multiple vm each doing multiple encodings. It was my thought that although with
enough cpu and ram this 'may' work. That the system would quickly overload any single device controller for the hard drives, even a
raid system. Writing up to 6(30gig) to 36(180gig) files all at full speed. We are not talking about a burst rate but a sustained rate.
Is the limiting factor, cpu/software/io controller/drives? The system needs to be as balanced.

Not to mention the single point of failure issue. If you lost one encoder it could keep going on the others.....

I still maintain that a group of smaller servers would do the job more reliably and faster. It is my fear that the encoding software
may not function in a vm. I am testing that this week.

I look forward to your input and words of wisdom.

Thanks in advance
w

by **Terry** » Tue Dec 01, 2009 4:04 pm

TexWebslinger wrote:A) No one said anything of the location of the server(s) Dave
B) They are on TV and radio in over 28 states, and on in several continents
C) End users are NOT COMPUTER literate and would like a one click solution to things.
D) Data is not replaceable. Redundant multiple backups is a safe way to insure the survival of the data.

With a budget that is globally based on the product, whatever the cost it is what it is. I figured a single server would take over 5 years
to do all the backup.

Why would one big server take any longer than a bunch of smaller ones?

I suppose your reference to media format and font maybe worth something. After all the bill board shows up a 8pt rather well.
Their point is that everyone has their way to do things. If there are N ways, most likely there will be at least N+1 real ways
It will always change, because technology changes. The product is not the format, it is the data. How ever the end user wants
the data, they need it that way. My cell requires 3gp, so I can't do YOUTUBE for example. Well not easily. I can't do flash on my
cell. Yet a low quality 3gp would be too low quality to watch on the computer or HDTV. There is not one answer, but as many as
there are users.

The questions are;
a) is what is the best technical design of the project for production and cost benefits.
b) Who has worked with Virtual Machines to give input on to what the expectations are.
c) File transfer to the local and network file servers- bandwidth issues?
d) How to best match the computational power to encode with the io controller and drive throughput.

They wanted to have one super server running multiple vm each doing multiple encodings. It was my thought that although with
enough cpu and ram this 'may' work. That the system would quickly overload any single device controller for the hard drives, even a
raid system. Writing up to 6(30gig) to 36(180gig) files all at full speed. We are not talking about a burst rate but a sustained rate.
Is the limiting factor, cpu/software/io controller/drives? The system needs to be as balanced.

Not to mention the single point of failure issue. If you lost one encoder it could keep going on the others.....

I still maintain that a group of smaller servers would do the job more reliably and faster. It is my fear that the encoding software
may not function in a vm. I am testing that this week.

So how many would you prefer? 6 12 24 48 96 or...?

It would be a WHOLE lot simpler to have all the data on one server.

And I would just keep it all in one format and simply convert to some other format as needed.

I look forward to your input and words of wisdom.
Thanks in advance
w

by **ulot** » Wed Dec 02, 2009 3:05 pm

My first thought without a good understanding of the constraints would be a SAN connected to virtualization servers. It is costly, but for good reason. Given that the data is the so important it sounds like this justifies those costs. If no material = no company, then it is hard to not justify it. Fiber channel could be used if the I/O level is really that great. There isn't much that matches the I/O performance. The scalability and data protection of a SAN is worthwhile too if this is an ongoing growth.

I can't think of any reason why a virtual machine couldn't run the encoding, but I have to admit I've never used it for that. If continuous workflow is important, I would recommend going with two virtualization servers. That way if one of the servers fails you can move the VMs between the servers. In theory if they decide to switch to encoding on the fly, resources could be dynamically allocated to the VMs to meet demand. More servers can be added if the demand surpasses the hardware available.

Again take this with a grain of salt since I only have a vague notion of the project.

by **TexWebslinger** » Mon Dec 07, 2009 3:56 am

Terry,

Encoding does take some time. Depending on the method and settings it may be a 1:1 ration. So if it is a 2 hour data file, it would take about 2hours for at least one of the 6 formats to encode. The fastest may be on the order of 30-45 minutes. So to do the 6 streams, would take about 9 hours? If one machine is doing this, that would take 9 hours by 3000 or about 27000 hours. being that there is about 8760 hours per year that would take over 3 years. This is not feasible.

Having multiple X servers would cut the time by X; ie if the project to get caught up would take 3 years, then 3 servers would make that 1 year

Generating on the fly, is not feasible. Now, your having to generate, or regenerate a 2 hours data file per request rather than having the image for transfer. The transfer may take an hour to burn a disk, a disk that has more than one of these files. In other words 3 or 4, or more 2hours encodings to make a 1 hour disk.(maybe up to 8hours play time) I believe over 1000 are done per week. I may be wrong on this. Many files will be onlline, not nearly ALL of them

FORMATS
Well it appears that there is a need for the formats or they would not be asking for it.

Redundancy of Server
With one server it may be simpler, but the point of failure for the entire production, is ONE machine. The ONE machine is limited to the amount that it can produce.

Lets put this in another way. We have a mountain to level. We have a pickup truck and a backhoe. That back hoe can fill that truck with one scoop. Now that truck is overloaded and has to limit
its speed to under 40 mph or risk blowing tires or brake failure. It takes hours for it to go to a location and to unload the load. Mean while everything is waiting for that truck to come back for another load. If you had 10 truck that were timed so that they could all be in motion at the same time, then the backhoe crew is not wasting time. If one truck fails then it is not critical. The rest may continue while that would is repaired. Now if you lose the backhoe, well...

My idea is to keep one production per rendering server. Each rendering server produces each of the versions for that production. Each version will take different times to complete. So to try to have say 6 servers all doing a version may be nice from a software install and function but it takes the sync of the machines all out of time. What I mean is if the machine completion times are;
Machine/Completion time
A)2hr
B)1:50
C)1:40
D)1:25
E)1:10
F) :45

You can see by the time you have done even 10 productions that the time sync is out more than say 11 hours from the slowest to fastest. So in less than one day of running, your 11 or more hours out of sync. For a project that is taking 3 years or more..this gap is huge..

It would be much easier to mirror the servers. I don't mean a raid system. I mean several systems, 6 for example. All the same hardware, software images and all taking one production in a batch stream for all version. When completed it burns a master BlueRay and updates some master network file server. The BlueRay may be shipped offsite for storage. All version of that production is on that one disk. The nfs would allow in house clients to pull and burn whatever is required.

A concern I have is throughput of the disk io. Most systems are designed for burst rates, either a search read, or a file update or even a copy. We are not speaking of that in a production mode. Almost 100% of the time is in writing to the disk. From day one, disk IO is slower than the computing power. I am concerned that the bottle neck is disk IO. To build a super computer I fear would make this issue even worse. This is especially true if the machine is running concurrent vm that are all writing. The more files open the more the heads have to travel to update each file, and the more fragmented on the fly the files are and the slower the writes.

With my idea of each rendering server, the io is sequential and has 100% resources in each batch step of the various versions. When completed it would do the batch copy to the NFS. This would try to keep the fragmentation as low as possible. It would keep head access to a minimum and help to increase reliability by not beating the drives to death...

MAINTAINCE
Since these rendering servers are not on the net, and do one thing, over and over..Software should not need to be "updated". Part of this virtual machine is to have the "same machine" on each real machine. So the issues are, how to move data, not update the machine. IF the hardware can run multiple vm's, great! But it is my thought that one server will be running one vm at best.

SOFTWARE
I'm not sure if the software that is currently used will even do the batch stream at this point. So I'm not sure about the hands free production efforts. YET...

I will check into the SAN servers..thanks

THANKS as always for all the thoughts and comments!!!!!

by **ulot** » Mon Dec 07, 2009 4:53 pm

Is the encoder a multi-threaded or single threaded process?

by **TexWebslinger** » Fri Dec 11, 2009 12:19 pm

That is a great question, that I don't have proper answer for. I started the vm last week, and downloaded the software, but have not installed it yet. So I have no first hand knowledge of it yet.

My guess would be single thread, tho a hope would be multi thread.

I will have this at the meeting Saturday..

Fort Worth Linux Users Group Discussion Board

Thoughts on a Server Farm? Production encoding

Thoughts on a Server Farm? Production encoding

Re: Thoughts on a Server Farm? Production encoding

Re: Thoughts on a Server Farm? Production encoding

Re: Thoughts on a Server Farm? Production encoding

Re: Thoughts on a Server Farm? Production encoding

Re: Thoughts on a Server Farm? Production encoding

Re: Thoughts on a Server Farm? Production encoding

Re: Thoughts on a Server Farm? Production encoding

Re: Thoughts on a Server Farm? Production encoding

Re: Thoughts on a Server Farm? Production encoding

Re: Thoughts on a Server Farm? Production encoding

Who is online