In the next series of posts, I will discuss the issues we have dealt with regarding putting together a complete storage solution of your own to handle our genotype and SNP data as well as NGS BAM files.  There are a lot of considerations that need to go into putting together a data repository including hardware, planning, personnel, location, and tool development and integration.  I’ll start by discussing some of the basics behind the hardware decision we’re wrestling with.

Hardware represents one of the toughest and most influential decisions that will be made when putting together a complete data solution.  It can be extremely expensive, requires lots of time and effort to make a good decision, and can make or break your system’s performance and functionality.

At the very least, you’ll need a database server to host your data.  There are dozens, if not hundreds, of companies out there that offer these.  Dell, Oracle, IBM, HP, and many more all have their own solutions to offer.  The important thing is to consider what the hardware will be used for.

Storage array networks (SANs) are essentially high-tech shelving for specialized hard drives.  Similar to the kind used in desktop computers, you can plug in more drives to get additional space and they come chiefly in two varieties: high performance and high capacity.  The former being designed for rapid access to data, the latter for greater storage per disk.  In addition to a SAN, you need a database or file server.  This device is the power behind your system.  A good server will have high RAM and processor speed in order to rapidly search through the SAN to find the data you are looking for.

At the very least, these two pieces of technology are where you’ll start.  However, there are other things to consider adding as well.  If you plan on having people develop and test modifications to tools and programs that operate on a server (for example Galaxy), you might want to get a second server for testing purposes.  Generally, a second server is only necessary when you’re developing new software or making lots of changes to servers and uptime is important.  If you plan on having a web site front-end, a web or application server might be needed to take some of the load off the database server.  These servers are generally cheaper and less powerful and are designed to be a dedicated server for websites and other programs that interact with the database without needing to reside on the same machine.