For us technology geeks, who code and play around with software solutions, the hype around certain topics is sometimes hard to understand. The past years have spawn certain topics that come with a cloak of mystery for the marketing oriented non-tekkies. Let me dive into a few and give you a technical perspective:
Big Data: People might think of a big mountain of files with important information and that picture is OK. But the background of “big data” means: It is too much data to cross-reference and statistically analyze on the fly.
Take the mouse movements on large sites for example. If you record every mouse movement of every visitor to a large enterprise like Amazon.com, you end up with a gazillion pointer coordinates over time. Behind these mouse movements there is a lot to analyze and could enable the site to draw conclusions for a better navigation and ultimately higher revenue by providing a better site for users.
Today the term “big data” just refers to large data sets, that became hard to handle quickly with existing database solutions like Oracle, MySQL or IBM’s DB2.
Which brings us to the next buzzword…
NoSQL databases like CouchDB (see right) or MongoDB store a massive amount (really a lot) of documents and provide them quickly on request. The difference to an Oracle relational database or solutions like MySQL is that a built-in relational functionality is missing for the sake of speedy delivery of data (see “map reduce”). This could cause significant more work in creating a solution you have to develop. That is one of the reasons I prefer to plan the result of my project first and then “go shopping” for solutions I might want to use in the development process.
There is no reason NOT to combine SQL, NoSQL and indexing solutions like SphinxSearch, if that reduces your time to market significantly and “Not Only SQL” starts to make more sense, doesn’t it?
The last buzzword is a bit older and has gained much more meaning in our daily work, especially after consumer products like Dropbox or GDrive have entered our life:
Most of the time developers talk about “the cloud” they just mean: the application is running on one or more virtual computers (yes, like VMware or ZEN) and you can not instantly determine which physical server you are currently using. In that meaning: 2 computers can be a very tiny cloud.
What I do not like about the phrase is the marketing-loaded illusion of stability or safety. If you combine a bunch (let’s say 100) servers, things can go terribly wrong and if the overlaying logic is unable to create instant backups of virtual machines, replicate data quickly and so on, the precious cloud is no safer nor more stable than a well setup single server system with a good backup strategy. Today the reason to go “into the cloud” with your application is scalability and speed.
Running these large databases mentioned before, working on statistics and analytics in the back, creating dynamic web pages individually for each of your visitors and ultimately delivering the pages quickly to a random place on the globe are the fuel for these buzzwords. It gives them a meaning and the right to exist. But these solutions are just tools for you to use to achieve your goal.
Nobody ever created a great product by using Big Data with a NoSQL database in the cloud – it is always a great idea that creates the product.
Your product needs to solve a need and the software and infrastructure in the back is just the way to make the products possible.