According to a Forbes article on "Why Big Data Matters" : "Terabytes, Petabytes, Exabytes. Who can keep track? These strange terms have just begun to enter the business lexicon, but the hype surrounding them has reached a fever pitch. We have undoubtedly entered the age of big data". Exactly, we have entered that age. Like any new technology, there is a lot of confusion surrounding big data. There are endless debates about what is and isn’t big data. So first let us clear that.
Big Data can be defines and interpreted in many different ways and why in Big Data Introduction post I defined Big Data in the terms of volume, velocity, and variety attributes. One thing that should be kept in mind that Big Data solutions are not a replacement for our existing warehouse solutions. There are some key principles which should be kept in mind before considering when to use Big Data technologies.
Perhaps the most talk about Big Data usage pattern is social media and customer sentiment . We can use Big Data to figure out what customers are saying about any organization and what are they saying about their competitors. Moreover the organization can use this newly found insight to figure out how this how this sentiments impacts the decision you they are making and the way the company engages. More specifically they can determine how sentiment is impacting sales, the effectiveness of marketing campaign, review of certain product and so on.
Log analytics is common use case for any Big Data project. All logs and trace data that are generated by operations of IT solutions are called as Data Exhaust. Organizations have a lot of Data Exhaust and it's pretty much a pollutant if it's just left around for a couple of hours or days in case of emergency and simply purged .Reason? Because Data Exhaust has concentrated value and IT companies need to figure out the way to store and extract value from it. Some of the value derived from data exhaust is obvious and has been transformed into value added click stream data that records every gesture , click and movement made on website.
By using Big Data platform it's possible to stop fraud. Several challenges in fraud detection pattern are directly attributable to solely utilizing conventional technologies. The most common and recurring theme we will see across all Big Data patterns is limits on what can be stored as well as available compute resources to process our intentions. Without Big Data Technologies, these factors limit what can be modeled. Less data equals constrained modeling.
The philosophy of Big Data is that insights can be drawn from a large volume of ‘dirty’ (or ‘noisy’) data, rather than simply relying on a small number of precise observations. One good example of the success of the ‘Big Data’ approach can be seen in Google’s Flu Trends which uses Google searches to track the spread of flu outbreaks worldwide. Despite the inevitable noise, the sheer volume of Google search data meant that flu outbreaks could now be successfully identified and tracked in near real-time. It is also important to remember that Big Data when used on its own can only provide probabilistic insights based on correlation. The true benefit of Big Data is that it drives correlative insights, which are achieved through the comparison of independent datasets. It is this that buttresses the Big Data philosophy of ‘more data is better data’; you do not necessarily know what use the data you are collecting will have until you can investigate and compare it with other datasets.
The ‘Big Data’ approach has already begun to be incorporated into weather nowcasting, and the Flu Trends disease example provides an excellent allegory for where it can initially prove most useful.
When it comes to solving information management challenges using Big Data technologies, there are few things that we should know.The data bound for analytic warehouse has to be cleansed , document before it is placed in the warehouse having strict schema. On the other hand Big Data Solution not only works on data which is not suited for traditional warehouse environment but also doesn't follow the strictness that traditional warehouse follow before putting data into them.
We can preserve the fidelity of data an gain access to massive volume of information for exploration and finding insights. It's important to understand that the traditional database technologies are important and in fact are relevant part of overall analytic solution. Traditional database technologies become more vital when used together with your Big Data Platform. Broadly it can be conclude that there are some class of problems that don't belong to traditional database technologies (at initial stage). And there is another kind of data that we are not sure of whether of putting in warehouse, may be because we don't now whether it's rich in value, it's structured, or it's too big. Sometimes we can't find out value per byte of data before investing effort and money. At the end of the day organizations want to know whether data is worth saving and has a high value per byte before investing in it.
So what we really need to know about big data is this: It represents a fundamental shift in how we do things. In effect, big data opens the door to a strategy where we no longer try to be “right” based on controlled research and small samples, but rather become less wrong over time as real world information floods in.
Big Data can be defines and interpreted in many different ways and why in Big Data Introduction post I defined Big Data in the terms of volume, velocity, and variety attributes. One thing that should be kept in mind that Big Data solutions are not a replacement for our existing warehouse solutions. There are some key principles which should be kept in mind before considering when to use Big Data technologies.
- Big Data solutions works very well not only for structured data but also well suited for semi structured and unstructured data.
- Big Data solution work best when all of the data or almost all the data is analyzed with respect to sample data.
- Big Data solutions are ideal for iteratory and exploratory analysis, when there is no predetermined business measures on data.
Social Media
Perhaps the most talk about Big Data usage pattern is social media and customer sentiment . We can use Big Data to figure out what customers are saying about any organization and what are they saying about their competitors. Moreover the organization can use this newly found insight to figure out how this how this sentiments impacts the decision you they are making and the way the company engages. More specifically they can determine how sentiment is impacting sales, the effectiveness of marketing campaign, review of certain product and so on.
Log Analytics
Log analytics is common use case for any Big Data project. All logs and trace data that are generated by operations of IT solutions are called as Data Exhaust. Organizations have a lot of Data Exhaust and it's pretty much a pollutant if it's just left around for a couple of hours or days in case of emergency and simply purged .Reason? Because Data Exhaust has concentrated value and IT companies need to figure out the way to store and extract value from it. Some of the value derived from data exhaust is obvious and has been transformed into value added click stream data that records every gesture , click and movement made on website.
Fraud Detection
By using Big Data platform it's possible to stop fraud. Several challenges in fraud detection pattern are directly attributable to solely utilizing conventional technologies. The most common and recurring theme we will see across all Big Data patterns is limits on what can be stored as well as available compute resources to process our intentions. Without Big Data Technologies, these factors limit what can be modeled. Less data equals constrained modeling.
Weather Forecasting
The philosophy of Big Data is that insights can be drawn from a large volume of ‘dirty’ (or ‘noisy’) data, rather than simply relying on a small number of precise observations. One good example of the success of the ‘Big Data’ approach can be seen in Google’s Flu Trends which uses Google searches to track the spread of flu outbreaks worldwide. Despite the inevitable noise, the sheer volume of Google search data meant that flu outbreaks could now be successfully identified and tracked in near real-time. It is also important to remember that Big Data when used on its own can only provide probabilistic insights based on correlation. The true benefit of Big Data is that it drives correlative insights, which are achieved through the comparison of independent datasets. It is this that buttresses the Big Data philosophy of ‘more data is better data’; you do not necessarily know what use the data you are collecting will have until you can investigate and compare it with other datasets.
The ‘Big Data’ approach has already begun to be incorporated into weather nowcasting, and the Flu Trends disease example provides an excellent allegory for where it can initially prove most useful.
Few Things to Remember
When it comes to solving information management challenges using Big Data technologies, there are few things that we should know.The data bound for analytic warehouse has to be cleansed , document before it is placed in the warehouse having strict schema. On the other hand Big Data Solution not only works on data which is not suited for traditional warehouse environment but also doesn't follow the strictness that traditional warehouse follow before putting data into them.
Conclusion
We can preserve the fidelity of data an gain access to massive volume of information for exploration and finding insights. It's important to understand that the traditional database technologies are important and in fact are relevant part of overall analytic solution. Traditional database technologies become more vital when used together with your Big Data Platform. Broadly it can be conclude that there are some class of problems that don't belong to traditional database technologies (at initial stage). And there is another kind of data that we are not sure of whether of putting in warehouse, may be because we don't now whether it's rich in value, it's structured, or it's too big. Sometimes we can't find out value per byte of data before investing effort and money. At the end of the day organizations want to know whether data is worth saving and has a high value per byte before investing in it.
So what we really need to know about big data is this: It represents a fundamental shift in how we do things. In effect, big data opens the door to a strategy where we no longer try to be “right” based on controlled research and small samples, but rather become less wrong over time as real world information floods in.