Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have went through a lot of articles but I dont seem to get a perfectly clear answer on what exactly a BIG DATA is. In one page I saw "any data which is bigger for your usage, is big data i.e. 100 MB is considered big data for your mailbox but not your hard disc". Whereas another article said "big data to be usually more than 1 TB with different volume / variety / velocity and couldn't be stored in a single system". Also that data should be stored in a NOSQL db with Hadoop used to transform data.
Further, I have been working on a solution and was wondering if I could classify it as a big data. Snippets on the solution below,
- Millions of raw data records and usually 500 plus GB of data.
- SQL database as back-end and SSIS / SQL queries to cleanse/process the data and convert it to a meaningful form.
- Visualization using Spotfire
Any help would be much appreciated. Thank you !
Big data is nothing but an assortment of such huge and complex data that becomes very tedious to capture, store, process, retrieve and analyze it.
From ibmbigdatahub artcile and edureka article
Bigdata can be defined in terms of four Vs.
Volume : The main characteristic that makes data “big” is the sheer volume. It could amount to hundreds of terabytes or even petabytes of information. For instance, 15 terabytes of Facebook posts or 400 billion annual medical records could mean Big Data!
Velocity: Velocity means the rate at which data is flowing in the companies. Big data requires fast processing. Time factor plays a very crucial role in several organizations. For instance, processing 2 million records at share market or evaluating results of millions of students applied for competitive exams could mean Big Data!
Variety : Big Data may not belong to a specific format. It could be in any form such as structured, unstructured, text, images, audio, video, log files, emails, simulations, 3D models, etc.
Veracity: Veracity refers to the uncertainty of data available. Data available can sometimes get messy and maybe difficult to trust. With many forms of big data, quality and accuracy are difficult to control
Big data is:
When a big boss believes this is a big opportunity because data is the new oil and gold, and get a big pile of money to throw out a window and flush it down the bowels. And then your data warehouses and silos turn into a data lake and the data lake full of synergy into a data swamp full of bit rot; where the big vision hits the reality that not everything that shines is gold. And then the gates of doom open and there it comes, the big bubble that is about to burst. The bridge over the through of desillusionment is small, and thou shall not pass, but tumble into the big abyss where all useless data go, no latter how eagerly it was collected and mapped and reduced without plan or objective. Bingo!
The Big Data Definitions & Taxonomies Subgroup of the NIST Big Data Public Working Group released a volume on definitions NIST Big Data Interoperability Framework: Volume 1, Definitions
Quotes:
Big Data refers to the inability of traditional data architectures to
efficiently handle the new datasets. Characteristics of Big Data that
force new architectures are:
- Volume (i.e., the size of the dataset);
- Variety (i.e., data from multiple repositories, domains, or types);
- Velocity (i.e., rate of flow); and
- Variability (i.e., the change
in other characteristics).
These characteristics—volume, variety,
velocity, and variability—are known colloquially as the ‘Vs’ of Big
Data
and:
Big Data consists of extensive datasets—primarily in the characteristics of volume,
variety, velocity, and/or variability—that require a scalable architecture for efficient
storage, manipulation, and analysis.