The 5v Big Data Characteristic and 3v, 4v, 6v, 7v, 8v, 10v, 17v, 42v

5v Big data are 5 characteristics that reflect the data is big data. At first big data only had 3 characteristics or what is known as 3v big data, but later developed into 4v big data and the latest became 5v big data and even became 6.7,8,10 17 and even 42 characteristics, to simplify the first characteristics we will discuss the meaning of the characteristics of big data.

Characteristics of big data

5v Big Data

What are the characteristics of big data? the characteristics of big data are the properties, features or characteristics that reflect that the data is data that is categorized as big data, for further details the following are the characteristics of big data:

3v Big Data 

What is 3v in big data? Big data 3v is the designation for the 3 characteristics of big data which consist of volume, variety and velocity/=.

The following is an explanation of the 3 characteristics of big data:

1. Volume

What is volume in big data? Volume in big data 3v is defined as the quantity or amount of data generated from many transactions and the volume of data stored.

What are some examples of this data? It can be in the form of user history logs such as browser history, recording transactions on e-commerce, ID card data or Indonesian resident data, customer data on banking and much more.

The size of big data usually uses the Terabytes scale (per 1000 Gigabytes) and the Petabytes size (per 1000,000 Gigabytes), for example based on publications made by Facebook at top-open-data-problems, Facebook generates 400 petabytes per day or 400,000,000 Gigabytes per day, of course data of this size is categorized as big data

2. Variety

What is Variety in 3v big data? This variety means variations in the type and nature of the data, whether the data is structured, semi-structured or unstructured

  • What is structured data or Structured data?

Structured data is data that has elements that can be accessed such as keys (primary keys, relational keys, foreign keys) to be analyzed or data that is stored in a certain format, for example data that is in a relational database or SQL database.

  • Semiterstruktur / Semi-structured data

Information that is not stored in a relational database but has a pattern or is neatly organized so that it is easier to analyze, with a little processing we can store this data in a relational database, for example data in XML and csv files which are often used to export data to databases.

  • Unstructured data / Unstructured data

Information or data that is not well organized due to its nature, or does not have a predefined data model or model that has been defined, for example image files, sound, video, pdf, log files and others

3. Velocity

What is Velocity in big data?

Velocity in 3v big data means the speed in generating data, accessing data and processing data. big data platform and big data analytics software, of course, must be able to process a lot of data as quickly as possible when there is a request, one example of velocity is on the Google search engine, based on data on google must process an average of 40,000 searches every second.

4v Big Data

What is 4v big data? 4v big data is the characteristic of big data which consists of volume, variety, velocity and veracity, at 4v this big data increases Veracity 

4. Veracity 

What is veracity in big data? Veracity in big data 4v means truthfulness, reliability, quality and availability of the data, which can be interpreted in Indonesian, namely that the existing data can be trusted for truth, reliability, quality and can be accessed properly.

An example of veracity in business is when the larger the data, usually this data will be more difficult to maintain or maintain the accuracy of the data, especially in the type of moving data, for example customer information data, company partner data, family data, this data is certain in a few months or Every year, there must be changes.

For example in customer data, there are customers who change cellphone numbers, change emails, change addresses or in the case of data on families, for example families who did not have children last year now have children or have family members who have died.

So why does the company data have to be accurate? because this data is used for big data analytics, of course the data must be accurate otherwise, when doing the analysis it will produce an incorrect analysis, usually in large companies there are those who update / update data, one of which is updating customer data. 

In this activity the company will contact and ask their customers to update data such as age, residential address, cellphone number, email or social media if there is a change so that when the company conducts an analysis it produces an accurate analysis or when they carry out marketing activities, their campaigns in the form of information SMS, product offer emails can be conveyed to customers properly.

5v Big Data

What are the 5v in big data? 5v in big data there is volume, variety, velocity, veracity and value, in big data 5v it has one characteristic, namely value

5. Value

This value is the culmination of the 5v big data and the most important characteristics in business analysis. The value of 5v big data means the value of the data, the value of this data also depends on the content of the data and depends on the skills of the data analyst team who analyzes the data, with the right data and processing, this big data can produce very valuable information to retrieve a decision.

One example of value in 5v big data is information that can be generated by big data in the use case of one Indonesian data, with one Indonesian data, the government can retrieve various data from various ministries or agencies, for example the Indonesian food security program.

In this example of a food security program, the government can analyze Indonesia’s food security using data from the ministry of agriculture, trade and related agencies to see food production capacity, food stocks and Indonesia’s food needs.

With these data and analysis, the government can predict when Indonesia will experience a shortage food stocks, therefore to prevent food shortages in Indonesia, the government can launch a program to increase food production capacity and also food imports to meet domestic food needs,

To see the Indonesian one-data use case, you can read here the Indonesian one-data use case.

After discussing 3v 4v 5v big data it turns out that besides the 3 versions there are also other characteristics there are 6, 7.8 10 and even more, some sources mention several characteristics with the same number but each content of the characteristics is different so it’s rather difficult to determine validity of this source, but for the purpose of adding insight this article will discuss these characteristics. 

6v big data 

6v big data increases by 1, namely Variability

6. Variability

Variability in big data is the variable used which will have an impact on how far and how fast changes occur in the data structure and how often the meaning or form of company data changes.

For example, there are companies that provide novel subscription services, with several options:

  • the price of the digital novel / apps version is IDR 50,000 per month
  • the price of the novel in printed form is IDR 100,000 
  • the price of an internet subscription to a provider is IDR 50,000  

of these options the company will create a questionnaire along with the simulation:

if customers are asked to choose only one of the 3 options above it will look unreasonable because most likely customers will take 2 options, namely novels in printed form and subscribing to the internet because it is more profitable for customers, but if customers are asked to choose between subscribing to novels in print form print and internet subscription, of course customers will choose to subscribe to the internet

From the simulation above it can be seen that the composition of the questions and the rules in the questionnaire will change people’s views and will also change the results of the questionnaire if in technical language, if you change the variables then the big data model will also change

7v big data

in big data 7 v increased Visualization

7. Visualization

Visualization is how we visualize large and complex data using charts and graphs  or other forms of visualization so that readers can more easily understand the data presented than using several excel files or documents full of numbers and formulas 

8v big data 

8 v big data is available into 17v Big data

10v big data

Big data 10 v is included in 17v Big data

17V Big Data

Based on the International Research Journal of Engineering and Technology (IRJET) with the title 17 V big Data.

There are 17 characteristics namely 

NoBig DataCharacteristicsElucidationDescription
1VolumeSize of DataQuantity of collected and stored data. Data size is in TB
2VelocitySpeed of DataThe transfer rate of data between source and destination
3ValueImportance of DataIt represents the business value to be derived from big data
4VarietyType of Data DDifferent type of data like pictures, videos and audio arrives at the receiving end
5VeracityData QualityAccurate analysis of captured data is virtually worthless if it’s not accurate
6ValidityData AuthenticityCorrectness or accuracy of data used to extract result in the form of information
7VolatilityDuration of UsefulnessBig data volatility means the stored data and how long
8VisualizationData Act/ Data ProcessIt is a process of representing abstract
9ViralitySpreading SpeedIt is defined as the rate at which the data is broadcast /spread by a user and received by different users for their use
10ViscosityLag of EventIt is a time difference the event occurred and the event being described
11VariabilityData DifferentiationData arrives constantly from different sources and how efficiently it differentiates between noisy data or important data
12VenueDifferent PlatformVarious types of data arrived from different sources via different platforms like personnel system and private & public cloud
13VocabularyData TerminologyData terminology likes data model and data structures
14VaguenessIndistinctness of existence in a DataVagueness concern the reality in information that suggested little or no thought about what each might convey
15VerbosityThe redundancyThe redundancy of the information available at different
sources because data can be classified into 2, good data and bad data, good data comes from secured,relevant, complete & trustworthy
16VoluntarinessThe will fullThe will full availability of big data to be used according to the context
17VersatilityFlexible ability“The ability of big data to be flexible enough to be used differently for different context.”
18ComplexityCorrelation of DataData comes from different sources and it is necessary to figure out the changes whether small or large in data with respect to the previously arrived data so that information can get quickly

which is translated into Indonesian becomes

Nocharacteristics of Big Dataa brief descriptionDescription
1VolumeSize of DataQuality of data collected and stored. the amount of data used in TB
2VelocitySpeed of DataTransfer speed between data source and destination
3ValueImportance of DataThings that represent the business value generated by big data
4VarietyType of Data Ddata types such as images, video and audio
5VeracityData QualityThe accuracy of the analysis of the data taken, the results of the analysis will be worthless if the data is not accurate
6ValidityData Authenticityvalidity, truth and accuracy of the data used to make an information
7VolatilityDuration of UsefulnessHow long the data is stored
8VisualizationData Act/ Data ProcessHow to visualize or present data
9ViralitySpreading Speedhow fast the data is spread by the user and how fast the data is received by other users for use
10ViscosityLag of Eventthe time difference between when an event occurs and the data for that event is generated
11VariabilityData Differentiationdata comes constantly from different sources
12VenueDifferent PlatformDifferent types of data come from different sources through different platforms such as customer data on the company’s internal website and also data coming from external platforms such as Google Analytics.
13VocabularyData TerminologyData terminology such as data models and data structures
14VaguenessIndistinctness of existence in a DataThe lack of clarity between one data and another
15VerbosityThe redundancyredundancy of information available from various sources
16VoluntarinessThe will fullfull availability of big data used according to the context
17VersatilityFlexible abilitythe ability of big data to adapt flexibly to be used for various
18ComplexityCorrelation of Datacorrelation between one data with other data so that information can be found more quickly

42v of Big Data 

Based on Elder Research there are 42 v of big data, namely:

  1. Vagueness: The meaning of found data is often very unclear, regardless of how much data is available.
  2. Validity: Rigor in analysis (e.g., Target Shuffling) is essential for valid predictions.
  3. Valor: In the face of big data, we must gamely tackle the big problems.
  4. Value: Data science continues to provide ever-increasing value for users as more data becomes available and new techniques are developed.
  5. Vane: Data science can aid decision making by pointing in the correct direction.
  6. Vanilla: Even the simplest models, constructed with rigor, can provide value.
  7. Vantage: Big data allows us a privileged view of complex systems.
  8. Variability: Data science often models variable data sources. Models deployed into production can encounter especially wild data.
  9. Variety: In data science, we work with many data formats (flat files, relational databases, graph networks) and varying levels of data completeness.
  10. Varifocal: Big data and data science together allow us to see both the forest and the trees.
  11. Varmint: As big data gets bigger, so can software bugs!
  12. Varnish: How end-users interact with our work matters, and polish counts.
  13. Vastness: With the advent of the Internet of Things (IoT), the “bigness” of big data is accelerating.
  14. Vaticination: Predictive analytics provides the ability to forecast. (Of course, these forecasts can be more or less accurate depending on rigor and the complexity of the problem. The future is pesky and never conforms to our March Madness brackets.)
  15. Vault: With many data science applications based on large and often sensitive data sets, data security is increasingly important.
  16. Veer: With the rise of agile data science, we should be able to navigate the customer’s needs and change directions quickly when called upon.
  17. Veil: Data science provides the capability to peer behind the curtain and examine the effects of latent variables in the data.
  18. Velocity: Not only is the volume of data ever increasing, but the rate of data generation (from the Internet of Things, social media, etc.) is increasing as well.
  19. Venue: Data science work takes place in different locations and under different arrangements: Locally, on customer workstations, and in the cloud.
  20. Veracity: Reproducibility is essential for accurate analysis.
  21. Verdict: As an increasing number of people are affected by models’ decisions, Veracity and Validity become ever more important.
  22. Versed: Data scientists often need to know a little about a great many things: mathematics, statistics, programming, databases, etc.
  23. Version Control: You’re using it, right?
  24. Vet: Data science allows us to vet our assumptions, augmenting intuition with evidence.
  25. Vexed: Some of the excitement around data science is based on its potential to shed light on large, complicated problems.
  26. Viability: It is difficult to build robust models, and it’s harder still to build systems that will be viable in production.
  27. Vibrant: A thriving data science community is vital, and it provides insights, ideas, and support in all of our endeavors.
  28. Victual: Big data — the food that fuels data science.
  29. Viral: How does data spread among other users and applications?
  30. Virtuosity: If data scientists need to know a little about many things, we should also grow to know a lot about one thing.
  31. Viscosity: Related to Velocity; how difficult is the data to work with?
  32. Visibility: Data science provides visibility into complex big data problems.
  33. Visualization: Often the only way customers interact with models.
  34. Vivify: Data science has the potential to animate all manner of decision making and business processes, from marketing to fraud detection.
  35. Vocabulary: Data science provides a vocabulary for addressing a variety of problems. Different modeling approaches tackle different problem domains, and different validation techniques harden these approaches in different applications.
  36. Vogue: “Machine Learning” becomes “Artificial Intelligence”, which becomes…?
  37. Voice: Data science provides the ability to speak with knowledge (though not all knowledge, of course) on a diverse range of topics.
  38. Volatility: Especially in production systems, one has to prepare for data volatility. Data that should “never” be missing suddenly disappears, numbers suddenly contain characters!
  39. Volume: More people use data-collecting devices as more devices become internet-enabled. The volume of data is increasing at a staggering rate.
  40. Voodoo: Data science and big data aren’t voodoo, but how can we convince potential customers of data science’s value to deliver results with real-world impact?
  41. Voyage: May we always keep learning as we tackle the problems that data science provides.
  42. Vulpine: Nate Silver would like you to be a fox, please.

Those are the disscussion of The 5v Big Data Characteristic and 3v, 4v, 6v, 7v, 8v, 10v, 17v, 42v.

Read also: