{"id":16339,"date":"2024-07-09T09:28:03","date_gmt":"2024-07-09T02:28:03","guid":{"rendered":"https:\/\/fpt-is.com\/en\/?post_type=goc_nhin_so&#038;p=16339"},"modified":"2024-07-17T16:48:24","modified_gmt":"2024-07-17T09:48:24","slug":"dark-data-unveiling-the-darkness","status":"publish","type":"goc_nhin_so","link":"https:\/\/fpt-is.com\/en\/insights\/dark-data-unveiling-the-darkness\/","title":{"rendered":"Dark Data &#8211; Unveiling the Darkness"},"content":{"rendered":"<h2><strong>1. Have you heard of Data that is \u201cDark\u201d?<\/strong><\/h2>\n<p><span style=\"font-weight: 400\">Imagine a photography-lover meticulously capturing every sunrise, interesting street corner, and delicious meal he encounters. His camera roll overflows with thousands of images, but most remain unedited, unorganized, and unseen. Buried within this digital archive of moments could be stunning portfolio pieces or heartwarming memories, but the photographer is overwhelmed by the sheer volume. This scenario exemplifies the phenomenon of &#8220;picture hoarding&#8221;.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Imagine you meticulously track your expenses using a budgeting app, but never analyze your years of old receipts tucked away in a drawer. Buried within those forgotten purchases could be hidden patterns \u2013 a tendency to overspend at certain restaurants, a category consistently exceeding your budget. This untapped trove of information is a personal example of\u00a0 &#8220;data hoarding&#8221;.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Those hoarding actions can be significant contributors to the term \u201cdark data\u201d.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Dark data also lurks within organizations, stemming from a vast collection of information gathered during regular business activities but left unused. Businesses typically collect dark data alongside data of more current value to a company. Sometimes the company collects specific data thinking it will use it in the future but actually does not. Sometimes data is collected just because it can be collected, even though there\u2019s no real use for it.<\/span><\/p>\n<p><span style=\"font-weight: 400\">\u00a0Dark data may be any or all of the following: Older, Incomplete, Incompatible, Redundant, Irrelevant. To most companies, dark data has little or no perceived value. In many instances, the company doesn\u2019t even know it exists <\/span><span style=\"font-weight: 400\">. It&#8217;s a hidden potential for improved operations, product development, and customer satisfaction, waiting to be unlocked.<\/span><\/p>\n<h3><strong>What is Dark Data?<\/strong><\/h3>\n<p><span style=\"font-weight: 400\">Gartner defines dark data as information assets collected, processed, and stored during regular activities but not used for further analysis <\/span><span style=\"font-weight: 400\">. Think of it as digital clutter accumulating in the background, hindering your ability to see the bigger picture.<\/span><\/p>\n<p><span style=\"font-weight: 400\">While the term &#8220;dark data&#8221; might sound ominous, it&#8217;s simply information waiting to be harnessed. Just like a cluttered attic might hold forgotten treasures, dark data can hold valuable insights. Dark Data is a friend, not a foe. Furthermore, dark data is a common occurrence in our information age \u2013 the sheer volume of data we generate daily, coupled with the lack of robust data management strategies, almost guarantees its existence.<\/span><\/p>\n<p><span style=\"font-weight: 400\">A staggering amount of data qualifies as dark data. Research suggests over half, and potentially up to 75% or more <\/span><span style=\"font-weight: 400\">, of a company&#8217;s information remains unused. That&#8217;s a significant portion of valuable insights gathering dust!<\/span><\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-16345 aligncenter\" src=\"https:\/\/cdn.fpt-is.com\/en\/sites\/3\/d1-1719107036.png\" alt=\"D1 1719107036\" width=\"425\" height=\"212\" \/><\/p>\n<p style=\"text-align: center\"><span style=\"font-size: 10pt\"><em><span style=\"font-weight: 400\">The dark data brief view<\/span><\/em><\/span><\/p>\n<h3><strong>Examples of Dark Data<\/strong><\/h3>\n<p><span style=\"font-weight: 400\">If it\u2019s dark, it\u2019s not that easily discovered. So where to look for dark data? What are the tell-tale signs? We can look at some examples below:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><b>Structured data:<\/b>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Sensor data: Manufacturing plants and logistics companies use an array of sensors to monitor everything from temperature fluctuations to machine performance. This data, while neatly organized, remains dark if not analyzed to identify potential equipment failures or optimize production processes.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Server log files: Every click, search, and page view on your company website is recorded in server logs. Without analyzing these patterns, you can miss opportunities to optimize user experience.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Others: electronic bank statements, medical records\u2026<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400\"><b>Semi-structured data:<\/b>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Customer surveys: Businesses conduct customer surveys to gather feedback. These surveys often remain semi-structured data if not properly categorized and analyzed using sentiment analysis tools.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Customer service call transcripts: Call center holds a wealth of semi-structured information. Customer frustrations, product feedback, and feature requests are embedded within call transcripts.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Others: HTML code, invoices, graphs, tables and XML documents\u2026<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400\"><b>Unstructured data:<\/b>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Machine log files: Complex machinery generates vast amounts of log data. Without proper tools to parse and analyze this unstructured data, it remains a cryptic record of machine activity, offering no insights into potential maintenance issues or areas for performance improvement.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Social media mentions: Brand mentions, customer reviews, and competitor analysis can be gleaned from social media platforms. This unstructured data requires sentiment analysis tools to transform it into actionable insights.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Others: Email correspondences, PDFs, text documents, call center recordings, chat logs and surveillance video footage\u2026<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">While the sheer volume of untapped potential within dark data is evident, its existence isn&#8217;t a recent discovery. To understand how we arrived at this point, let&#8217;s look at the timeline below.<\/span><\/p>\n<h3><span style=\"font-weight: 400\"><strong>Dark Data: A Timeline of Discovery<\/strong> <\/span><\/h3>\n<ul>\n<li style=\"font-weight: 400\"><b>2012:<\/b><span style=\"font-weight: 400\"> &#8220;Dark data&#8221; emerges, highlighting the challenge of stored data with unknown value.<\/span><\/li>\n<li style=\"font-weight: 400\"><b>2013:<\/b><span style=\"font-weight: 400\"> Gartner refines the concept and explores analysis methods.<\/span><\/li>\n<li style=\"font-weight: 400\"><b>2015:<\/b><span style=\"font-weight: 400\"> IBM reveals the dark side of unused sensor data in the age of IoT.<\/span><\/li>\n<li style=\"font-weight: 400\"><b>2016:<\/b><span style=\"font-weight: 400\"> A study shows a vast amount of data remains hidden from key decision-makers.<\/span><\/li>\n<li style=\"font-weight: 400\"><b>2017:<\/b><span style=\"font-weight: 400\"> Major acquisitions signal efforts to unlock dark data&#8217;s potential.<\/span><\/li>\n<li style=\"font-weight: 400\"><b>2018:<\/b><span style=\"font-weight: 400\"> The definition expands to encompass hidden data beyond traditional sources.<\/span><\/li>\n<\/ul>\n<h2><strong>2. The cost of Darkness<\/strong><\/h2>\n<p><span style=\"font-weight: 400\">In today&#8217;s data-driven world, failing to utilize all available information can be a significant disadvantage. Dark data, the vast amount of unanalyzed information collected by organizations, presents a hidden cost with untapped potential. Here&#8217;s why you should be concerned about dark data:<\/span><\/p>\n<h3><strong>Financial Burden<\/strong><\/h3>\n<ul>\n<li style=\"font-weight: 400\"><b>Storage Costs:<\/b><span style=\"font-weight: 400\"> Storing unused data requires physical or digital infrastructure, leading to increased expenses as data volume grows. A Veritas study reveals that 52% of the average company\u2019s data storage budget is spent on dark data. This translates to millions of dollars wasted on storing information with no current value. Your company is probably devoting half of your budget to store data you don\u2019t use.\u00a0<\/span><\/li>\n<\/ul>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-16344 aligncenter\" src=\"https:\/\/cdn.fpt-is.com\/en\/sites\/3\/d2-1719107034.png\" alt=\"D2 1719107034\" width=\"425\" height=\"352\" \/><\/p>\n<p style=\"text-align: center\"><span style=\"font-weight: 400;font-size: 10pt\"><em>Veritas Study<\/em> <\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><b>Regulatory Compliance:<\/b><span style=\"font-weight: 400\"> Data privacy laws apply to all data, even dark data, leading to potential fines for non-compliance.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Inefficiencies:<\/b><span style=\"font-weight: 400\"> Managing large data sets, including dark data, slows down retrieval and analysis, reducing productivity and increasing labor costs.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Security Risks:<\/b><span style=\"font-weight: 400\"> Dark data can be a security liability, increasing the risk of breaches and data loss.\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">A 2019 study revealed companies like Netflix spend millions storing data on AWS, a significant portion of which might be dark data. Similarly, data breaches involving dark data can incur hefty fines, as seen in the Equifax case (Settlement: $1.38 billion) <\/span><span style=\"font-weight: 400\">.<\/span><\/p>\n<h3><strong>Missed Opportunities<\/strong><\/h3>\n<ul>\n<li style=\"font-weight: 400\"><b>Limited Data Analysis:<\/b><span style=\"font-weight: 400\"> Analytics tools produce the highest quality of data analysis when they have access to complete data. The lack of access to dark data limits the pool of analyzable information. A 2015 IBM report highlights that 60% of dark data loses value rapidly after generation <\/span><span style=\"font-weight: 400\">.<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Unexploited Potential:<\/b><span style=\"font-weight: 400\"> Untapped dark data holds valuable customer, business, and operational insights. This data can reveal crucial information on customer behavior, network security patterns, and investment trends. Competitors leveraging dark data can gain an edge, leading to lost revenue or market share for those who don&#8217;t.<\/span><\/li>\n<\/ul>\n<h3><strong>Security Concerns<\/strong><\/h3>\n<p><span style=\"font-weight: 400\">Unsecured dark data can be exploited by attackers seeking operational insights or document structures within an organization. This can lead to data leaks or regulatory fines if proper data inventory and access controls are not implemented. Information integrity is vital, and businesses must ensure the source and quality of data used for analysis.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Fortunately, advancements in technologies and analytics offer solutions for handling dark data. These techniques allow for large-scale, cost-effective, and automated analysis, minimizing the resources needed to unlock the value of dark data. Also, by employing the right strategies, organizations can transform dark data from a hidden cost into a competitive advantage. The next section of this paper talks about how we can harness the power of dark data.<\/span><\/p>\n<h2><strong>3. Harness the power of Dark data<\/strong><\/h2>\n<p><span style=\"font-weight: 400\">Between 2022 and 2023 alone, the data lake market witnessed a surge, with its value projected to reach over $34 billion by 2030 <\/span><span style=\"font-weight: 400\">. However, the initial promise of data lakes \u2013 that simply having all your data in one place would unlock insights \u2013 hasn&#8217;t always materialized. Much of this data remains unstructured and unused, transforming the data lake into a dark data swamp. Organizations are recognizing the need for a more sophisticated approach to data management. The following three-step roadmap is a proposed solution to tackle this challenge &#8211; to shed light on Dark Data.<\/span><\/p>\n<p><strong>Step 1: Laying the Foundation<\/strong><\/p>\n<p><span style=\"font-weight: 400\">Our journey begins with establishing a solid foundation. This first part focuses on two key areas:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><b>Data Assessment:\u00a0 <\/b><span style=\"font-weight: 400\">Here, we move beyond the data lake by conducting a thorough data assessment. Look beyond traditional sources like ERP and Point-of-Sale (POS) systems. Server logs, social media interactions, sensor data \u2013 all these can be potential goldmines of dark data. As the saying goes, &#8220;All dark data should be traceable to a source&#8221; Data audits play a crucial role, revealing sources like customer transactions, system logs, or even data streams from Internet of Things (IoT) devices <\/span><span style=\"font-weight: 400\">.<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Data Governance:<\/b><span style=\"font-weight: 400\">\u00a0 The first step is to build a strong data culture within your organization is start building proper governance. This involves setting clear ownership and access control protocols, defining data retention policies based on compliance and value, and fostering a strong data culture within your organization. Tools like IBM Watson Knowledge Catalog is one prominent candidate to execute large-scale data governance for a corporation.<\/span><\/li>\n<\/ul>\n<p><strong>Step 2: Adopting Tools for Transformation<\/strong><\/p>\n<p><span style=\"font-weight: 400\">Once you&#8217;ve identified your dark data and established good governance practices, it&#8217;s time to equip yourself with the right tools. Here, there are three key areas for transformation:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><b>Data Classification:<\/b><span style=\"font-weight: 400\"> You can classify data based on business needs and compliance requirements, prioritizing the most valuable information for further exploration. Tools like IBM Watson Knowledge Catalog with its Automated Discovery (AD) and Quick Scan (QS) functionalities can help you understand the purpose and potential usefulness of your dark data. Quick Scan is extremely fast and built for a shallow analysis of millions of data elements.\u00a0<\/span><\/li>\n<\/ul>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-16343 aligncenter\" src=\"https:\/\/cdn.fpt-is.com\/en\/sites\/3\/d3-1719107032.png\" alt=\"D3 1719107032\" width=\"425\" height=\"210\" \/><\/p>\n<p><span style=\"font-weight: 400\">For some use cases, there is a need for a very deep investigation of a more limited number of data elements that an enterprise would define as critical to their business. Automated Discovery offers the features needed for a deep analysis and investigation of critical data elements within an enterprise.<\/span><\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-16342 aligncenter\" src=\"https:\/\/cdn.fpt-is.com\/en\/sites\/3\/d4-1719107030.png\" alt=\"D4 1719107030\" width=\"423\" height=\"234\" \/><\/p>\n<ul>\n<li style=\"font-weight: 400\"><b>Data Extraction:<\/b><span style=\"font-weight: 400\"> Unlocking the secrets within your dark data requires specialized tools. Here are a few options to consider: DeepDive (open source developed by Standford University), Amazon Textract from Amazon Web Services (AWS), or Dark Vision ( technology demonstrator that uses IBM Watson services to extract dark data from videos). These tools can extract valuable information from various formats like text, images, and even video data.<\/span><\/li>\n<\/ul>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-16341 aligncenter\" src=\"https:\/\/cdn.fpt-is.com\/en\/sites\/3\/d5-1719107028.png\" alt=\"D5 1719107028\" width=\"424\" height=\"192\" \/><\/p>\n<p style=\"text-align: center\"><em><span style=\"font-weight: 400\"><span style=\"font-size: 10pt\">How to use Amazon Textract to extract data from any Image &amp; PDF<\/span> <\/span><\/em><\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-16340 aligncenter\" src=\"https:\/\/cdn.fpt-is.com\/en\/sites\/3\/d6-1719107026.png\" alt=\"D6 1719107026\" width=\"425\" height=\"282\" \/><\/p>\n<p style=\"text-align: center\"><span style=\"font-weight: 400;font-size: 10pt\"><em>How Dark Vision processes videos to discover what&#8217;s inside of them<\/em> <\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><b>Data Visualization:<\/b><span style=\"font-weight: 400\"> Implement tools that allow you to see the bigger picture by bringing data from all sources, including dark data, onto a single platform. This helps identify trends and insights hidden within the data that might not be readily apparent in its raw form.<\/span><\/li>\n<\/ul>\n<p><strong>Step 3: Embracing the Future<\/strong><\/p>\n<p><span style=\"font-weight: 400\">The final part of the journey focuses on long-term strategies for maximizing the value of your dark data:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><b>Cloud Storage:<\/b><span style=\"font-weight: 400\"> Consider migrating data storage to the cloud for improved accessibility, scalability, and real-time data processing. Cloud platforms like Google Cloud Platform (GCP) with its suite of tools (Cloud Vision API, Document AI, AutoML, Natural Language Processing (NLP) API) offer functionalities specifically designed to handle dark data.<\/span><\/li>\n<li style=\"font-weight: 400\"><b>AI and Machine Learning Adoption:<\/b><span style=\"font-weight: 400\"> Invest in AI and Machine Learning tools like Snorkel (open source developed by Stanford University) and Azure Cognitive Services from Microsoft (with functionalities like Computer Vision, Form Recognizer, Text Analytics). These tools can process, analyze, and secure your dark data at scale, identifying patterns, exceptions, and potential business insights within the data. Additionally, Intelligent Document Processing (IDP) solutions that combine Robotic Process Automation (RPA) and AI can be instrumental in extracting valuable information from various document formats.<\/span><\/li>\n<\/ul>\n<h3><strong>Some additional considerations<\/strong><\/h3>\n<ul>\n<li style=\"font-weight: 400\"><b>Security:<\/b><span style=\"font-weight: 400\"> Ensure all data, including dark data, is properly secured to mitigate cyber security risks. Apply strong encryption standards to your data, including in-house server data and that which is in cloud storage.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Compliance:<\/b><span style=\"font-weight: 400\"> Stay updated on data privacy regulations and ensure your dark data management practices are compliant. The recent implementation of Vietnam&#8217;s Decree 13 on Personal Data Protection (effective July 2023) adds another layer of urgency to investigating dark data. This regulation empowers individuals with the right to access and erase their personal information. Fulfilling these rights effectively may require organizations to delve into their dark data repositories to identify and manage this personal data. Failure to do so could lead to non-compliance with Decree 13, potentially resulting in fines or reputational damage. This highlights the growing importance of proactively classifying and understanding dark data to ensure adherence to evolving data privacy regulations like Decree 13.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Like dark matter in physics, dark data represents a vast amount of unseen information with hidden potential. By understanding what dark data is and how it accumulates, businesses can take steps to manage it more effectively. This can involve implementing data governance strategies, cleaning up and organizing information, and investing in tools to analyze different data formats. Shedding light on dark data can unlock valuable insights and empower businesses to make better decisions, improve customer experiences, and optimize operations.<\/span><\/p>\n<table style=\"border-collapse: collapse;width: 100%\">\n<tbody>\n<tr>\n<td style=\"width: 100%\"><strong>Exclusive article by FPT IS Expert<\/strong><\/p>\n<p><em>Author Tran Minh Chau &#8211; <span style=\"font-family: inherit;font-size: inherit\">Data Scientist Lead, FPT IS<\/span><\/em><\/p>\n<div class=\"html-div xe8uvvx xdj266r x11i5rnm xat24cr x1mh8g0r xexx8yu x4uap5 x18d9i69 xkhd6sd x1h91t0o xkh2ocl x78zum5 xdt5ytf x13a6bvl x193iq5w x1iyjqo2 x1eb86dx\" role=\"presentation\"><\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n","protected":false},"author":3,"featured_media":16345,"parent":0,"template":"","nang_luc":[828,790,805],"danh_muc_goc_nhin_so":[528,789],"dich_vu":[537,540,858,859],"linh_vuc":[856,519],"platform":[],"san_pham":[],"the_goc_nhin_so":[],"class_list":["post-16339","goc_nhin_so","type-goc_nhin_so","status-publish","has-post-thumbnail","hentry","nang_luc-digital-transformation","nang_luc-experts-sharing","nang_luc-technology","danh_muc_goc_nhin_so-digital-transformation","danh_muc_goc_nhin_so-expert-sharing","dich_vu-insights-data-ai","dich_vu-data-center","dich_vu-private-sector-news","dich_vu-public-sector-news","linh_vuc-enterprises","linh_vuc-government"],"acf":[],"_links":{"self":[{"href":"https:\/\/fpt-is.com\/en\/wp-json\/wp\/v2\/goc_nhin_so\/16339","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fpt-is.com\/en\/wp-json\/wp\/v2\/goc_nhin_so"}],"about":[{"href":"https:\/\/fpt-is.com\/en\/wp-json\/wp\/v2\/types\/goc_nhin_so"}],"author":[{"embeddable":true,"href":"https:\/\/fpt-is.com\/en\/wp-json\/wp\/v2\/users\/3"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/fpt-is.com\/en\/wp-json\/wp\/v2\/media\/16345"}],"wp:attachment":[{"href":"https:\/\/fpt-is.com\/en\/wp-json\/wp\/v2\/media?parent=16339"}],"wp:term":[{"taxonomy":"nang_luc","embeddable":true,"href":"https:\/\/fpt-is.com\/en\/wp-json\/wp\/v2\/nang_luc?post=16339"},{"taxonomy":"danh_muc_goc_nhin_so","embeddable":true,"href":"https:\/\/fpt-is.com\/en\/wp-json\/wp\/v2\/danh_muc_goc_nhin_so?post=16339"},{"taxonomy":"dich_vu","embeddable":true,"href":"https:\/\/fpt-is.com\/en\/wp-json\/wp\/v2\/dich_vu?post=16339"},{"taxonomy":"linh_vuc","embeddable":true,"href":"https:\/\/fpt-is.com\/en\/wp-json\/wp\/v2\/linh_vuc?post=16339"},{"taxonomy":"platform","embeddable":true,"href":"https:\/\/fpt-is.com\/en\/wp-json\/wp\/v2\/platform?post=16339"},{"taxonomy":"san_pham","embeddable":true,"href":"https:\/\/fpt-is.com\/en\/wp-json\/wp\/v2\/san_pham?post=16339"},{"taxonomy":"the_goc_nhin_so","embeddable":true,"href":"https:\/\/fpt-is.com\/en\/wp-json\/wp\/v2\/the_goc_nhin_so?post=16339"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}