On This Page:
Find definitions for the most popular words & phrases in the field of analytics. Explore job responsibilities in data-focused fields. Dig into types of data, including structured, unstructured, and semi-structured data. Learn more about analytics processes (e.g. prescriptive analytics). Compare data analytics & data science terminology with business analytics & analysis concerns. Or get up-to-speed with big data tools, programming languages, and analytics software.
Data analysts are tech-driven interpreters who collect, analyze, and transform data in order to make well-informed decisions. They are experts in the “Big 4” of analytics processes (e.g. diagnostic analytics), data cleaning, data mining, and data visualization. They use analytics to test hypotheses, uncover patterns & trends, deliver predictions, create strategies, and come up with practical insights. At a higher level, their work may dovetail with the tasks of a data scientist.
Business Analytics Practitioners
Business analytics practitioners are corporate problem-solvers. They are dedicated to using analytics processes to improve business decisions, develop products & services, run more efficient campaigns, trim costs, or implement IT systems. They predict & prescribe, as well as describe & diagnose. They may also be adept in areas such as Agile, Econometrics, Business Intelligence (BI), Operations Research (OR), Requirements Engineering, and/or Management Information Systems (MIS).
Data scientists are inventors who specialize in creating new ways to harness and analyze data. They are well-versed in advanced analytics techniques, big data applications, predictive modeling, Artificial Intelligence (AI), and Machine Learning (ML). They build their own algorithms & predictive models, develop their own data visualization tools & dashboards, write their own programs, and automate analytics tasks. Data scientists often start their professional careers as data analysts.
Data engineers build the systems that are needed to collect, store, and analyze big data—they are infrastructure specialists. They develop algorithms, create data platforms, design data models, construct data warehouses, and build, test & maintain database pipeline architectures. Since this is a mid-level position, data engineers may begin their careers as Business Intelligence (BI) analysts, software engineers, database administrators, and the like.
Data architects create the “blueprint” for an organization’s enterprise data framework—they are designers & visionaries. They translate business needs into technical requirements, design complex technical architectures, work closely with data engineers to construct secure & usable systems, and define how data will be stored, accessed & managed. Like architects who work on brick & mortar buildings, data architects are senior-level experts.
Types of Data
Big data usually refers to structured, semi-structured & unstructured data sets that are too large to be handled by standard data processing applications & software. The explosion of data in the 21st century means that big data experts have to deal with 4 key challenges:
- Volume: With big data, you’re looking at terabytes and petabytes of raw data.
- Variety: Mobile devices, the Internet of Things (IoT), video, audio, photos, sensors—data sources are now everywhere.
- Velocity: The speed at which real-time data are being generated increases exponentially each day.
- Veracity: Not all data sets are reliable. Big data experts must assess the truthfulness & value of data sources and consider inherent bias.
Qualitative data are descriptive and unstructured. They describe certain attributes, qualities, and characteristics, but—unlike quantitative data—they can be difficult to measure & analyze with precision. Think of adjectives and adverbs (e.g. soft, quirky, happy, quickly, etc.).
Examples of qualitative data include individual responses to an open-ended questionnaire, notes taken during a focus group, transcripts of audio & video recordings, and descriptions of colors, sounds, textures, tastes or smells. Qualitative data can be analyzed by grouping these ideas into categories and themes.
Quantitative data are data sets that can be counted & measured—they are typically structured and quantifiable. A shorthand way to think of quantitative data is to ask the questions: How many? How often? or How much?
Examples of quantitative data include revenue numbers, income numbers, test scores, ages, customer ratings, website visits, distance measurements, and more. They can be gathered through techniques like scientific experiments, market reports, headcounts, surveys, and polls. And they can be evaluated through statistical analysis.
Raw data refers to quantitative and qualitative data that have been collected, but not yet processed, organized, cleaned, and mined. You can have transcripts, survey responses, product prices, performance data, and sales numbers, but until those raw data points are analyzed, they don’t tell you much.
Semi-structured data fit somewhere between structured data and unstructured data. These data can’t be sorted into rows & columns within databases, but they still contain tags, metadata, and markers that can be used to help organize them into groups and hierarchies. Think of emails—the text is likely to be unstructured, but the names, times, dates, and category folders are structured.
Structured data have been standardized & organized to fit into a pre-defined data model. Structured data are stored in data warehouses and live in relational databases. They can be organized into spreadsheets (e.g. Excel) with discrete fields for each data point and rows & columns for easy sorting. This is where you’ll often find quantitative data.
Unstructured data are not organized into any pre-defined model or architecture. Examples of unstructured data include audio files, video files, memos, chats, messages, PDFs, and images. Unstructured data are stored in data lakes and live in NoSQL databases. In the past, it was difficult to analyze unstructured data, but tools like Hadoop are making the job much easier.
Analytics Processes: The Big 4
Descriptive analytics usually refers to the process of analyzing historical data in order to identify patterns, trends, and relationships. It answers the question: What happened?
Data analysts might use descriptive analytics to monitor web traffic; analyze month-over-month sales growth; identify the most popular products; or track their organization’s progress towards a goal (e.g. improving wait times).
Diagnostic analytics goes one step beyond descriptive analytics—it’s a process that’s concerned with understanding the root causes of trends & anomalies. It answers the question: Why did this happen?
Data analysts might use diagnostic analytics to dig into web traffic data & discover what accounted for a sudden spike in numbers; investigate months with strong sales to see if certain products or campaigns were driving purchases; or run diagnostics to identify the causes of a technology issue.
Predictive analytics is a process that’s concerned with probability—forecasting outcomes & trends, providing insights into individual cases, and predicting the likelihood of events. It answers the question: What could happen in the future?
Data analysts might use predictive analytics to suggest new products that could appeal to customers based on their past behavior & purchases; predict which accounts are likely to default; or discover where cybercriminals are likely to infect IT systems.
Prescriptive analytics is a process that’s focused on decision-making—using a data-based approach to choose the best course of action from a range of options. It answers the question: What should we do next?
Data analysts might use prescriptive analytics to help start-up investors assess complex risks and decide where to invest; make decisions on when to evacuate residents from a town at risk of flooding; or raise the price of airline tickets automatically in order to adjust to increased demand.
Data Analytics & Data Science Terms
Artificial Intelligence (AI)
The field of AI is focused on developing smart machines & computer systems that can perform many of the same functions as human intelligence. Think of traits like:
- Machine Learning (ML)
- Natural Language Processing (NLP)
- Social intelligence (e.g. understanding human emotions)
- Perception (e.g. speech & facial recognition)
- Motion (e.g. robotics, autonomous vehicles, etc.).
AI is ubiquitous in modern society. It’s used in marketing (e.g. personalized advertising), healthcare (e.g. diagnostic aids), finance (e.g. robo-advisors), social media (e.g. customized feeds), e-commerce (e.g. Alexa)—anywhere there is technology, there is usually AI.
Data cleaning is the process of identifying corrupt, irrelevant, and/or inaccurate data within a data set and then either removing or fixing the data. It can also be referred to as Data Cleansing or Data Scrubbing.
Data governance is a set of standards for managing data within an organization, from its initial acquisition to its final disposal. A data governance framework covers rules, processes, roles, legal compliance, and usage. It ensures that data sets are secure and private, as well as available, trustworthy, and accurate.
Data mining refers the process of exploring large data sets, extracting or “mining” those sets for usable data, and analyzing data to identify meaningful patterns and trends. It can also be known as Knowledge Discovery in Data (KDD).
Data visualization is the visual representation of data & information. Basic forms of data visualization include bar graphs, pie charts, and simple maps. But analytics experts usually use sophisticated tools & software (e.g. Tableau, Power BI, etc.) to create data visualizations that showcase patterns & trends.
Machine Learning (ML)
ML is an application of Artificial Intelligence that is focused on helping systems & machines “learn” from experience and improve their performance without being explicitly programmed to do so. To do this, machines draw on sample data—widely known as “training data.” In some cases, they even develop their own algorithms.
ML is used in all kinds of settings (e.g. identifying cybersecurity vulnerabilities, detecting credit card fraud, suggesting products, etc.), but it does come with limitations, including issues of inherent bias and misclassification.
Natural Language Processing (NLP)
Natural Language Processing (NLP) is a field that combines computer programming and AI techniques to build tools & machines that understand language in the same way as humans do. It’s used to analyze large quantities of text & voice data in order to generate insights, answer human questions, and improve customer experiences. You’ll see NLP at work within digital assistants, email filters, language translation, automated phone calls, search engine results, and more.
Predictive modeling is a statistical approach that uses algorithms to analyze data and predict & forecast outcomes. Examples of predictive models include decision trees, linear regression, and neural networks. In conjunction with Machine Learning and data mining, predictive modeling is frequently used in predictive analytics.
Real-time analytics is the process of using, analyzing, and assessing data as soon as it enters a system. You might use real-time analytics to track incoming orders, adjust online marketing campaigns, detect fraud at a point-of-sale, monitor someone’s health, and more.
A relational database stores data in tables, columns & rows. If they have data points in common, tables within the database can be connected to each other through “keys.” That means they now have a relationship. Now, with a single query, you can build a data set from one or more related tables. Examples of relational databases include MySQL, Oracle, and Microsoft SQL Server.
Business Analytics & Analysis Terms
Agile began its life in software development—it referred to a set of best practices in project management that emphasized flexibility, constant & non-hierarchical collaboration, and continuous improvements. Gradually, the idea of Agile mindsets, management practices & methodologies spread into other sectors. It has now become a core part of business analysis practice.
Business Intelligence (BI)
Business Intelligence (BI) combines the latest business strategies & analytics technologies to help professionals make informed, data-driven decisions about their organization. In contrast to the predictive & prescriptive approaches used in business analytics, the field of BI usually focuses on existing processes. Think of:
- Descriptive analytics
- Diagnostic analytics
- Data mining
- Data visualization
- Data storytelling
- Best practices
BI experts ask questions like: “What can we learn from what happened?” and “How can we improve on what we have?” BI platforms, tools & software are widely available in the marketplace. Examples include custom dashboards, scorecards, Key Performance Indicator (KPI) reports, ad hoc queries, automated alerts, and more.
Econometrics is the use of statistical & mathematical models to develop economic theories, test existing hypotheses & policies, and create economic forecasts.
Elicitation techniques are used by business analysts to gather information & requirements from a client. They can take the form of interviews, brainstorm sessions, focus groups, workplace shadowing/observation, surveys & questionnaires, document analyses, and more.
Management Information Systems (MIS)
Management Information Systems (MIS) has two definitions:
- MIS is a field that studies interactions between people, technology, and organizations.
- MIS can also refer to information systems that are used to collect, process & store data within a business or organization.
Examples of the second definition include Enterprise Resource Planning (ERP) systems, inventory control systems, sales & marketing systems, Transaction Processing Systems (TPS), HR management systems, and more. Through the development and management of these systems, MIS experts are devoted to making a business more efficient & profitable.
Operations Management (OM)
Operations Management (OM) is focused on optimizing business processes that deal with the creation, production, and delivery of goods & services. Some people refer to it as finding the best ways to transform inputs into outputs.
OM experts assess—and improve—inventory management, supply chain management, transport networks, sustainability policies, staffing policies, product quality, service operations, and more. OM is often about managerial decision-making and has much more of a people-focused/strategic feel than Operations Research (OR).
Operations Research (OR)
Operations Research (OR) is a field that focuses on using advanced scientific methods, mathematical principles & sophisticated analytical techniques to solve operational problems and make informed decisions. It is sometimes referred to as Management Science.
OR can encompass everything from simulations and stochastic models to game theory and financial engineering. These complex approaches are used to improve supply chain management, set optimal prices, plan projects, manage risk, optimize networks, automate processes, and more. OR analysts help Operations Managers make the right decisions.
Requirements Engineering is a process often used in systems engineering & software engineering to manage requirements for a client’s project during the design phase. This process includes 1) defining & understanding a client’s requirements through feasibility studies & elicitation techniques; 2) identifying & analyzing requirements for the project; 3) developing & verifying those requirements; and 4) documenting & maintaining requirements even after the project is launched.
A programming language is a set of commands and notations that provides instructions for computers to follow. If a programming language is like a recipe, the computer is like a chef who obeys the recipe. Programming languages can be used to help build websites & applications, create software programs, develop games & apps, and perform complex analytical tasks.
Python is an open-source programming language that is widely used in the fields of data analytics, business analytics, and data science. If you’re interested in big data, data mining, Machine Learning, and Artificial Intelligence, you will need to know Python. That’s why it’s an integral part of the curriculum in any analytics degree.
R is an open-source programming language that is used for statistical computing and data analysis. It’s one of the most commonly used languages in data mining. Commercial rivals for R include SAS, SPSS, and Stata.
Structured Query Language (SQL)
SQL is a domain-specific programming language used in relational database management. It can be used to store, extract, organize, manage, and manipulate data within databases.
Popular Analytics Software
Created by the SAS Institute, SAS is a statistical software suite that can be used for advanced analytics, business intelligence, data management, and predictive analytics tasks. It can access & retrieve large quantities of data, manage & manipulate the data, perform statistical analyses on the data, build predictive models, and generate reports.
Big Data Tools
Apache Hadoop is an open-source software framework and parallel data processing engine. It distributes its processing tasks across a Hadoop cluster—a collection of computers or “nodes” that are networked together and able to perform parallel computations. Through these clusters, Hadoop can store and analyze large amounts of structured, semi-structured & unstructured data sets.
Apache Spark is an open-source data processing framework / analytics engine that can quickly tackle processing tasks on extremely large data sets. It’s used for SQL processing, real-time processing, stream processing, and Machine Learning. Because it uses Random Access Memory (RAM), Spark is 100 times faster than Hadoop on smaller workloads.
In contrast to relational databases, which use relational tables, NoSQL databases store data in a variety of forms, including documents, graphs, and key-value stores. They’re particularly useful in storing unstructured data and capable of handling large, unrelated & rapidly changing data sources.