Data Fundamentals

Uplifting our data capability

In today’s digital age, data has become one of the most valuable assets for organisations across all industries. It holds the key to understanding trends, making informed decisions, and driving innovation. However, to harness the full potential of data, it is essential to have a solid understanding of its fundamentals.

At UNE (University of New England) we recognise the importance of data literacy and offer a comprehensive exploration of data fundamentals to equip our staff with the knowledge and skills necessary for success in the data-driven world.

What is Data?

Data refers to the raw, unprocessed facts, figures, and observations that are collected from various sources. It can take different forms, including structured data (such as numbers and text) and unstructured data (such as images, audio, and video). Understanding the different types and formats of data is crucial for effective data management and analysis.

Data Types

Understanding the Building Blocks of Information

In the world of data, understanding the different types of data is essential for effective analysis, storage, and interpretation. At UNE, we offer a comprehensive exploration of data types to equip our staff with a solid foundation in data management and analysis. Let’s delve into the fundamental data types that serve as the building blocks of information.

Numeric data represents numerical values and is often used for quantitative analysis. It can be further classified into two main types:

1. Integer: Integer data type represents whole numbers without decimal points. It is commonly used for counting or labelling purposes. Examples include student IDs, years, or quantities of items.

2. Floating-Point: Floating-point data type represents numbers with decimal points. It is suitable for representing measurements, monetary values, or any data requiring precision. Examples include temperature readings, GPA scores, or financial figures.

Textual data represents alphanumeric characters, such as letters, numbers, symbols, and spaces. It is used to represent names, descriptions, addresses, and other textual information. Understanding the encoding and formatting of textual data is crucial to ensure proper handling and analysis.

Date and time data types are used to represent specific points in time or durations. They are essential for temporal analysis, scheduling, and time-based calculations. Date data types capture calendar dates, while time data types represent specific points in time or durations. Combined date and time data types provide a comprehensive representation of both.

Boolean data type represents logical values, typically expressed as “true” or “false.” It is used for logical comparisons and conditional operations. Boolean data types are fundamental in decision-making processes and logical operations.

Categorical data represents discrete values that belong to specific categories or groups. It is often used to classify and organise data based on distinct characteristics or attributes. Categorical data can be further divided into two subtypes:

  • Nominal: Nominal data represents categories without any inherent order or hierarchy. Examples include gender, colours, or categories of products.
  • Ordinal: Ordinal data represents categories with a specific order or ranking. It captures information with relative values or levels. Examples include survey ratings, educational levels, or customer satisfaction levels.

Spatial data represents geographic or spatial information, such as coordinates, boundaries, or maps. It is used to analyse and visualise data in a spatial context, enabling spatial relationships and patterns to be identified. Spatial data types are crucial for applications like geographic information systems (GIS), urban planning, or environmental analysis.

Understanding the different data types is vital for data management, analysis, and interpretation. Each data type has its own characteristics, considerations, and appropriate analytical techniques.

Data Sources

Unleashing the Potential of Information

In the digital age, data has become a valuable resource that organisations rely on to drive innovation, make informed decisions, and gain a competitive edge. Understanding the various data sources is crucial for effectively harnessing the power of information. At UNE, we recognise the importance of data literacy and offer comprehensive education on data sources to equip ourselves with the knowledge and skills necessary to navigate the vast landscape of data acquisition.

Internal Data Sources

Internal data sources refer to data generated and collected within an organisation. These sources include:

Transactional Systems

Operational systems like customer relationship management (CRM), enterprise resource planning (ERP), or point-of-sale (POS) systems capture transactional data related to sales, inventory, and student interactions.

Employee Records

Human resources systems store data related to employee profiles, performance evaluations, attendance, and payroll.

Operational Logs

Logs generated by IT systems, machinery, or equipment provide valuable information for monitoring performance, troubleshooting issues, and optimising operations.

Student Interactions

Data collected from customer support systems, call centres, or online platforms can offer insights into customer preferences, behaviour, and satisfaction.

External Data Sources

External data sources encompass information that is acquired from outside the organisation. These sources include:

Publicly Available Data

Government agencies, research institutions, and international organisations publish a wealth of data that can be utilised for various purposes, such as demographics, economic indicators, health statistics, or environmental data.

Social Media Feeds

Social media platforms generate vast amounts of data that provide real-time insights into customer sentiments, trends, and online interactions.

Third-Party Data Providers

Data vendors and data aggregators offer specialised datasets, such as market research data, industry benchmarks, or consumer behaviour data, which can enhance an organisation's understanding of its target audience or industry.

Open Data Initiatives

Many organisations and communities release datasets under open data initiatives, enabling public access to information and fostering collaboration and innovation.

Research Data

Research data is collected through scientific experiments, surveys, observations, or studies. Universities, research institutions, and academic journals are primary sources of research data. This data can be valuable for conducting studies, validating hypotheses, or advancing knowledge in specific fields

Legacy Systems and Archives

Legacy systems and archives may hold valuable historical data that organisations can leverage for trend analysis, historical comparisons, or compliance purposes. These sources often require special considerations for data extraction and integration.

Sensor Data and Internet of Things (IoT)

The proliferation of sensors and IoT devices generates vast amounts of real-time data. These devices, embedded in various environments like smart cities, manufacturing processes, or environmental monitoring systems, capture data on temperature, humidity, location, energy consumption, and more. Sensor data provides insights into operations, facilitates predictive maintenance, and enables data-driven decision-making.

Syndicated Data

Syndicated data refers to data that is collected and shared by market research firms or industry-specific organisations. This data provides standardised information on market trends, consumer behaviour, product performance, or industry benchmarks, enabling organisations to make data-driven decisions and gain competitive advantages.

Data Structures

Organising Information for Efficiency and Accessibility

In the realm of data management and analysis, having a solid understanding of data structures is essential. Data structures serve as the foundation for organising and storing information in a way that enables efficient processing, retrieval, and manipulation.

Arrays

Arrays are one of the simplest and most fundamental data structures. They consist of a collection of elements of the same type, organised in a contiguous memory block. Arrays provide fast access to elements through indexing, making them ideal for situations where direct access to elements is required. They are widely used for tasks such as sorting, searching, and mathematical computations.

Linked Lists

Linked lists are dynamic data structures composed of nodes that contain both data and a reference to the next node. Unlike arrays, linked lists allow for efficient insertion and deletion of elements at any position. Linked lists are particularly useful when the size of the data set is unknown or constantly changing.

Stacks

Stacks are a Last-In-First-Out (LIFO) data structure that follows the principle of "last item in, first item out." Elements can only be added or removed from the top of the stack. Stacks are commonly used for managing function calls, expression evaluation, and undo/redo operations.

Queues

Queues are a First-In-First-Out (FIFO) data structure, where elements are added at the end and removed from the front. Queues are utilised in scenarios such as task scheduling, job processing, and breadth-first search algorithms.

Trees

Trees are hierarchical data structures composed of nodes connected by edges. Each node can have child nodes, forming a tree-like structure. Trees offer efficient searching, sorting, and data organisation capabilities. They are used in various applications, including file systems, database indexing, and decision-making processes.

Graphs

Graphs are a collection of nodes connected by edges, where each edge represents a relationship or connection between nodes. Graphs are versatile data structures used in social networks, transportation systems, and network analysis. They enable the representation and analysis of complex relationships and dependencies.

Hash Tables

Hash tables, also known as hash maps, are data structures that use a hash function to map keys to corresponding values. Hash tables provide fast retrieval and insertion of key-value pairs, making them efficient for tasks like data lookup and dictionary implementations.

Heaps

Heaps are binary trees that satisfy the heap property, where each node's value is either greater than or equal to (max heap) or less than or equal to (min heap) its child nodes' values. Heaps are commonly used for priority queue implementations and sorting algorithms like heap sort.

Understanding and choosing the appropriate data structure is essential for optimising data storage, retrieval, and manipulation operations.

Data Transformation

Unleashing the Power of Data Manipulation

In the realm of data analysis and management, data transformation plays a critical role in unlocking the full potential of information. It involves the process of converting, reformatting, and manipulating data to make it more suitable for analysis, integration, or presentation. At UNE, we recognise the importance of data transformation and techniques necessary to harness the power of data manipulation.

Cleaning and Filtering

Data cleaning involves identifying and rectifying errors, inconsistencies, and missing values in the dataset. This step ensures data accuracy and reliability before further analysis. Filtering involves selectively removing or retaining specific data records or variables based on predefined criteria, allowing analysts to focus on relevant subsets of data.

Data Integration and Aggregation

Data integration involves combining data from multiple sources into a single, unified dataset. It requires resolving inconsistencies, merging duplicate records, and aligning data formats to create a comprehensive view. Aggregation involves summarising and condensing data to a higher level, such as computing averages, totals, or other statistical measures for a particular group or time period.

Data Encoding and Formatting

Data encoding involves converting data from one format or representation to another. This transformation ensures compatibility and consistency across different systems or applications. It includes tasks such as converting dates to a standardised format, encoding categorical variables, or transforming data into suitable units of measurement.

Data Normalisation

Data normalisation is the process of organising and structuring data in a standardised form to minimise redundancy and dependency issues. It ensures that data is stored efficiently and prevents data anomalies. Normalisation techniques include breaking data into separate tables, defining relationships, and establishing primary and foreign keys.

Data Derivation and Calculation

Data derivation involves creating new variables or metrics derived from existing data. It enables analysts to compute additional insights or perform complex calculations that aid decision-making. Derived variables can include ratios, percentages, growth rates, or any other transformations that provide meaningful information.

Data Discretisation and Binning

Data discretisation involves grouping continuous data into discrete intervals or categories. It simplifies complex data distributions and facilitates analysis by reducing the number of unique values. Binning refers to the process of assigning data points to predefined intervals, enabling analysts to identify patterns, trends, or outlier values more easily.

Data Transformation for Machine Learning

In the context of machine learning, data transformation is crucial for preparing data to be suitable for training predictive models. It involves tasks such as feature scaling, handling missing values, one-hot encoding categorical variables, and handling skewness or outliers to ensure optimal model performance.

Data Visualisation and Presentation

Data transformation also plays a role in presenting data in a visually appealing and understandable format. Transforming raw data into charts, graphs, or interactive visualisations enables analysts to communicate insights effectively and facilitates decision-making at various levels within an organisation.

Mastering the art of data transformation empowers analysts to unlock the full potential of data, uncover hidden insights, and make informed decisions.

Data Structures

Organising Information for Efficiency and Accessibility

In the realm of data management and analysis, having a solid understanding of data structures is essential. Data structures serve as the foundation for organising and storing information in a way that enables efficient processing, retrieval, and manipulation.

Arrays

Arrays are one of the simplest and most fundamental data structures. They consist of a collection of elements of the same type, organised in a contiguous memory block. Arrays provide fast access to elements through indexing, making them ideal for situations where direct access to elements is required. They are widely used for tasks such as sorting, searching, and mathematical computations.

Linked Lists

Linked lists are dynamic data structures composed of nodes that contain both data and a reference to the next node. Unlike arrays, linked lists allow for efficient insertion and deletion of elements at any position. Linked lists are particularly useful when the size of the data set is unknown or constantly changing.

Stacks

Stacks are a Last-In-First-Out (LIFO) data structure that follows the principle of "last item in, first item out." Elements can only be added or removed from the top of the stack. Stacks are commonly used for managing function calls, expression evaluation, and undo/redo operations.

Queues

Queues are a First-In-First-Out (FIFO) data structure, where elements are added at the end and removed from the front. Queues are utilised in scenarios such as task scheduling, job processing, and breadth-first search algorithms.

Trees

Trees are hierarchical data structures composed of nodes connected by edges. Each node can have child nodes, forming a tree-like structure. Trees offer efficient searching, sorting, and data organisation capabilities. They are used in various applications, including file systems, database indexing, and decision-making processes.

Graphs

Graphs are a collection of nodes connected by edges, where each edge represents a relationship or connection between nodes. Graphs are versatile data structures used in social networks, transportation systems, and network analysis. They enable the representation and analysis of complex relationships and dependencies.

Hash Tables

Hash tables, also known as hash maps, are data structures that use a hash function to map keys to corresponding values. Hash tables provide fast retrieval and insertion of key-value pairs, making them efficient for tasks like data lookup and dictionary implementations.

Heaps

Heaps are binary trees that satisfy the heap property, where each node's value is either greater than or equal to (max heap) or less than or equal to (min heap) its child nodes' values. Heaps are commonly used for priority queue implementations and sorting algorithms like heap sort.

Understanding and choosing the appropriate data structure is essential for optimising data storage, retrieval, and manipulation operations.

Data Analysis

Unveiling Insights for Informed Decision-Making

In today’s data-driven world, organisations rely on data analysis to derive meaningful insights, make informed decisions, and gain a competitive edge.

Exploratory Data Analysis (EDA)

Exploratory Data Analysis involves examining and visualising data to gain a deeper understanding of its characteristics, patterns, and relationships. It includes tasks such as data profiling, summary statistics, data visualisation, and identification of outliers or missing values. EDA helps analysts generate hypotheses, discover trends, and identify potential relationships before proceeding to more advanced analyses.

Descriptive Statistics

Descriptive statistics provides a summary of the main features of a dataset, such as measures of central tendency (mean, median, mode), dispersion (range, standard deviation), and distribution (skewness, kurtosis). Descriptive statistics help analysts understand the basic characteristics and properties of the data, enabling them to communicate its key features effectively.

Inferential Statistics

Inferential statistics allows analysts to draw conclusions and make predictions about a population based on a sample of data. It involves hypothesis testing, confidence intervals, and regression analysis. Inferential statistics helps analysts make inferences and generalise findings from a sample to a larger population, enabling data-driven decision-making.

Data Mining and Machine Learning

Data mining and machine learning techniques involve the use of algorithms and statistical models to discover patterns, relationships, or predictive insights within the data. These techniques include clustering, classification, regression, and association rule mining. By applying advanced analytical methods, analysts can uncover hidden patterns, make predictions, and automate decision-making processes.

Time Series Analysis

Time series analysis focuses on data collected over time, aiming to uncover patterns, trends, and seasonality. It involves techniques such as forecasting, trend analysis, and decomposition. Time series analysis is particularly useful for predicting future values, understanding temporal dependencies, and making data-driven decisions based on historical patterns.

Statistical Modelling

Statistical modelling encompasses the application of statistical techniques to build mathematical models that describe and explain relationships within the data. It involves linear regression, logistic regression, time series models, and more. Statistical modelling helps analysts understand the relationships between variables, perform hypothesis testing, and make predictions based on the model's parameters.

Data Visualisation

Data visualisation is a powerful tool for presenting data in a visual format, enabling analysts to communicate complex information effectively. It involves the use of charts, graphs, maps, and interactive visualisations to represent patterns, trends, and relationships within the data. Data visualisation enhances data comprehension, facilitates storytelling, and aids in decision-making.

Big Data Analytics

With the exponential growth of data, big data analytics focuses on extracting insights from large and complex datasets. It involves techniques such as distributed computing, parallel processing, and scalable algorithms to process and analyse massive volumes of data. Big data analytics enables organisations to uncover valuable insights and make data-driven decisions in real-time.

By mastering the art of data analysis, individuals can unlock the true potential of data and drive informed decision-making.

More Data Essentials

Data Governance
Data Governance serves as the robust foundation upon which data’s value is realised, and ensures that it is regarded as a strategic asset and managed responsibly throughout its entire lifecycle.
Learn More
Data Analytics
Lets unravel the transformative potential of data analytics in driving innovation, improving efficiency, and shaping a more data-savvy UNE. Data Analytics involves the use of advanced techniques and tools to examine, interpret, and visualise data in meaningful ways.
Learn More
Data Literacy Program
Data Literacy is about developing a critical and inquisitive mindset, fostering curiosity, and being able to discern between credible data sources and misleading information. Lets embark on a journey to improve UNE’s Data literacy skills.
Learn More
Data Strategy
At UNE, our Data Strategy is a comprehensive roadmap that outlines how we will mobilse the power of data to make informed decisions, drive innovation and maximise the impact of our initiatives across the university community.
Learn More
Get In Touch
Let’s embark on this data-driven journey together! If you have any questions or need further information, please reach out to DII Project team at DIIP@une.edu.au
Contact Us