Skip to Main Content

Data & Statistics

Library resources focused on data, analysis, interpretation, and statistics. Discover open access data sources.

Data? Datum? Statistics?

According to SAGE Research Methods:

Research is used to find things out and produce evidence/results, with a focus on purposeful documentation, discovery, interpretation, and development of methods/systems for the advancement of knowledge. Rules ensure findings do not depend upon opinions; transparency and rigor are important. Research is used to reaffirm the results of previous work, solve new or existing problems, support existing theories, develop new theories, & expand on previous work. Research may replicate elements of prior projects, or the project as a whole, and used to test the validity of instruments, procedures, and experiments.  [David Byrne, Project Planner]

Then more specifically, research methods are the "systematic tools used to find, collect, analyze, and interpret information."

In: The A-Z of Social Research

"Provides the tools whereby understanding is created. Emphasis is on the broad approach rather than just techniques for data gathering and analysis." Guidelines and conventions give the researcher a structure of enquiry and a set of rules of inference (drawing conclusions from evidence). A researcher uses methods to organize ideas and evidence, and at the same time has a language and a format for communicating results. Includes: conceptualization, theorizing, making abstractions, using techniques, assembling information, and analyzing information.


In: The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation

"Data refer to pieces of information -- data is the plural form of datum. This distinction is most prominent in scientific or academic writing, whereas in other forms of writing, data can be singular or plural. Technology enhancements, particularly computer storage capacity, processor speed, and computer portability, have significantly increased the amount of data available to researchers."

In: The SAGE Encyclopedia of Communication Research Methods

". . .observations, evidence, information, or empirical materials that can be interpreted in numeric and nonnumerical forms. Data create the bridge between the content and method providing firsthand evidence or observation. Data cannot be utilized to both develop and confirm a theory because new data are necessary for each condition. Numeric research (quantitative) relies on the measurement or assignment of numbers to mark the characteristics of data. Quantitative data, or numerical evidence, are often collected through self-report or other-report questionnaires and surveys. However, these data can also be from structured interviews, observations, tests, and inventories. Qualitative research investigates interpretative experiences through research questions (not hypotheses). Qualitative data is nonnumerical evidence. These data are guided by the researchers’ intentions and participants’ understanding."


In: Dictionary of Statistics & Methodology

"A collection of related data items, such as the answers given by respondents to all the questions on a survey."

Data Point

In: Dictionary of Statistics & Methodology

"An individual piece of data; a datum. Often, the point at which two values intersect on a graph."

Data Mining

In: Encyclopedia of Measurement and Statistics

"The process of discovering useful patterns in very large databases. It uses methods from statistics, machine learning, and database management to restructure and analyze data to extract knowledge or information from the data. Data mining is also known as knowledge discovery in databases. Data mining differs from traditional statistical analysis in a number of ways, including amount and type of data used and the goals of the analysis."

Data archives

In: The A-Z of Social Research

"At its simplest level, a data archive is a ‘library’ of datasets, which makes previously-collected data available for use by other researchers – secondary analysis. Hakim (1982) outlines three main functions of a data archive: The preservation and storage of data; The dissemination of data; The development of methods and procedures to stimulate the widest use of data."

Data Archive

In: The SAGE Encyclopedia of Qualitative Research Methods

"A data archive is a resource center that acquires, stores, and disseminates data for secondary analysis for research, learning, and teaching. The prime function of such archives is to ensure long-term preservation and future usability of the data they hold. Data archiving is a method of conserving expensive resources and ensuring that their research potential is fully exploited. Unless preserved and documented for further research, data that have often been collected at significant expense may later exist in only a small number of reports that analyze only a fraction of the research potential of the data. In the case of digital archives, within a short space of time the data files are likely to be lost or become obsolete as technology evolves."

Data Visualization Methods

In: The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation

Data "is generated by means of measurements of various processes taking place in the physical world or created by computer applications such as simulations."

Data visualization "refers to the process of transforming data. . .to pictures. (1) to help users understand their data better and easier and (2) to let them discover unknown facts about the underlying phenomena from which data are derived."

Scalability = "Large amounts of data can be compactly presented, therefore easing the user’s burden of separating important aspects from details and also reducing the time and effort required to study a given process."

Simplicity = "Nontechnical users can be shielded from complex aspects related to data acquisition and processing, so they can focus on those high-level aspects that the data capture."

Communication = "Different types of users having different backgrounds are enabled to communicate and learn about a data-intensive problem based on the same (visual) medium."

Discovery = "The visual depiction of data enables finding complex patterns that one is not aware of and which are hard to find when using solely traditional data analysis methods."

Within SAGE Research Methods' Project Planner, David Byrne reviews many foundational definitions.


"All data have to be interpreted" so there are many tools and processes available.

"Quantitative data are numerical. The numbers are attached to cases and they describe variable attributes of the cases. We make these data by measuring. Micro-data refer to the original data which describe specific cases. Aggregate data refer to the averages computed from micro-data which describe some larger set to which the original cases belong. Cross-sectional data refer to data collected at one specific point in time. Longitudinal data are collected for the same cases at multiple points in time. Two basic strategies: Exploration -- look at the data to see what they are telling you. You examine the data for patterns, signs of association, and examples of differences. Explanation -- hypothesis or set of hypotheses, which you develop in advance of any engagement with the particular dataset on which you are going to test them."

"Qualitative data refer to all the forms of data which are not quantitative. We cannot use mathematical techniques to describe and analyze this sort of data." Sources: documents of all kinds; transcripts/notes from interviews and focus groups; ethnographic observation notes; audio recordings; video recordings; images; visual information.

Secondary Data

In: The SAGE Encyclopedia of Communication Research Methods

Primary Data is "directly obtained from first-hand sources by means of questionnaire, observation, focus group, or in-depth interviews."

Secondary Data is "data collected by someone other than the user. Refers to data that have already been collected for some other purpose.  Literature reviews account for many varieties of classification for secondary data, including those that seek to distinguish between raw data and compiled data."

Raw Data = "there has been little if any processing. Includes data from organizations’ databases, websites, or newspapers, among other sources."

Compiled Data = "there has been some form of selection or summarizing. Refers to government publications, books, journals, industry statistics, and reports, among other sources."

Surveys = a third type of secondary data collection -- category that falls in the middle of the other two types. "Examples include census data; continuous and regular surveys (e.g., government family spending, labor market trends, employee attitude surveys); and ad hoc surveys (i.e., those non-regular-basis surveys made by various organizations)."

How to use Mixed Methods Research?: Understanding the Basic Mixed Methods Designs
In: Mixed Methods Research: A Guide to the Field

" Mixed methods design. . .researchers mix quantitative and qualitative methods in specific ways to address a research purpose. Different rationales call for different ways of integrating quantitative and qualitative methods in a mixed methods study. . . underlying logic that reflects how quantitative and qualitative data are collected and analyzed within a specific mixed methods design."

Multimethod Research
In: Dictionary of Statistics & Methodology

"Another term for mixed-method research, that is, research that combines two or more methods of design, measurement, or analysis. Often a distinction is drawn between multi- and mixed-method research with mixed methods referring to joining methods that cross the quantitative-qualitative divide. Multimethod research is then used either to describe the joining of two or more quantitative methods or as a generic term for any combination of research methods. Whatever the label, using more than one method can be important for enabling the researcher to reduce biases likely to be associated with a single method."


In: Encyclopedia of Evaluation

"Mathematical techniques used to describe and draw inferences from quantitative data. Statistics are commonly divided into two branches: descriptive and inferential. Descriptive statistics are used to describe, summarize, and represent more concisely a set of data. Inferential statistics involve procedures for drawing inferences that go beyond the data set: conventionally, inferences about a large group (i.e., a population) based on observations of a smaller sample." 


In: Encyclopedia of Evaluation

"The process of selecting units for study that will be representative of a population so that one can make generalizations about that population. A key distinction in sampling is between the theoretical population and the accessible population. Samples can be selected randomly (probability), purposively, or accidentally (convenience). How sampling is done depends on the purpose of the evaluation."