Math Comps 2017-18 – Mathematics and Statistics

Recovery of Resistor Networks

One driving focus in network theory is to understand the inner workings of a network—be it an ecological network, a computer network, or the network of neurons in a human brain—given only incomplete data about the way matter or information flows from one part of the network to another.

For example, a resistor network is a simple mathematical structure, a graph whose edges are weighted by positive real numbers, that models the flow of current through a network of idealized resistors. To what extent can the individual resistances (the edge weights) in a resistor network be recovered, if one only knows some of the effective resistances between certain pairs of nodes? Computing these effective resistances is a simple problem in linear algebra, related to computing the effective transition matrix for a random walk on a graph, but the resulting effective resistances depend on the original edge weights in a highly nonlinear fashion.

Nevertheless, it is a theorem that if the network is embedded in a disc (a circular planar network), then every resistance can be recovered from the effective resistances between the nodes on the boundary circle, but only if the network’s associated “medial graph” has no “lenses.” But even if there are lenses in a circular planar network’s medial graph, some edges’ resistances may still be recoverable from the boundary data, and we should be able to give a range of possible values for the others. One goal of this project might be to prove the conjecture that, if we allow resistance 0 to represent contracting an edge of the network, and resistance ∞ to represent deleting that edge, then the possible resistances for any edge in a circular planar network always form a closed subinterval of [0, ∞], and to give an algorithm for computing that subinterval.

Quadratics with newly reducible third iterate: how rare is the golden ratio?

Description can be found here.

Weighted Voting

Suppose I own 3/7 of the stock in a particular company and each of two other investors own 2/7. How are we to make decisions? The first thing that occurs to most folks is weighted voting. I get three votes and each of the others gets two. If you think about it for just a minute this solution is less than satisfactory. Any coalition of two people can pass or defeat any measure. That is, I don’t have any more influence, or power, than either of the others. Perhaps if we insist that five votes are needed to pass or defeat a measure that would help? A minute or two of thought should convince you that I have entirely too much power now (e.g., the other two, 4/7 of the ownership, cannot pass anything without my approval). In a way, there really is no good solution to this problem. There are fundamental problems with weighted voting; nevertheless people persist in using it. We will analyze some of the ways that mathematicians and political scientists have proposed to measure voting power. We will discover that all such measures known suffer from flaws. We will ask ourselves what, if anything, can be salvaged from this distressing state.

Directed Reading in Algebraic Geometry, Homological Algebra, or Another Algebraic Subject: You Can Help Pick the Topic!

This is an opportunity for two people to delve deeply into material beyond the standard undergraduate curriculum, and to get substantial (weekly) experience in presenting that material at the blackboard. The only prerequisite is Math 342; although further experience in abstract algebra and related subjects might be helpful, I don’t expect that anything will come up that has been taught in Math 352 recently (in particular, we’ll have occasion to use neither Galois theory nor representation theory).

In algebraic geometry, techniques from abstract algebra are used to describe and investigate “varieties”: sets that are defined by polynomial equations (in several variables). The interplay between algebraic and geometric points of view makes this a very rich topic, that has expanded enormously in the last hundred years or so; while it is traditionally considered “pure” mathematics, the more computational parts have applications in such areas as robotics and computer-aided design. The first major goal of the reading would almost certainly be Hilbert’s “Nullstellensatz” (theorem of the zeros), which establishes a correspondence between varieties and certain ideals in polynomial rings. After that there is a variety (pun intended) of possibilities, depending on the background, interests, and tolerance for abstraction of the participants.

Homological algebra grew out of investigations in algebraic topology and has important applications there, but it has become a separate, highly abstract area of mathematics. The mathematical “language” of category theory, which it uses, is pervasive in much of modern mathematics. (Some results which require only this general framework and thus pop up in all sorts of different contexts are affectionately known as “(general) abstract nonsense” – you can find a Wikipedia article with this title!) The reading would probably cover substantial sections of MacLane’s classic book Homology.

Statistical Learning

Statistical learning focuses on harnessing the patterns present in past data to make predictions about future observations, and in our data-rich society we frequently encounter this class of problem. Your credit card company uses your financial information along with your past purchases to predict whether or not a purchase made with your card is fraudulent. Zillow combines past sale prices, mortgage records, prior tax assessments, physical features, and location to predict the market value of every house in the country. Political campaigns utilize demographics to segment (i.e. cluster) registered voters into more-homogeneous groups and tailor a campaign strategy to reach each group. This comps will explore statistical learning techniques. We will consider both supervised learning techniques, where the outcome we wish to predict is known for the observed data, as well as unsupervised techniques, where the outcome is unknown. Our statistical toolkit will be expanded to include discriminant analysis, tree-based methods, support vector machines, principal component analysis, and clustering methods.

Multiple applications will be explored during this comps. One application is to apply supervised classification techniques to distributional assessment. We will attempt to automate the detection of a set of an observed data plot (e.g. a Q-Q plot) in a field of plots generated under a hypothesized distribution (e.g. the normal distribution) and compare the accuracy of this approach to standard distributional assessments as well as human observation.

Another look at the Shadows of the Cantor Set

Description can be found here.

Statistics of Climate Extremes

The potential impacts of climate change can strongly depend on the changing characteristics of extreme weather events (heatwaves, floods, etc.). In part for this reason, many people in the climate community are interested in characterizing how extreme events are projected to change in the future. Extreme events are by definition rare events, so projected changes in extremes events are currently less well understood than are changes in mean climate.

This comps will be an introduction to the statistical methods used to analyze and model extreme values, with applications to the Earth’s climate. The perspective required for modeling extreme events is somewhat different from the methods you have learned for modeling mean behavior (as in regression analysis). We will mainly focus on methods using the generalized extreme value (GEV) distribution, whose role in extreme value analysis is similar to the normal distribution’s role in modeling means. The emphasis will be on data analysis and uncertainty quantification using the GEV distribution, and on model diagnostics.

The data we will analyze will be from a large, initial condition ensemble of climate model simulations. A climate model is a complex computer model simulating the Earth’s atmosphere and oceans under a prescribed climate scenario. An “initial condition ensemble” is essentially a whole bunch of independent simulations from one such model, used to study the influence of internal climate variability on projections. We’ll make use of the data both to understand how the statistical theory works in practice and also to investigate projections of changes in the behavior of climate extremes.

Symbolic Dynamics and Tilings

Description can be found here.

Sampling designs for highly clustered populations

The zebra mussel is a freshwater species of mussel that is native to areas in southern Russia. These mussels first arrived in North America in the 1980s after arriving the Great Lakes by hitching a ride on transoceanic ships. Once established in Lake Superior waters, zebra mussels were then transported to smaller Minnesota lakes by recreational boats. To effectively combat the spread of these invasive and destructive mussels in Minnesota lakes, scientists need to understand how their populations grow and spread over time once introduced into a previously uninfested lake. The goal of this project is to study how well various sampling methods do at quantifying mussel abundance and distribution in infested lakes. This project will begin with a review of design-based sampling theory. This theory differs from your typical 275 problem because our measured quantities are fixed values (not realizations from a random process) but who we sample is random. So randomness is induced by the choice of sampling design, not because, say, our variable is “normally distributed.” After reviewing some basic sampling designs (simple random sample without replacement, stratified sample, etc) we will focus on designs like line transect sampling and adaptive sampling that are often used in natural resource applications. In particular, adaptive designs, which modify “who” is sampled during the collection of data, are often used when populations (like mussels) are clustered in a small geographic regions. We will then consider how to apply these designs to sampling mussels in MN lakes. Using realistic mussel density surfaces, we will use simulation studies in R to understand how effective competing designs are at estimating mussel abundance. Effectiveness will be measured by considering quantities like estimator SE, confidence interval coverage, and bias.

Geometry and Gerrymandering

Every state is broken up into congressional districts, each of which elects a single member to the US House of Representatives. As populations grow and shift, these districts must be redrawn to comply with state and federal law. In most states this redistricting is performed by the state legislature. Partisan legislators may choose to redraw district lines in strange ways in order to favor a particular political party, an act known as Gerrymandering. Gerrymandering is generally considered to be a bad thing: it can reduce electoral competition, reduce voter turnout, and arguably lead to more extreme partisanship. For this comps project, we will look at the geometry of redistricting and examine ways in which mathematics can tell us the extent to which a congressional district might be Gerrymandered. One such method is a compactness measure, meant to tell us how much the shape of a district might deviate from an ideal shape. For example, one might measure how “uncircular” a district is (the Polsby-Popper ratio and the Reock ratio), how “unconvex” a district is (the convex hull ratio), or how difficult it is to fit straight lines inside a district (a measure called bizarreness, related to convexity). We will use these measures to assess Gerrymandering of actual congressional districts, discuss the mathematical reasonableness of these measures, and invent some of our own measures to capture Gerrymandering. We will also investigate how different algorithmic strategies for redistricting might affect elections.