Alright before we start, I’m gonna call the Conditional Gradient Algorithm as Frank-Wolfe Algorithm (FW), attributing to the authors of the first ever paper on these methods viz., Marguerite Frank and Philip Wolfe in 1956!  The simplex method (one of the top 10 algorithms of the 20th century, I’d say that simplex gets the first place for me) was introduced by Dantzig (I strongly encourage to read the history in the wiki page, it’s amazing!) in 1947 for solving linear optimization (LO) problems. A LO algorithm solves the following question: Let’s say we are given with the following two types of data viz., 1. A linear function $f$ in say $\mathbb{R}^n$; 2. Legal set: A (sub)set of points $C\subseteq \mathbb{R}^n$ that can described just using linear equations or inequalities. Now how can I pick the point in the legal set $C$ where $f$ achieves the minimum value (or maximum value, it doesn’t matter in this case only, in general this is not true).

# Bare bones of Matrix Completion – III (4 mins read)

This will  hopefully the last post in the matrix completion problems. But since the MBI inequality is widely used, I decided to squeeze in one more application of Community Detection problem in this as well (yay!) in the future post. So first the matrix completion. problem, we will see how a simple algorithm can yield exact results. To that end, since the problem as stated is ill posed, we will first make some assumptions on the matrix that we are completing, namely, the low rank assumption (we will discuss the case where the matrix we are completing is square).

# Bare bones of Matrix Completion – I (7 mins read)

This last three weeks I was in Utah for a summer math school or conference whichever you prefer that was organized by the Institute of Advanced Studies (IAS), Princeton, New Jersey. Basically we had three weeks of lectures, problem sessions, endless fun discussions day in day out with free food, free stay – the resort was amazing, we stayed at the Zermatt resort (if you’re using headphones you might wanna turn the volume down a bit). Anyway, the point of this post is lectures given by Roman Vershynin that I sat through on Random Matrices and applications (basically this is a scribe). Specifically, about the famous problem called as matrix completion. Later (hopefully soon) I’ll write about another topic that definitely got my attention called as Differential Privacy.

# Why are vector spaces so good

In the basics we saw that vector spaces (v.s.) are defined over a field. Field is the keyword. Recall that examples of field include real numbers ‘r’, complex numbers ‘c’, rational numbers ‘q’. Let’s take the last example of q. q consists of numbers of the form a/b where a and b are integers like -2, 3, 5, 10 etc.. Examples of numbers that are not rational include square root of 2, pi, e etc.. v.s. of dimension n are sets of n – tuple of numbers where each number in the tuple comes from the underlying field. The question is, why do we need a field, that is, why can’t we just define it over the set of integers ‘z’. Recall also that z shares all properties with q except that not all elements have a multiplicative inverse i.e., there is no number for 21 in z so that the product of both of these numbers equal 1 but if we ask for an element in q, then we have 1/21 * 21 = 1. But that’s ok because nothing is stopping me from defining n – tuples from z. For example let’s look at z^2 which consists of numbers of the form (a,b) where a and b are integers. Well, the world didn’t come to an end anyway, in fact, we can see that if we take two 2- tuples of this form and add them coordinate-wise, we get a number of the same form with different integers. Secondly, if we multiply (coordinate-wise) a 2-tuple with an integer we still end with a different 2-tuple where each coordinate is again an integer, so what’s the problem?

If you noticed, a significant of aspect this blog is digression all over so I thought it is only natural for me to write about it. It turns out that one of the main reasons why math is ‘hard’ for those who feel that it is hard is that they learn math the hard way which is why it is hard. That doesn’t obviously sound like a thing, but the truth is that it is as good as any other reason. Let’s take an example.

# Continuous time zones, yes please

Here we will see how the phenomenon of time zones be made very ‘easy’, that is, for non-frequent travelers it becomes a pain every time they have to figure out how long the travel duration is even though most airlines provide the information in the ticket. I personally have tried computing the duration of actual travel in my mind just using the local time of departure and local time of arrival. Why would someone do this? well, why not? it’s just a good mental exercise, the calculations are often interesting when there are more than 1 or 2 stops or worse if we miss the connecting flights. In the latter case we have no other option but to compute ourselves or well ask someone to compute it for us. So, what makes the timezones computations kinda hard or interesting or complex? From the mathematical point, it is because of the fact that the time zones are not continuous. Let me try to motivate what I mean. One reason why timezones can be confusing  is because that the places where times change are often arbitrary in the sense that they are chosen by a specific set of people, so not most people know the reason why India just has 1 timezone or why US has 6 timezones. Well the question here is not why we need multiple timezones but why those specific numbers. One might try to make a crude relation that the number of timezones for a country increase with the size of the country in terms of the area of the land. Well this might be a very bad assumption because the area even if we assume countries to be rectangle is length  times breadth (product of difference between extreme points of latitudes and longitudes ) but only breadth (longitudes) contributes to the timezones. A smarter option would be to just use longitudes, well that might be bad too, because of daylight savings (see the timezones of US when the daylight savings are on, it’s ridiculous!). Anyway, a mathematician would easily see that the way timezones work is completely arbitrary with the current setup. How do we fix this? Here we go, I’ll explain the process since I figured that, that is the best way to explain it a clean way.

# Some basics

OK, this post is supposed to be a gentle introduction to some of the words or terminologies used in math. What is a ‘set’? A set is a collection of distinct objects, eg: {apple, orange}, {football, basketball} etc.. As we can see, a set is written with ‘{‘, ‘}’ brackets, the examples are read as ‘a set of (or consisting of) apple and orange’ and so on. Individual entities of a set are called as ‘elements’ simply because they make up the set that we are interested in. As we can see, we can’t do much just with a set, well it’s not true cause lots of interesting objects are almost always defined using a set, but anyway what we really care about is ‘relations’ and ‘mappings’.

# Linearity – I

The word ‘linear’ is used in so many different contexts that it kind of loses its importance. After a lot of contemplation, it turns out that this word as simple as it may sound, has far reaching consequences. The point of this post is to exactly make this clear and precise (maybe not precise). There are lots of ways to describe linearity. Someone who has taken a calculus course in high school or college will say that things are nice when everything is linear, for example linear equations or straight lines have a constant slope, oops we need to define what ‘slope’ is. Let’s say we would like to buy sugar. The cost of a kilogram of sugar is \$10 and to simplify things, we pay a tax of \$5 how much ever the quantity is. If we want to buy 10 kilograms of sugar, we need \$105 i.e., 10 times 10 plus 5(meh).  A linear equation is a thing that can be written in this form i.e., c = ax + b where a and b can be any arbitrary numbers, x is the amount of sugar and c is the total cost to buy x kilograms of sugar. The number a is often called as the slope since it decides how steep the function, that is, if a is a big positive number then we can say that the buying sugar will be expensive and so on, well to be honest it also depends on b but let’s say that we always have \$b. We note at this point that this also allows a to be zero (now what does that mean in our example? I’ll leave that for you). But the situation is bad when a is zero, why? this requires some amount of reasoning and some machinery or terminology to address this issue.

# First blood drawn

This is due for a long time. I’ll be writing about stuff that I learn everyday and things I have learnt, mostly math and computational aspects of math. I’m not sure about the audience for my writing but we’ll see how it goes. I should mention that this will mostly be non-rigorous and should be fairly accessible for all but at times my nerd nature will be shown. This is because I often stumble how nice looking things show some weird nature and can easily become incomprehensible when beaten with rigorousness or when approached more formally. For those who’re still curious, the specific topics include applications of pure math (algebra and topology) in Machine Learning problems though not strictly. I’m also a big fan of numerical optimization and math programming, so you can expect that to be covered as well. I also intend to post some ideas and problems that I intend to think about seriously so I hope we all learn together.