I don’t think about Earth often. I don’t think about how its working, or why the sky is blue. However, when I do think about Earth, I must face the fact that I simply know very little about it! Recently, I started a project that forces me to know quite a deal about Earth. I’m quite voracious with my thirst for knowledge, but for this project I started from scratch. I still haven’t finished the project, but while I’ve ben working on it , a tangential concern struck me. I spent over a month reading research papers and attempting to understand how to model the sun’s motion and radiation affect on the planet. I could have spent 1 week, if I knew where the information about the derivation and origin of the formulas were. Search needs to improve.

How can search be improved ? Searching for a topic and searching for a formula, are almost one in the same. The difference is that with a formula you have context. Standard topic search was typically good enough, to get me to a point where some websites might contain references to the information I was seeking. From there like a good sleuth I followed the breadcrumbs, until I could find the magic phrase that was simply a line in a paper.

Context should allow you to generalize formulas and pattern match against them. What you’re matching against is a finite set of math symbols, and then using mathematical rules and properties, find an another set that matches those concerns. If the symbols that match are within a some distance of each other (carat spacing isn’t enough) and otherwise match you have found a reference to the formula. From there you might use natural language processing to figure out if the text in the website or pdf is attempting to explain the formula, use the formula or derive the formula . From there you can classify the documents by how they use the information.

The killer application, may be in education. You can imagine typing into a search engine explain Pythagoras’s theorem c^2=x^2 + y^2 . Then referencing and linking to the proper article. Google already attempts to do some of this but, I don’t believe there is a strong enough weight on formula and meaning. One step further and you save researchers, and people in general a heck of a lot of time.

Search needs to improve