Stuff I Built in no particular order
Embedded Checkout Bread Finance
Merchants hate it when a service provider takes you away from their page to perform a transaction. At Bread, we provided a straightforward API to easily integrate any storefront experience with Bread Financing. The button can look like our default button, but is completely customizable for those who preferred a whitelabeled solution. screenshot
Go ReactJS PostgreSQL
Disaster Alerts in Google Search / Android Google
Cellular providers have been required by law to (finally) start pushing out timely public safety alerts to phones. These alerts are usually in all caps and accompanied by an obscenely scary noise. Using the data that the Public Alerts team is already ingesting, we wanted to supplant (or at least suppliment) these alerts with something less panic-inducing, and more information forward. Our mission was to replace "FLOOD WARNING" with a map and detailed (decapitalized) information such as time, duration, depth, likelihood... all the stuff that matters when you need it most.
My particular contribution to this project was to take the alert text from sources like the National Weather Service and extract out the most relevant parts and recombine them to form a useful and informative snippet. For example, on a Tornado warning I'd focus more on directional and temporal aspects rather than rainfall amounts. This was done by tokenizing the text and weighting certain n-grams based on the alert type. I also worked on the UI for both the Web, and Android.
These alerts were intended originally for only for Maps on the Android platform, but once demo'd we were asked to put them in Google Web Search, iOS Maps, and Google Now.
screenshotGo Python C++ Javascript
Anti-Fraud Engine Bread Finance
Not only do fraudulent transactions cost the company money, but in many cases a failure to properly identify the individual to whom you are writing a loan to can cause some serious legal stickiness. This fraud engine collected all sorts of data throughout the checkout process like IP Address, Device ID, Geolocation, and velocity. In addition to that, we also ran verification checks on the addresses provided, the mobile number, and checked a ton of fraud databases.
All of these pieces of discrete data are collected throughout the process, and then just before you are able to complete a check out the fraud engine would generate signals out of all the data and run against a custom built rule set. The decisioning is completely concurrent and as fast as it's slowest rule. Given that most complicated rule can really only be a string comparison, we are able to evaluate over a thousand rules on a huge amount of data in under 200ms. It was pretty sweet.
Go
Cross Portal Search Admeld
Between Users, Ad Network Accounts, Creatives, Orders and a whole other mess of other things... finding a particular piece of information quickly was a trial for internal Admeld users. Cross Portal Search (affectionately known as 'Corporal Smurf' after a post-dentist talk with the CTO) was a data-type agnostic search engine to find whatever you were looking for.
This search engine indexed all the important metadata about every piece of data in our system to allow quick recall from virtually anything you wanted to search on. This was built in Java using Lucene, although if Elasticsearch were around at the time it would have made my job a lot easier. This is a nice segway in to the next item...
Java Lucene MySQL
Inventory Search isocket
isocket's core mission is to make it easier for Publishers and Advertisers to find each other and do business directly. Central to that goal is the ability to find inventory available on a particular site by virtually any number of vectors, such as comScore demographics, ad placement, price, IAB category... even weather - although no one has ever used that. screenshot
Node.js Javascript Elasticsearch Java Thrift
Content Categorizer / Search Monitor110
As User Generated Content came on to the web, Monitor110 ingested content in to it's multi-stage processing service. Using a homegrown Bayesian categorization engine, articles are tagged with relevant stock tickers or category. Daily re-training of the engine actually refined the queries such that after a period of time we would see articles popping up in a topic that we had not put in a single training term for.
The best example of this was during the time of the first iPhone. Rumors were circulating that Apple was releasing a phone and the internet chatter had dubbed it the 'Jesus Phone'. Within a few days, articles mentioning 'Jesus Phone' and no other Apple-esque terms started appearing in the APPL ticker. We all ran away terrified as the now-sentient system threatened to blog mean things about us.
Java Lucene ActiveMQ MySQL
RTB Analytics Admeld
Insight in to how a Publisher was making money off Admeld's RTB platform was essential to selling the concept of RTB, as well as providing any shred of market intelligence to a typically black-boxed world.
The RTB Analytics platform leveraged our already existing (and ever growing) Hadoop-based data aggregation system. By slicing different types of data in to pivotable chunks, we were able to present clear and dynamic visualizations of how a Publisher was making money. (Screenshot hopefully coming).
Ruby Sinatra jQuery Hadoop Vertica Javascript
Web Automation Admeld
Web Automation grew out of two essential needs :
- Providing Publishers all the data they needed in one place.
- Insight in to what the Ad Networks were doing to enable our Ad Servers to make more informed decisions.
We built a system that would log in to various ad network reporting systems (very few had APIs) and scrape out relevant impression and payment data. This data would be used alongside our own reporting information to uncover discrepancies. As an example :
We would request to serve an Ad from both Ad Network A and Ad Network B. Ad Network A would tell us they pay $1 per impression, while Ad Network B would report they only paid $0.90. We would serve A. After scraping numbers from their reporting system and comparing them against our own numbers, we would see that Ad Network A would underreport the impressions - either for nefarious reasons or just lack of technical sophistication. We would know that we served 100, but they would report 80. This meant that we're getting paid only $80 for 100 impressions while serving from Ad Network B (who reported more accurately) would have netted us $90. The Ad Server would then work off our own internally derived CPM and favor the true higher price. Ad Network names have been removed to protect the innocent.
The system was built in a time before really cool stuff like Agouti existed, so we rolled our own solution. A service would fire up individually addressable instances of Firefox, which had ports open to allow us to inject our own Javascript in to the page. Every page the browser visited would then contain our own admeld.js which had methods we could call from the main Java scraping service. (i.e. pullTableColumns would exist as a Java method which then call the corresponding Javascript function through the JSSH port).With this framework now in place, I built a runner that would use Groovy scripts that were custom built for each reporting platform. User credentials were injected in, and the service would return simple JSON results from the scrape. We were essentially API-afying a site and normalizing their data to fit in to our system in a clear and IAB standards-compliant way. It was easily one of our biggest wins, and often cited as a key reason Google aquired Admeld.
Firefox Java Groovy MySQL JavascriptBezerk and Smoosh Admeld
Why "map" when you can Bezerk? Why "reduce" when you can Smoosh? In order for Admeld's ad servers to make intelligent decisions, we needed to know what they were doing.
Ad Server clusters in multiple co-locations dumped billions of lines of log data. More often than not, a single request would span multiple servers. Small (local) Hadoop clusters would grab 15 minute increments of log data from all the servers, and then combine these lines together in to a coherent RequestChain object which would represent only the relevant (non-duplicated) data points about that particular request. These newly smooshed objects would then stream in to the central Hadoop cluster for some serious crunching. At any given time, we were no more than 15 minutes off real-time with our reporting. This was huge, as most Publishers were accostomed to only getting this data every few days, or sometimes even a whole month. Combined with the Web Automation system, we were able to have a complete view of what was going on in what was quickly becoming a very very large ecosystem.
Hadoop Java MySQL
Power Outage/Gas Supply Maps During Hurricane Sandy Google
During Hurricane Sandy, the NYC branch of the Google Crisis Response team had a rare opportunity to operate in an area where we were actively responding as well. Since we were able to work closely with NYC Department of Emergency Management and the NYC Mayors office, we were able to get access to power outage data we normally were never allowed to use and plot it on our Crisis Map.
There was a more interesting data point we were able to capture as well. There were a lot of software engineers who no longer had to go to work for the week and were forming up under IRC and Twitter as "Hurricane Hackers" to try to help out in any way they could. I was able to engage that community and Google provided resources to help them, and they in turn were able to hack together a gas supply data set that I was then able to plot on the Crisis Map. While probably not in any way the most difficult thing I've worked on, it's the only thing I've ever built that I saw on CNN while I was coding it.
Go Node.js Javascript Appengine
Magicbus isocket
Evented io / pubsub backend. TODO: FILL ME IN
C++ Thrift Java Node.js Javascript RabbitMQ Redis Go
Firemeld / Firemeld for Chrome Admeld
Firefox Chrome Javascript jQuery
Dogfort Side Project
Anti-social networking. As my circle of eng friends from Admeld got thrown to the diaspora, we could no longer all communicate via shared Skype session or on corporate GTalk. Also, most chat platforms were pretty weak on showing images and video. We tried our own private sub-reddit, and a number of other things and finally Dogfort was born. It had a few key essential features:
- Invite Only
- One-click image/video sharing from anywhere
- Group chat
- Abundant shaming for reposting an image
Built under cover of darkness, and usually while a little drunk - Dogfort has been through many iterations. Direct contact with Dogfort users provide a clean and efficient feedback loop for incorporating new features. Today's version has a voting system, comments, and a Chrome extension to allow for quick posting. Built using AngularJS on the frontend and Node.js on the back, Dogfort also gives me an awesome playground to try out new technologies before advocating for them in my paying gig. screenshot
AngularJS Javascript Node.js MongoDB Heroku Express Firebase
This Resume Of Sorts Side Project
Not sure how it's going yet.
HTML Bootstrap
Where I Built It in a rather particular order
Bread Finance Principal Engineer / Architect / Quartermaster
New York NY Aug 2014 -> PresentBread makes it easy to pay over time for the things you love. link
iSocket (acquired by Rubicon Project) Senior Software Engineer
Burlingame CA, New York NY Mar 2013 -> Aug 2014iSocket simplifies the direct sales process. Their technology automates the manual steps like order execution and monitoring, helping premium publishers and advertisers do more business directly. link
Google Software Engineer V
Mountain View CA, New York NY Dec 2011 -> March 2013Google.org Crisis Response Team When disaster strikes, people turn to the internet for information. We help ensure the right information is there in these times of need by building tools to collect and share emergency information, and by supporting first responders in using technology to help improve and save lives. link screenshot
Google Public Alerts Being prepared is just as important as knowing what to do in a crisis. Public Alerts provide a warning before disasters cause damage, so searching for "Hurricane info" in Miami when there is an alert for the East coast would trigger a Public Alert and give you time to prepare. link screenshot
Admeld (Acquired by Google) Senior Software Engineer
New York NY Aug 2008 -> Dec 2011Since 2008, Admeld has led the industry in helping premium publishers maximize their ad revenue and simplify their operations. Admeld pioneered the private ad exchange and built technology that made it easy for publishers to identify new opportunities and control how every impression is sold. link
Monitor110 Lead Engineer
New York NY Mar 2005 -> Jul 2008Monitor110 gathered information from over 130 million sources of various types, ranked by financial market knowledge through a proprietary algorithm that takes 50 factors into account. Users could choose between top sources preselected for their market sector and subscribe to sources of their own. Static sites can be monitored for changes with good granularity. Premium subscription and other deep web sources, blogs, forums, news and regulatory filings are among the sources included. The end results are delivered through proprietary RSS reader with email, IM and SMS alerts as appropriate. article post-mortem screenshot
Nat'l Reading Styles Institute Consultant, Software Engineer
New York NY Oct 2001 -> Mar 2005NRSI is a research-based educational organization dedicated to improving literacy. link (I swear I didn't build this terrible site)
SUSS MicroTec Senior Software Engineer / Electrical Engineer
Ste Jeorie FRANCE, Vonnigen GERMANY, Waterbury VT Aug 2000 -> Oct 2001SUSS MicroTec is a leading supplier of process equipment for microstructuring in the semiconductor industry and related markets. Their portfolio covers a comprehensive range of products and solutions for backend lithography, wafer bonding and photomask processing, complemented by micro-optical components. link
International Business Machines Software Engineer
Binghamton NY, Essex Junction VT Sep 1999 -> Oct 2001IBM microelectronics delivers application-optimized semiconductor technologies designed to take performance, integration and power efficiency to the next level in solutions spanning mobile and wired, from consumer products to robots that do really well on Jeopardy. link