Inputs & Outputs (Troy Davis, Seattle)

Anatomy of a Malware Ad on NYTimes com

On Saturday evening, Avast displayed a malware warning as I loaded a nytimes.com article.  After some digging, here's the malware I found.

Ad Delivery

nytimes.com article pages include an ad placement with the HTML DOM ID adxBigAd.  From loading a few articles, they seems to rotate between a banner and an iframe. On this article, a 300x250 iframe was inlining this URL: tradenton.com slash ?id=21610438 (note: I don't recommend visiting it, and have URLs are not linked where possible) A comment gave the campaign ID asVonage01_1163613_nyt12, though it was obviously unrelated to Vonage.  tradenton.com was registered Sept. 2, 2009, so it may have had a previous owner.

Injection

tradenton.com serves a 15-line HTML snipped containing this JavaScript: As anyone who has looked at phishing links knows, this is nasty on a couple levels. It's eval()'ing escaped code, which is almost never needed to serve an ad. Note that the variable action_URL is defined but never used. After unescaping the code, this is what's being run: What's served by harlingens.com slash includes02.js? Aha! The eval'ed JavaScript is requesting a second Javascript, which hits action_URL:

Malware

Now we're talking. Requesting that action_URL on sex-and-the-city.cn actually serves a HTTP 302 Redirect to protection-check07.com slash 1/?sess=%3DGQx3jzwMi02MyZpcD0yMDguNzUuNTcuMTIxJnRpbWU9MTI1NjgwMI0MaQ%3DN. And we hit pay dirt. It's a fake page for a non-existent antivirus app, which is actually malware. Titled "My computer Online Scan", this page displays this JS alert:


 

Popup from malware advertised on nytimes.com

Then resizes the browser window into a full-screen application-style, as if it had become a virus scanner. Some highlights from the static content and JS on this page:

Dont close this window, if your want you PC to be protected.

353 trojans

You need to remove this threat as soon as possible!

Scan procedures finished.

431 Probably harmfull items was found!

 

Here's a screen shot:


screenshot of web page of malware advertised on nytimes.com


Here's full HTML source in a gist viewer. As usual, these phishers haven't sprung for spelling or grammar checkers.

The page also uses IP-based geocoding by inlining its own iframe called geoip.php, which has city-level granularity (though it was off by 1,000 miles for me). The "Full System Cleanup" link goes to /download.php?id=2006-63 on the same server, which serves a file called Scanner-b4ba2_2006-63.exe.

That redirects to /download/Scanner-b4ba2_2006-63.exe, a static file with the checksum 6c5b5669151337ca51ec45b1f5785d02. Running strings on this 167 KB program - too small for any virus scanner - has it requesting administrator privileges, though I haven't done detailed forensics.

Notes

As of Sept. 12, 2009, tradenton.com and harlingens.com resolved to 212.117.166.69. sex-and-the-city.cn resolved to 94.102.48.29. protection-check07.com had 3 A records: 91.212.107.5, 94.102.51.26, 88.198.107.25. Also, I changed indentation and spacing for readability, so checksums on gist may not match source files.

Filed under  //   geeky  

Auction Lifecycle for Data Geeks

I collected stats from my first sell-side eBay transaction: number of watchers, number of bids, and current item price.  Here's what I found: eBay Auction Interest - Chart
Auction Watchers, Bids, and Price Over Time
Larger chart

Specifics: eBay, 7 day duration, start $0.01, no reserve, $167.50 sale, digital camera, July 2009

30-second Summary

  • 30-50% of prospects are trolling for a bargain that probably doesn't exist
  • May be an untapped strategy in bidding at 70% of sale price to "stake a claim"
  • Last-minute emotions didn't affect the price much
  • 7 days may not be long enough for all prospects to discover

What's noteworthy?

  • Ratio of watchers:bids (casual interest:purchase intent).  For most buyers, this auction was a 2-stage process: find & watch, wait a couple days, then monitor & bid.  Visible interest accumulates for 2.5 days, peaking at 11 watchers per 1 bid, then steadily drops to 1.5:1 at sale.

As the end approaches, there are far fewer lookie-loos relative to participants - most have either bid or left (un-watched) - and the ratio sinks. Nobody bids upfront, probably from past experience: each person knows nobody else will bid, so they just watch too.

Auction Ratios
Ratio of Auction Watchers, Bids, and Price
Larger chart

  • Ratio of price:bids (strength & frequency of purchase intent).  Because bidding is idle in days 2 and 3, this ratio starts and stays high ($17 of item price per 1 bid).  Then bids trend smaller for 2 days, dropping to an average of $7 per bid.  Just like a regular auction, the price increase per new bid is smaller later in the auction, though the absolute price increase is large thanks to lots of bids.

Only in the last hour does the dollars per bid spike again, and only in relative terms - small change that happens very quickly.  That's the sniper, who must overshoot because there's only one chance to snipe.  A larger spike here would indicate a desperate sniper.

  • Ratio of price:watchers (strength & frequency of casual interest).  This stays pretty constant throughout the auction until it rises in the last 2 days  I interpret that as bargain hunters bailing (un-watching) while bidding heats up.

Analysis

  • Watchers is linear for the first part of the auction, then bargain hunters start to realize they're not going to get a steal.  They adapt: they un-watch the item before it ends.   Watchers started dropping about 12 hours before the auction ended, when the price was at about 60% of final sale price.   This shows that some buyers still think it's a good use of time to hunt for undervalued items.  That's a challenge with sub-$200 commodity gadgets.
  • Assuming our sample size of 1 was representative, one could fit a line to the price ramp and estimate the ending price of a halfway complete auction.  This actually might not be far off, at least for auctions matching the same specifics (duration, reserve, category, etc.)
  • 3 bids arrived in the last 40 seconds, yet those were the only bids in the last few hours.  I checked the history of automatic bids (where eBay rebids up to your max allowed amount) vs. user-entered ones: only 2 people (winner and 1 other bidder) were online at 11 AM when the auction ended.  Only one of them bid until the last 10 seconds.  Unlike live auctions, last-minute emotions didn't affect the price much.

This reinforces that online auctions aren't impulse-driven, at least for smaller items that can't draw enough live users to start a bidding war.  Someone may get caught up in their own re-bidding midway through the auction, and there's an adrenaline rush for those few who actively bid when it ends, but in this auction there wasn't any going-once gavel pounding.

Conclusions

  • Although the winner asked me a question 2-3 days in, their first and only bid was 7 seconds before the end.  This was destined to be sniped from the very start.
  • No one jumps in and makes a semi-serious bid (say, over 60% of final price) before they need to.  The serious prospects believe that by placing a real bid, they'll increase the baseline and consequently the sale price.  In a sense, we're telling ourselves that we might get a steal by not bidding, and that other people's bid increases are based on the current price more than their willingness to pay.

That may be true when many buyers are already watching an item.  However, I could see making a bid on day 1 that's 70% of final value, and thereby decreasing the number of people who bother to watch it.  Basically, you'd claim the item for yourself, like bluffing in poker.  Assuming the higher first bid translated into fewer watchers, I could see that keeping the final price lower.

  • Prospective bidders may have still been discovering the item 7 days in.  The number of watchers never really flatlined.  While the total number of watchers dropped at the very end, we can't tell whether those folks un-watching were partially offset by new watchers.  Based on how quickly the watchers dropped in the final 12 hours, I'm guessing the short flat-lining was driven by the auction ending.

Would more people watch an 8 or 9 day auction?  Probably.   Whether one of those watchers would be willing to pay more is a much harder question.

Ideas

  • Because eBay doesn't provide an in-depth event history, we don't know when the 2 ending bidders watched the item, nor how they found it (search? what keywords? browse? which category?).  A deep "event log" with timestamps, referral reason, and username could let sellers write custom auction strategy management tools - innovate on selling strategy rather than logistics or sourcing.

I could see charging for that post-transaction visibility, since pro sellers are the only ones likely to refine listing strategy, and presumably they're receiving higher sale prices.  Bigger challenge: Prospective buyers would need to acknowledge a warning that their watch interest would be made available to the seller, whether or not they actually bid.

  • About 1 in 3 watchers explicitly un-watched this item in the few days after it ended, rather than letting eBay roll it off their watch list.  This may mean that eBay can to do a better job of post-auction cleanup or item segmentation, or that 1 in 3 people are really pedantic.
  • There's still room for an auction format which ends 10 minutes after the final bid.  Such a hybrid of "buy it now" and traditional auctions that might actually speed the auction up - nobody will be motivated by end time, and that could front-load the bidding.  Wouldn't take much to be more front-loaded than this auction was.

Note: Bid amount is public-facing current bid, and includes automatic bid. By default, eBay post-auction bid history does not include automatic bids.

Filed under  //   geeky   product management  

GMail Wishlist

GMail's already-marvelous interface need not deter our constructive "backseat engineering." Here's my wishlist.

1. Endless popup warnings when replying to Trashed threads. Compose a reply to a deleted (not archived) thread. A warning dialog box pops up and steals focus every 30-60 seconds to say "This thread is in the trash." Yep, just like it was a minute ago. I'm usually typing when dialog box grabs focus, and with keyboard shortcuts enabled, my next few compose keystrokes teleport me halfway through the UI.

Wish: Show the dialog box once when I create the reply, or show a simple message in the compose window instead of taking focus. Then let me type.

2. Awkward handling of seldomly-used labels. I have a few labels that are only attached to 10-20 thread, and many old labels that won't ever get attached to another thread. The labeled threads would be hard to reconstitute through search.

While I can now hide them in the inbox view (yay) and IMAP, and can search for labels when attaching one to a message, it still feels awkward to see "50 more" labels. Some of my label names were chosen to influence sort order ("lists/blah").

Wish: A place to stash archived labels. I'd be happy with an Archive flag on labels, which would display them in the folder list under a single expandable link, or onl "Show Archived" radio button. Existing Manage Labels options (show/hide, Show in IMAP) seem like client app-specific workarounds for not having Archive; show/hide is really "Show in Web." Real hierarchical labels (nested "groups of labels") is more than I need.

3. Compose and Contact links behave differently than labels. Since the Compose and Contacts links are to Javascript targets, I can't Ctrl+click to Compose in a new tab, or right-click and "Open in New Window" on Contacts. I leave the main GMail inbox tab open all the time, and I'm usually already looking at a thread when I want Compose or Contacts. I end up using the Inbox link to open a new tab and navigate from there.

Wish: I think I'd accept a bit more latency for being able to use Compose and Contacts links like I do label links.

4. Can't auto-merge contacts (and until recently, not even a "Merge these 2 contacts"). Also, after manually merging contacts, I'm sent back to the top of the contact list (rather than the previous scroll position), so it's painful to do with lots of contacts.

Wish: "Auto-Merge All," then show me a long list of affected contacts to skim/correct. Show 50 or 100 merged contacts per page, not 1. Stopgap: scroll to merged contact after a manual merge.

5. "All Contacts" has a better memory than I do. GMail adds anyone I email to my "All Contacts" set, yet when composing, I sometimes remember the name of a company/domain name but not an individual.

Wish: In the To field, search contacts' email addresses (domain names, specifically). This already happens when no first or last names are defined. When find-as-you-type doesn't match any contacts, consider waiting a couple seconds then showing "Search Contacts for 'blah'." That way I don't have to delete my partial address, click "To" for the contact picker, and re-type my query.

Filed under  //   geeky   product management  

A Handful of Mac OS X Tips

Some things I've learned over the past few weeks:

  • I love one-finger tap to click. In the week I went without it, I never got used to the extra trackpad resistance at the edges. Also turned tracking speed way up.
  • Avoid carpal tunnel: Alias Cmd+# to switch to a specific Space. If you used WindowMaker or Fluxbox, it's similar. I also have a hot corner (lower left) show all Spaces.
  • There's a big enough difference between free and paid apps to justify paying for software again, and bundle sales happen all the time. See MacUpdate Promo and MacHeist.
  • Get Safari AdBlock, which is based on the superb AdBlock Plus.
  • Change Cmd+Space alias to call Quicksilver instead of Spotlight (and never use Spotlight again). Everyone's first piece of advice is to install QS, and Apple should integrate it into default installation.
  • Gmail Notifier is awesome. Mailplane might be better; the icons, new mail count, and easy Calendar access in Gmail Notifier solve my problem.
  • Campfire+Propane.app is worth its weight in gold.
  • GarageBand was made for cross-country flights
  • Required: NeoOffice, Adium, Fluid.app (and Fluid apps for a handful of sites you care about), Skitch, cdto (open Terminal in current directory from Finder), Dropbox, Transmission

Filed under  //   geeky  

Short, fast micro-whois

90% of my whois queries are to check whether a domain name is registered, where I don't need any details. Here's a Bash function to check domain availability; type one character, get one character.

Problem: I want a responsive checker that I can start in an instant, with almost no typing. It should be omnipresent: accessible from as many desktop windows as possible, without task switching. Rather than showing details, output should be short so I can see my query history evolve. And since most domains are registered (and I want to get past them), response time matters for registered domains more than available ones.

Solution: I have this bash function in my .bashrc (update: two versions by request -- shell script and Ruby). You can download micro-whois here.

 

and voila, zero-effort domain name availability:

$ d yort.com

1

$ d ihopethisonedomainisnotregistered.com

0

 

Registered

$ time d yort.com

1 real 0m0.260s

 

$ time whois -n yort.com

.. [ 60 lines ]

real 0m0.782s

 

Unregistered

$ time d ihopethisonedomainisnotregistered.com

real 0m0.649s

 

$ time whois -n ihopethisonedomainisnotregistered.com

real 0m0.657s

Filed under  //   geeky   product management  

Temporary code never is

Somewhere it's the Third Rule of Software Product Development: never implement a feature poorly because you expect it to be thrown away. Sure, rewriting is expensive, and seeing a permutation of "spend time with no product gain" is thoroughly demotivating. But there's 3 better reasons:

  1. You'll never actually get back to doing it. It will take up permanent residence in the back of your head.
  2. By necessity, other features immediately start hanging off of it. When you do find a free day to blow on rewriting something (among the 5 things you meant to circle back to), it's too late to redo the (a) right way: that 4-hour task is now a 3-day refactor.
  3. Humans can't switch from "write disposable crap" to "write well-thought-out, maintainable, valuable code." The work process is different. Having replaced planning time with dive-in-and-code-code-code, Dumpster-ready features become the norm.

Oh, and it doesn't take that much more time.

Filed under  //   geeky   product management  

Self-signed IMAP SSL certs on iPhone

I'm sure somewhere on the planet, there's a second person with an iPhone who doesn't use GMail, and perhaps that person uses their own self-signed SSL certificate for IMAPS.

When adding a new Account, iPhone setup will pop up a dialog to confirm the shady self-signed certificate.  Even with that acknowledged, it will try to connect and eventually time out.  The warning doesn't mention (nor prompt to install) the root certificate, which will make the cogs turn.  Put the CA root cert on any Web server (.crt extension and application/x-x509-ca-cert MIME type), then hit the URL in Safari from the phone.

You'll see an Install Profile dialog like in the Enterprise Deployment Guide:

No need for the Configuration Utility or Enterprise kit.  The deployment guide says you can also attach the cert to an email, then open that message on the phone. Apple, clicking through the IMAP cert alert should make that cert trusted, or at least warn why it won't work until the root cert is added (and how).

Update: I'm no longer the last person on the planet not using GMail.

Filed under  //   geeky   networks  

SIX tips every peer should know

A gaggle of Seattle network operators converged in one room last week, for the annual Seattle Internet Exchange (SIX) members meeting.  As a janitor and board director, I was already preparing a handout, so I used the back side for six SIX tips.

How TCP sliding windows and the bandwidth*delay product works

Given the desired throughput of a TCP connection and its round-trip time, the [bandwidth*delay product](http://en.wikipedia.org/wiki/Bandwidth-delay_product) is the minimum [TCP window size](http://www.rhyshaden.com/tcp.htm) that each endpoint (host) must support in order to transfer at that throughput. Trying to saturate a 100 Mbps cross-country MPLS VPN with a single HTTP transfer?

BDP = bandwidth * RTT

100 Mb/sec * 70 ms = 12,500,000 bytes/sec * 0.070 sec = 875,000 bytes

Each endpoint's OS TCP window size must be at least 875 KB, or it will be the bottleneck.

Enlightenment through iperf

iperf beautifully solves two problems.  First, how to generate a fixed amount of traffic and measure packet loss at that rate.  "Send 30 Mbps of UDP to this IP, regardless of TCP congestion control, packet loss, or anything else."  Second, how to simulate a TCP flow given different parameters (bitate, window size).

See whole packet payload, not just headers, with tcpdump

Most folks have used tcpdump to show packet headers.  Even handier for diagnosing protocol-related problems is to print the full payload.  To disable reverse DNS lookups, sniff eth0, print the payload, and sample whole packet: tcpdump -n -i eth0 -X -s 0

Find a LAN IP's port without tracing cables

Get the MAC address.  Using another system or router on the same segment, generate traffic to the IP in question.  First, get its MAC: arp -a or show arp Then find the MAC.  In IOS: show mac-address-table address 00ff.dead.beef

5 minute samples are almost meaningless for many traffic profiles

While measuring usage counters every 5 minutes works wonderfully for calculating total traffic passed, in many cases it leads to wildly inaccurate throughput estimates.  5 minute averages round off most useful granularity; when a circuit's 5 minute average usage is over 50%, it may already be the bottleneck for on microsecond granularity, depending on number of endpoints, traffic profile, serialization time, and other characteristics.  Try polling 1 device on a 1 minute or 30 second sample.

Filtering ICMP is not a security requirement, and in fact doesn't improve security.

It's just really annoying. If you must filter, allow the types  required for basic operation, like TTL Exceeded and Host/Port Unreachable.

Filed under  //   geeky   networks  

Open-source public school assignment algorithms

Seattle's school assignment system runs on a 1979-era VAX that will cost $2MM+ to replace ("Dinosaur' computer stalls Seattle schools plans").

Every Seattleite has heard a horror story about 45 minute busing caused by the School Assignment Process, and parents sued the district to eliminate race as a tie-breaker (eventually winning in a US Supreme Court decision).

The system's so broken that there's nothing to lose.   Here's how open-sourcing the software could help, as sent to Seattle School District. Minor modifications for readability:

 

Date: Tue, 26 Feb 2008 09:42:36 -0800

To: schoolboard@seattleschools.org

Hi, I read today's article in the Times about SSD's aging VAX, and it brought up a novel idea.  There's been considerable work in the voting world to create "Open Source" [1] software.  That field wants to:

  • increase transparency improve the underlying process
  • reduce fears of (and controversy from) tainted results
  • share knowledge with interested parties
  • engage outside (and otherwise-adversarial) entities
  • try new systems, technologies, ideas (without paying for it)
  • decrease or share burden of maintenance costs
  • not appear insular

Why not open source the school selection software?  What could that do?

  • position SSD as a thoughtful, extremely well-intentioned leader
  • reduce fears and questions by pointing to the "real McCoy"
  • collaborate with other districts on selection strategy and implementation
  • let outsiders take a stab at improving it, or simply playing with different results
  • shine more smart eyes on the problem
  • let other districts see what a large district does, and maybe roll their own criteria into it

There's nothing proprietary about school selection; on the contrary, just like voting, the goal is the best, most transparent, most practical result above all else.

It would put SSD at the forefront of school selection, not to mention technology and execution savvy. Others have done this for similar reasons, with similar results:

  • Netscape/AOL, in open-sourcing Firefox (now in use by 15-20% of Internet users.
  • Netflix, in opening their movie selection algorithms (and creating a prize for improving them).

This is also happening all over the education field:

  • FlexBooks, open source textbooks; dozens of industry luminaries create best-of-breed textbooks.
  • Moodle, open source curriculum/course management in use at over 35,000 sites to teach 14,000,000 students.
  • MIT OpenCourseWare, exactly what it sounds like: MIT's courses, online, free.

The list goes on.  Most of the same reasons and benefits apply to an open-source school selection application and algorithms.

[1]: what is open source?  Software programs whose "source code" is available for review, analysis, use, or modification as other interested parties see fit.

Filed under  //   geeky   seattle