Meet a zulily Developer: John

Each month, we’ll talk with one of our developers and learn about a day-in-the-life of a zulily engineer.

Who are you, and what do you do at zulily?

I’m John, a tech lead on the SHIPS* team.john-self

*The name of my team has changed numerous times during my tenure at zulily, and actually is about to change again. Other names for the team I am on have been: Supply Chain, FMS, PFOAM, SCS, “those folks that deal with shipping stuff to Mom”…

When did you join zulily?

I started in June of 2012, so it has been 2+ years.

What was it like in the early days?  Tell us a crazy story.

  • On my first day, I vividly remember Dan Ward coming up to me and introducing himself. He was wearing a neon orange shirt, white pants, a neon orange belt and neon orange shoe-laces. I remember thinking to myself, “This dude is really friendly, but that is a lot of neon orange!” ūüôā
  • Later in the morning of my first day at zulily, I remember hearing “Good morning!!!” <CLAP>, <CLAP>, <CLAP>, <CLAP> over and over again. Of course this was Tatiana leading a conga line of folks who were telling everyone “Good Morning!!!” and giving them a high-five.
  • For lunch on my first day, I went to Pecos BBQ Pit in SODO and ordered a pulled pork sandwich with the “hot” BBQ sauce. I like spicy food, but not ghost chili peppers pureed with the tears of Satan…
  • Later in that first week, zulily announced that they were going to be the first company to integrate with SAP in 90 days (where most companies take 18-24 months to do the same amount of work.) My team did a lot of the heavy lifting on this aggressive project, and we pulled it off. ¬†Even built a LEGO Galactic Empire Super Star Destroyer during the process. ūüôā
  • A year later zulily had another aggressive project where I got to travel to London with Dan Ward and Neil Harris to deploy SAP into the UK portion of the business. Again we managed to pull off this aggressive project in “zulily time”, I also came away with a serious love for Brown Sauce, Bacon Butties, and Neil Harris’ ability to function at a very high level sans sleep.

john-destroyer

What is different now?

zulily still moves very fast and is very aggressive. What is different now is the number of folks to help with the work, and the impact of the work has been magnified at least three orders of magnitude. I still cannot wrap my head around the growth.

What’s a typical day like for you?

I get into the office around 7am before most folks get into the office, grab some coffee and look at my calendar to see how many meetings I have. I then pound out some code or documentation till about 9am before the meetings start happening. Typically I will have 1-2 phone screens or on-site interviews a day, 1-2 meetings with sister and cousin teams a day regarding system integrations, in between said meetings try to write a line or two of code and hopefully sometime during the day try to remember to have some lunch. I do my best to catch the 5:15pm water taxi to West Seattle where I live. Have dinner with my kids and wife, put my kids to bed and then if I have any energy left write some more code before I head to bed. Rinse, repeat…

What gets you excited about working about working at zulily?

In a word, impact. It is very rare that one gets to work at a place where the requirement is to scale systems by orders of magnitude in hopes of keeping up with the demands of the business. ¬†I would categorize working in zulily tech as “extreme engineering” with very high highs and very low lows. ¬†It is thrilling to be able to triage, debug and resurrect a system that is cratering, or deploy subtle changes to systems that almost immediately start generating more revenue and see it happen on a pretty splunk graph.

In another word, trust. There are not many places where an engineer would be allowed to have the impact described above without backbreaking amounts of process and oversight.

Seattle Scalability Meetup @ zulily: Google, Hortonworks, zulily

We are looking forward to meeting everyone attending the scalability meetup at our office. It is going to be a great event with a good overview of how zulily leverages big data and a deep dive into Google Big Query & Apache Optiq in Hive.

Agenda

Topic:  Building zulily’s Data Platform using Hadoop and Google Biq Query

Speakers: Sudhir Hasbe is Director of big data, data services and BI at zulily. (https://www.linkedin.com/in/shasbe). Also Paul Newson (https://www.linkedin.com/profile/view?id=971812 )

Abstract: zulily, with 4.1 million customers and projected 2014 revenues of over 1 billion dollars, is one of the largest e-commerce companies in the U.S. ‚ÄúData-driven decision making‚ÄĚ is part of our DNA. Growth in the business has triggered exponential growth in data, which¬†required us to redesign our data platform. The zulily data platform is the backbone for all analytics and reporting, along with being the backbone of our data service APIs consumed¬†by various teams¬†in the organization. This session provides a technical deep dive into our data platform and shares key learnings, including our decision to build a Hadoop cluster in the cloud.

Topic: Delivering personalization and recommendations using Hadoop in cloud

Speakers: Steve Reed is a principal engineer at zulily, the author of dropship, and former Geek of the Week. Dylan Carney is a senior software engineer at zulily. They both work on personalization, recommendations and improving your shopping experience.

Abstract: Working on personalization and recommendations at zulily, we have come to lean heavily on on-premise Hadoop clusters to get real work done. Hadoop is a robust and fascinating system, with a myriad of knobs to turn and settings to tune.¬† Knowing the ins and outs of obscure Hadoop properties is crucial for the health and performance of your hadoop cluster. (To wit: How big is your fsimage? Is your secondary namenode daemon running? Did you know it’s not really a secondary namenode at all?)

But what if it didn’t have to be this way? Google Compute Engine (GCE) and other cloud platforms make promises of easier, faster and easier-to-maintain Hadoop installations. Join us as we describe¬†learning¬†from our years of Hadoop¬†use, and give an overview of what we’ve been able to adapt, learn and unlearn¬†while moving to GCE.

Topic: Apache Optiq in Hive

Speaker: Julian Hyde, Principal, Hortonworks

Abstract: Tez is making Hive faster, and now cost-based optimization (CBO) is making it smarter. A new initiative in Hive introduces cost-based optimization for the first time, based on the Optiq framework. Optiq’s lead developer Julian Hyde shows the improvements that CBO is bringing to Hive. For those interested in Hive internals, he gives an overview of the Optiq framework and shows some of the improvements that are coming to future versions of Hive.

Our format is flexible: We usually have 2 speakers who talk for ~30 minutes each and then do Q+A plus discussion (about 45 minutes each talk) finish by 8:45.

There will be beer afterwards, of course!

After-beer Location:

Paddy Coyne’s: ¬†http://www.paddycoynes.com/

Doors open 30 minutes ahead of show-time. 

Optimizing memory consumption of Radix Trees in Java

On the Relevancy team at zulily, we are often required to load a large number of large strings into memory. This often causes memory issues. After looking at multiple ways to reduce memory pressure, we settled on Radix Trees to store these strings. Radix Trees provide very fast prefix searching and are great for auto-complete services and similar uses. This post focuses entirely on memory consumption.

What Is A Radix Tree?

Radix Trees take sequences of data and organize them in a tree structure.¬†Strings with common prefixes end up sharing nodes toward the top of¬†this structure, which¬†is how memory savings is¬†realized.¬†Consider the following¬†example, where we store “antidisestablishmentarian” and “antidisestablishmentarianism” in a Radix Tree:

+- antidisestablishmentarian (node 1)
                           +- ism (node 2)

Two strings, totaling 53 characters, can be stored as two nodes in a tree. The first node stores the common prefix (25 characters) between it and its children. The second stores the rest (3 characters). In terms of character data stored, the Radix Tree stores the same information in approximately 53% of the space (not counting the additional overhead introduced by the tree structure itself).

If you add the string “antibacterial” to the tree, you need to break apart node 1 and shuffle things around. You end with:

+- anti                             (node 3)
      |- disestablishmentarian      (node 4)
      |                      +- ism (node 2)
      +- bacterial                  (node 5)


Real-World Performance

We run a lot of software in the JVM, where memory performance can be tricky to measure. In order to validate our Radix Tree implementation and measure the impact, I pumped a bunch of pseudo-realistic data into various collections and captured memory snapshots with YourKit Java Profiler.

Input Data

It didn’t take long to hack together some real-looking data¬†in Ruby with Faker. I created four input files of approximately 1,000,000 strings that included a random selection of 12-digit numbers, bitcoin addresses, email addresses and ISBNs.

sreed:src/ $ head zulily-oss/radix-tree/12-digit-numbers.txt
141273396879
414492487489
353513537462
511391464467
633249176834
347155664352
632411507158
752672544343
483117282483
211673267195

sreed:src/ $ head zulily-oss/radix-tree/bitcoins.txt
1Mp85mezCtBXZDVHGSTn3NYZuriwRMmW6D
1N8ziuitNLmSnaXy2psYpLcXvugHw1Yc5s
18DnruBzLHmnVHQhDghoa6eDt6sDkfuWKr
1A3sRfAnP89HE4RgNQARa3kCq4xFEF9eev
12WR4DrsR4mM8gDHZCuqXe2h37VUSUPSNu
1PRmYuevwZXZamBEgANzLXe2SjFneGDsXp
1EpjPwt8Ap47XA6HwJhCTxUZRDH11GKWuQ
1P8MAgobhLw4FYcFHbw7a8t2FvQZg8K597
15xhiiLdkin8zi6S5KL9DkDDQyvLb1pjjT
1NPEZeEjgGu5TYdz5d3kxjVfLwxAZ2fK6f

sreed:src/ $ head zulily-oss/radix-tree/emails.txt
jakayla.hoppe@krajcikpollich.info
abbey.goodwin@tromp.org
laney.dach@walkerlubowitz.biz
rosanna_towne@marks.name
sherwood@oberbrunnerauer.name
mohamed_rice@champlin.com
margaret_kirlin@greenfeldercasper.net
vince@funk.net
leora_ohara@hackett.biz
audra.hermann@bauch.org

sreed:src/ $ head zulily-oss/radix-tree/isbns.txt
216962073-7
640524955-7
955360834-5
429656067-0
605437693-4
204030847-4
037410069-1
239193083-6
182539755-4
034988227-4

Measuring Memory with YourKit

YourKit provides a measurement of “retained size” in its memory snapshots which¬†is helpful when trying to understand how your code is impacting the heap. What isn’t necessarily intuitive about it, though, is what objects it¬†excludes from this “retained size” measurement. Their documentation is very helpful here: only object references that are¬†exclusively held by the object you’re measuring will be included. Instead of telling you “this is how much memory usage your object imposes on the VM,” retained size instead tells you “this is how much memory the VM would be able to garbage-collect if it were gone.” This is a subtle, but very real, difference if you wish to optimize memory consumption.

Thus, my memory testing needed to ensure that each collection held complete copies of the objects I wished to measure. In this case, each string key needed to be duplicated (I decided to intern and share every value I stored in order to measure only the memory gains from different key storage techniques).

// Results in shared reference, and inaccurate measurement
map1.put(key, value);
map2.put(key, value);

// Results in shared char[] reference, and better but
// still inaccurate measurement
map1.put(new String(key), value);
map2.put(new String(key), value);

// Results in complete copy of keys, and accurate measurement
map1.put(new String(key.toCharArray()), value);
map2.put(new String(key.toCharArray()), value);

Collections Tested

I tested our own Radix Tree implementation, ConcurrentRadixTree from¬†https://code.google.com/p/concurrent-trees/, a string array, Guava‘s ImmutableMap and Java’s HashMap, TreeMap, Hashtable and LinkedHashMap.¬†Each collection stored the same values for each key.

Both zulily’s Radix Tree and the ConcurrentRadixTree from concurrent-trees were configured to store string data as UTF-8-encoded byte arrays.

ConcurrentRadixTree was included simply to ensure that our own version (to be open-sourced soon) was worth the effort. The others were measured simply to highlight the benefits of Radix Tree storage for different input types. Each collection has its own merits and in most ways they are all superior to the Radix Tree for storage (put/get performance, concurrency and other features).

Results

radix-tree-memory-2

First of all, Guava’s ImmutableMap is pretty good. It stored the same key and value data as java.util.HashMap in¬†92-95% of the space. The Radix Tree breaks keys¬†into byte array sequences and stores them in a tree structure based on common prefixes. This resulted in a best case of 62% the size of the ImmutableMap for bitcoin addresses (strings which have many common prefixes) and a worst case 88% for random 12-digit numbers. We¬†see that the memory used by¬†this data structure is largely dependent on the type of data put into it. Large strings with many large common prefixes are¬†stored very efficiently in a¬†narrow tree structure. Unique strings create a lot of branches in¬†the underlying tree, making it very wide and adding a lot of overhead.

Converting Java Strings to byte arrays accounts for most of the memory savings, but not all. Byte array storage was anywhere from 90% (bitcoin addresses) to 99% (ISBNs) in the tests I ran.

For us, storing byte-encoded representations of string data in a radix tree allowed us to reclaim valuable memory in our services. However it wasn’t until validating the implementation in an accurate manner¬†with realistic data and trustworthy¬†tools that we rested easy knowing we had set out what we wished to accomplish.

welcome to the zulily engineering blog!

It has been just over four and a half years now since Darrell and Mark (our two co-founders) came up with the original idea for¬†zulily. ¬†And from the beginning we’ve focused on building software to power¬†a new way of shopping online. ¬†We call it discovery-based shopping.

Here at zulily our tech team is at the core of the business and involved in the entire life-cycle of¬†both our vendors and our customers. ¬†Whether it’s building internal tools for our merchandizing and studio teams, launching new features on our vendor portal or vendor data exchange or delivering a new personalized experience on our mobile or site experience, we are always focused on challenging ourselves to build world-class solutions which exceed expectations.

We are a build shop and big supporters of the open source community.  We believe in the power of the community and feel we have an obligation to give back to the projects that have helped us get to where we are today.  As we continue our transition from small, frenetic start-up, expect to see us continue to be more active in the community.

At our core we have 10 values we try to live by on a daily basis. ¬†These have served us well over the past 4+ years as we’ve tried new things and experienced major wins… and a number of “well, that was a bad idea” moments.

  1. “No” is not in our vocabulary ‚ÄĒ we strive to find creative solutions and the path to “yeah, we’ll give it a go”.
  2. We believe in speed of innovation and taking agile development to the extreme.
  3. We embrace a customer-centric view to delivering technology solutions ‚ÄĒ always¬†start with the customer.
  4. Mistakes are expected and encouraged ‚ÄĒ we learn from them and move on.
  5. We empower our engineers to solve business problems and tailor our process accordingly.
  6. Engineers write production code and own it from start to finish.
  7. We are defensive in nature: we assume things will break and plan for it.
  8. We believe in “just-in-time” software with an eye towards capacity and scalability.
  9. We value full transparency and continuous communication.
  10. We strive to find the simple solution in anything we do.

In the end we’re all about building an amazing team, passionate about building awesome software and technology solutions. ¬†We love to move fast and take risks. ¬†And we’re big believers in the idea of continuous improvement.

Thanks for taking a few minutes out of your busy day to read our tech blog.   We hope you enjoy it!

Luke