Archive

Archive for the ‘Programming’ Category

The Hugging Will Continue Until Morale Improves

December 4th, 2015 No comments

I saw in today’s news that Apple open sourced their Swift language. One of the most influential companies in the world explicitly adopting an open source model – that’s great! I’m a believer. One of the big reasons we founded Discourse was to build an open source solution that anyone, anywhere could use and safely build upon.

It’s not that Unix won — just that closed source lost. Big time.

— Jeff Atwood (@codinghorror) July 1, 2015

People were also encouraged that Apple was so refreshingly open about this whole process and involving the larger community in the process. They even hired from the community, which is something I always encourage companies to do.

Also, not many people were, shall we say … fans … of Objective C as a language. There was a lot of community interest in having another viable modern language to write iOS apps in, and to Apple’s credit, they produced Swift, and even promised to open source it by the end of the year. And they delivered, in a deliberate, thoughtful way. (Did I mention that they use CommonMark? That’s kind of awesome, too.)

One of my heroes, Miguel de Icaza, happens to have lots of life experience in open sourcing things that were not exactly open source to start with. He applauded the move, and even made a small change to his Mono project in tribute:

When Swift was open sourced today, I saw they had a Code of Conduct. We had to follow suit, Mono has adopted it: https://t.co/hVO3KL1Dn5

— Miguel de Icaza (@migueldeicaza) December 4, 2015

Which I also thought was kinda cool.

It surprises me that anyone could ever object to the mere presence of a code of conduct at, say, a conference. But even the conference organizers object, sometimes.

  • A weak Code of Conduct is a placebo label saying a conference is safe, without actually ensuring it’s safe.

  • Absence of a Code of Conduct does not mean that the organizers will provide an unsafe conference.

  • Creating safety is not the same as creating a feeling of safety.

  • Things organizers can do to make events safer: Restructure parties to reduce unsafe intoxication-induced behavior; work with speakers in advance to minimize potentially offensive material; and provide very attentive, mindful customer service consistently through the attendee experience.

  • Creating a safe conference is more expensive than just publishing a Code of Conduct to the event, but has a better chance of making the event safe.

  • Safe conferences are the outcome of a deliberate design effort.

I have to say, I don’t understand this at all. Even if you do believe these things, why would you say them out loud? What possible positive outcome could result from you saying them? It’s a textbook case of honesty not always being the best policy. If this is all you’ve got, just say nothing, or wave people off with platitudes, like politicians do. And if you’re Jared Spool, notable and famous within your field, it’s even worse – what does this say to everyone else working in your field?

Mr. Spool’s central premise is this:

Creating safety is not the same as creating a feeling of safety.

Which, actually … isn’t true, and runs counter to everything I know about empathy. If you’ve ever watched It’s Not About the Nail, you’ll understand that a feeling of safety is, in fact, what many people are looking for. It’s not the whole story by any means, but it’s a very important starting point. An anchor.

People understand you cannot possibly protect them from every single possible negative outcome at a conference. That’s absurd. But they also want to hear you stand up for them, and say out loud that, yes, these are the things we believe in. This is what we know to be true. We will look out for each other.

I also had a direct flashback to Deborah Tannen’s groundbreaking You Just Don’t Understand, in which you learn that men are all about fixing the problem, so much so that they rush headlong into any remotely plausible solution, without stopping along the way to actually listen and appreciate the depth of the problem, which maybe … can’t really even be fixed?

If women are often frustrated because men do not respond to their troubles by offering matching troubles, men are often frustrated because women do… he feels she is trying to take something away from him by denying the uniqueness of his experience… [And,] if women resent men’s tendency to offer solutions to problems, men complain about women’s refusal to take action to solve the problems they complain about.

Since many men see themselves as problem solvers, a complaint or a trouble is a challenge…. Trying to solve a problem or fix a trouble focuses on the message level. But for most women who habitually report problems at work or in friendships, the message is not the main point… [as] trouble talk is intended to reinforce rapport by sending the metamessage “We’re the same; you’re not alone.”

Women are frustrated when they not only don’t get this reinforcement but, quite the opposite, feel distanced by the advice, which seems to send the metamessage “We’re not the same. You have the problems; I have the solutions.”

Having children really underscored this point for me. The quickest way to turn a child’s frustration into a screaming, explosive tantrum is to try to fix their problem for them. This is such a hard thing for engineers to wrap their heads around, particularly male engineers, because we are all about fixing the problems. That’s what we do, right? That’s why we exist? We fix problems?

I once wrote this in reply to an Imgur discussion topic about navigating an “emotionally charged sitation”:

Oh, you want a master class in dealing with emotionally charged situations? Well, why didn’t you just say so?

Have kids. Within a few years you will learn to be an expert in dealing with this kind of stuff, because what nobody tells you about having kids is that for the first ~5 years, they are constantly. freaking. the. f**k. out.

46 Reasons My Three Year Old Might Be Freaking Out

If this seems weird to you, or like some kind of made up exaggerated hilarious absurd brand of humor, oh trust me. It’s not. Real talk. This is actually how it is.

In their defense, it’s not their fault: they’ve never felt fear, anger, hunger, jealousy, love, or any of the dozen other incredibly complex emotions you and I deal with on a daily basis. So they learn. But along the way, there will be many many many manymanymanymany freakouts. And guess who’s there to help them navigate said freakouts?

You are.

What works is surprisingly simple:

  • Be there.
  • Listen.
  • Empathize, hug, and echo back to them. Don’t try to solve their problems! DO NOT DO IT! Paradoxically, this only makes it way worse if you do. Let them work through the problem on their own. They always will – and knowing someone trusts you enough to figure our your own problems is a major psychological boost.

You gotta lick your rats, man.

(protip: this works identically on adults and kids. Turns out most so-called adults aren’t fully grown up. Who knew?)

I guess my point is that rats aren’t so different from people. We all want the same thing. Comfort from someone who can tell us that the world is safe, the world is not out to get you, that bad things can (and might) happen to you but you’ll still be OK because we will help you. We’re all in this thing together, you’re a human being much like myself and we love you.

That’s why a visible, public code of conduct is a good idea, not only at an in-person conference, but also on a software project like Swift, or Mono. But programmers being programmers – because they spend all day every day mired in the crazy world of infinitely recursive rules from their OS, from their programming language, from their APIs, from their tools – are rules lawyers par excellence. Nobody on planet Earth is better at arguing to the death over a set of complete arbitrary, made up rules than the average programmer.

So I knew in my heart of hearts that someone – and by someone I mean a programmer – would inevitably complain about the fact that Mono had added a code of conduct. So I made a programmer joke.

@migueldeicaza I find these rules offensive and will be fining a complaint

— Jeff Atwood (@codinghorror) December 4, 2015

This is the second time in as many days that I made what I thought was an obvious joke on Twitter that was interpreted seriously.

When someone starts at Discourse, I have the talk with them. “You remember your family? Forget them. Look at me. *We* are your family now.”

— Jeff Atwood (@codinghorror) December 2, 2015

OK, maybe sometimes my Twitter jokes aren’t very good. Well, you know, that’s just, like … your opinion, man. I should probably switch from Twitter to Myspace or Ello or Google Plus or Snapchat or something.

But it bothered me that people, any people, would think I actually asked new hires to put the company above their family.* Or that I didn’t believe in a code of conduct. I guess some of that comes from having ~200k followers; once your audience gets big enough, Poe’s Law becomes inevitable?

Anyway, I wanted to say I’m sorry. And I’m particularly sorry that eevee, who wrote that awesome PHP is a Fractal of Bad Design article that I once riffed on, thought I was serious, or even worse, that my joke was in bad taste. Even though the negative article about Discourse eevee wrote did kinda hurt my feelings.

@samsaffron @JakubJirutka programmers should not have feelings that is a liability

— Jeff Atwood (@codinghorror) October 2, 2015

I know we have our differences, but if we as programmers can’t come together through our collective shared horror over PHP, the Nickelback of programming languages, then clearly I have failed.

To show that I absolutely do believe in the value of a code of conduct, even as public statements of intent that we may not completely live up to, even if we’ve never had any incidents or problems that would require formal statements – I’m also adding a code of conduct as defined by contributor-covenant.org to the Discourse project. We’re all in this open source thing together, you’re a human being very much like us, and we vow to treat you with the same respect we’d want you to treat us. This should not be controversial. It should be common. And saying so matters.

If you maintain an open source project, I strongly urge you to consider formally adopting a code of conduct, too.

@codinghorror hugs!

— Miguel de Icaza (@migueldeicaza) December 4, 2015

The hugging will continue until morale improves.

* That’s only required of co-founders

[advertisement] Building out your tech team? Stack Overflow Careers helps you hire from the largest community for programmers on the planet. We built our site with developers like you in mind.
Categories: Others, Programming Tags:

The 2016 HTPC Build

November 30th, 2015 No comments

I’ve loved many computers in my life, but my HTPC has always had a special place in my heart. It’s the only always-on workhorse computer in our house, it is utterly silent, totally reliable, sips power, and it’s at the center of our home entertainment, networking, storage, and gaming. This handy box does it all, 24/7.

I love this little machine to death; it’s always been there for me and my family. The march of improvements in my HTPC build over the years lets me look back and see how far the old beige box PC has come in the last decade since I’ve been blogging:

2005 ~$1000 512 MB RAM, single core CPU 80 watts idle
2008 ~$520 2 GB RAM, dual core CPU 45 watts idle
2011 ~$420 4 GB RAM, dual core CPU + GPU 22 watts idle
2013 ~$300 8 GB RAM, dual core CPU + GPU×2 15 watts idle
2016 ~$320 8 GB RAM, dual core CPU + GPU×4 10 watts idle

As expected, the per-thread performance increase from 2013’s Haswell CPU to 2016’s Skylake CPU is modest – 20 percent at best, and that might be rounding up. About all you can do is slap more cores in there, to very limited benefit in most applications. The 6100T I chose is dual-core plus hyperthreading, which I consider the sweet spot, but there are some other Skylake 6000 series variants at the same 35w power envelope which offer true quad-core, or quad-core plus hyperthreading – and, inevitably, a slightly lower base clock rate. So it goes.

The real story is how power consumption was reduced another 33 percent. Here’s what I measured with my trusty kill-a-watt:

  • 10w idle with display off
  • 11w idle with display on
  • 13w active standard netflix (720p?) movie playback
  • 14w multiple torrents, display off
  • 15w 1080p video playback in MPC-HC x64
  • 40w Lego Batman 3 high detail 720p gameplay
  • 56w Prime95 full CPU load + Rthdribl full GPU load

These are impressive numbers, much better than I expected. Maybe part of it is the latest Windows 10 update which supports the new Speed Shift technology in Skylake. Speed Shift hands over CPU clockspeed control to the CPU itself, so it can ramp its internal clock up and down dramatically faster than the OS could. A Skylake CPU, with the right OS support, gets up to speed and back to idle faster, resulting in better performance and less overall power draw.

Skylake’s on-board HD 530 graphics is about twice as fast as the HD 4400 that it replaces. Haswell offered the first reasonable big screen gaming GPU on an Intel CPU, but only just. 720p was mostly attainable in older games with the HD 4400, but I sometimes had to drop to medium detail settings, or lower. Two generations on, with the HD 530, even recent games like GRID Autosport, Lego Jurassic Park and so on can now be played at 720p with high detail settings at consistently high framerates. It depends on the game, but a few can even be played at 1080p now with medium settings. I did have at least one saved benchmark result on the disk to compare with:

GRID 2, 1280×720, high detail defaults
Max Min Avg
i3-4130T, Intel HD 4400 GPU 32 21 27
i3-6100T, Intel HD 530 GPU 50 32 39

Skylake is a legitimate gaming system on a chip, provided you are OK with 720p. It’s tremendous fun to play Lego Batman 3 with my son.

At 720p using high detail settings, where there used to be many instances of notable slowdown, particularly in co-op, it now feels very smooth throughout. And since games are much cheaper on PC than consoles, particularly through Steam, we have access to a complete range of gaming options from new to old, from indie to mainstream – and an enormous, inexpensive back catalog.

Of course, this is still far from the performance you’d get out of a $300 video card or a $300 console. You’ll never be able to play a cutting edge, high end game like GTA V or Witcher 3 on this HTPC box. But you may not need to. Steam in-home streaming has truly come into its own in the last year. I tried streaming Batman: Arkham Knight from my beefy home office computer to the HTPC at 1080p, and I was surprised to discover just how effortless it was – nor could I detect any visual artifacts or input latency.

It’s super easy to set up – just have the Steam client running on both machines at a logged in Windows desktop (can’t be on the lock screen), and press the Stream button on any game that you don’t have installed locally. Be careful with WiFi when streaming high resolutions, obviously, but if you’re on a wired network, I found the experience is nearly identical to playing the game locally. As long as the game has native console / controller support, like Arkham Knight and Fallout 4, streaming to the big screen works great. Try it! That’s how Henry and I are going to play through Just Cause 3 this Tuesday and I can’t wait.

As before in 2013, I only upgraded the guts of the system, so the incremental cost is low.

That’s a total of $321 for this upgrade cycle, about the cost of a new Xbox One or PS4. The i3-6100T should be a bit cheaper; according to Intel it has the same list price as the i3-6100, but suffers from weak availability. The motherboard I chose is a little more expensive, too, perhaps because it includes extras like built in WiFi and M.2 support, although I’m not using either quite yet. You might be able to source a cheaper H170 motherboard than mine.

The rest of the system has not changed much since 2013:

Populate these items to taste, pick whatever drives and mini-ITX case you prefer, but definitely stick with the PicoPSU, because removing the large, traditional case power supply makes the setup both a) much more power efficient at low wattage, and b) much roomier inside the case and easier to install, upgrade, and maintain.

I also switched to Xbox One controllers, for no really good reason other than the Xbox 360 is getting more obsolete every month, and now that my beloved Rock Band 4 is available on next-gen systems, I’m trying to slowly evict the 360s from my house.

The Windows 10 wireless Xbox One adapter does have some perks. In addition to working with the newer and slightly nicer gamepads from the Xbox One, it supports an audio stream over each controller via the controller’s headset connector. But really, for the purposes of Steam gaming, any USB controller will do.

While I’ve been over the moon in love with my HTPC for years, and I liked the Xbox 360, I have been thoroughly unimpressed with my newly purchased Xbox One. Both the new and old UIs are hard to use, it’s quite slow relative to my very snappy HTPC, and it has a ton of useless features that I don’t care about, like broadcast TV support. About all the Xbox One lets you do is sometimes play next gen games at 1080p without paying $200 or $300 for a fancy video card, and let’s face it – the PS4 does that slightly better. If those same games are available on PC, you’ll have a better experience streaming them from a gaming PC to either a cheap Steam streaming box, or a generalist HTPC like this one.

The Xbox One and PS4 are effectively plain old PCs, built on:

  • Intel Atom class (aka slow) AMD 8-core x86 CPU
  • 8 GB RAM
  • AMD Radeon 77xx / 78xx GPUs
  • cheap commodity 512GB or 1TB hard drives (not SSDs)

The golden age of x86 gaming is well upon us. That’s why the future of PC gaming is looking brighter every day. We can see it coming true in the solid GPU and idle power improvements in Skylake, riding the inevitable wave of x86 becoming the dominant kind of (non mobile, anyway) gaming for the forseeable future.

[advertisement] At Stack Overflow, we help developers learn, share, and grow. Whether you’re looking for your next dream job or looking to build out your team, we’ve got your back.
Categories: Others, Programming Tags:

Testing with Data

November 20th, 2015 No comments

It’s not a coincidence that this is coming off the heels of Dave Paquette’s post on GenFu and Simon Timms’ post on source control for databases in the same way it was probably not a coincidence that Hollywood released three body-swapping movies in the 1987-1988 period (four if you include Big).

I was asked recently for some advice on generating data for use with integration and UI tests. I already have some ideas but asked the rest of the Western Devs for some elucidation. My tl;dr version is the same as what I mentioned in our discussion on UI testing: it’s hard. But manageable. Probably.

The solution needs to balance a few factors:

  • Each test must start from a predictable state
  • Creating that predictable state should be fast as possible
  • Developers should be able to figure out what is going on by reading the test

The two options we discussed both assume the first factor to be immutable. That means you either clean up after yourself when the test is finished or you wipe out the database and start from scratch with each test. Cleaning up after yourself might be faster but has more moving parts. Cleaning up might mean different things depending on which step you’re in if the test fails.

So given that we will likely re-create the database from scratch before each and every test, there are two options. My current favourite solution is a hybrid of the two.

Maintain a database of known data

In this option, you have a pre-configured database. Maybe it’s a SQL Server .bak file that you restore before each test. Maybe it’s a GenerateDatabase method that you execute. I’ve done the latter on a Google App Engine project, and it works reasonably well from an implementation perspective. We had a class for each domain aggregate and used dependency injection. So adding a new test customer to accommodate a new scenario was fairly simple. There are a number of other ways you can do it, some of which Simon touched on in his post.

We also had it set up so that we could create only the customer we needed for that particular test if we needed to. That way, we could use a step likeGiven I'm logged into 'Christmas Town' and it would set up only that data.

There are some drawbacks to this approach. You still need to create a new class for a new customer if you need to do something out of the ordinary. And if you need to do something only slightly out of the ordinary, there’s a strong tendency to use an existing customer and tweak its data ever so slightly to fit your test’s needs, other tests be damned. With these tests falling firmly in the long-running category, you don’t always find out the effects of this until much later.

Another drawback: it’s not obvious in the test exactly what data you need for that specific test. You can accommodate this somewhat just with a naming convention. For example,Given I'm logged into a company from India, if you’re testing how the app works with rupees. But that’s not always practical. Which leads us to the second option.

Create an API to set up the data the way you want

Here, your API contains steps to fully configure your database exactly the way you want. For example:

Given I have a company named "Christmas Town" owned by "Jack Skellington"
And I have 5 product categories
And I have 30 products
And I have a customer
...

You can probably see the major drawback already. This can become very verbose. But on the other hand, you have the advantage of seeing exactly what data is included which is helpful when debugging. If your test data is wrong, you don’t need to go mucking about in your source code to fix it. Just update the test and you’re done.

Also note the lack of specifics in the steps. Whenever possible, I like to be very vague when setting up my test data. If you have a good framework for generating test data, this isn’t hard to do. And it helps uncover issues you may not account for using hard-coded data (as anyone named D’Arcy O’Toole can probably tell you).


Loading up your data with a granular API isn’t realistic which is why I like the hybrid solution. By default, you pre-load your database with some common data, like lookup tables with lists of countries, currencies, product categories, etc. Stuff that needs to be in place for the majority of your tests.

After that, your API doesn’t need to be that granular. You can use something likeGiven I have a basic company which will create the company, add an owner and maybe some products and use that to test the process for creating an order. Under the hood, it will probably use the specific steps.

One reason I like this approach: it hides only the details you don’t care about. When you sayGiven I have a basic company and I change the name to "Rick's Place", that tells me, “I don’t care how the company is set up but the company name is important”. Very useful to help narrow the focus of the test when you’re reading it.

This approach will understandably lead to a whole bunch of different methods for creating data of various sizes and coarseness. And for that you’ll need to…

Maintain test data

Regardless of your method, maintaining your test data will require constant vigilance. In my experience, there is a tremendous urge to take shortcuts when it comes to test data. You’ll re-use a test company that doesn’t quite fit your scenario. You’ll alter your test to fit the data rather than the other way around. You’ll duplicate a data setup step because your API isn’t discoverable.

Make no mistake, maintaining test data is work. It should be treated with the same respect and care as the rest of your code. Possibly more so since the underlying code (in whatever form it takes) technically won’t be tested. Shortcuts and bad practices should not be tolerated and let go because “it’s just test data”. Fight the urge to let things slide. Call it out as soon as you see it. Refactor mercilessly once you see opportunities to do so.

Don’t be afraid to flip over a table or two to get your point across.

– Kyle the Unmaintainable

Categories: Others, Programming Tags:

To ECC or Not To ECC

November 19th, 2015 No comments

On one of my visits to the Computer History Museum – and by the way this is an absolute must-visit place if you are ever in the San Francisco bay area – I saw an early Google server rack circa 1999 in the exhibits.

Not too fancy, right? Maybe even … a little janky? This is building a computer the Google way:

Instead of buying whatever pre-built rack-mount servers Dell, Compaq, and IBM were selling at the time, Google opted to hand-build their server infrastructure themselves. The sagging motherboards and hard drives are literally propped in place on handmade plywood platforms. The power switches are crudely mounted in front, the network cables draped along each side. The poorly routed power connectors snake their way back to generic PC power supplies in the rear.

Some people might look at these early Google servers and see an amateurish fire hazard. Not me. I see a prescient understanding of how inexpensive commodity hardware would shape today’s internet. I felt right at home when I saw this server; it’s exactly what I would have done in the same circumstances. This rack is a perfect example of the commodity x86 market D.I.Y. ethic at work: if you want it done right, and done inexpensively, you build it yourself.

This rack is now immortalized in the National Museum of American History. Urs Hölzle posted lots more juicy behind the scenes details, including the exact specifications:

  • Supermicro P6SMB motherboard
  • 256MB PC100 memory
  • Pentium II 400 CPU
  • IBM Deskstar 22GB hard drives (×2)
  • Intel 10/100 network card

When I left Stack Exchange (sorry, Stack Overflow) one of the things that excited me most was embarking on a new project using 100% open source tools. That project is, of course, Discourse.

Inspired by Google and their use of cheap, commodity x86 hardware to scale on top of the open source Linux OS, I also built our own servers. When I get stressed out, when I feel the world weighing heavy on my shoulders and I don’t know where to turn … I build servers. It’s therapeutic.

I like to give servers a little pep talk while I build them. “Who’s the best server! Who’s the fastest server!”

— Jeff Atwood (@codinghorror) November 16, 2015

Don’t judge me, man.

But more seriously, with the release of Intel’s latest Skylake architecture, it’s finally time to upgrade our 2013 era Discourse servers to the latest and greatest, something reflective of 2016 – which means building even more servers.

Discourse runs on a Ruby stack and one thing we learned early on is that Ruby demands exceptional single threaded performance, aka, a CPU running as fast as possible. Throwing umptazillion CPU cores at Ruby doesn’t buy you a whole lot other than being able to handle more requests at the same time. Which is nice, but doesn’t get you speed per se. Someone made a helpful technical video to illustrate exactly how this all works:

This is by no means exclusive to Ruby; other languages like JavaScript and Python also share this trait. And Discourse itself is a JavaScript application delivered through the browser, which exercises the mobile / laptop/ desktop client CPU in a big way. Mobile devices reaching near-parity with desktop performance in single threaded performance is something we’re betting on in a big way with Discourse.

So, good news! Although PC performance has been incremental at best in the last 5 years, between Haswell and Skylake, Intel managed to deliver a respectable per-thread performance bump. Since we are upgrading our servers from Ivy Bridge (very similar to the the i7-3770k), the generation before Haswell, I’d expect a solid 33% performance improvement at minimum.

Even worse, the more cores they pack on a chip, the slower they all go. From Intel’s current Xeon E5 lineup:

  • E5-1680 ? 8 cores, 3.2 Ghz
  • E5-1650 ? 6 cores, 3.5 Ghz
  • E5-1630 ? 4 cores, 3.7 Ghz

Which brings me to the following build for our core web tiers, which optimizes for “lots of inexpensive, fast boxes”

2013 2016
Xeon E3-1280 V2 Ivy Bridge 3.6 Ghz / 4.0 Ghz quad-core ($640)

SuperMicro X9SCM-F-O mobo ($190)

32 GB DDR3-1600 ECC ($292)

SC111LT-330CB 1U chassis ($200)

Samsung 830 512GB SSD ×2 ($1080)

1U Heatsink ($25)
i7-6700k Skylake 4.0 Ghz / 4.2 Ghz quad-core ($370)

SuperMicro X11SSZ-QF-O mobo ($230)

64 GB DDR4-2133 ($520)

CSE-111LT-330CB 1U chassis ($215)

Samsung 850 Pro 1TB SSD ×2 ($886)

1U Heatsink ($20)
$2,427 $2,241
31w idle, 87w BurnP6 load 14w idle, 81w BurnP6 load

So, about 10% cheaper than what we spent in 2013, with 2× the memory, 2× the storage (probably 50-100% faster too), and at least ~33% faster CPU. With lower power draw, to boot! Pretty good. Pretty, pretty, pretty, pretty good.

(Note that the memory bump is only possible thanks to Intel finally relaxing their iron fist of maximum allowed RAM at the low end; that’s new to the Skylake generation.)

One thing is conspicuously missing in our 2016 build: Xeons, and ECC Ram. In my defense, this isn’t intentional – we wanted the fastest per-thread performance and no Intel Xeon, either currently available or announced, goes to 4.0 GHz with Skylake. Paying half the price for a CPU with better per-thread performance than any Xeon, well, I’m not going to kid you, that’s kind of a nice perk too. So what is ECC all about?

Error-correcting code memory (ECC memory) is a type of computer data storage that can detect and correct the most common kinds of internal data corruption. ECC memory is used in most computers where data corruption cannot be tolerated under any circumstances, such as for scientific or financial computing.

Typically, ECC memory maintains a memory system immune to single-bit errors: the data that is read from each word is always the same as the data that had been written to it, even if one or more bits actually stored have been flipped to the wrong state. Most non-ECC memory cannot detect errors although some non-ECC memory with parity support allows detection but not correction.

It’s received wisdom in the sysadmin community that you always build servers with ECC RAM because, well, you build servers to be reliable, right? Why would anyone intentionally build a server that isn’t reliable? Are you crazy, man? Well, looking at that cobbled together Google 1999 server rack, which also utterly lacked any form of ECC RAM, I’m inclined to think that reliability measured by “lots of redundant boxes” is more worthwhile and easier to achieve than the platonic ideal of making every individual server bulletproof.

Being the type of guy who likes to question stuff… I began to question. Why is it that ECC is so essential anyway? If ECC was so important, so critical to the reliable function of computers, why isn’t it built in to every desktop, laptop, and smartphone in the world by now? Why is it optional? This smells awfully… enterprisey to me.

Now, before everyone stops reading and I get permanently branded as “that crazy guy who hates ECC”, I think ECC RAM is fine:

  • The cost difference between ECC and not-ECC is minimal these days.
  • The performance difference between ECC and not-ECC is minimal these days.
  • Even if ECC only protects you from rare 1% hardware error cases that you may never hit until you literally build hundreds or thousands of servers, it’s cheap insurance.

I am not anti-insurance, nor am I anti-ECC. But I do seriously question whether ECC is as operationally critical as we have been led to believe, and I think the data shows modern, non-ECC RAM is already extremely reliable.

First, let’s look at the Puget Systems reliability stats. These guys build lots of commodity x86 gamer PCs, burn them in, and ship them. They helpfully track statistics on how many parts fail either from burn-in or later in customer use. Go ahead and read through the stats.

For the last two years, CPU reliability has dramatically improved. What is interesting is that this lines up with the launch of the Intel Haswell CPUs which was when the CPU voltage regulation was moved from the motherboard to the CPU itself. At the time we theorized that this should raise CPU failure rates (since there are more components on the CPU to break) but the data shows that it has actually increased reliability instead.

Even though DDR4 is very new, reliability so far has been excellent. Where DDR3 desktop RAM had an overall failure rate in 2014 of ~0.6%, DDR4 desktop RAM had absolutely no failures.

SSD reliability has dramatically improved recently. This year Samsung and Intel SSDs only had a 0.2% overall failure rate compared to 0.8% in 2013.

Modern commodity computer parts from reputable vendors are amazingly reliable. And their trends show from 2012 onward essential PC parts have gotten more reliable, not less. (I can also vouch for the improvement in SSD reliability as we have had zero server SSD failures in 3 years across our 12 servers with 24+ drives, whereas in 2011 I was writing about the Hot/Crazy SSD Scale.) And doesn’t this make sense from a financial standpoint? How does it benefit you as a company to ship unreliable parts? That’s money right out of your pocket and the reseller’s pocket, plus time spent dealing with returns.

We had a, uh, “spirited” discussion about this internally on our private Discourse instance.

This is not a new debate by any means, but I was frustrated by the lack of data out there. In particular, I’m really questioning the difference between “soft” and “hard” memory errors:

But what is the nature of those errors? Are they soft errors – as is commonly believed – where a stray Alpha particle flips a bit? Or are they hard errors, where a bit gets stuck?

I absolutely believe that hard errors are reasonably common. RAM DIMMS can have bugs, or the chips on the DIMM can fail, or there’s a design flaw in circuitry on the DIMM that only manifests in certain corner cases or under extreme loads. I’ve seen it plenty. But a soft error where a bit of memory randomly flips?

There are two types of soft errors, chip-level soft error and system-level soft error. Chip-level soft errors occur when the radioactive atoms in the chip’s material decay and release alpha particles into the chip. Because an alpha particle contains a positive charge and kinetic energy, the particle can hit a memory cell and cause the cell to change state to a different value. The atomic reaction is so tiny that it does not damage the actual structure of the chip.

Outside of airplanes and spacecraft, I have a difficult time believing that soft errors happen with any frequency, otherwise most of the computing devices on the planet would be crashing left and right. I deeply distrust the anecdotal voodoo behind “but one of your computer’s memory bits could flip, you’d never know, and corrupted data would be written!” It’d be one thing if we observed this regularly, but I’ve been unheathily obsessed with computers since birth and I have never found random memory corruption to be a real, actual problem on any computers I have either owned or had access to.

But who gives a damn what I think. What does the data say?

A 2007 study found that the observed soft error rate in live servers was two orders of magnitude lower than previously predicted:

Our preliminary result suggests that the memory soft error rate in two real production systems (a rack-mounted server environment and a desktop PC environment) is much lower than what the previous studies concluded. Particularly in the server environment, with high probability, the soft error rate is at least two orders of magnitude lower than those reported previously. We discuss several potential causes for this result.

A 2009 study on Google’s server farm notes that soft errors were difficult to find:

We provide strong evidence that memory errors are dominated by hard errors, rather than soft errors, which previous work suspects to be the dominant error mode.

Yet another large scale study from 2012 discovered that RAM errors were dominated by permanent failure modes typical of hard errors:

Our study has several main findings. First, we find that approximately 70% of DRAM faults are recurring (e.g., permanent) faults, while only 30% are transient faults. Second, we find that large multi-bit faults, such as faults that affects an entire row, column, or bank, constitute over 40% of all DRAM faults. Third, we find that almost 5% of DRAM failures affect board-level circuitry such as data (DQ) or strobe (DQS) wires. Finally, we find that chipkill functionality reduced the system failure rate from DRAM faults by 36x.

In the end, we decided the non-ECC RAM risk was acceptable for every tier of service except our databases. Which is kind of a bummer since higher end Skylake Xeons got pushed back to the extra-fancy Purley platform upgrade in 2017. Regardless, we burn in every server we build with a complete run of memtestx86 and overnight prime95/mprime, and you should too. There’s one whirring away through endless memory tests right behind me as I write this.

I find it very, very suspicious that ECC – if it is so critical to preventing these random, memory corrupting bit flips – has not already been built into every type of RAM that we ship in the ubiquitous computing devices all around the world as a cost of doing business. But I am by no means opposed to paying a small insurance premium for server farms, either. You’ll have to look at the data and decide for yourself. Mostly I wanted to collect all this information in one place so people who are also evaluating the cost/benefit of ECC RAM for themselves can read the studies and decide what they want to do.

Please feel free to leave comments if you have other studies to cite, or significant measured data to share.

[advertisement] At Stack Overflow, we put developers first. We already help you find answers to your tough coding questions; now let us help you find your next job.
Categories: Others, Programming Tags:

Running a .NET app against a Postgres database in Docker

October 25th, 2015 No comments
PancakeProwler

Some days/weeks/time ago, I did a presentation at MeasureUP called “Docker For People Who Think Docker Is This Weird Linux Thing That Doesn’t Impact Me”. The slides for that presentation can be found here and the sample application here.

Using the sample app with PostgreSQL

The sample application is just a plain ol’ .NET application. It is meant to showcase different ways of doing things. One of those things is data access. You can configure the app to access the data from SQL storage, Azure table storage, or in-memory. By default, it uses the in-memory option so you can clone the app and launch it immediately just to see how it works.

Quick summary: Calgary, Alberta hosts an annual event called the Calgary Stampede. One of the highlights of the 10-ish day event is the pancake breakfast, whereby dozens/hundreds of businesses offer up pancakes to people who want to eat like the pioneers did, assuming the pioneers had pancake grills the size of an Olympic swimming pool.

The sample app gives you a way to enter these pancake breakfast events and each day, will show that day’s breakfasts on a map. There’s also a recipe section to share pancake recipes but we won’t be using that here.

To work with Docker we need to set the app up to use a data access mechanism that will work on Docker. The sample app supports Postgres so that will be our database of choice. Our first step is to get the app up and running locally with Postgres without Docker. So, assuming you have Postgres installed, find the ContainerBuilder.cs file in the PancakeProwler.Web project. In this file, comment out the following near the top of the file:

// Uncomment for InMemory Storage
builder.RegisterAssemblyTypes(typeof(Data.InMemory.Repositories.RecipeRepository).Assembly)
       .AsImplementedInterfaces()
       .SingleInstance();

And uncomment the following later on:

// Uncomment for PostgreSQL storage
builder.RegisterAssemblyTypes(typeof(PancakeProwler.Data.Postgres.IPostgresRepository).Assembly)
    .AsImplementedInterfaces().InstancePerRequest().PropertiesAutowired();

This configures the application to use Postgres. You’ll also need to do a couple of more tasks:

  • Create a user in Postgres
  • Create a Pancakes database in Postgres
  • Update the Postgres connection string in the web project’s web.config to match the username and database you created

The first two steps can be accomplished with the following script in Postgres:

CREATE DATABASE "Pancakes";

CREATE USER "Matt" WITH PASSWORD 'moo';

GRANT ALL PRIVILEGES ON DATABASE "Pancakes" TO "Matt";

Save this to a file. Change the username/password if you like but be aware that the sample app has these values hard-wired into the connection string. Then execute the following from the command line:

psql -U postgres -a -f "C:pathtosqlfile.sql"

At this point, you can launch the application and create events that will show up on the map. If you changed the username and/or password, you’ll need to update the Postgres connection string first.

You might have noticed that you didn’t create any tables yet but the app still works. The sample is helpful in this regard because all you need is a database. If the tables aren’t there yet, they will be created the first time you launch the app.

Note: recipes rely on having a search provider configured. We won’t cover that here but I hope to come back to it in the future.

Next, we’ll switching things up so you can run this against Postgres running in a Docker container.

Switching to Docker

I’m going to give away the ending here and say that there is no magic. Literally, all we’re doing in this section is installing Postgres on another “machine” and connecting to it. The commands to execute this are just a little less click-y and more type-y.

The first step, of course, is installing Docker. At the time of writing, this means installing Docker Machine.

With Docker Machine installed, launch the Docker Quickstart Terminal and wait until you see an ASCII whale:

Docker Machine

If this is your first time running Docker, just know that a lightweight Linux virtual machine has been launched in VirtualBox on your machine. Check your Start screen and you’ll see VirtualBox if you want to investigate it but the docker-machine command will let you interact with it for many things. For example:

docker-machine ip default

This will give you the IP address of the default virtual machine, which is the one created when you first launched the Docker terminal. Make a note of this IP address and update the Postgres connection string in your web.config to point to it. You can leave the username and password the same:

<add name="Postgres" connectionString="server=192.168.99.100;user id=Matt;password=moo;database=Pancakes" providerName="Npgsql" />

Now we’re ready to launch the container:

docker run --name my-postgres -e POSTGRES_PASSWORD=moo -p 5432:5432 -d postgres`

Breaking this down:

docker run Runs a docker container from an image
--name my-postgres The name we give the container to make it easier for us to work with. If you leave this off, Docker will assign a relatively easy-to-remember name like “floral-academy” or “crazy-einstein”. You also get a less easy-to-remember identifier which works just as well but is…less…easy-to-remember
-e POSTGRES_PASSWORD=moo The -e flag passes an environment variable to the container. In this case, we’re setting the password of the default postgres user
-p 5432:5432 Publishes a port from the container to the host. Postgres runs on port 5432 by default so we publish this port so we can interact with Postgres directly from the host
-d Run the container in the background. Without this, the command will sit there waiting for you to kill it manually
postgres The name of the image you are creating the container from. We’re using the official postgres image from Docker Hub.

If this is the first time you’ve launched Postgres in Docker, it will take a few seconds at least, possibly even a few minutes. It’s downloading the Postgres image from Docker Hub and storing it locally. This happens only the first time for a particular image. Every subsequent postgres container you create will use this local image.

Now we have a Postgres container running. Just like with the local version, we need to create a user and a database. We can use the same script as above and a similar command:

psql -h 192.168.99.100 -U postgres -a -f "C:pathtosqlfile.sql"

The only difference is the addition of -h 192.168.99.100. You should use whatever IP address you got above from the docker-machine ip default command here. For me, the IP address was 192.168.99.100.

With the database and user created, and your web.config updated, we’ll need to stop the application in Visual Studio and re-run it. The reason for this is that the application won’t recognize that we’ve changed database so we need to “reboot” it to trigger the process for creating the initial table structure.

Once the application has been restarted, you can now create pancake breakfast events and they will be stored in your Docker container rather than locally. You can even launch pgAdmin (the Postgres admin tool) and connect to the database in your Docker container and work with it like you would any other remote database.

Next steps

From here, where you go is up to you. The sample application can be configured to use Elastic Searchfor the recipes. You could start an Elastic Search container and configure the app to search against that container. The principle is the same as with Postgres. Make sure you open both ports 9200 and 9300 and update the ElasticSearchBaseUri entry in web.config. The command I used in the presentation was:

docker run --name elastic -p 9200:9200 -p 9300:9300 -d elasticsearch

I also highly recommend Nigel Poulton’s Docker Deep Dive course on Pluralsight. You’ll need access to Linux either natively or in a VM but it’s a great course.

There are also a number of posts right here on Western Devs, including an intro to Docker for OSX, tips on running Docker on Windows 10, and a summary or two on a discussion we had on it internally.

Other than that, Docker is great for experimentation. Postgres and Elastic Search are both available pre-configured in Docker on Azure. If you have access to Azure, you could spin up a Linux VM with either of them and try to use that with your application. Or look into Docker Compose and try to create a container with both.

For my part, I’m hoping to convert the sample application to ASP.NET 5 and see if I can get it running in a Windows Server Container. I’ve been saying that for a couple of months but I’m putting it on the internet in an effort to make it true.

Categories: Others, Programming Tags:

Are you in the mood for HTTP?

October 21st, 2015 No comments

Sometimes you are just in that mood, you know you want to talk HTTP and APIs with a bunch of people that care. Recently Darrel Miller and I realized we’re in that mood, and so with a little nudging from Jonathan Channon we decide now is a good time. And so, “In the Mood for HTTP” was born.

It is a new Q&A style show, where folks submit questions on all things HTTP and Darrel and I give answers. Every show is live via Google Hangouts on Air, AND it is recorded and immediately available. In terms of the content, one thing I think that is really nice is we’re getting to dive into some really deep areas of building APIs that are not well covered. For example what level of granularity of media types should you use? Do Microservices impact your API design? And much more!

We’re not always in agreement, we’re not always right. We do always have fun!

Read Darrel’s blog post which goes into more detail of that what and why, then come join us!

Categories: Others, Programming Tags:

Save your DNS and your SANITY when using VPN on a Mac

October 1st, 2015 No comments
Screen Shot 2015-10-01 at 7.08.31 AM

There was time when using my Mac was bliss form a DNS perspective, I never had to worry about my routing tables getting corrupted. I could always rely on hosts getting resolved, life was good! And then a combination of things happened:

  • The networking stack on OSX went downhill.
  • I joined Splunk.
  • I started using a VPN on my Mac (We use Jupiter SSL VPN).
  • I started having to deal with this now recurring nightmare of my DNS suddenly failing, generally after using the VPN.

If you use a VPN on a Mac, I am sure you’ve seen it. Suddenly you type “https://github.com” in my browser, and you get a 404. “Is Github down?” you ask your co-workers? “Nope, works perfectly fine for me”. “Is hipchat down?”. “Nope, I am chatting away.”

Meanwhile. your browser looks something like this:

AAARGH!

So you reboot, and then you find out that Github was up all along, the problem was your routing tables on got screwed somehow related to the VPN. After dealing with this constantly, you start to seriously lose your sanity! It will of course always happen at the most inopportune time, like when you are about to present to your execs or walk on stage.

But my friends, I have a cure. This is a cure I learned from the ninjas at my office, it is a little bash alias, and it will save you AND your DNS. Drop it in your .bash_profile

alias fixvpn="sudo route -n flush && sudo networksetup -setv4off Wi-Fi && sudo networksetup -setdhcp Wi-Fi"

Next time the DNS demons come to get you, run this baby from the shell.

Screen Shot 2015-10-01 at 7.09.48 AM

Wait a few seconds, and bring up that webpage again.

Screen Shot 2015-10-01 at 7.09.57 AM

You DNS and Sanity are restored!

Categories: Others, Programming Tags:

Building a PC, Part VIII: Iterating

September 17th, 2015 No comments
iPhone single core geekbench results

The last time I seriously upgraded my PC was in 2011, because the PC is over. And in some ways, it truly is – they can slap a ton more CPU cores on a die, for sure, but the overall single core performance increase from a 2011 high end Intel CPU to today’s high end Intel CPU is … really quite modest, on the order of maybe 30% to 40%.

In that same timespan, mobile and tablet CPU performance has continued to just about double every year. Which means the forthcoming iPhone 6s will be almost 10 times faster than the iPhone 4 was.

Remember, that’s only single core CPU performance – I’m not even factoring in the move from single, to dual, to triple core as well as generally faster memory and storage. This stuff is old hat on desktop, where we’ve had mainstream dual cores for a decade now, but they are huge improvements for mobile.

When your mobile devices get 10 times faster in the span of four years, it’s hard to muster much enthusiasm for a modest 1.3 × or 1.4 × iterative improvement in your PC’s performance over the same time.

I’ve been slogging away at this for a while; my current PC build series spans 7 years:

The fun part of building a PC is that it’s relatively easy to swap out the guts when something compelling comes along. CPU performance improvements may be modest these days, but there are still bright spots where performance is increasing more dramatically. Mainly in graphics hardware and, in this case, storage.

The current latest-and-greatest Intel CPU is Skylake. Like Sandy Bridge in 2011, which brought us much faster 6 Gbps SSD-friendly drive connectors (although only two of them), the Skylake platform brings us another key storage improvement – the ability to connect hard drives directly to the PCI Express lanes. Which looks like this:

… and performs like this:

Now there’s the 3× performance increase we’ve been itching for! To be fair, a raw increase of 3× in drive performance doesn’t necessarily equate to a computer that boots in one third the time. But here’s why disk speed matters:

If the CPU registers are how long it takes you to fetch data from your brain, then going to disk is the equivalent of fetching data from Pluto.

What I’ve always loved about SSDs is that they attack the PC’s worst-case performance scenario, when information has to come off the slowest device inside your computer – the hard drive. SSDs reduced the variability of requests for data massively. Let’s compare L1 cache access time to minimum disk access time:

Traditional hard drive
0.9 ns ? 10 ms (variability of 11,111,111× )

SSD
0.9 ns ? 150 µs (variability of 166,667× )

SSDs provide a reduction in overall performance variability of 66×! And when comparing latency:

7200rpm HDD — 1800ms
SATA SSD — 4ms
PCIe SSD — 0.34ms

Even going from a fast SATA SSD to a PCI Express SSD, you’re looking at a 10x reduction in drive latency.

Here’s what you need:

These are the basics. It’s best to use the M.2 connection as a fast boot / system drive, so I scaled it back to the smaller 256 GB version. I also had a lot of trouble getting my hands on the faster i7-6700k CPU, which appears supply constrained and is currently overpriced as a result.

Even though the days of doubling (or even 1.5×-ing) CPU performance are long gone for PCs, there are still some key iterative performance milestones to hit. Like mainstream 4k displays, mainstream PCI express SSDs are an important milestone in the overall evolution of desktop computing.

[advertisement] Find a better job the Stack Overflow way – what you need when you need it, no spam, and no scams.
Categories: Others, Programming Tags:

Code the Town! Investing in the next generation of programmers in Austin, TX

August 27th, 2015 No comments
Code the Town

Austin, TX is a hot-bed for technology. You can find a user group for just about any technology and purpose meeting almost any day of the week.

And now, there is a group that intersects with giving back to the community and helping the next generation of programmers. Code the Town is a group that does just that. Clear Measure, and other companies, are sponsors of the group. The official description is:

“This is a group for anyone interested in volunteering to teach Hour of Code https://hourofcode.com/us in the Austin and surrounding area school districts. The goal is to get community volunteers to give the age appropriate Hour of Code to every student at every grade level. We want to have our own community prepare students for a technology-based workforce. We also want to build a community of professionals and students that have a passion for coding and teaching. We want to begin the Hour of Code in the high schools first. High school students would then be prepared to teach the younger students. Once this group has momentum, it will be able to form motivated teams and use software projects done for local non-profit organizations to not only reinvest in our community but also to help our youth gain experience in software engineering. Whether you are a student, parent, educator, or software professional, please join our Meet Up! This will be fun! And it will have a profound impact on the next generation.”

The long term vision is to create a sustainable community of professionals, educators, parents, and students that continually gives back to local community organizations through computers and technology while continually pulling the next generation of students into computer programming.
It all starts with some volunteers to teach students the basics of computer programming. In the 1990s, the web changed the world. Now, we have hand-held smartphones and other devices (TVs, bathroom scales, etc) that are connected to computer systems via the internet. In the next decade, almost every machine will be connected to computer systems, and robotics will be a merging between mechanical engineering and computer science. Those who know how to write computer code will have a big advantage in the workforce where the divide between those who build/create and those who service what is created might get bigger than it already has.
BlocklyCode the Town will focus on introducing students to computer programming and then pull them together with their parents, their teachers, and willing community professionals to work on real software projects for local non-profits. In this fashion, everyone gets something. Everyone gives something, and everyone benefits. If you are interested in this vision, please come to the first meeting of Code the Town by signing up for the Meetup group.

Categories: Others, Programming Tags:

Docker on Western Devs

August 23rd, 2015 No comments

In a month, I’ll be attempting to hound my share of glory at MeasureUP with a talk on using Docker for people who may not think it impacts them. In it, I’ll demonstrate some uses of Docker today in a .NET application. As I prepare for this talk, there’s one thing we Western Devs have forgotten to talk about. Namely, some of us are already using Docker regularly just to post on the site.

Western Devs uses Jekyll. Someone suggested it, I tried it, it worked well, decision was done. Except that it doesn’t work well on Windows. It’s not officially supported on the platform and while there’s a good guide on getting it running, we haven’t been able to do so ourselves. Some issue with a gem we’re using and Nokogiri and lib2xml and some such nonsense.

So in an effort to streamline things, Amir Barylko create a Docker image. It’s based on the Ruby base image (version 2.2). After grabbing the base image, it will:

  • Install some packages for building Ruby
  • Install the bundler gem
  • Clone the source code into the /root/jekyll folder
  • Run bundle install
  • Expose port 4000, the default port for running Jekyll

With this in place, Windows users can run the website locally without having to install Ruby, Python, or Jekyll. The command to launch the container is:

docker run -t -p 4000:4000 -v //c/path/to/code:/root/jekyll abarylko/western-devs:v1 sh -c 'bundle install && rake serve'

This will:

  • create a container based on the abarylko/western-devs:v1 image
  • export port 4000 to the host VM
  • map the path to the source code on your machine to /root/jekyll in the container
  • run bundle install && rake serve to update gems and launch Jekyll in the container

To make this work 100%, you also need to expose port 4000 in VirtualBox so that it’s visible from the VM to the host. Also, I’ve had trouble getting a container working with my local source located anywhere except C:Usersmysuername. There’s a permission issue somewhere in there where the container appears to successfully map the drive but can’t actually see the contents of the folder. This manifests itself in an error message that says Gemfile not found.

Now, Windows users can navigate to localhost:4000 and see the site running locally. Furthermore, they can add and make changes to their posts, save them, and the changes will get reflected in the browser. Eventually, that is. I’ve noticed a 10-15 second delay between the time you press Save to the time when the changes actually get reflected. Haven’t determined a root cause for this yet. Maybe we just need to soup up the VM.

So far, this has been working reasonably well for us. To the point, where fellow Western Dev, Dylan Smith has automated the deployment of the image to Azure via a Powershell script. That will be the subject of a separate post. Which will give me time to figure out how the thing works.

Categories: Others, Programming Tags: