Saturday, December 25, 2010

Learning by example

It is time for me to learn ASP.NET and AJAX.

Why ASP.NET? Because it is C# and it works well. Because there is a huge MS behind the technology. Because it works well and it is easy to get started with it (there is a Visual Studio Express). Because it is simple.

So to start I went to http://asp.net and started looking at the tutorials. It definitely helps, but you cannot learn programming by reading a book (or a web site). So I fired up VS and created an ASP.NET project.

The first question is whether to use MVC or WebForms:

image

Since I do believe Scott Hanselman and I am a freak of lower-level and finer control it is MVC for me.

The first steps feel fine, the integration with VS is great: there are actually “Create a view/model/controller” there! However, the problem began when I try to play with AJAX a little bit.

I have started with this tutorial and immediately ran into a problem: what should my partial page to inherit from? No matter what I have tried to do: either “compilation error” or the page “does not implement method” error.
In better cases, I get “unhandled exception”.

In the end, it turned out that when you want to use

Html.RenderPartial("ListContentView", Model);



Your “ListContentView” should be a partial view and not a regular view:



image



Otherwise, life is not good Sad smile

Sunday, November 07, 2010

Rewrite is pending

Recent new form PDC had nearly caused me a heart attack: MS drops Silverlight for web and bets on HTML5.

Few months ago when I was choosing a platform to write my app, Silverlight looked good:

  • C# and .Net
  • Easy to develop and debug.
  • MS is backing it up, meaning that it will be widely used.
  • Can be detached from the browser and made into a standalone application.

Even worse, detachable feature pushed me to develop a thick client to save the round trips to the server and improve performance.

 

Few months later, it does not look like a great idea anymore. Not only I have to re-write the app in HTML5/JavaScript, but I also have to re-architecture it to make it more decoupled. On the bright side, I plan to make it a learning experience and also to document the process in my blog to practice a discipline of posting and writing.

Wednesday, September 15, 2010

Getting back

I have not worked on my project for months. :(

Now, I am back. At the high point of release frenzy, but still I have an itch to scratch.

It feels good to be back…

Thursday, June 17, 2010

Why have I got Kindle DX

Few days ago I have received a Kindle DX. For a long time i thought about buying one and finally I did it. Say Hello to Kindle DX with Global WirelessIt was clear to me that I’m getting DX because of the screen size. The regular Kindle narrow lines looks annoying.

AFTER I’ve got it, I started to think why did I buy it? Will I start to read all my books on Kindle? No. First of all, I like the feeling of a real book better. Second, some of the books are unavailable on Kindle, especially books in Russian and Hebrew that I read.

BTW, Kindle DX does not support Unicode. Read and weep. It is only 2010. Amazon still are not aware that there are other languages than English.

Anyway, if I’m not dropping books, then why to get Kindle? It was clear to me that I need one, but why?

First reason I thought of, is that I have a bunch of PDF’s and DOCS that I want to read. Currently the only options are:

  1. Print the document. Not good as it wastes lots of paper and it is not convenient to walk around with a bunch of pages. In addition, next page always gets lost.
  2. Read on computer. This means that I have to sit in front of computer and read the document. What do you mean I can’t read lying on a sofa at salon?
  3. Read on a laptop. It solves the mobility issue, but raises few more:
    1. It is not as easy to read from  laptop screen as from e-ink or book.
    2. It takes a long time (relatively) to get into a reading mode.
    3. Battery life is limited.
    4. The laptop gets hot and heavy.
    5. It has to be open to read from it and takes space.

I’ve checked and the results were as predicted: I did not read books in PDF and DOC format apart from a very rare exceptions. I hope that with Kindle DX it will change.

Another reason is very simple one. To get a book from Amazon takes about 6-8 weeks using sane delivery method and costs about 30$. This means that it is a huge waste of time and money to get a single book from Amazon. On the other hand, getting a bunch of books is a considerable amount of money.

Taken together, I rarely order books from Amazon and when I do, I do it in batches. Now I have Kindle. I can get a book within minutes, without additional payment and cheaper than the printed edition. I have a feeling that Amazon is about to get much larger part of my income.

“I think this is a beginning of a beautiful friendship …”

Wednesday, April 14, 2010

How good is 96% test coverage?

It is 10th annual ribosome convention. Best ribosomes from all over the world in the same place. Taken away from their duties and day-to-day business to focus on one and only thing:

Why THE HELL we do not get what was specified after gene translation???

It seems that the final quality of the product is pretty bad. The final thing apears to be a little "hairy". Instead of getting something like this:

 angelinajolie_face

We keep on getting stuff like this:

chimp-face-cu-16470022

Now the difference is striking, even though, a total average of hair lenght is the same.

On a previous, 9th, congress, one of the seniors ribosomes suggested translation coverage tests before RNA is translated into protein. If the RNA does not pass the test, the protein is not produced. Arbitrarely, the bar was set to 51%, which is the minimal majority. Other candidates: 7, 13, 21 (black jack) 69, did not make it.

This means that RNA tha arrives to ribosome have to be checked for at least 51% of its rate BEFORE the translation is done.

It is plausable to think that 51% coverage will enable us to tell between "desired" and "acquired". "We need more quality!", thought ribosomes.

However, one year later things look much less optimistic. Ribosomes has come to a sad realization - 51% is not enough. RNA coverage did not help!!! Shimps keep on being produced. Moreover, some crazy ribos reached as high as 85% covrage, but still received:

chimp-face-cu-16470022 

Ribosomes are stuck. Discussions continued until a talk given by some academic-somes. Apparently the above two spieces share 96% of DNA code!!!!

Now this must be wrong. The differences are too large!!!

How can it be????

The answer is simple: there is a huuuuuuuge difference between micro and macro. On a micro level the difference is almost not-recognizable: 4%. It cannot be caught and mapped out by 51% RNA coverage.

On a macro level the difference is huge. Not because the micro parts are different, but because they interact differently one with another.

This means that RNA-level testing and coverage have a very limited payoff. The path to a better final product goes thought a better understanding of the global scope and how things work, much more than in lower testing.

This knowledge is hard to measure and does not seem to bring benefits right away. However, until then we get it - same same, but different...

Tuesday, March 16, 2010

Mercurial on Windows

I’ve downloaded version 1.5 of Mercurial for windows. Install was seamless and now I’ve got a command line to play with.

Lucky me, the setup has added “hg” to the path. I hate it when it does not do that. Since I already have a project this is what I’ve done:

> hg help

Help screen with many options.

> cd TestProject
> hg init
> hg status | less
'less' is not recognized as an internal or external command,
operable program or batch file.
> hg status | more
<A list of files all marked with “
?” at the beginning>
> hg add *
> hg status | more

Now all files are marked as added.

> hg commit
<Write  a comment>
> hg status | more
Empty line.

I notice that there is some not related directory:

> hg remove BrowserTest
> hg commit
Write a comment.

Now there is some weird zap file in my working directory. I wonder what it is…

To Be Continued …

[Edited]
The best part is that your entire repository for a given project is located in “.hg” directory!!!!

Tuesday, January 26, 2010

Communication out

I wanted to write on this topic for a long time, but recent piece by Joel Spolsky has made me do this.

Joel talks about having less conversation in a project and communicate less during software project. The point is logical: the more people involved in decision making, the more time it takes to reach a decision and the less chances of success the project has. Proposed solution is logical as well: limit communication. Do not tell everyone what you are doing and ask their opinion. Decide in small numbers.

I am all for small numbers. Let me decide my own fate. There is one small correction though…

There is a difference between duplex and half-duplex communication. Computer networking distinguishes between those two, however, in a day to day life we tend to forget about half-duplex communication as full duplex is a default:

  • Face to face conversation is full duplex.
  • Phone is full duplex.
  • IM/email is full duplex. This one is not trivial, but the information flows both ways.
  • Even web blog/twitter/facebook, that once was half-duplex now is full duplex, as everyone can leave comments.

We are so used with full duplex communication that there is a tendency to think of it as the only type. How it is related to software projects?

I had virtually joined another team during last project. This team held daily sync as the deadline approached and it appears that I misunderstood the purpose of those meetings. I thought that the meetings are there to sync team members regarding their doings, plans, what they did and etc. However, the moment I said: “I am working on a problem A”, the immediate reaction was: "Why don’t you try this”, “You should do that”, “Try to turn this flag on” and etc.

I wasn’t asking for help or directions! I was telling my status so they would know.

This the major difference between full duplex and half-duplex communication. It is good and desired to communicate out your status, plans, what you did, why you are doing this and that. However, this communication is half-duplex. You are not interested in other’s opinions, unless of course you are going full speed into abyss. People have to understand that their opinions are not required, however, they do need to know what their peers have been up to. This knowledge is essential both to prevent duplicate work and to provide a bigger picture.

Communicate out. Do not ask permissions.

Saturday, January 16, 2010

System Thinking – 2

Recently I’ve got an email from one of my managers saying: “If the team will solve K bugs this week, I will make you a breakfast”. It can be food, it can be money or it can be sex. Does not matter. The idea is: you work hard and do something that is not possible otherwise and you will get a reward.

This is a good case to apply system thinking! 

Managers sometimes think, and my guess is that this manager in particular, that the dynamics looks like this:

systemthinking_motivation_simple

Meaning that raising reward/incentive will increase the amount of hard work that people are putting.

There is a number of problems with this approach, for example: it assumes that people aren’t working hard enough already. However, I will concentrate on the connection between reward and hard work.

On a basis of my observations, this method does not work. At least it didn’t in this particular case. It does sounds logic, so why it does not work?

More people-aware managers find out that the connection between award and hard-work is not direct, but rather looks more like this:

systemthinking_motivation_middle

Meaning that if the motivation is low, increasing reward will not really increase the amount of hard work. Reward will influence motivation, which in turn will influence hard work.

This view is much more realistic and sometimes appears to work. It does even shows a basic problem with this approach: there is a connection between reward and hard-work and it will go into both directions. Once reward is gone, the hard-work is gone as well.

Even though the above view works from time to time, it is still simplified. One sigh of the simplification is that:

  • There are people that do not work for reward/incentive. At least not for material.
  • It works temporary. At some point people will not work hard even if reward is substantial.

The above two observations together mean that there is another relation that is not covered by the above view.

Closer to reality view looks something like this:

systemthinking_motivation_full

Notice that there is a motivation->hard-work->pride force-feeding loop. Meaning that people with motivation will work harder, will take pride in their work, which will motivate them again.

The inner loop is the one targeted most of the time. This is also the one that is the easiest to target. However, notice that the larger the “greed” the lower the motivation. This means that increasing the reward will increase the “greed”, but decrease motivation. Sometimes it will work, but sometimes the reward will be increased, but this will decrease the motivation in such way that people will not work harder. This will happen when there is no trust between developers and management.

What happened to that manager? He got promoted.

Monday, December 14, 2009

System thinking – 1

One concept that always fascinated me is the whole-parts thing. How the parts become a whole and how you can dissemble the whole into parts.

System thinking is just a thing about it. Usually, when we try to analize some system, we break it into parts and try to understand each part. Once this is done, we hope to understand how the whole thing is working together. However, there are times where breaking into parts misleads.

This is where System Thinking is coming into play (bing for it). It says that sometimes you better off thinking hollistically instead of zooming in as much as possible.

This is an example from software development.

Sometimes it seems that every fixed bug causes 5 regression bugs. In this case the most common response is to make more checks on the code before check-in. this is how it looks graphically:

 

systemthinking_regressionBugs_small_small

What it means is that when "Regression bugs" raise, the process increases. Notice the "+" on the arrow from bugs to process circle. Now, this is something that we decide.

An arrow from process to bugs is something that we anticipate. The arrow is marked with "-" meaning that when process increases we anticipate that the number of regression bugs goes down. Notice that the number of "-" signs on this loop is odd, meaning that this is a balancing loop. Or in other words, it will stabilize itself.

This is "divide and conquer" approach.

However, from experience it is not always works out like this. Thus, if we turn to hollistic system thinking we get another picture:

systemthinking_regressionBugs_big_small

This picture introduces another feedback loop. The new loop has 2 (even) "-" arrows in it. This means that this is reinforcing loop and it can go wild.

Clearly, this is a simplified view of the process, but even in this simplified view there are two loops that battle one another.

If reinforcing loop wins, then introducing more process wil not make the situation better. In this situation, it is probably better to educate people to fight off this loop.

to be continued ...

Tuesday, March 31, 2009

API not changes

I have been to Configuration Management meeting this week. The topic was pre-checkin validations. I will write about them later, but now I want to talk about small sub-topic that was raised during this meeting: description of checkin.
For example, I check-in some file. Let's say README.txt. A little, 19-inch wide, window pops up where I have to fill in few fields. Some of them are completely sensible and some are over the top in my opinion. The entire practice of describing the checkin is totally common. Once I myself defined log format and implemented some system for enforcing long-enough comments. However, the devil is in the details.
In this constellation, there are quite a bit more fields than just a formatted description. For example:
  • code reviewer. What the hell?!?!? If you don't trust me enough to checkin the code, why pay me money at all?
  • Unit-test: passed, failed, not applicable. Once again. If I, as a professional developer, decide that this code is good enough to be checked-in, then let me decide whether unit tests should be executed or not. Do not make it a-must.
  • UI changes. I can understand the need for it, but does it mean that I checkin a code that should have UI, but does not have. Meaning that there is no way to use this code right now? In addition it means, that I have checked in partial code, as some of the code is missing (UI part), which is exactly this field is about.
There were other fields, but one of the managers wanted a new checkbox called: "API change". The reasoning was that: "I want to know that API has changed as it is important. It can break other modules".
First. If you checkin a code that breaks other modules, then you are doing it wrong. Checkbox or no checkbox.
Second, API changes are not dangerous. They are visible. They break build, things stop working. The dangerous changes are those that do not change API, but change the behavior of the code. Class signature remains the same, but its logic changes. These are the dangerous changes.
So maybe there should be a checkbox: "Dangerous change". If the developer feels that the code he just checked-in is dangerous, he should check it. Probably, there should also be a reason to explain why this code is dangerous. Maybe, another field with a name of the person who approved checkin of a dangerous code.
This idea can be taken even further. There might be a checkbox called: "There is a bug in this checkin".

Suckodrom #1: Transition to IPv6

Talk down IPv6 is like kicking in da balls already knocked out man. Image, you are one of the top professionals, you defined a new IP standard of the future. You publish it to the world (in 1996!!!). Nobody pays any attention. Years pass by. Slowly it take off. Every year it seems like the next year or two it will burst forward. It doesn't. And then some anonymous blogger, which will be me, comes and posts a critisism of your 12 years old work. Luckily, nobody reads this blog, as in general me talking down IPv6 is like kicking knocked out man in da balls in a dark alley where nobody can see me. Not something I will tell my parents.
I have no problem, yet, with IPv6 protocol per se. I think that we are unlucky being stuck with IPv4. However, the transition to IPv6 sucks big time. Clearly, two protocols (4 and 6) will run side by side for years and years. And then IPv4 will continue to run in data centers disguised as IPv6 for even more years. So you would imagine that transition is a pretty important subject. It is, and it has been under intensive research. Just take a look at IPv6 Wikipedia entry to see the number of references and RFCs. Unluckily, transition still sucks.
  1. NAT-PT. It was designated as a device/software that will translate from IPv4 to IPv6 and vice versa in a completely transparent way. In addition to architectural problems of NAT-PT there is a performance thing. It is slow. It slows down the traffic and does not make IPv6 looks good.
  2. Tunneling technologies. There are few of those: ISATAP, 6to4 and Teredo. There is inherit problem with all tunnels: they waste bandwidth. They make IPv6 "feel" slower than IPv4, which reduces the drive to move to the new technology.
  3. End-to-end IPSec. IPv6 hosts must support IPSec. The problem is that end-to-end authentication cannot work with NAT-PT. This means that there are severe limitations on topology. If IPv6 host wants to talk to IPv4 host with end-to-end IPSec, the traffic cannot pass NAT-PT. What if you do pass NAT-PT and you have to work with IPSec, after all it is a must for IPv6, then what do you do? You fall back to IPv4.
The world will move to IPv6 in time, South Korea and Japan are good examples. However, it does sucks that the transition itself is puts potential customers from the protocol.
This post is short. I am still getting on track with this blog thing.

Comrade Trotzky

I am a newbie in MS. As a good American company MS appointed someone to be my buddy. This idea is not specific to MS, but rather common in the industry. Buddy's job is to teach me the company ways, answer my questions, take me to lunch, guide me through the first days (or maybe weeks, if I am a slow learner) in the company. Managers usual choose one of the nicest persons on a team to be a buddy for a newcomer. The newcomer is in a vulnerable situation: he does not know much about the group, he has no idea what are the problems of the product, he didn't see the code, he never debugged the product, his development environment is not set, when build fails he does not know what to do, he is not aware of all possible tools, he has no idea who is who on the team, so he might take advice from the most dumb person that was fired a week ago but still cannot find its way to the elevator. Buddy remedies this situation. Suddenly there is someone from "inside" who can explain the situation to the newcomer. It is much less threatening to enter a team this way, as it is like you already have a friend. It is like you are coming to a party Great idea, this buddy thing. There are few caveats thought.
  1. Newbie and buddy should be of the same intellectual level. Neither "smart newbie and dumb buddy" nor "dumb newbie and smart buddy" will work. It is either "dumb and dumb" or "smart and smart", can't be any other way.
  2. "Smart and smart" situation is problematic as you get a competition situation in many cases. Newbie has to prove his worthiness, while buddy has to establish himself as an alpha developer according to his buddy position.
  3. Companies in general ignore social dynamics related to sexes. For example, if newbie is "he" and buddy is "she", in many cases newbie will work harder to prove himself than in case the buddy were "he". The opposite case ("she" newbie and "he" buddy) works much better, as buddy will go extra mile to impress newbie.
So the problem is when you hire a smart "he" developer. You cannot appoint him dumb buddy, as the buddy willl teach newbie how to use coffee machine during the first week boring the poor newbie to death. You can appoint a smart "he" or "she" buddy, but be ready for some friction, which kind of negates the idea of the buddyship.
Obviously a manly guy like me finds himself in "smart smart" situation. My buddy proved that he is a smart guy by brilliantly playing his part. He came to me after I have been a week at the job and said:
"Hi. I am your buddy. If you have questions you can ask me."
He never come to me again, but was ready to answer my questions.
This is the way!!!
He showed me the real, hard day-to-day work:
Trotzky punishing two sisters
Not some pansy "eeha, we are having such a great time here. we are such a nive people!" This is the way. Respect your newbies. Throw them into a deep water and let them swim. If they will have a question they will come. Not everyone will survive, but those who do will become your true comprades.
Not like comprade Trotzky.

Wednesday, December 26, 2007

Serious != Professional

Compare those two code snippets:

bool operator==(const AClass& first, const AClass& second) {
return (first.weight == second.weight) && (first.value == second.value);
}

vs.

bool operator==(const AClass& apple, const AClass& orange) {
return (apple.weight == orange.weight) && (apple.value == orange.value);
}

The meaning is the same, but the second one actually compares oranges to apples!!! Geek humor, weeee!!!

Or consider a description of those two use cases in design document:

The user should be able to save custom records in the system and retrieve them at any time. Records should be kept on non-volatile, persistent storage. Retrieval rates should be less than one second for the first item. If a number of items were saved together, then subsequent items within a set should be retrieved within 0.1 sec.

vs.

Our prospect customer, Moses (Let My People Go), wishes to keep 10 commandments persistent. Since his last accident with stone scrolls, he wishes to give a shot to our system. Apart from making his record persistent, he wishes to be able to retrieve any commandment in less than one second, in case he forgets it and is wondering whether he is allowed to do something. If he saved a set of commandments, then the subsequent ones should be retrieved within 0.1 sec. To keep the flow.

Granted, the second one has more words: 58 in the first version vs 86 in the second one. On the other hand it is easier to read and it keeps you awake, while the first version actually puts the reader mind in a dormant state. I know this state. This is when my eyes read the document, the words pass through my brain without stopping for a cup of coffee or a pee. If the document is important, then I have to read it a number of times to get meaning to stay.

The second version on the other hand, is so ridiculous that it is an easy read and something stays in your mind. I think the reason something stays is because it is much easier to imagine Moses trying to save 10 commandments, rather than a "customer" wishing to save his "record". God knows what this records hold.

Another comparison:


vs.



They communicate the same message, but the second one is funnier. In a not-funny-joke way. The important thing to notice here is that the task of understanding the design/code/diagram is hard enough by itself. If the means can be made lighter, then the reader can concentrate on the task at hand.

The point I'm getting to is that things might be funny, but it does not make them content-empty or unprofessional. Many people feel that they have to be serious to be taken seriously. This is not the case. One should enjoy his work and have fun doing it. This is the way to do it better than good, this is a way to do it great.

You should have fun while doing your job and the best and easiest way to have fun is to crack a joke once in a while.

Saturday, November 17, 2007

Sure sigh of a scalable program

I work for a company that develops scalable, high-performance, clustered server.
Last month I have found myself struggling to explain what is the difference between scalable, high-performance programs and programs written "just to work".

Developing a small program is like building a tree house, while creating huge, multi-modules and gazillion lines monster is like constructing a skyscraper. The tasks are so different, that it is hard to even compare them. The same goes, so I feel, to the comparison of a high-performance, clustering system and small, stand-alone utility. At the same time, explaining why this is the case was just like explaining why the water is wet. Something that you know, but cannot put into word. In other words, I was a feeling stupid.

This got me thinking. If I have to put one single test that the program is good enough to work under pressure. What test will it be?

Since Intel architecture rules supreme now in dekstop/server rooms, let me go over a little bit of history here.
DOS and 8086 days. Computer memory is wide open before each running program. It starts from 0 and goes to whatever you bought. The upper memory (384KB) is reserved for system usage, however, your program still can access it and crash the computer, of course. What is "memory management" in those days? OS (DOS) doesn't really manage memory. It just gives all available memory to the currently executing program. Once program A has finished, program B will be loaded into exactly same location where A has been just a moment ago. There are no restrictions on the memory programs can access. They can change DOS code or do whatever they want. Accessing other's memory is considered rude thought. Kind of like peeing in a pool.
From program point of view, once a memory is allocated it is always there. It's access is fast and constant. How is malloc()/free() functions implemented? Easy. All you have to do is to find the beginning of the system memory and that's it. The memory from the end of the program to the start of system memory is yours to manage. OS doesn't care what memory manager does with the memory.

Time went by and 640KB is not enough for everybody anymore. Each program now wants lots of memory all for itself and forever. 80386 had exactly the thing: virtual memory + memory protection. Each program can now run in its own memory space. You want to zero all your memory? Go wild, nobody else will be hurt. As a matter of fact, 80286 had preliminary support for protected mode. However, due to a number of mistakes, such as inability to switch it on and off without loosing memory contents, and inability to access BIOS, rendered the mode unusable.
80386 provided everything needed to put each program in its own luxurious apartment from a memory point of view:
  • 32-bit virtual memory space
  • paging
Virtual memory is simple. Each program has a memory chunk starting from 0 and going until 2^32. Neat. There is one "problem" though: what if a computer with 4MB RAM runs a dozen of program each one consuming 1MB of RAM. Obviously, there is not enough RAM for all of them. Who gets the memory and who doesn't? This is where paging comes into play. Paging basically says: "I have a limited amount of physical memory, much less than I give my programs to allocate. I will put some of the memory on a disk (it can be any secondary storage, but usually it is a disk) and once it is needed I will bring it back. If more RAM memory is needed, I will transfer (page out) some of the memory to the disk." This is what OS is saying to itself. Good thing, the processor is backing it up. If the program is trying to access a memory that is currently on disk, the OS should get a notification that something is wrong. Hardware is doing it and the notification is called "page fault" interrupt. So, your program is accessing some variable that at the time happens to be on a disk. Hardware recognize this situation and raises a special interrupt. OS gets the interrupt and loads the requested page from a disk to a memory. Your program is dormant at the moment and unaware of this process. The disk might be busy and serving other requests, but your program doesn't care about it. It is in a better place right now. Probably drooling or in Tumbolia. At the end, the page is loaded to the memory and OS continues your program from the same spot. From it's point of view, the variable was accessed just like any other variable in memory. One thing is missing from this description: how the hardware "knows" that this address is not in the memory? It has to be some sort of dialog between OS and hardware on what addresses in memory and what addresses on a disk. This is a good question and will be answered later.

As you can see, just accessing a memory during a normal execution can have unpredictable latency. You just never know when the accessed variable is on disk and when it is in the memory. Most of the programs live happily with this uncertainty. They want an abstraction of 4GB of flat memory and do not care what is going on under the hood. Take a web server for example. It has to be as fast as possible, but does the memory paging in and out is really a bottle neck here? Here are some ping results:

> ping www.yahoo.com PING www.yahoo-ht3.akadns.net (87.248.113.14) 56(84) bytes of data.
64 bytes from f1.us.www.vip.ird.yahoo.com (87.248.113.14): icmp_seq=1 ttl=54 time=158 ms 64 bytes from f1.us.www.vip.ird.yahoo.com (87.248.113.14): icmp_seq=2 ttl=54 time=158 ms 64 bytes from f1.us.www.vip.ird.yahoo.com (87.248.113.14): icmp_seq=3 ttl=54 time=168 ms 64 bytes from f1.us.www.vip.ird.yahoo.com (87.248.113.14): icmp_seq=4 ttl=54 time=161 ms

> ping www.google.com
PING www.l.google.com (64.233.183.99) 56(84) bytes of data.
64 bytes from nf-in-f99.google.com (64.233.183.99): icmp_seq=1 ttl=244 time=94.1 ms
64 bytes from nf-in-f99.google.com (64.233.183.99): icmp_seq=2 ttl=244 time=92.6 ms

64 bytes from nf-in-f99.google.com (64.233.183.99): icmp_seq=3 ttl=244 time=97.4 ms


> ping www.cnet.com
PING c18-ssa-xw-lb.cnet.com (216.239.122.220) 56(84) bytes of data. 64 bytes from c18-ssa-xw-lb.cnet.com (216.239.122.220): icmp_seq=1 ttl=244 time=218 ms 64 bytes from c18-ssa-xw-lb.cnet.com (216.239.122.220): icmp_seq=2 ttl=244 time=207 ms 64 bytes from c18-ssa-xw-lb.cnet.com (216.239.122.220): icmp_seq=3 ttl=244 time=206 ms

This is only a ping trip time. Disk access is a matter of a few ms. Do you think that anyone will notice if cnet access is 230ms instead of 208ms? Moreover, the times will go up, as each page access has images, javascript and sometimes, flash to load. All this takes a lot of time, dwarfing paging latencies to a minuscule size.

BTW, did you notice that Google has a constantly lower response times than Yahoo and their page is much "lighter". Yahoo has to remedy this situation if they want to stay in the game.

However, if you are writing an application that has to be blazing fast and that serves machines on LAN, not on the Internet. Don't talk to me about bandwidth. Latency means responsiveness. If your desktop-server application works slow, it does not matter why it happens. Your computer might be obediently trying to compress DVD quality movie (a task you gave it by mistake, but its too stupid to even ask you: "Are you sure you want to compress this movie? It is incompressible you know...") or the server is paging like mad right now. It does not matter why your application isn't responding. It is annoying like hell. LAN client-server applications have to be fast. Mind you, not all of them, but there are some.

Take for example, NAS server. Basically, you ask your NAS server to read or write some data from or to a disk. If your LAN round trip time is less than 1ms, which is usual, then the time it takes for your server to read the data is the latency. Lets say it is about 5ms. Therefore, if in the process of reading your data, server software hits page fault a couple of times and OS reads pages from disk to the memory a couple of times. Then instead of 6ms, your request latency will be around 16ms. This is the good case, where disk queues were empty and OS didn't have to write other processes pages in order to bring NAS server pages.

Once again. Your application latency tripled itself just because two memory pages were on disk and your server software didn't even know about it!!! This problem is not disk specific: go and read Joel Spolsky article about The Law of Leaky Abstractions.

This server software cannot allow itself to use standard abstraction. It has to know exactly when and where its memory is. Now it is time to talk a little bit about malloc, free and how your application (C and C++) gets its memory.

Remember the question: how the hardware "knows" that this address is not in the memory? Well, here is how. OS maps allocated pages using TLB (translation lookaside buffer). Since now all programs use 4GB of memory and if your's does then you should really check it for memory leaks, OS negotiates with programs on amount of memory they are consuming. At every point of a program life cycle, OS knows exactly how much memory the program uses and thus, can map the appropriate amount of pages. These mappings will explain to hardware what pages are present and what pages are missing (on a secondary storage). Thus, hardware will know when to raise "page fault" interrupt. But how does OS negotiates with programs regarding the amount of used memory?

This is where malloc and free come in handy. From OS point of view, each program has its memory allocated from zero up to some limit. Like this:

0--------------------------------------------v
================================================================
where "=" specifies a total memory that can be used by this program, "-" specifies memory currently used by the program (and mapped by operating system), "0" is a beginning of a memory and "v" points to the largest address used currently by the program.
If no "free" was called, i.e. all program memory is used then another malloc will move "v" to the right. If some memory was freed ("_"):

0----____---------------_____------------v
================================================================
Then the next malloc might use freed memory instead of taking new memory from OS. If a new memory has to be taken from operating system a special system call is invoked: sbrk. In human it means system break or move the system break or i need more memory! map more memory for me! NOW!!! move it move it move it move it!!!

So, depending on the current allocation situation, a new call to malloc might be translated into:
  1. Search for a suitable memory chunk (done by malloc rather efficiently). However, sometimes you know that your program always (or almost always) allocates memory in 4KB chunks. For example, because this is the chunk size you chose to move data. In those cases, one can write much more efficient memory allocator than general-case malloc.
  2. Do a sbrk call, which will cause operating system to map more pages. This is problematic, since you have already wasted some time looking for a suitable page. Doing a system call now is gonna add more to malloc latency.
This means that if you must control the latency of your program in general and memory allocations in particular, then you have to:
  1. "Pin" all your memory such that it wont be paged out. This is the most important step.
  2. Pre-allocate all the memory that you will ever need. This step is tricky. Don't allocate too much, as you will bring other programs to their knees, but also don't allocate too little, as it wont be enough.
  3. Write your own, customized memory allocator. Sounds like a lot of work that already has been done by other people. True. It is a lot of work. However, you will be surprised what performance gain you can get from this thing. Something completely unrelated to your program domain. Sadly I cannot put numbers here, since they are confidential, but let me say that they are tens of percents. And the behavior smoother. Everybody loves smooth behavior.
Back to beginning. If I had to give one the sign of scalable, high-performance program what it will be? If the program has its custom-made, crafted memory allocator, then it is scalable, high-performance program. Or at the very least, programmers seriously considered making one. If no thoughts were given to memory allocation, then the program is written "just to work". This is not a bad thing by the way.

Thursday, July 21, 2005

Lesson #2: Know your tools

Before you can drive a car, you have to go through a long and tedious process. This is understood, after all nobody expects you to use stick transmition at the first time. The funny thing, is that car manufactures do not try to make it easier for users. The last time I have checked, ads were talking about faster, beautifull and comfortable cars. I never even heard "our car is easy to use" ad line.
In computers the situation is different. You hear all the time "easy to use and powerful tool" or "just buy our program and all your problems will be solved". I think that we are making a mistake. By "we" I mean software developers. Complex things have inherited complexity in them. They have some complexity limit, if you try to simplify things more than that limit you will get burned.
Microsoft Word is perceived as an extremely easy to use program. Launch it and start typing like a crazy monkey. You want to make a table of contents? No problem it can do it. You want to make your section titles bold? Sure. You want to change a font of all your bold text, because they are your section titles? Oops. They need to be of the same style. Oh, you didn't use nor define one style for titles. You didn't know that you should do that. Well, tough luck. Come again next time.
My point is that even with MS Word there are a lot of things that you should know. They will make your life easier, they will save your time and make your computer time bareable. Sadly, most people are not aware of that, partially, because we, computer nerds, are trying very hard to appear them as very easy to use. Sadly we are doing very good job at it.

The interesting thing I have found out during my work in startup is that the same is true about many programmers. We use CVS as our version control tool. It took about half a year until people have become comfortable with branches. Even now, after about 3 years of intensive CVS work people are making beginners mistakes. No, I do not say that they are not smart enough to master CVS. Most of them are able to implement CVS, but they just cannot get used to it. They are too lazy to invest time into learning CVS, they prefer to save their time now by asking me what should be done. What these people do not understand, is that by not learning CVS they are wasting their time. By not knowing what you can do with the tools, you are stuck with a basic functionality. They are wasting their time on a dumb repetetive tasks. One day I saw a guy doing the same operation: he changed one particular bit of identation. Line after line, with great patience and concentration. Each operation required him to make 5 different operations. I have made him a very simple macro operation replacing 5 by 1. I have saved him few valuable minutes. By stupid macro. I saved days for people by just showing them a flag option in CVS commands. I have made impossible into possible by trivial usage of grep and regular expressions. All this does not make me smarter than the people I helped. But it does make me a more valuable developer, as I can perform more tasks in the same time.
If you value your job and career,learn your tools, they will make you into a better developer.

Wednesday, July 20, 2005

Lesson #1: core people

The best investement that one can ever make in startup is brining the best people on board from the very beginning. The beginning part is very important here. Times when the company is small and nothing is done yet are very critical and shaping. You can hire a bunch of fresh graduates that know nothing about software development later and you will be ok. It might even be necessary later on, as it is hard to get a superstar singer to sing on your bar mizvah. But, hire incompetent, or just not good enough people as the core of your startup and you are doomed.

The core people lay a foundation of the future products. Decisions they make are like a planted seeds. If they are good, you will get a healthy oak. If they are bad, all you will get is a small bush. The beginning phase is very decision intensive: what version control to use, development platform, coding conventions, how formal development process should be, what documents to write and how, product architecture definitions, specs, future growth and vision. While not curved in stone, these decisions are very hard to change later if not impossible.

In addition your core team will set high standards for the other company members. Once you get into company with good programmers, your natural tendency is to get to their level and stay there. Unexperienced programmers learn a lot from core guys. Core people feel themselves as mentors and do they feel good about it. The company starts with a high entry level and continues to climb even higher.
It is always easier to get downwards than upwards.

All in all, core team might continue with the company or not. Either way, their influence on the company spirit and way will last much much longer.
Invest in your core team. Make them the best of the best. You wont be sorry about it later.