Re: [alsa-devel] HG -> GIT migration

21 May 2008

      At Wed, 21 May 2008 10:43:43 -0700 (PDT),
Linus Torvalds wrote:
...
On Wed, 21 May 2008, Takashi Iwai wrote:
...
Well, what I meant is about the fixes to the subsystem (say, ALSA) by
people in the outside.  Not every ALSA-bugfix patch goes into the
upstream from ALSA tree.  You, Andrew and others pick individually
ALSA-fix patches.  They will be missing in the ALSA subsystem tree.
Well, that's actually fairly rare, but when it happens, either:

if you didn't get the fix (ie you're are just seeing random patches go 
in that happen to touch alsa), why should you then merge the WHOLE TREE 
with all my experimental stuff anyway? You can largely ignore it, 
knowing it's fixed, and when you ask me to pull, we'll have a good 
end result.

if you got the same fix as a patch, just apply it to your tree (ie just 
ignore what happens upstream). This happens all the time - people 
duplicate patches simply because two people apply it.

But the real issue is here is that my tree sometimes gets ten THOUSAND 
commits during the merge window. Do you really want to pull those 
thousands of commits into your tree just for one or two possible ALSA 
fixes?
Indeed, that's the whole question.  My statement follows in below.
...
In _my_ tree, at least the people involved with asking me to pull end up 
also having (a) people test it and (b) aware that it's in my tree, so they 
work on trying to fix it. But if ALSA just merges at random times, neither 
of those two cases are true. Nobody will know about or test some random 
state that ALSA merged into its own tree.
Ask yourself (and ignore the ALSA parts - think of some totally 
*different* development area) which you think is better

developing in one area based on a stable base, with the people who do 
development in that area knowing about that area.

or develop on top of a churning sea of thousands of changes to other
sub-areas that you don't know anything about?

In other words, the reason I ask people to not do lots of merges is more 
than just "it looks confusing". It's literally a matter of "it's bad 
development practice". It causes problems. The confusing history is 
actually *real* - it's not just a "visual artifact" of looking at the 
result in gitk. The confusing history is a real phenomenon, and implies 
that people are doing development not based on some tested base.
Yeah, I've been always amazed by gitk graphs :)
...
...
And, what if that you need a fix for the fix that isn't in ALSA
tree...?  IMO, either a rebase or a merge is better than
cherry-picks.
First off, I don't see why you even need cherry-picks in the first place. 
I think your argument is bogus, and you're making it because you want to 
get the end result, not because the argument is valid on its own.
Here, let's see what I committed to the sound subsystem since 2.6.24 
(ignoring merges):
git log --no-merges v2.6.24.. --committer=torvalds sound/
and look over that list. Remember: this is not some short timeframe, this 
is over TWO whole merge windows, ie this is way more commits than we would 
normally _ever_ get out of sync over.
Realistically, which of those commits aren't (a) either already from you 
sent to me just as a way to get a quick fix into my tree without merging 
the whole thing or (b) stuff that can't just be in my tree and doesn't 
have to be in the ALSA tree until the next release?
Honestly, now: does *any* of those commits look like "we should merge all 
the other changes just because we need that commit _now_ in ALSA"?
I really doubt it.
Don't get me wrong: I haven't suggested frequent rebases at all.
This thread began actually because an update of the present alsa.git
tree is required for applying my patches properly.
[BACKGROUND: We are trying to make alsa.git tree with multiple
committers.  And, the current git-rebase doesn't care about sign-offs
when a patch was committed by others.  But, this is another topic...]
However, I have to point that backport or backmerge is a rare case but
does happen certainly.
For example, assume that we now need to change the codes that touch
the device creation.  Now on the current your tree, the driver core
changed the API.  So, we need that change as well.  However, picking
this particular change might not be enough if it's a part of a long
series of patches.
BTW, about the stability: we have an independent ALSA tree containing
only the subset of the kernel tree (the sound part).  On this, we
apply patches continuously without rebase or merge.  People except for
the development kernel testers usually use this tree.
...
So I'd seriously suggest submaintainers merge *AT*MOST* once a week, and 
preferably much much less often than that. There simply isn't any real 
reason to do it more often. Because it can cause problems.
That's why my suggested rule is:

merge with mainline at major releases
This is "safe". Yes, releases still have bugs, but on the other hand, 
they have much fewer problems than random git trees of the day, so they 
are a lot safer targets to merge.

merge with mainline if you know there are real conflicts that need to 
be resolved.
This isn't "safe", but it's about trying to resolve conflicts early, so 
at some point the downside of merging with a "random point" is smaller 
than the downside of delaying the merge!

but perhaps the most important rule is that things should never be 
*really* black-and-white, and in the end the really fundamental rule 
should be:

Use your own judicious good sense, and merge at other points as 
necessary, but just keep in mind that a merge is a big change.

Yes, merging with git may be technically really really trivial and take 
all of two seconds of your time, but:
(a) you *do* potentially get thousands of new commits that aren't 
     actually related to your work and that you probably don't know 
     well.
 (b) others, when they look at your history, will have a harder time 
     following it.
so while I can give you a few guidelines, in the end those guidelines are 
just _examples_ of when merges can make sense. You need to understand what 
the impact of a merge is - and that while git makes merging technically 
pretty damn trivial most of the time, a merge should still be a big deal, 
and something you think about.
So the kinds of merges I *really* dislike are the ones that are basically 
"let's do a regular merge every day to keep up-to-date". That's fine if 
you don't do any development at all and "git pull" is just basically a 
"track the current development kernel for testing", but if it involves a 
merge, it means that there is something wrong in your development model.
Oh, this is really helpful.  Maybe it should be documented somewhere
as a reference...
...
...
But, my question is about the divergence between the development and
for-linus branches: how to apply patches that exist only in for-linus
tree back.
How often does it happen? And how big/important are those? I really think 
it's probably a "maybe once or twice a release cycle".
And then, the actual answer can be different depending on the details. For 
example, there are really three things you can do:

ignore it. Is it a cleanup patch (like the sparse patches) or just 
fairly trivial stuff that doesn't matter in real life ("remove 
duplicated unlikely()" patch or the /proc fixups)
This is often the right thing to do. You _will_ merge eventually 
anyway, we know that. I'd expect merges to happen at least once in the 
development cycle, maybe twice.
Yes, the patch may touch the sound system, but do you really _care_ 
about it happening rigth now, or can you just wait until the next merge 
you do?

Well, there is another case to think.  For example, core API changes
or changes of header files.  These happen pretty often, at each kernel
release, practically :)  And, the code I'm working on is for the next
kernel release.  So, it should follow these changes, too.  That is, I
need the top-most development tree.  This is another "divergence".
Or, I could postpone the changes touching these until the next kernel
release -- then the tree gets merged anyhow and patches can be applied
safely.  But, of course, it means the fix or improvement will be
delayed for one kernel release cycle.
...

cherry-pick it. Is it a small, simple patch that you want, but that 
isn't really worth pulling in all the other stuff that you simply don't 
know?
This isn't wrong. It shouldn't be *common*, but it's not wrong to have 
the same patch in two different branches. It makes sense if it is 
something you really want, but it's still not important or complex 
enough to actually mege everything else!

Hm, that's what I didn't consider seriously.  I thought cherry-picking
patches may cause merge errors easily.
...

and finally: merge. It really can be the RightThing(tm). Is it a 
biggish infrastructure change? Is it a series of several related and 
dependent commits?
In other words: is it something big enough that you'd rather merge 
everything else too (which at least has gotten tested together)? If so, 
merging is absolutely the right thing to do!

So merging on its own is not "wrong or evil" at all. Merging is a very 
good operation to do, but *mindless* merging is bad. That's really all 
that I'm really trying to argue against.
If you thought it through, and decided that yes, you really want to merge, 
then you should merge. I just think a lot of people merge without even 
thinking about all the other things it involves, just because git made it 
*so* easy to do.
Yeah, that's exactly what I feel now, too.  There is no crystal clear
guideline.  But, the common sense tells best...
Thanks,
Takashi