To err is human...
... but computers make it worse. We were led to this thought by some recent experiences
which were reinforced by an article by Fukui Susumu in the latest Gekkan Go World
(which, incidentally, also has a nice but poignant feature by Nakayama Noriyuki on the
end of his 25 years of visiting overseas tournaments - his treasure chest is now in
the Nihon Ki-in Hall of Fame!).
What Fukui was concerned with was big mistakes perpetrated in handing down game records.
He mentions first of all, as a matter of consuming shame for all professionals, the
now well-known story that it was an amateur who only recently pointed out that the
traditionally accepted final score in the Ear-Reddening Game between Honinbo Shusaku
and Genan Inseki was wrong. Instead of B+3 it should be B+2.
But there have been other egregious examples. Fukui specialises in collecting old games
and is the editor of the latest and highly recommended complete set of games by
Honinbo Dosaku. He mentions the game below. Dosaku is White against Yasui Chitetsu.
This is the traditional version. When White played 44, everyone thought that, because
it was by Dosaku, it was a deep and mysterious move. In fact it is garbage as a
Black push at A will show. But what Fukui has found is that White 38 is a misprint.
It should be at A. Then 44 is fine, if unspectacular.
Fukui's next example is similar. There were writers who gushed about the inventiveness
of the young Honinbo Dochi playing White 16. In fact, as Fukui discovered, White 16
was actually 116 and 16 should have been at A.
Fukui gives another more complex example in some detail - a game between Genan Inseki
and Hattori Yusetsu - but we have seen enough to confirm our own experience, which is
that errors in game records are depressingly common. It is bad enough when the errors
are perpetuated through the printed medium, as in the above cases, but in the database
age it is cause for great concern.
This is a message that does not get repeated enough, in our opinion, and so we will beg
your indulgence to state it here. It applies not just to databases but to wikis (and of
course to books - but the tempting facility and low cost of digital media make the problem
greater on the internet).
The danger comes when unscrupulous people copy someone else's data
without checking (or even having!) original sources. It is not just a question of
misappropriation of someone else's work - we all try to build on the shoulders of giants -
but of what remains in the permanent record. Our plea is simply for some pretty basic
scholarship. That should be a solemn duty. Of course, if a source can also be checked
against other sources, so much the better.
Although our strictures apply to any kind of data, we might perhaps be considered to
have unrivalled experience in doing the hard graft of producing a
massive set of sgf records, so the problems we have encountered may best illustrate our
concerns and so be of wider interest. Among those we have found are the following:
- Two different printed sources may differ. This is particularly a problem with games
between Japanese and Chinese players. The differences can be quite startling, even
to the extent that we have sometimes opted to enter both versions in our database
(in other cases we add our own commentary to explain the differences). Apart from
differences in moves there is the case of a Shimamura-Chen game where the
Japanese source said Jigo and the Chinese W+R.
- Having two printed sources at once at least makes it possible, though not easy, to spot
differences. But if you acquire a second source a long time after the first one, it
can be tough to spot differences (at least without inputting the whole game again
and running a compare program). This happened to us after our recent trip to the
Far East. One of the goodies acquired was a complete set of the Honinbo tournament games.
We were lucky that T. Mark, his memory seared with the work of inputting the original
version, spotted several discrepancies just by flipping through the pages. That doesn't
always solve the problem, of course. If the discrepancy is that more moves appear,
that's straightforward. If actual moves differ then the hunt is on for a third source!
- A printed source and an internet source may differ. This can happen in various
ways. A rather common one which caused problems a couple of years ago was when games
played on servers played out all the dame and fill-in moves (to facilitate counting,
presumably) when these were not played out in real life. This is less of a
problem now that even pros are expected to play out games to the very end.
- But there is a sort of reverse problem here. Internet versions of games often
show final ko fights over half-point kos played out. In printed versions they are
often omitted and encapsulated as "X connects the ko."
- Transposed moves are another bugbear. Often they don't seem to make much difference to
the game, but we do remember one pro who was furious that he was being portrayed as
making move A before move B - not in our database, thank goodness!
- Results are often wrong. Like the Shusaku example these can be very hard to
pick up. If the game record is complete it is possible, if tedious, to use computers
to check. But "B+2 (moves beyond XXX not known)" is problematical.
- A strange variation of the above which we encountered this week with the Korean Yearbook
was a game marked B+R and the right number of moves for Black to play last in a Korean
internet source, but the yearbook had one move less and W+R. This sort of thing is
surprisingly common in Chinese data. Obviously, having the tournament reports
comes in handy when you have funny results.
A subtle problem with results is when the source is marked "White wins", implying there
were more moves. Some people wrongly change this to W+R, but conversely some people
use W+ for White wins by resignation. That seems a poor choice to us, but it is not the
only thing fuzzy about sgf record-keeping.
- If you had to make a decision between a printed source and an internet source, we
suspect most people would go for the printed version. Certainly our experience suggests
that post-war Japanese printed sources are generally highly accurate - pre-war can be
rather ropey. Korean sources can be nightmarish and Chinese sources (until recently)
not much better. But whatever the reputation of the printed source, one thing it usually
has going for it is that it is an official record (Nihon Ki-in Yearbooks and so).
- What happens when you find a move that is blatantly wrong (like White 16 above)
or when spurious moves are added at the end? In our view this is where the skill of
the editor comes into play. We may not be as good as Fukui, but we know how to edit.
- And what about game data? Your game says Yi Sang-hun - which one? Or you
have one game by Lee Sanghun and one by Yi Sang-hun. Are you aware they are the same.
Is the name even right in the first place? There are quite a few games out there
attributed to Jimmy Cha (Ch'a Min-su) that were actually played by Ch'a Su-kweon.
There are countless more ways in which a game record can go wrong. But, whether the original
source is digital or printed, once a digital version exists, the scope for transmission
of errors is then enormous. We don't profess to have perfect data, but at least we do buy the
printed sources and try very hard to check them and conform to them, and of course we
are always updating. Having a full-time, cross-checking team of two people plus a host of
occasional but eagle-eyed contributors helps quality control enormously. Isn't go
important enough to deserve that?
Sorry for the plug, but we do care. Hopefully Mr Fukui will not find too much fault with us.
It would be a shame if,
sometime in the future someone was misled by faulty records spawned by a wilful
refusal to check the sources.
This is a page from GoGoD's New In Go. If you have come
to this page from an outside link and the index panel is not shown,
click here to view.