Re: Why did the bot stop working today?

1

For not-want of an accent, the unfoggedbot was lost.

That's a pretty grave accent aigu error.


Posted by: arthegall | Link to this comment | 09-18-07 10:57 PM
horizontal rule
2

You could call it an acute condition.


Posted by: ben w-lfs-n | Link to this comment | 09-18-07 11:08 PM
horizontal rule
3

DIE-acritic!


Posted by: Stanley | Link to this comment | 09-18-07 11:09 PM
horizontal rule
4

Text parsing, Ben.

Wave of the future.


Posted by: Sifu Tweety | Link to this comment | 09-18-07 11:12 PM
horizontal rule
5

So the unfoggedbot couldn't be arsed with fancy characters.


Posted by: teofilo | Link to this comment | 09-18-07 11:12 PM
horizontal rule
6

HTML entities were good enough for Jesus and they're good enough for me, Sifu.


Posted by: ben w-lfs-n | Link to this comment | 09-18-07 11:13 PM
horizontal rule
7

6: yeah look what happened to him, though.


Posted by: Sifu Tweety | Link to this comment | 09-18-07 11:14 PM
horizontal rule
8

I'm just suprised that it's ticket and not billet. Do the French favor the former, or do the two have subtley different meanings?


Posted by: Otto von Bisquick | Link to this comment | 09-18-07 11:15 PM
horizontal rule
9

Google I feel lucky result on "Python HTML entity parser."


Posted by: Sifu Tweety | Link to this comment | 09-18-07 11:16 PM
horizontal rule
10

Second result.


Posted by: Sifu Tweety | Link to this comment | 09-18-07 11:17 PM
horizontal rule
11

What you really want is the htmlentitydefs module, if the problem is entities. However, that comment wasn't written with entities. Had it been, there would have been no problems.


Posted by: ben w-lfs-n | Link to this comment | 09-18-07 11:17 PM
horizontal rule
12

Uh, second result.


Posted by: Sifu Tweety | Link to this comment | 09-18-07 11:17 PM
horizontal rule
13

After all, all of the characters &, e, a, c, u, t, e, and ; (pretend those are mentioned and not used) are perfectly good ascii already, Sifu, so why would they cause problems? Do try to keep up.


Posted by: ben w-lfs-n | Link to this comment | 09-18-07 11:18 PM
horizontal rule
14

Welcome to four days ago, Tweety.


Posted by: Josh | Link to this comment | 09-18-07 11:19 PM
horizontal rule
15

14: that was a lovely time.

I still refuse to believe that basic ASCII text parsing is beyond the scope of Ben's prodigious gifts; if it's converting to ASCII then you're going to have discrete strings, and somebody must have run across this.


Posted by: Sifu Tweety | Link to this comment | 09-18-07 11:21 PM
horizontal rule
16

Even if nobody has run across this problem, of course, it would be a simple matter of taking the XHTML entity documentation and parsing that, such that one ended up with a formatted list of HTML entity strings, suitable for dumping into one's bot bot, for parsing.

God knows Ben is way ahead of me on this one.


Posted by: Sifu Tweety | Link to this comment | 09-18-07 11:25 PM
horizontal rule
17

You mean like a codec, sifu?


Posted by: ben w-lfs-n | Link to this comment | 09-18-07 11:25 PM
horizontal rule
18

I suppose I do, sure.


Posted by: Sifu Tweety | Link to this comment | 09-18-07 11:26 PM
horizontal rule
19

Are you proposing codecs for text strings?? Almost as bad an idea as this.


Posted by: arthegall | Link to this comment | 09-18-07 11:27 PM
horizontal rule
20

The problem isn't that I don't know what to do with œ (that is, & o e l i g ;, no spaces); the problem is that it broke on œ (that is, LATIN SMALL LIGATURE OE).

>>> print u'\u0153'
Traceback (most recent call last):
File "", line 1, in ?
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0153' in position 0: ordinal not in range(256)
>>>

It will be interesting to see how this comment gets transmitted, containing the characters it does.


Posted by: ben w-lfs-n | Link to this comment | 09-18-07 11:29 PM
horizontal rule
21

19: never prevent Ben from applying more complexity to a problem. No better way to make the poor fellow lose interest.


Posted by: Sifu Tweety | Link to this comment | 09-18-07 11:30 PM
horizontal rule
22

And here I was thinking that the answer to the titular question was that it had become sentient and gone on strike until Ben started paying it a living wage.


Posted by: washerdreyer | Link to this comment | 09-18-07 11:30 PM
horizontal rule
23

And, in fact, it got busted.


Posted by: ben w-lfs-n | Link to this comment | 09-18-07 11:30 PM
horizontal rule
24

19: no, I'm not.


Posted by: ben w-lfs-n | Link to this comment | 09-18-07 11:32 PM
horizontal rule
25

And, in fact, it got busted.

That is, the bot got busted trying to transmit it. The earlier solution with é worked; œ, no dice. The latter must be more esoteric.


Posted by: ben w-lfs-n | Link to this comment | 09-18-07 11:34 PM
horizontal rule
26

You should probably hire the Pinkertons.


Posted by: washerdreyer | Link to this comment | 09-18-07 11:35 PM
horizontal rule
27

You should probably listen to Pinkerton. That was an underrated album.


Posted by: arthegall | Link to this comment | 09-18-07 11:36 PM
horizontal rule
28

Also, Ben: your 24 says 'no', but your 20's "Latin-1 codec" snippet says 'yes'. Maybe 19 should have been "proposing" s/b "using."


Posted by: arthegall | Link to this comment | 09-18-07 11:38 PM
horizontal rule
29

Pinkerton is possibly the least underrated album ever recorded. I have never seen a single music critic say anything negative about it. I have never heard anyone say it is anything less than 400,000% better than anything Weezer has done since then.


Posted by: Cryptic Ned | Link to this comment | 09-18-07 11:38 PM
horizontal rule
30

The latter must be more esoteric.

...and the former does in fact fall in [0,256).

28: I thought Sifu was proposing something that would decode "é" by, say, "e" (close enough!). I don't really understand the import of 19, I guess.


Posted by: ben w-lfs-n | Link to this comment | 09-18-07 11:42 PM
horizontal rule
31

29: All Rivers come to an end.


Posted by: Stanley | Link to this comment | 09-18-07 11:43 PM
horizontal rule
32

What do you mean, for instance, by "text string"?


Posted by: ben w-lfs-n | Link to this comment | 09-18-07 11:44 PM
horizontal rule
33

8: Ah, answered my own question via Google. In case there are any other non-Francophones out there:

billet = train or plane ticket; ticket = métro or bus ticket
Source.


Posted by: Otto von Bisquick | Link to this comment | 09-18-07 11:47 PM
horizontal rule
34

My east-coast privilege (which has me commenting, instead of sleeping, at 1:50am) allows me to refer to them as "text" strings instead of what I suppose to be the more proper, "character strings."


Posted by: arthegall | Link to this comment | 09-18-07 11:53 PM
horizontal rule
35

Where a character is what, an ascii character, a utf-8 character, a utf-16 character, or what? There is no bare "character". If you mean byte strings, then your 28 is off base. 339 does not fit in a byte. And while 339 is a character according to utf-16, it's not a character according to utf-8, but rather two: 0xC5, and 0x93.


Posted by: ben w-lfs-n | Link to this comment | 09-18-07 11:57 PM
horizontal rule
36

(Anyway it doesn't fit in an 8-bit byte.)


Posted by: ben w-lfs-n | Link to this comment | 09-18-07 11:58 PM
horizontal rule
37

What, exactly, is the difference between ascii and utf-8? Wait, don't answer that.

By "character," Ben, I probably mean something closer to "glyph." It's the thing I see on the screen, which you can encode any which way.

I'm obviously not trying to put down the idea of encoding things as other things, in general. My 19 was meant somewhat as a joke, but mainly as an excuse to give that link to a page about Binary XML (which I once heard someone suggest would lead to a world where we had "codecs for XML," which strikes me as a uniquely awful idea). That is all.


Posted by: arthegall | Link to this comment | 09-19-07 12:06 AM
horizontal rule
38

& now I am curious, for what do you think a codec is appropriate?


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 12:06 AM
horizontal rule
39

38: compressed video formats, of course.


Posted by: arthegall | Link to this comment | 09-19-07 12:07 AM
horizontal rule
40

or audio, or still images ...


Posted by: imposter syndrome | Link to this comment | 09-19-07 12:12 AM
horizontal rule
41

The anagram results for "codec ben w-lfs-n" include:

blown condo feces
snob concede wolf

So there's that.


Posted by: Stanley | Link to this comment | 09-19-07 12:21 AM
horizontal rule
42

simple anagram:

bén w-lfs-n -> né wolfsnob


Posted by: Cryptic Ned | Link to this comment | 09-19-07 12:31 AM
horizontal rule
43

Ah. But I can't properly be said to be proposing latin-1, for example. Such things are already with us.

(35 is wrong on some things, btw.)

By "character," Ben, I probably mean something closer to "glyph." It's the thing I see on the screen, which you can encode any which way.

I don't really see the conceptual difference between taking the bytes in a file and representing them as letters and taking the bytes in a file and representing them as a series of images. I mean, plainly there aren't actually glyphs, or even letters, in the file, any more than there are actually files on a hard drive. All of this is added by the faculty of the understandingprograms that interpret things for us. And if there are multiple ways of interpreting a sequence of bytes as glyphs, if you can mark that into the sequence, then you'll be able to translate them. (Of course, if some are supersets of others, and you deal with them in such a way that you don't just have sequences of bytes but also the information that here is a character, you'll get failures, like the failure the latin-1 codec had above—it only recognizes characters that are one byte long, and it's told that some two-byte sequence is a character. Well, that won't work. But it does explain why encoding the two-byte character as two one-byte utf-8 things allows it at least to be printed, albeit as nonsense: "œ". And that at least allows us to proceed with only a minor stumble.)


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 12:37 AM
horizontal rule
44

I wonder why the nonsense got doubled up when the bot reported 43. Something up in the database too? I wonder, but I don't really care.


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 12:38 AM
horizontal rule
45

Speaking as someone whose pathetic bits of python code are littered with unicode error catching sequences, most containing the word "Fuck" and its derivatives, you have my sympathy. Can't you send stuff as UTF-8 instead, though? That way there might not be a crisis when some smartarse sends a non-latin character ...


Posted by: Nworb Werdna | Link to this comment | 09-19-07 12:58 AM
horizontal rule
46

What I really meant, though, was: "I didn't mean to come off as pretentious, and thank you for hosting/fixing the bot, which is pretty much the greatest thing ever." I shouldn't have pushed on the whole UTF/codec thing, since obviously you're doing us all a favor by spending a long time fixing the damn thing. As a public service, no less.


Posted by: arthegall | Link to this comment | 09-19-07 1:06 AM
horizontal rule
47

I'm so sorry that my one delurk crashed your new toy.


Posted by: Greengage | Link to this comment | 09-19-07 6:04 AM
horizontal rule
48

I'm so sorry bursting with pride that my one delurk crashed your new toy.

Fixed.


Posted by: DS | Link to this comment | 09-19-07 6:18 AM
horizontal rule
49

Frog.


Posted by: John Emerson | Link to this comment | 09-19-07 6:19 AM
horizontal rule
50

Not Frog! Explainer of (not even ambassador for) Frogs. Big difference.


Posted by: Greengage | Link to this comment | 09-19-07 6:52 AM
horizontal rule
51

w-lfs-n, the last time I had to deal with a nightmarish jumble of characters -- processing a large volume of Atom and RSS feeds, many of which misidentified their character sets -- I ended up jamming everything through Tidy first. Python has a libtidy API, right?


Posted by: snarkout | Link to this comment | 09-19-07 7:11 AM
horizontal rule
52

"Explainer of Frogs" would be a bitchin' title to have on a business card.


Posted by: Sifu Tweety | Link to this comment | 09-19-07 7:24 AM
horizontal rule
53

"Frog Whisperer" could've been good, but I think the [x]-whisperer formula is getting a bit played out. Too bad.


Posted by: DS | Link to this comment | 09-19-07 7:26 AM
horizontal rule
54

Greengage has gone native over there and he doesn't even know it.


Posted by: John Emerson | Link to this comment | 09-19-07 7:31 AM
horizontal rule
55

Greengage is a she. And she's just been asked to reorder her business cards, and is now thinking of all sorts of titles she could give herself. Thanks, Sifu.


Posted by: Greengage | Link to this comment | 09-19-07 7:53 AM
horizontal rule
56

You're not the person behind greenshade are you, Greengage?


Posted by: ogged | Link to this comment | 09-19-07 7:57 AM
horizontal rule
57

I'm hoping 'Greengage' comes from Cold Comfort Farm, and you're planning to comment in a vaguely D.H. Lawrence fashion. But simply explaining frogs will be satisfactory.


Posted by: LizardBreath | Link to this comment | 09-19-07 8:10 AM
horizontal rule
58

I have reluctantly concluded that it will be easier to simply invade all of the other countries on earth and force them to use English than it will be get UTF-8 support working properly in all the software that needs it. I'm not sure if this is an argument that Anne Coulter has made, but if not she should consider it.


Posted by: Tom | Link to this comment | 09-19-07 8:22 AM
horizontal rule
59

Greengage is a plum, too.


Posted by: A White Bear | Link to this comment | 09-19-07 8:23 AM
horizontal rule
60

Isn't she, though!


Posted by: Sifu Tweety | Link to this comment | 09-19-07 8:24 AM
horizontal rule
61

Norba: If you check back into this thread, or notice the Tre Kroner signal we've shone on a low cloud bank the last few nights, can you help us find out whatever happened to Gunhild Larking?


Posted by: I don't pay | Link to this comment | 09-19-07 8:31 AM
horizontal rule
62

8 and 33:
Your definition is probably more useful, but I was going to answer that a "billet" is big, with lots of information on it, and may be sort of floppy, while a "ticket" is small, cardboardy, and simple.

---Jackmormon's French-English Phenomenological Dictionary.


Posted by: Jackmormon | Link to this comment | 09-19-07 8:38 AM
horizontal rule
63

Oddly, there's no specific name for lady frogs. They're all just frogs.

A bullfrog is a species. There are lady bullfrogs.


Posted by: John Emerson | Link to this comment | 09-19-07 8:43 AM
horizontal rule
64

63 - There are, and the lady wildcats kicked their asses.


Posted by: snarkout | Link to this comment | 09-19-07 8:51 AM
horizontal rule
65

61: huh? Is that me?
and, in re Ben, the thing you want is, I've just remembered, something like "sillystring.encode('latin-1','xmlcharrefreplace')" which will -- if you really want to use latin-1 -- produce something legible for all the extraneous characters. Now I will go off and search for ways to get MS word smart quotes into the feed and really scramble the bugger.


Posted by: Nworb Werdna | Link to this comment | 09-19-07 9:38 AM
horizontal rule
66

All I can find about the luscious Gunhild is that she came from Jönköping (a town of otherwise remarkable dullness) and was fourth in something or other at the 1956 Melbourne Olympics. High jump, I think -- a score of 1m 67. She hasn't died since 2002: it's hard to get newspaper searches earlier than that.


Posted by: Nworb Werdna | Link to this comment | 09-19-07 9:52 AM
horizontal rule
67

Sorry about the name, I confused Nworb with Wuggie Norple or something. The town is something nobody else had, thanks.


Posted by: I don't pay | Link to this comment | 09-19-07 9:57 AM
horizontal rule
68

Nworb is a genius. Thanks.


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 10:22 AM
horizontal rule
69

OK. Further: a scholarly article from the university of Malmö (http://www.idrottsforum.org/articles/tolvhed/tolvhed.html) claims that she was described by a magazine at the time (Melbourne 1956) as "swinging her well-turned thighs with feminine grace over the bar"and later remembered as "the pin-up girl for the whole olympiad".

Sorry about weird charsÖ swedish keyboard layouts have punctuation where they shouldnät.

Also ++ a pdf report from, I think, the Swedish national sport board on "The sexualisation of public space in sport" (great cover, worth a click) http://www.rf.se/files/%7B7E36FB48-BC66-4966-828E-7A65BF267A27%7D.pdf

This is quite as humourless as anyone could hope. It finds a Swedish magazine that described her as a "blonde bombshell" and another which printed pictures that contrated her favourably with a Russian shot-putter.

She is mentioned in an article on Swedish athletes on page 48 of the 1990 yearbook of her highschool, but while the index is on the wb. the text isn't. and now I had better do some work.


Posted by: Nworb Werdna | Link to this comment | 09-19-07 10:23 AM
horizontal rule
70

oh, and if your owrkplace objects to nude statues with snow on them, don't click on the pdf link above.


Posted by: Nworb Werdna | Link to this comment | 09-19-07 10:25 AM
horizontal rule
71

This is quite as humourless as anyone could hope Wow, that cover! Switching off Tre Kroner and turning on the Ragebunny signal now.


Posted by: I don't pay | Link to this comment | 09-19-07 10:29 AM
horizontal rule
72

(Sending it as utf-8 is one of the things that causes libpurple clients to display a mess of chinese characters. Perhaps Frowner, M/tch and Emerson can get some use from that, I can't.)


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 10:30 AM
horizontal rule
73

72: Racist.


Posted by: M/tch M/lls | Link to this comment | 09-19-07 10:32 AM
horizontal rule
74

Hey, and, IDP -- it's Tre Kronor. I suppose the shock of defeat prevents Canadians from noticing this detail.


Posted by: Nworb Werdna | Link to this comment | 09-19-07 12:18 PM
horizontal rule
75

Yeah, that's it.


Posted by: I don't pay | Link to this comment | 09-19-07 12:19 PM
horizontal rule
76

56, 57, 59, 60: No, I'm not greenshade, but thanks for the link. And I'm not Cold Comfort Farm either, though it was one of my earliest favorite reads, long ago before Unfogged. (I read it when I was too young to know that a brassiere -- not doing that accent again -- was just a bra, and so when the woman who collects them is talking about finding a cool three-paneled one I thought perhaps it might be a kind of dresser.) I am a plum, though, and I ripen at this time of year. You've outed me.


Posted by: Greengage | Link to this comment | 09-19-07 12:24 PM
horizontal rule
77

Switching off Tre Kroner and turning on the Ragebunny signal now.

Wha? Huh? I got nuthin'. Aaargh.
[nods off again.]


Posted by: mcmc | Link to this comment | 09-19-07 12:38 PM
horizontal rule
78

There's a good Swedish restaurant called Tre Kroner in Chicago. It used to have fabulously cute waitresses, but that was ten or twelve years ago.


Posted by: ogged | Link to this comment | 09-19-07 12:50 PM
horizontal rule
79

78: so they're probably still cute, but a little bit too old for you?


Posted by: Sifu Tweety | Link to this comment | 09-19-07 12:52 PM
horizontal rule
80

I never signed on for grappling with text encoding issues, dammit.

The hell you didn't.

Also, this is a good read, though you probably knew all that already.


Posted by: Hamilton-Lovecraft | Link to this comment | 09-19-07 1:39 PM
horizontal rule
81

There's a good Swedish restaurant called Tre Kroner in Chicago. It used to have fabulously cute waitresses, but that was ten or twelve years ago.

My wife's favorite; we had breakfast there, without the kids, this past Sunday. And the Gunhildicity of the staff is still apparent. But it is, really, spelled Tre Kronor as it should be.


Posted by: I don't pay | Link to this comment | 09-19-07 1:49 PM
horizontal rule
82

Swedish restaurant called Tre Kroner

My mom loves this place; I like it too. The cute waitress is Slovak.


Posted by: lw | Link to this comment | 09-19-07 1:59 PM
horizontal rule
83

80: Not knowingly, anyway.

In other news, SUCCESS. One must first decode the deliverances of the database from utf-8, then encode them as utf-16 bigendianly.


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 2:44 PM
horizontal rule
84

OT: I love the terrifying skull with braces that replaced the light-giving dildo amongst the main page icons. Way better than the robot icon I sent you, ogged.


Posted by: Sifu Tweety | Link to this comment | 09-19-07 2:46 PM
horizontal rule
85

Test: œ.


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 2:46 PM
horizontal rule
86

Hm, that didn't work.


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 2:46 PM
horizontal rule
87

Gosh, that was a lot of Chinese.


Posted by: Nathan Williams | Link to this comment | 09-19-07 2:47 PM
horizontal rule
88

You need more bigendæ, ben.


Posted by: Sifu Tweety | Link to this comment | 09-19-07 2:48 PM
horizontal rule
89

I love the terrifying skull with braces

My morning: not wasted! Consider it a placeholder until someone is moved to create a better one.


Posted by: ogged | Link to this comment | 09-19-07 2:50 PM
horizontal rule
90

Fœtus!


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 2:56 PM
horizontal rule
91

Huzzah!


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 2:57 PM
horizontal rule
92

It sure does make me feel dumb that the reason for the last ten minutes of mistakes was that, even though I knew that I had to use utf-16-be, I left out the "-be" part.


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 2:58 PM
horizontal rule
93

You've got fœtus on your breath, Sifu.


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 2:59 PM
horizontal rule
94

I am the Ω of baby-eating.


Posted by: Sifu Tweety | Link to this comment | 09-19-07 3:04 PM
horizontal rule
95

I'm still not going to parse out the html entities, Sifu.


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 3:05 PM
horizontal rule
96

The cute waitress is Slovak.

This is now a universal truth across much of the UK.

As I discovered to my cost, when I made a joke in Czech and discovered the Czech words I was using were 'baby' Czech and really embarrassing for an adult man to be using.


Posted by: nattarGcM ttaM | Link to this comment | 09-19-07 3:06 PM
horizontal rule
97

Ah yes, the bot icon is teh cool. Nice job.


Posted by: bitchphd | Link to this comment | 09-19-07 3:13 PM
horizontal rule
98

The bot icon is cool. Did ogged do that?


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 3:14 PM
horizontal rule
99

He claims he did. Why, do you know different?


Posted by: bitchphd | Link to this comment | 09-19-07 3:22 PM
horizontal rule
100

I do not know anything.


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 3:24 PM
horizontal rule
101

Best comment #100 ever.


Posted by: bitchphd | Link to this comment | 09-19-07 3:30 PM
horizontal rule
102

Pah.


Posted by: ben w-lfs-n | Link to this comment | 09-19-07 3:31 PM
horizontal rule
103

My robot icon was better.

Not better than 100, though. That's genius.


Posted by: Sifu Tweety | Link to this comment | 09-19-07 3:50 PM
horizontal rule
104

85 worked for me if it was that o+e thingy.


Posted by: John Emerson | Link to this comment | 09-19-07 4:55 PM
horizontal rule