@@ -279,19 +279,31 @@ X<-C>
279
279
280
280
The B<-C> flag controls some of the Perl Unicode features.
281
281
282
+ B<CAUTION:> As with the L<C<:utf8> PerlIO layer|PerlIO/:utf8>, none of
283
+ the features enabled by this flag or the equivalent C<PERL_UNICODE>
284
+ environment variable validate that input is valid UTF-8, nor guarantee
285
+ to produce valid UTF-8. Instead it will assume input is provided in
286
+ Perl's internal upgraded byte encoding, and provide output in this
287
+ encoding, which is a superset of UTF-8 that can encode any character
288
+ allowed in Perl strings. (On EBCDIC systems, it is a superset of
289
+ UTF-EBCDIC instead.) This can result in broken Perl strings or output
290
+ bytes which are not valid in UTF-8. This internal encoding will be
291
+ referred to as C<utf8> below to differentiate it from a strict UTF-8
292
+ encoding format.
293
+
282
294
As of 5.8.1, the B<-C> can be followed either by a number or a list
283
295
of option letters. The letters, their numeric values, and effects
284
296
are as follows; listing the letters is equal to summing the numbers.
285
297
286
- I 1 STDIN is assumed to be in UTF-8
287
- O 2 STDOUT will be in UTF-8
288
- E 4 STDERR will be in UTF-8
298
+ I 1 STDIN is assumed to be in utf8
299
+ O 2 STDOUT will be in utf8
300
+ E 4 STDERR will be in utf8
289
301
S 7 I + O + E
290
- i 8 UTF-8 is the default PerlIO layer for input streams
291
- o 16 UTF-8 is the default PerlIO layer for output streams
302
+ i 8 :utf8 is the default PerlIO layer for input streams
303
+ o 16 :utf8 is the default PerlIO layer for output streams
292
304
D 24 i + o
293
305
A 32 the @ARGV elements are expected to be strings encoded
294
- in UTF-8
306
+ in utf8
295
307
L 64 normally the "IOEioA" are unconditional, the L makes
296
308
them conditional on the locale environment variables
297
309
(the LC_ALL, LC_CTYPE, and LANG, in the order of
@@ -307,22 +319,22 @@ perl.h gives W/128 as PERL_UNICODE_WIDESYSCALLS "/* for Sarathy */"
307
319
perltodo mentions Unicode in %ENV and filenames. I guess that these will be
308
320
options e and f (or F).
309
321
310
- For example, B<-COE> and B<-C6> will both turn on UTF-8 -ness on both
322
+ For example, B<-COE> and B<-C6> will both turn on utf8 -ness on both
311
323
STDOUT and STDERR. Repeating letters is just redundant, not cumulative
312
324
nor toggling.
313
325
314
326
The C<io> options mean that any subsequent open() (or similar I/O
315
327
operations) in main program scope will have the C<:utf8> PerlIO layer
316
- implicitly applied to them, in other words, UTF-8 is expected from any
317
- input stream, and UTF-8 is produced to any output stream. This is just
328
+ implicitly applied to them, in other words, utf8 is expected from any
329
+ input stream, and utf8 is produced to any output stream. This is just
318
330
the default set via L<C<${^OPEN}>|perlvar/${^OPEN}>,
319
331
with explicit layers in open() and with binmode() one can
320
332
manipulate streams as usual. This has no effect on code run in modules.
321
333
322
334
B<-C> on its own (not followed by any number or option list), or the
323
335
empty string C<""> for the L</PERL_UNICODE> environment variable, has the
324
336
same effect as B<-CSDL>. In other words, the standard I/O handles and
325
- the default C<open()> layer are UTF-8 -fied I<but> only if the locale
337
+ the default C<open()> layer are utf8 -fied I<but> only if the locale
326
338
environment variables indicate a UTF-8 locale. This behaviour follows
327
339
the I<implicit> (and problematic) UTF-8 behaviour of Perl 5.8.0.
328
340
(See L<perl581delta/UTF-8 no longer default under UTF-8 locales>.)
0 commit comments