Converting from ISO-8859-1 to UTF-8 in Perl
When posting my observations via email any Swedish characters are converted to quoted-printable ISO-8859-1 by Gmail. However, this blog is in UTF-8. This is how I translated the input from the mail message.
#!/usr/bin/perl -w
use strict;
use MIME::QuotedPrint qw( decode_qp );
use Encode qw( decode encode );
# split the mail message
my ( $headers, $body );
{
local $/ = undef;
( $headers, $body ) = split( "\n\n", <STDIN>, 2 );
}
# decode the qouted-printable input
$body = decode_qp( $body );
# decode to Perl's internal format
$body = decode( 'iso-8859-1', $body );
# encode to UTF-8
$body = encode( 'utf-8', $body );
print $body, "\n";
The result is piped into a second script that formats the actual posting.
Pretty basic, eh? But until you know how, it can be a bit frustrating getting this to work.
Posted at 23:21,
in the comp category. Comments [0]
Submit this story to: » del.icio.us
» digg
» reddit. Search for it on
technorati.
Submit a comment
Please enter comments as plain text only; HTML is not supported. Submitting an URL is optional.
Comments are moderated and may not appear immediately.
Comments are closed for this story.