Bash script for encoding all files from a directory to different charset on a Linux box


This is a simple bash script which uses iconv binary to convert all the files in a given directory. Most common use - converting some website from local encoding to utf-8, for example windows-1251 to utf-8. It saves you the time for typing the same command for every single file.

Installation

Usage

/path/to/dir_iconv.sh ~/txt1 cp1251 utf8 - converts all files from directory ~/txt1 от cp1251 (windows-1251) to utf8.

To make it easier put the file in /bin, /usr/bin or any other directory listed in $PATH variable. The above example can be executed like this:

dir_iconv.sh ~/txt1 cp1251 utf8

Original files are kept with .old extension.

The code

Here's the code:

#!/bin/bash

ICONVBIN='/usr/bin/iconv' # path to iconv binary

if [ $# -lt 3 ]
then
    echo "$0 dir from_charset to_charset"
    exit
fi

for f in $1/*
do
    if test -f $f
    then
        echo -e "\nConverting $f"
        /bin/mv $f $f.old
        $ICONVBIN -f $2 -t $3 $f.old > $f
    else
        echo -e "\nSkipping $f - not a regular file";
    fi
done

 

Comments:

wojmichal@...pl (08-01-2008 15:30) :
You can add extra line after
$ICONVBIN -f $2 -t $3 $f.old > $f
to cause removal of temporary files
rm -f $f.old

Agustincl (14-04-2009 17:55) :
Perfect code.
Is possible to makeit recursive?

Acl.

Jakub Rulec (10-01-2011 21:11) :
Hi guys,
try this ... it´s tested and workin:

First step (ICONV):
find /var/www/ -name \*.php -type f | (while read file; do iconv -f ISO-8859-2 -t UTF-8 "$file" > "${file%.php}.phpnew"; done)

Second step (REWRITE - MV):
find /var/www/ -name "*.phpnew" -type f | (while read file; do mv $file `echo $file | sed 's/\(.*\.\)phpnew/\1php/'` ; done)

It´s just conclusion on my research :)

Hope it helps
Jakub Rulec

Back to articles list

This page was last modified on 2024-03-28 20:44:54