Thursday, November 11, 2010

MP3 ID3v1 tag reading in Perl and in Haskell

I am building an MP3 jukebox for my home...

I know that I am supposed to use ID3v2, but my MP3 collection (CD ripped, Amazon and Emusic) still sports ID3v1 tags, so I thought it would be a safe  bet to just parse it.

I quickly wrote an ID3v1 tag parser in Perl (yes, I know CPAN has several solutions for this but I wanted to write my own just for the fun). Here is what it looks like:

use strict;
use warnings;
use Fcntl qw(:seek);

my @genre = (
'Blues','Classic Rock','Country','Dance',
'Jazz','Metal','New Ag(e','Oldies',
'Alternative','Ska','Death Metal','Pranks',
'Game','Sound Clip','Gospel','Noise',
'Space','Meditative','Instrumental Pop','Instrumental Rock',
'Southern Rock','Cult','Gangsta','Top 40',
'Christian Rap','Pop/Funk','Jungle','Native American',
'Cabaret','New Wave','Psychadelic','Rave',
'Acid Punk','Acid Jazz','Polka','Retro',
'Musical','Rock &','Hard Rock','Folk',
'Folk-Rock','National Folk','Swing','Fast Fusion',
'Bluegrass','Avantgarde','Gothic Rock','Progressive Rock',
'Psychedelic Rock','Symphonic Rock','Slow Rock','Big Band',
'Chorus','Easy Listening','Acoustic','Humour',
'Speech','Chanson','Opera','Chamber Music',
'Symphony','Booty Brass','Primus','Porn Groove',
'Satire','Slow Jam','Club','Tango',
'Samba','Folklore','Ballad','Power Ballad',
'Rhytmic Soul','Freestyle','Duet','Punk Rock',
'Drum Solo','A Capela','Euro-House','Dance Hall' );

my $id3v1;
my $id3v1_tmpl = "A3 A30 A30 A30 A4 A28 C C C";

while (my $filename = ) {
    chomp $filename;
    open my $fh, '<', $filename or next;
    binmode $fh;
    seek $fh, -128, SEEK_END and read $fh, $id3v1, 128;
    close $fh;
    my (undef,$title,$artist,$album,$year,$comment,undef,$trk,$genr) =
    print "$filename|$title|$artist|$album|$year|$trk|".$genre[$genr]."\n";

Basically it takes a stream of MP3 filenames over stdin, opens them and dumps out a pipe delimited summary of what it found. Here is how it is run:

$ find /home/todd/music -name "*.mp3" | perl >mp3_data.txt

Here is a line from the output (mp3_data.txt):
/home/todd/music/Charles Mingus/Ah Um/Charles Mingus_10_Pedal Point Blues.mp3|Pedal Point Blues|Charles Mingus|Ah Um|1959|10|Jazz

I am considering using Haskell for my jukebox, so I was curious what this would look like in Haskell.  Here is my newbie Haskell implementation:

import Text.Printf
import Data.Array
import Char
import System.Environment
import System.IO

-- Create a array of genres
genres = listArray (0, l-1) genres_l 
           genres_l = [
             "Blues", "Classic Rock","Country","Dance",
             "Jazz","Metal","New Age","Oldies",
             "Alternative","Ska","Death Metal","Pranks",
             "Game","Sound Clip","Gospel","Noise",
             "Space","Meditative","Instrumental Pop","Instrumental Rock",
             "Southern Rock","Cult","Gangsta","Top 40",
             "Christian Rap","Pop/Funk","Jungle","Native American",
             "Cabaret","New Wave","Psychadelic","Rave",
             "Acid Punk","Acid Jazz","Polka","Retro",
             "Musical","Rock &","Hard Rock","Folk",
             "Folk-Rock","National Folk","Swing","Fast Fusion",
             "Bluegrass","Avantgarde","Gothic Rock","Progressive Rock",
             "Psychedelic Rock","Symphonic Rock","Slow Rock","Big Band",
             "Chorus","Easy Listening","Acoustic","Humour",
             "Speech","Chanson","Opera","Chamber Music",
             "Symphony","Booty Brass","Primus","Porn Groove",
             "Satire","Slow Jam","Club","Tango",
             "Samba","Folklore","Ballad","Power Ballad",
             "Rhytmic Soul","Freestyle","Duet","Punk Rock",
             "Drum Solo","A Capela","Euro-House","Dance Hall" ]
           l = length genres_l

main = do

  hSetEncoding stdin latin1
  hSetEncoding stdout latin1

  fname <- getContents          -- lazily read list of files from stdin
  mapM print_id3v1 (lines fname)
print_id3v1 fname = do
  print fname
  inh <- openBinaryFile fname ReadMode
  hSeek inh SeekFromEnd (-128)
  dat <- hGetContents inh
  printf "%s|%s|%s|%s|%s|%d|%s\n"
       (extract 3 30 dat)      -- Title
       (extract 33 30 dat)     -- Artist
       (extract 63 30 dat)     -- Album
       (extract 93 4 dat)                              -- Year
       (Char.ord (head (extract 126 1 dat)))           -- Track
       (genres !(Char.ord (head (extract 127 1 dat)))) -- Genre
  hClose inh

-- extract and trim a range of elements from list
extract idx ln s = 
    trim0 (take ln (drop idx s))

-- Trim nulls from list
trim0 s = filter (/= '\0') s

You run it similarly. Frustratingly, it isn't very happy with filenames with non-ASCII characters :-(

$ find /home/todd/music/ -name "*.mp3" -print | ./mp3info >mp3_files.txt 
mp3info: /home/todd/Music/music/Ch�ying Drolma & Steve Tibbetts/Selwa/Ch�ying Drolma & Steve Tibbetts_05_Gayatri.mp3: openBinaryFile: does not exist (No such file or directory)

I am not a Haskell expert, but I didn't expect it to choke there...

EDIT: Fixed the filename problem by adding:

  hSetEncoding stdin latin1
  hSetEncoding stdout latin1

No comments: