Musings of a Fondue

Subtitles Using JavaScript - Part 3

Using Excel to convert subtitle files into arrays is too laborious to be practical.

I came across regex while working through Michael Hartl’s Ruby on Rails tutorial. He covers regex in Lesson 6 when talking about password validations.

With generous use of Stack Overflow and Rubular, I managed to put a ruby script together. It takes a subtitle file as an input, and returns a JavaScript array as an output.

Overall using the script requires much less effort than Excel. Only two manual steps of search and replace are needed to finish up the array.

Here it is in action,


Workflow

I used Rubular to hash out the regex. For example,

find start of a group,
image

find line end,
image

find times,
image

Here’s the final code,


#!/usr/bin/env ruby

# **Output result as a js file**
    #http://stackoverflow.com/a/8720196 => output result as text file
    $stdout = File.new('output.js', 'w')

# **Create Array**
    # http://stackoverflow.com/a/13556720 => multiple gsub
    regexRef = {
                /\A/      =>  'var myArray = [[' , #start
                /^\s/     =>  '],['              , #inner. 
                /\z/      =>  ']]'               , #end.   
        
                /\r/      =>  ','    , #split inner elements
                /\n/      =>  ''     , #remove \n
                /,\]/     =>  ']'    , #replace instances of ',]'
                /\s-->\s/ =>  '-,-'  , #replace instances of ' --> '

                /"/       =>  "'"    , #replace double quotes in script with single quotes. Prevents breaking.
                                       #    Ex. when a character in the script is quoting something long, and the quote is
                                       #    spread over more than one line...leads to an element in the array
                                       #    that lacks closing quotes (""") thus breaking array
               }
    string = File.read('Skyfall2.srt')
    regexRef.each_pair {|f,t| string = string.gsub!(f, t)}
    #p string

# **Convert time to seconds**
    # find index of all occurences of "-,-" (returns an array) 
    mid = (0..string.length - 1).find_all do |i| string[i,3] == "-,-" end 

    # use mid to identify and format times
    mid.each do |idx|

        sTime = ( (string[(idx - 12), 2].to_i * 3600) +   # hours
                  (string[(idx -  9), 2].to_i * 60  ) +   # minutes
                  (string[(idx -  6), 2].to_i       ) +   # seconds
                  (string[(idx -  3), 3].to_f / 1000) )   # milliseconds

        eTime = ( (string[(idx +  3), 2].to_i * 3600) +   # hours
                  (string[(idx +  6), 2].to_i * 60  ) +   # minutes
                  (string[(idx +  9), 2].to_i       ) +   # seconds
                  (string[(idx + 12), 3].to_f / 1000) )   # milliseconds

        string.sub!(string[(idx-12),12], sTime.to_s+("~"*(12-sTime.to_s.length)) )
        string.sub!(string[(idx+ 3),12], eTime.to_s+("~"*(12-eTime.to_s.length)) )
        # substitute original time format (12 characters), with new time format (seconds)
        # ~ is used to preserve 12 characters during loop. Rationale:
            # The mid array is calculated once before loop.
            # In first loop-through original time is replaced with new time format, which is shorter in character length
            # Without ~ to preserve, then subsequent time will be incorrectly placed, as insertion point is based
            # on original string length, not new shorter one
    end
    #p string

# **Clean up array**
    regexRef = {
                #cleanup time
                /-,-/   =>   ','  , 
                /~/     =>   ''   , 
                #encapsulate each element of array with double quotes
                #   Could be done above for more succinct code. Separation
                #   in case a better method exists for encapsulating
                #   the lines of dialogue in quotes.
                #JScript will convert time from string to number
                /,/       =>  '","' , 
                /\]","\[/ =>  '],[' ,
                /\[/      =>  '["'  , 
                /\]/      =>  '"]'  ,
                /\["\[/   =>  '[['  ,
                /\]"\]/   =>  ']]'  ,
               }
    regexRef.each_pair {|f,t| string = string.gsub!(f, t)}
    p string

Update (April 2015):

The regex can be re-written such that no manual steps are required…

Comments