Thread: .SRT to text
View Single Post
Old 07-29-2019, 02:43 AM
AdrianG001 AdrianG001 is offline Windows 10 Office 2016 for Mac
Join Date: Mar 2018
Posts: 40
AdrianG001 is on a distinguished road

Originally Posted by Komma View Post
Could you please help me to find some script or macros for MS Word to remove timestamps from .SRT file (got it from YouTube)? To finally have clean text.
You can try running this python script to convert .srt to text or try using a third party tool for it if you find this difficult

Creates readable text file from SRT file.
import re, sys

def is_time_stamp(l):
  if l[:2].isnumeric() and l[2] == ':':
    return True
  return False

def has_letters(line):
  if'[a-zA-Z]', line):
    return True
  return False

def has_no_text(line):
  l = line.strip()
  if not len(l):
    return True
  if l.isnumeric():
    return True
  if is_time_stamp(l):
    return True
  if l[0] == '(' and l[-1] == ')':
    return True
  if not has_letters(line):
    return True
  return False

def is_lowercase_letter_or_comma(letter):
  if letter.isalpha() and letter.lower() == letter:
    return True
  if letter == ',':
    return True
  return False

def clean_up(lines):
  Get rid of all non-text lines and
  try to combine text broken into multiple lines
  new_lines = []
  for line in lines[1:]:
    if has_no_text(line):
    elif len(new_lines) and is_lowercase_letter_or_comma(line[0]):
      #combine with previous line
      new_lines[-1] = new_lines[-1].strip() + ' ' + line
      #append line
  return new_lines

def main(args):
    args[1]: file name
    args[2]: encoding. Default: utf-8.
      - If you get a lot of [?]s replacing characters,
      - you probably need to change file_encoding to 'cp1252'
  file_name = args[1]
  file_encoding = 'utf-8' if len(args) < 3 else args[2]
  with open(file_name, encoding=file_encoding, errors='replace') as f:
    lines = f.readlines()
    new_lines = clean_up(lines)
  new_file_name = file_name[:-4] + '.txt'
  with open(new_file_name, 'w') as f:
    for line in new_lines:

if __name__ == '__main__':

 * Run from command line as
 ** python cp1252
 * Creates file_name.txt with extracted text from 
 * Script assumes that lines beginning with lowercase letters or commas 
 * are part of the previous line and lines beginning with any other character
 * are new lines. This won't always be correct. 

run your code like:
python cp1252
Reply With Quote