Microsoft Office Forums .SRT to text

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 05-24-2019, 03:27 AM
Komma Komma is offline .SRT to text Windows XP .SRT to text Office 2019
Novice
.SRT to text
 
Join Date: May 2019
Posts: 1
Komma is on a distinguished road
Default .SRT to text


Hi,
Could you please help me to find some script or macros for MS Word to remove timestamps from .SRT file (got it from YouTube)? To finally have clean text.
Reply With Quote
  #2  
Old 05-24-2019, 05:16 AM
Moonshine Moonshine is offline .SRT to text Windows 10 .SRT to text Office 2016
Not fond of tags
 
Join Date: Apr 2018
Posts: 211
Moonshine has a spectacular aura aboutMoonshine has a spectacular aura about
Default

Try these online options as an alternative to using Word/Excel:

Extract Text only from subtitle and remove timestamps

https://anatolt.ru/t/del-timestamp-srt.html

Convert Subtitles to plain text

https://subtitletools.com/convert-su...in-text-online
Reply With Quote
  #3  
Old 05-24-2019, 05:22 PM
macropod's Avatar
macropod macropod is offline .SRT to text Windows 7 64bit .SRT to text Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 19,605
macropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to behold
Default

Cross-posted at: https://www.office-forums.com/thread...-text.2350607/
For cross-posting etiquette, please read: http://www.excelguru.ca/content.php?184
__________________
Cheers,
Paul Edstein
[MS MVP - Word]
Reply With Quote
  #4  
Old 07-29-2019, 02:43 AM
AdrianG001 AdrianG001 is offline .SRT to text Windows 10 .SRT to text Office 2016 for Mac
Banned
 
Join Date: Mar 2018
Posts: 40
AdrianG001 is on a distinguished road
Default

Quote:
Originally Posted by Komma View Post
Hi,
Could you please help me to find some script or macros for MS Word to remove timestamps from .SRT file (got it from YouTube)? To finally have clean text.
You can try running this python script to convert .srt to text or try using a third party tool for it if you find this difficult

Code:
"""
Creates readable text file from SRT file.
"""
import re, sys

def is_time_stamp(l):
  if l[:2].isnumeric() and l[2] == ':':
    return True
  return False

def has_letters(line):
  if re.search('[a-zA-Z]', line):
    return True
  return False

def has_no_text(line):
  l = line.strip()
  if not len(l):
    return True
  if l.isnumeric():
    return True
  if is_time_stamp(l):
    return True
  if l[0] == '(' and l[-1] == ')':
    return True
  if not has_letters(line):
    return True
  return False

def is_lowercase_letter_or_comma(letter):
  if letter.isalpha() and letter.lower() == letter:
    return True
  if letter == ',':
    return True
  return False

def clean_up(lines):
  """
  Get rid of all non-text lines and
  try to combine text broken into multiple lines
  """
  new_lines = []
  for line in lines[1:]:
    if has_no_text(line):
      continue
    elif len(new_lines) and is_lowercase_letter_or_comma(line[0]):
      #combine with previous line
      new_lines[-1] = new_lines[-1].strip() + ' ' + line
    else:
      #append line
      new_lines.append(line)
  return new_lines

def main(args):
  """
    args[1]: file name
    args[2]: encoding. Default: utf-8.
      - If you get a lot of [?]s replacing characters,
      - you probably need to change file_encoding to 'cp1252'
  """
  file_name = args[1]
  file_encoding = 'utf-8' if len(args) < 3 else args[2]
  with open(file_name, encoding=file_encoding, errors='replace') as f:
    lines = f.readlines()
    new_lines = clean_up(lines)
  new_file_name = file_name[:-4] + '.txt'
  with open(new_file_name, 'w') as f:
    for line in new_lines:
      f.write(line)

if __name__ == '__main__':
  main(sys.argv)

"""
NOTES
 * Run from command line as
 ** python srt_to_txt.py file_name.srt cp1252
 * Creates file_name.txt with extracted text from file_name.srt 
 * Script assumes that lines beginning with lowercase letters or commas 
 * are part of the previous line and lines beginning with any other character
 * are new lines. This won't always be correct. 
"""

run your code like:
python srt_to_txt.py file_name.srt cp1252
Reply With Quote
Reply

Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
.SRT to text Text Field [content control] - Default text color vs Filled Text color jackcoletti Word 3 02-01-2017 08:10 AM
.SRT to text Drawing lines between text boxes which have actual text within the text boxes RHODIUM Word 6 10-01-2016 04:43 PM
Compare text from Bookmark with text from Text box Byron Polk Word VBA 4 07-30-2014 06:18 AM


All times are GMT -7. The time now is 08:09 PM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2019 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft