How can I tag and selectively extract text (multiple files)?
Hi,
I have about 50 files (MS Word 2010) comprising transcriptions of interviews and meetings. I want to determine the total number of words spoken by each individual participant in each file.
All the transcription files have been formatted as follows:
Each new paragraph indicates the start of the next 'turn' in the interview/meeting and is marked with a bracketed timestamp - [00:47:15] for example - followed by the speaker’s upper case initials - LW. for example - followed by the text of that speaker’s 'turn'.
Can you suggest how I might best go about grouping and/or extracting the transcribed text of each individual participant in each of the 50 files? Whether the individual's text is converted into a table or compiled into a separate file (or into a specific column of an Excel worksheet) is not important to me, just so long as I can easily count the total number of words spoken by each individual in a given file. Ideally, I would like to do this as an easily repeatable sequence of operations (filling in the appropriate string, of course, to identify each separate speaker).
Thanks in advance for any guidance or suggestions you can offer. I hope I have expressed the question/problem clearly enough. If not, please don’t hesitate to seek clarification.
Best regards,
Paul Fitz
Last edited by macropod; 10-28-2014 at 04:08 PM.
Reason: email address removed for privacy
|