Microsoft Office Forums - View Single Post - How can I tag and selectively extract text (multiple files)?

PaulFitz · 10-31-2014, 10:00 AM

I've attached for F&R (Find & Replace) testing purposes the following file:
Transcript_ELA_Meeting 06B_TDT_temp.docx
You will notice small bits of text frequently appearing - enclosed in square brackets* - within a speaker's 'turn'. These are just short particles of discourse uttered by other interlocutors. You'll also notice that pauses and other non-linguistic events are indicated by single or double parentheses. I want to expunge all instances of these –[*], (*), ((*)) - from the text before measuring participants' word counts. What remains should be the SID (Speaker Identifier) and associated text.
*I wonder if these square brackets aren't a factor that prevented your F&R script from working as it should have.
I've also attached a file with a draft version of the table I'll be using to place/compile the SID/ word count data:
ELA_Meeting 01_TDTable.docx
I plan to migrate this data into an Excel worksheet in order to convert the raw word count numbers into percentages of total word count, and, ultimately, to display the data relationships in four different tables/charts:
1) For each meeting, the WC (word count) of each participant expressed as a percentage of the total WC. (as a Table; alternatively, as a Stacked column chart)
2) For each meeting, the ratio LW's WC to that of the combined total of the other 5 team members: MB + FN + LKW + CL + CSB. (as a Stacked column chart)
3) For each meeting, the ratio of MB's WC to that of the combined total of the other 4 team members: FN + LKW + CL + CSB. (also as a Stacked column chart)
4) The trajectory of each of the team member's WC as a percentage across all 15 meetings; i.e. LW, MB, FN, LKW, CL, CSB. (as a Bar graph)
I'm a complete novice with Excel, so there's plenty of learning coming up. (Any tips you can offer on converting the WCs to percentages, etc., - or helpful resources you can point me towards - will be most welcome.)
Finally, Paul, the TS (time stamps) are superfluous for this stage of my work. In my own efforts to convert to files I used F & R to insert tabs (between TS, Speaker ID/initial, & text), then Converted to Table, and deleted the TS column. I pasted the resulting two columns into Excel, sorted A > Z, selected each Speaker’s total text, and pasted into word for a WC. Voilà!
Just for clarification, adding the TS to the transcriptions at an earlier stage was a useful feature of Transcribe Lite - a Chrome extension/application we've found to be a big time-saver. TS make it quick and painless to check specific audio file locations. Also, I had imagined that the TS might provide a means of calculating the TD (talk distribution) based on proportional time (as opposed to WC), once we had loaded the audio and text files into the NVivo application for coding & analysis, etc. Alas, I haven't yet found (or devised) a way to structure that sort of query or functional sorting of the data. I imagine Excel is the right tool to extrapolate the duration of each turn from the TS and then sum that value together with all the other values for a given Speaker ID, and then report/display the result. Ah but how, exactly? That is another question. Therein, I suppose, lies the software developer's art. In any event, WC will still reveal the patterns the PI needs to show.
Once again, Paul, I appreciate your willingness to share your knowledge and problem solving skill (not to mention time) in helping me with these research procedures. The 'models' that you and GMayor have offered are like beacons in the fog, and will guide me in my own skill development in the weeks and months ahead. Thank you.
Regards,
PF