Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 01-05-2018, 03:54 AM
ballpoint ballpoint is offline Removing duplicate rows when identical value in a column Windows 10 Removing duplicate rows when identical value in a column Office 2016
Advanced Beginner
Removing duplicate rows when identical value in a column
 
Join Date: Sep 2017
Posts: 42
ballpoint is on a distinguished road
Default Removing duplicate rows when identical value in a column


I am struggling with a rather complex issue, to which I can see no easy solution.

I have a set of data that contains paragraphs in which a certain word is cited and the document containing them. This is accomplished through a script (courtesy of macropod) that extracts paragraphs from documents according to a keyword, in the format:

Code:
Document | KeyWord | Paragraph
The problem I have is that often times words are cited in an ambiguous way and are thus not an accurate representation of what is being cited.The main problems is that sometimes the citation is correct, but sometimes it is not. Accordingly, it is necessary to search for a "lowest common denominator" and a more complex form.

For example, let's assume I am looking for the a court case, let's call it "Case1212". This would generally be cited as "Court of Appeals, Case 1212", but may also be cited as "Case 1212". In this case, however, the citation would also be correct for the district court case. (This is an odd jurisdiction with terrible citation systems, if you need to know!)

Think of the following example leading to identical citing paragraphs.

Code:
Document | Word | Paragraph
Doc 1 | Court of Appeals, Case 1212 | The Court affirms that A > B because Court of Appeals, Case 1212 says that
Doc 1 | Case 1212 | The Court affirms that A > B because Court of Appeals, Case 1212 says that
The way I can see around resolving the problem is to search for two keywords, (eg "Court of Appeals, Case 1212" and "Case 1212"). By comparing the texts in "paragraph" column, the occurrence of the more complex form should reasonably exclude the other. The rest could then be easily parsed manually.

Now, my problem is how to automatically remove the rows containing "Case 1212" in the "Keyword" column if there exist another row containing "Court of Appeals, Case 1212" in the same column when the two rows have have the same value in the "Paragraph" column.

Before you mention it: I thought about showing duplicate paragraphs / values and doing it manually. However, you must understand that it's 9,000 rows we are talking about. If you have any suggestions, I'll be forever grateful!
Reply With Quote
  #2  
Old 01-05-2018, 08:54 AM
Pecoflyer's Avatar
Pecoflyer Pecoflyer is online now Removing duplicate rows when identical value in a column Windows 7 64bit Removing duplicate rows when identical value in a column Office 2010 64bit
Expert
 
Join Date: Nov 2011
Location: Brussels Belgium
Posts: 2,769
Pecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant future
Default

Perhaps post a desensitized sheet showing some data and expected results ?
__________________
Did you know you can thank someone who helped you? Click on the tiny scale in the right upper hand corner of your helper's post
Reply With Quote
Reply

Tags
duplicate values

Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Need to delete duplicate of a column B based on column A and keep ColumnB if unique value to columnA enigmaprince OneNote 0 08-23-2017 01:30 PM
How to find duplicate rows in a huge excel file containing 500000 rows Stc25234 Excel Programming 2 06-23-2017 10:51 AM
Removing duplicate rows when identical value in a column Merge duplicate rows but retain data from one column Willem113 Excel 1 09-21-2016 05:42 PM
Removing duplicate rows when identical value in a column Removing Duplicate Text in a Document rsrasc Word VBA 11 10-26-2014 02:02 PM
Removing duplicate index entries Bengt Word 5 03-01-2013 02:01 AM

Other Forums: Access Forums

All times are GMT -7. The time now is 03:01 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft