Unified Legal Party Based Sentiment Analysis Pipeline

—The rapid growth of text corpora across various domains has emerged a need and an opportunity to leverage Natural Language Processing to automate and efficiently streamline tedious manual tasks. Legal domain is one such text rich domain which suffers a rapid growth of text corpora and requirement for natural language processing applications. In the pursuit of automating the prediction of the winning party of a court case among other usages, analysing sentiment in a party wise manner is beneficial for legal professionals. The two main sub-tasks in this process is to identify parties in a court case and afterwards analysing the respective sentiment towards each party. In this study we discuss the unification of two such models capable of doing the two task into a single pipeline to perform party based sentiment analysis efficiently.


I. INTRODUCTION
Various disciplines and domains have their own inherent reliance on manual text dependant task-flows, that are often tedious, repetitive and exhausting. These types of tasks can be potentially automated by the use of Natural Language Processing (NLP). Two major reasons for these problems to exist are: 1) inherent requirements for explicit human intervention, and 2) the lack of practically applicable techniques. Legal domain, can be considered as such a text rich domain, where often the legal professionals find themselves doing a lot of manual work on a daily basis. By its very nature, the legal domain, relies on and produces, a large number of documents, which require expert knowledge and a considerable time and some of these are highly repetitive but at the same time cognitively demanding. These types of problems cannot be addressed by simple procedural programming, rather they require NLP research and applications. Legal Information Extraction (IE) is a prominent research area today since there is a demand to automate the aforementioned tasks for efficiency and the convenience of legal professionals and even the general public.

A. Legal parties
In a legal case, a party consists of a person, a group of people and organizations [1]. In general, court cases have two main parties. In civil cases the party filing the complain is known as the plaintiff and the other party is known as the defendant. In criminal cases, one party is the prosecutor which is a government entity who files the case and the other party is called the accused or the respondent. There maybe other persons or entities mentioned in a court case, who do not belong to any of these parties, which can be considered third parties, such as witnesses. Also the two parties may be referred to in different non uniform names throughout the legal documents. Thus, identifying the parties and then assessing the impact on the court case with respect to facts and evidence provided by each of the parties, is not a trivial task.

B. Case Law
Case Law can be elaborated as the usage of past court case decisions as grounds to assist the decision of an ongoing court case, rather than using relatively abstract constitutions, statutes and regulations [2]. In a way, it provides the practical examples to the applicability, extents and limitations of applying law on similar kinds of cases. The usage and applicability of Case Law differs based on jurisdictions, similarity or uniqueness of the court cases and other factors. Since they contain the summary of a court case, they can be considered as ideal datasets for applying NLP in legal domain related research and applications.

C. Aspect Based Sentiment Analysis (ABSA)
Sentiment analysis is a prominent technique in NLP where the feeling or opinion of the text content is extracted from textual data [3]. Aspect Based Sentiment Analysis (ABSA) is a specific application of sentiment analysis, where opinion .
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited Sahan Jayasinghe #1 , Lakith Rambukkanage 2 , Ashan Silva 3 , Nisansa de Silva 4 , Amal Shehan Perera 5

Ex.1 : Cao v. Commonwealth of Puerto Rico [22]
Dolores H. Cao, an elderly resident of Cupey, Puerto Rico, was removed from her home, made to undergo a psychological evaluation, and placed in a substitute home and, later, a state institution for the elderly, by the Puerto Rico Family Department (the Department) implied by the text content is evaluated and output with respect to specific aspects [4] present in or addressed by the text. In the legal domain this is very useful, since the context in the legal document is often interpreted with respect to the parties of the court case.
Most of the legal domain related cognitively demanding activities are carried out in a party wise evaluation of natural language statements. For example, Predicting the winning party of a court case is an important use case for legal professionals, as they can use the gained knowledge to drive the court case in their favour by enforcing the important arguments and countering the critical arguments which may be raised by the opposition. Therefore, predicting the winning party of a court case entails the evaluation of arguments in a party-wise manner.
Traditional NLP analysis of the document context as a whole, would not be sufficient since it would be an aggregate representation of the court case rather than an evaluation that takes parties into consideration. For example an argument supporting one party opposes the other party, and this should be captured to in order to evaluate each argument to predict the winning party of a court case. Party based sentiment analysis can be used to approach the problem of predicting the winning party of a court case, since it enables evaluating the context in a party wise approach. In addition, sentiment is a suitable candidate as it is a representation of emotion in the context, which is a valuable feature to be used individually or in combination with other features present in the court case.
Thus far, as stated in the work of Rajapaksha et al. [14], they are the only study to approach the problem of legal party based sentiment analysis. The key bottlenecks we identified in their solution, are two fold: 1) The parties for each sentence have to be given as a manual input 2) Input has to be given as single sentences or small paragraphs. The first makes expertise in legal domain a requirement, which reduces the automation expected. The second, results in less decisive value, in case of nonuniform representation of parties. The main concern is that for each sentence input, the parties have to be given as manual input for the sentiment to be predicted with respect to each party. For identifying parties in a sentence, expertise in legal domain would be required for nontrivial sentences, and more than that, repetitively inputting parties for each sentence is an exhaustive task and requires a lot of time and manual work. The second concern arising from that is that the input has to be given as single sentences or small paragraphs, with nonuniform representation of parties. This has less decisive value and sentiment of the sentences coherent to a court case with a unique representation of respective parties would be more useful for further analysis.
The focus of our study is to address the above two concerns and reduce manual work. For this, we are driven to use automated party identification. However, it is important to note here that the Named Entity of NLP does not directly map to Legal Entity (Or Party) of the legal domain. since this is conducted in the legal domain, the notion of entities and parties is not consistent with each other (i.e. Only a subset of entities present in a document actually belong to a legal party: petitioner, defendant).
Ex.1 taken from United States Supreme Court, elaborates how basic Stanford Core NLP [21] Named Entity Recognition (NER) model tags the entities. But, the location entities Cupey and Puerto Rico are not legal parties. Thus, it was understood that using general purpose NER to identify only the legal entities is not trivial. Therefore, we decided to use the model proposed by de Almeida et al. [20] which is specifically used to identify parties in legal court case documents. By using the output of de Almeida et al. [20] as one of the inputs to Rajapaksha et al. [14], we propose a fully automated pipeline.
Upcoming sections of this paper are organized as follows. Details of previous research works related to our study is discussed in Section II. Methodology and implementation details of the pipeline are elaborated in Section III and the experimental details are listed in Section IV. This journal paper is an extension of our previously published conference paper [23]. We have provided a more detailed discussion of the research work, with different approaches we tried out to come up with the final PBSA pipeline.

A. Party Identification in the Legal Domain
The study by Samarawickrama et al. [18] is fundamentally based on the frequency at which a given entity occurs as the subject of a sentence. In order to obtain this statistic, they have used two sub-models: 1) use Name Entity Recognition and co-reference resolution to identify entities, and 2) calculate a probability output based on subject frequency. A threshold is then used to identify the entities belonging to a party.
In contrast, the study by de Almeida et al. [19] approaches the problem in a different perspective. First an NER Model

October 2022
International Journal on Advances in ICT for Emerging Regions is used for identifying people and organizations, whereupon which a mask is applied on them. Next, co-reference resolution is used to resolve different representations of the same entity. A custom algorithm is used to replace the multitoken representations of entities. Finally, the masked sentences of the NER model is fed to a sequence-to-sequence model. The output binary sequence then discerns whether each word belongs to a party or not.
In their follow up study, de Almeida et al. [20] have introduced a novel method to identify parties more accurately using deep learning. They have trained and evaluated a number of deep learning models experimenting with different architectures of Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) cells for a data set (1000 paragraphs) created using court case documents of US Supreme Court published by Sugathadasa et al. [5]. The process has four steps: 1) Tokenizing, 2) Embedding, 3) Masking, and 4) the Neural Network. NER and co-reference resolution are used to identify entities in the Tokenizing step. For Embedding they have used a Word2Vec [24] model of 300 dimensions trained on Google News data 1 . The Masking step uses the identified entities and brings the dimensions up to 301. Finally, the vector is passed to a Neural Network based on Bidirectional Recurrent Neural Networks (Bi-RNN) which outputs the probability of each token belonging to a Legal party.

B. Party Based Sentiment Analysis
In the study by Rajapaksha et al. [12], the sentiment for each party is calculated using sentence level sentiment. To allocate the sentiment to each party they have used a simple convention. They have used the co-reference resolution of Stanford Core NLP [21], to resolve for pronouns and other references of the same entity. Further, they have used the constituency parser of the same to breakdown the phrases for sentiment annotation.
Mudalige et al. [13] describes a solution for a major drawback in NLP sector related to the legal domain which is the unavailability of a public dataset for Aspect (Party) Based Sentiment Analysis. As discussed in Section I-A, a legal case consists of two or more parties, where each party belongs to one of the two categories: petitioner or defendant. The researchers have worked on developing a dataset with sentiment annotation for each party mentioned in a sentence which supports multiple sub tasks related to ABSA. Further, in a followup study, Rajapaksha et al. [14] have implemented a deep learning model for detecting the sentiments towards each legal entity referenced in a given sentence using the dataset they created [13]. The results achieved have been compared with a number of ABSA models [25][26][27][28][29][30][31][32], all of which has been outperformed by an accuracy of 0.7 and 0.62 F1 score.

III. METHODOLOGY
The objective of this study is to define a pipeline which uses the output of party extraction system of de Almeida et al. [20] 1 https://code.google.com/archive/p/word2vec/ to generate the input for sentence-wise party based sentiment analysis system of Rajapaksha et al. [14]. Party identification model by de Almeida et al. [20] was selected as they have shown above 97% accuracy for all their model variations and 90.89% precision and 91.69% recall for their best performing model, for the test data set. The PBSA model by Rajapaksha et al. [14] was selected as it has outperformed a number of ABSA models at the time of reference [25][26][27][28][29][30][31][32]. On top of their performance factor, these two subsystems were specifically selected to be combined, as both of these models have been trained on United States Supreme court cases.
The purpose of implementing this pipeline is to eliminate need of manual input of legal entities referenced in the sentence to Party-Based Sentiment Analysis model. To achieve this we have come up with a more accurate solution advanced over inferences gained, with respect to the baseline approach of using the two systems in [14,20] as is. The evaluation results through the evolution of our pipeline is presented in section IV.

A. Party Extraction System
de Almeida et al. [20] have used a pre-trained Word2Vec model by Google for 300-dimensional word embeddings and implemented a procedure to add a mask value (as the 301 st dimension) for Person, Organization and Location entities recognized by Stanford's Named Entity Recognition model.

B. Party-based Sentiment Analysis (PBSA) System
The study by Mudalige et al. [13] is based on integrating the ABSA concept into the legal domain. They point out the unavailability of a publicly available dataset for ABSA in legal domain as a main issue. It should be noted that according to Caswell et al. [33], NLP datasets that are publicly available but not tagged by experts may be of insufficient quality. With the help of law experts, Mudalige et al. [13] have created an annotated public dataset for Party-Based Sentiment Analysis which consists of sentences taken from the US Supreme Court document data set released by Sugathadasa et al. [5].
International Journal on Advances in ICT for Emerging Regions October 2022 Fig. 2. PBSA System working on a sentence from Lee v. United States [34] Further, they have included sub-sentences obtained from the above sentences using the Stanford Dependency Parser. These information are given to the system along with the relevant entities and their references for training the deep learning model. For each entity, they have used three labels to represent the sentiment within the context: They have concluded after their experiments that the ensemble model implemented using BERT [35] embeddings, Attention-based LSTM with Aspect Embedding architecture (ATAE-LSTM) [26], and the Graph Convolution Networks (GCN) proved to be the best performing model. In this study, we have used this ensemble model to gain the sentiment outcomes for each sentence of a given text with respect to identified legal entities from the party extraction system. As shown in Fig 2, this model serves as the Party-based Sentiment Analysis (PBSA) System component of the pipeline described in this work. In order to output party-based sentiment for each sentence in a given text, we have defined an abstract view of the pipeline proposed by this study in Fig. 3. The basic functionality of the pipeline is to take a court case document as input, and output the party-based sentiment for all the sentences in the document. First we use the Party Extraction system to identify the party members and pass it on through an adapter as an input to the Party-Based Sentiment Analysis system in the expected format. The output of the Party Extraction system after the intermediate adapter, is the petitioner and defendant entities and their references in each sentence of the given text ordered exactly as they appear. This is subsequently fed to the PBSA system, which outputs the party based sentiment for all the sentences.

D. Baseline Model
As the starting point of implementing the pipeline, we straightforwardly used the Party Extraction system defined by de Almeida et al. [20] which consists of a deep learning model which predicts the petitioner and defendant probabilities of each token and a party extraction algorithm which uses Stanford Co-Reference to provide the names of petitioners and defendants. We used the aforementioned existing party extraction algorithm to obtain the party members and input them to the latter half of the pipeline. However, we observed that, for certain scenarios, the extracted names of organizations or persons involved in a specific party deviated from expected string. This deviation came in the form of a description tailing behind the expected string. An example of this faulty coreference annotation is shown in Fig. 4. In Fig. 4, which uses a sentence obtained from Cao v. Commonwealth of Puerto Rico [22], the pronoun her should be identified as the co-reference of Dolores H. Cao. But instead, the extraction function of Samarawickrama et al. [18] forms

October 2022
International Journal on Advances in ICT for Emerging Regions an erroneous entity where the tailing description of Dolores H. Cao is also included in the entity.

E. nuRef model: Improving Stanford Co-Reference Output
To mitigate the issue of extracting the entities with a tailing description, we implemented an algorithm to update the co-referenced entities to actual entity name described in Algorithm 1.

end Function
The function UpdateCoRef defined in Algorithm 1 takes the list of tokens generated by the Stanford Annotator for the given text and the co-reference annotation as input. Since the goal of this function is to remove descriptive fields from an entity and replace the respective entity field of CoRef object with the actual name of the entity, the execution goes through each co-reference entity in the annotation and incorporates Stanford NER annotation to extract out the actual words representing the entity.
Respective NER values for each token exist as a field named NER in the tokens list and to combine the consecutive words with same NER value, we have defined a separate function FindSameNERWords in Algorithm 1. We need to pass the NER value which we look for in consecutive words and the token list starting from the current position of the token in CoRef entity (which is a sub-list of tokens). The function FindSameNERWords is called whenever a token is deemed to be belonging to one of the three categories: Person, Organization, Location. This function finds the consecutive words with same NER value as the current token.  Fig. 4 through Algorithm 1.Since it finds a token which belongs to Person category first, then it searches for consecutive words with the Person NER value. As soon as the algorithm encounters a token with a different NER value, it returns the words found so far with the first NER value (in this case, Person) immediately and forgoes further scanning through the remaining tokens of the Co-Reference entity. Then the relevant field of the CoRef object is updated by combining the same NER words which were returned. In the example given in Fig.  5, it is Dolores H. Cao.
We define the nuRef model with the GRU of de Almeida et al. [20] which was trained using masked vectors generated using Stanford original Co-Reference annotation and the updated updated co-reference. When the nuRef was tested, it was observed that even though the entity strings which are output have improved in quality, the disparity between the pre-trained GRU and the new co-reference has degraded the accuracy which is based on alignment compared to the Baseline model discussed in Section III-D. Therefore, it was decided that the GRU model should be re-trained using the masked vectors generated using the updated Co-Reference approach.

F. nuRefGRU model: Training the GRU using the Updated Co-Reference
We followed the same approach proposed by Samarawickrama et al. [18]  model was used to vectorize the tokens and the mask value was added to vectors making them 301-dimensional. We used the same data set released by de Almeida et al. [20] which consists of 1000 court case paragraphs where each token is labeled as petitioner or defendant. We define this model with the GRU retrained with updated co-refernce as nuRefGRU model. It was observed that the nuRefGRU model out-performers not only the nuRef model but also the Baseline model. Therefore, with the nuRef-GRU model we proceeded towards the implementation of the adapter for generating the input for Party-Based Sentiment Analysis system [14].

G. Pipeline Implementation 1
The inputs for the adapter set between the party extraction system and PBSA system are: list of tokens, updated coreferences, and the outputs of the party extraction system. The latter consists of the list of petitioner entities and the list of defendant entities. The output of the adapter is the sentencewise references of petitioners and defendants which can be used directly as input for PBSA System defined by Rajapaksha et al. [14].
In order to provide the petitioner and defendant entities in the same order they are mentioned in each sentence for Partybased Sentiment Analysis System implemented by Rajapaksha et al. [14], Algorithm 2 populates an object which stores sentence-wise petitioner and defendant entities (also their references like pronouns and surnames for Person entities, abbreviations for Organization entities) as key, value pairs. Key is the token index in the respective sentence and value is the entity itself or the reference words.
At the last section of Algorithm 2, since the party members which are referenced only once in the text are not in Co-Reference annotation, they are searched through the token list and added to the object LegalEntityPositions. We can extract the sentences as a list using Stanford Tokenization as well.
At this point, all the data needed as input for the Partybased Sentiment Analysis system are ready. Subsequently we have to iterate through sentences of the text passing the text and respective mentions of petitioner and defendant entities for PBSA system. Workflow of the combined systems with the intermediate adapter is elaborated by Fig. 6.

H. Drawbacks of Implementation 1
Implementation 1 (Section III-G) depends on both Stanford Annotation accuracy and the deep learning model for party probability prediction accuracy. The final algorithm of Party Extraction System defined by Samarawickrama et al. [18] for extracting the entities belonging to petitioner and defendant parties depends on the probabilities (two values for petitioner and defendant) predicted by deep learning model for each token and the detection of Person, Organization and Location entities and their references by the Stanford Annotator.
We analyzed that even though the GRU model provides high probability for an entity in the text, sometimes Stanford's Annotator failed to recognize this as a Person, Organization or Location entity. Due to this conundrum, the party extraction system fails to return the entities recognized only by the GRU model. Also, there were court cases where the petitioner or defendant party is referred more generally without specifying names of the people or organizations belonging to that party (Ex.2).

October 2022
International Journal on Advances in ICT for Emerging Regions When considering similar cases as Ex.2, the GRU model has been able to predict high confidence for the word "Plaintiff" as an entity of the petitioner party. But the Stanford Annotator does not recognize those as entities. On the other hand, in situations where the GRU model has not identified a reference of an entity as a petitioner or defendant, the Stanford Annotator could recover those references and include in the output of the adapter connecting to PBSA system.
Since there's both positive and negative aspects of the Implementation 1 (Section III-G), we researched for a different approach to overcome the existing issues.

I. Pipeline Implementation 2
In this implementation, the inputs for the adapter are the petitioner and defendant probability arrays predicted by GRU model for each token of given text input. Using these probability arrays and token list of the court case text, we defined the adapter to generate the input for Party-based Sentiment Analysis system defined by Rajapaksha et al. [14].

end Function
Algorithm 3 defines the process to create a 3-dimensional list of party members and their references where the 1 st dimension represents the sentence index, 2 nd represents whether petitioner or defendant and the 3 rd stores the names of the entities or their references in the same order they appear in the respective sentence. This algorithm takes the list of tokens as a 1-dimensional array, and to identify the token that starts International Journal on Advances in ICT for Emerging Regions October 2022 a new sentence, we need to provide the indices of sentence starting tokens in the tokens list. This array can be generated from Stanford Annotation. We have incorporated a threshold parameter to classify the tokens as either petitioner party or defendant party. When the token has the petitioner probability value greater than or equal to the threshold, and if the defendant probability is less than the threshold, the algorithm extracts that token as a petitioner. The inverse of the said condition extracts the token as a defendant. Other combinations have been neglected for party extraction.
Since the deep learning model is trained for probability prediction by labeling the petitioner and defendant entities and their references as well, probability arrays that we provide for Algorithm 3 has higher petitioner or defendant probability for tokens representing entity references also. Accuracy metrics for GRU model is presented in the Experiments and Results section. The workflow for the combined systems along with the intermediate adapter we defined in this approach can be elaborated as Fig. 7.
With this implementation of the pipeline, the main issue we faced in the Implementation 1 (Section III-G) is mitigated, since the identification of petitioner and defendant entities along with their references solely depends on the Deep Learning model's output.
There is no impact from the Stanford NER and Co-Reference accuracy for the pipeline in this approach. But, when a reference of a party member is not predicted with a higher probability by the Deep Learning model, there is no way to recover those words to include in the input of PBSA system. That is because there is no co-reference identification unlike in the Implementation 1 (Section III-G).

IV. EXPERIMENTS AND RESULTS
In order to ensure accurate inputs are fed into the Party-Based Sentiment Analysis system implemented by the researchers Rajapaksha et al. [14], we need to evaluate the output of Deep Learning model which predicts the petitioner and defendant probabilities for each token in text. Initially, we used the Bi-directional Recurrent Neural Network model which consists of Gated Recurrent Units with 512 output units, trained and evaluated as the best performing model by researchers Samarawickrama et al. [18] for the dataset of 1000 US supreme court case paragraphs. We compared the accuracy, precision, recall and f1 metrics for the last  The unified party identification and party based sentiment analysis pipeline reduces a lot of manual work that needs to be done in contrast to using these models separately. Since the combined pipeline outputs the sentiment for each party per sentence for a court case, it can be effectively used to create a sentiment annotated dataset from a large amount of case law document data. The final output of the unified pipeline itself may not represent an end use case, but this dataset will serve as a starting point for further analysis of legal documents using party wise annotated sentiment.
Predicting the winning party of a court case is one such extended usage of the pipeline as it requires the party-wise evaluation of arguments. The derived pipeline can be used to generate a party wise sentiment annotated court case dataset to predict the winning party of a court case which will be an important insight for legal professionals. Likewise there are other use cases for which researchers can fit the pipeline to a bigger architecture according to their task.
While this pipeline reduces manual work, it is important to note that there's room for improvement in both models individually. Identifying such optimizations and fine tuning the models will increase the accuracy of the pipeline overall. Therefore, future work for this pipeline is majorly two fold. Improving the individual model performances can be stated as one important future work. This also includes creating an annotated dataset for evaluation, where the party and the respective sentiments for each party are annotated per each sentence of a given paragraph/document. On the other hand using the pipeline in a bigger architecture for research or practical use case can be stated as another valuable future work.