mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-12-31 17:25:01 +00:00
**Summary** Relax table-segregation rule applied during chunking such that a `Table` and `Text`-subtype elements can be combined into a single chunk when the chunking window allows. **Additional Context** Until now, `Table` elements have always been segregated during chunking, i.e. a chunk that contained a table would never contain any other element. In certain scenarios, especially when a large chunking window of say 2000 characters is used, this behavior can reduce retrieval effectiveness by isolating the table from surrounding context. --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: scanny <scanny@users.noreply.github.com>
18 lines
5.0 KiB
JSON
18 lines
5.0 KiB
JSON
[
|
|
{
|
|
"type": "Table",
|
|
"element_id": "e6278883f688428c98cec628a00b0102",
|
|
"text": "Field Name Size Type Description Example School_Year 9 VARCHAR School year the assessment was given 2019-2020 LEA_Name VARCHAR Official Name of the School System Happy City Schools LEA_Code 3 VARCHAR 3-digit ALSDE-assigned system code 010 or 298 School_Code 6 VARCHAR 4-digit ALSDE-assigned school code 0100 or 9203 Student_Identifier 10 VARCHAR Student's ALSDE ID number -SSID ***must be 10 digits and start with \"19\" or \"20\"*** 9999999999 Student_Last_Name 35 VARCHAR Student's last name Smith Student_First_Name 35 VARCHAR Student's first name Jane Student_Date_of_Birth_Month 2 VARCHAR Student birth date month. MM 05, 11 Student_Date_of_Birth_Day 2 VARCHAR Student birth date day. DD 03, 25 Student_Date_of_Birth_Year 4 VARCHAR Student birth date Year. YYYY 2015 Reading_Teacher_Identifier 13 VARCHAR Reading Teacher's ALSDE ID/TCHNumber. The teacher who is primarily responsible for Reading instruction of the student. (These are two names for the same number). ***must be in this format 3 letters, dash, 4 numbers, dash, 4 numbers*** XXX-9999-9999, NOJ-1234-5678 Reading_Assessment_Name 15 VARCHAR Unique identifier for Reading assessment. Vendor's name for overall assessment. XXXX Reading_Administration_Mode 8 VARCHAR This field indicates if the assessment was administered in an in-person (face-to-face) or a remote learning environment. The options are: InPerson or Remote Reading_Benchmark_Period 3 VARCHAR Benchmark period during the term the assessment was administered. Summer School will be SSS. BOY, MOY or EOY (SSS for summer school) Reading_Date_Completed 10 VARCHAR This is the date on which the assessment is completed MM/DD/YYYY 43962 Reading_Extended_Time 2 VARCHAR The field will contain a \"Y\" if the student was given more than the allotted time to finish the assessment or any subtest of the assessment as defined by the vendor in a standard administration. Y",
|
|
"metadata": {
|
|
"category_depth": 1,
|
|
"page_number": 1,
|
|
"parent_id": "3ddff8c2b6c44a16be24baf72bdd78a2",
|
|
"text_as_html": "<table class=\"Table\" id=\"e6278883f688428c98cec628a00b0102\"> <thead> <tr> <th>Field Name</th><th>Size</th><th>Type</th><th>Description</th><th>Example</th></tr></thead><tbody> <tr> <td>School_Year</td><td>9</td><td>VARCHAR</td><td>School year the assessment was given</td><td>2019-2020</td></tr><tr> <td>LEA_Name</td><td></td><td>VARCHAR</td><td>Official Name of the School System</td><td>Happy City Schools</td></tr><tr> <td>LEA_Code</td><td>3</td><td>VARCHAR</td><td>3-digit ALSDE-assigned system code</td><td>010 or 298</td></tr><tr> <td>School_Code</td><td>6</td><td>VARCHAR</td><td>4-digit ALSDE-assigned school code</td><td>0100 or 9203</td></tr><tr> <td>Student_Identifier</td><td>10</td><td>VARCHAR</td><td>Student's ALSDE ID number -SSID ***must be 10 digits and start with \"19\" or \"20\"***</td><td>9999999999</td></tr><tr> <td>Student_Last_Name</td><td>35</td><td>VARCHAR</td><td>Student's last name</td><td>Smith</td></tr><tr> <td>Student_First_Name</td><td>35</td><td>VARCHAR</td><td>Student's first name</td><td>Jane</td></tr><tr> <td>Student_Date_of_Birth_Month</td><td>2</td><td>VARCHAR</td><td>Student birth date month. MM</td><td>05, 11</td></tr><tr> <td>Student_Date_of_Birth_Day</td><td>2</td><td>VARCHAR</td><td>Student birth date day. DD</td><td>03, 25</td></tr><tr> <td>Student_Date_of_Birth_Year</td><td>4</td><td>VARCHAR</td><td>Student birth date Year. YYYY</td><td>2015</td></tr><tr> <td>Reading_Teacher_Identifier</td><td>13</td><td>VARCHAR</td><td>Reading Teacher's ALSDE ID/TCHNumber. The teacher who is primarily responsible for Reading instruction of the student. (These are two names for the same number). ***must be in this format 3 letters, dash, 4 numbers, dash, 4 numbers***</td><td>XXX-9999-9999, NOJ-1234-5678</td></tr><tr> <td>Reading_Assessment_Name</td><td>15</td><td>VARCHAR</td><td>Unique identifier for Reading assessment. Vendor's name for overall assessment.</td><td>XXXX</td></tr><tr> <td>Reading_Administration_Mode</td><td>8</td><td>VARCHAR</td><td>This field indicates if the assessment was administered in an in-person (face-to-face) or a remote learning environment. The options are:</td><td>InPerson or Remote</td></tr><tr> <td>Reading_Benchmark_Period</td><td>3</td><td>VARCHAR</td><td>Benchmark period during the term the assessment was administered. Summer School will be SSS.</td><td>BOY, MOY or EOY (SSS for summer school)</td></tr><tr> <td>Reading_Date_Completed</td><td>10</td><td>VARCHAR</td><td>This is the date on which the assessment is completed MM/DD/YYYY</td><td>43962</td></tr><tr> <td>Reading_Extended_Time</td><td>2</td><td>VARCHAR</td><td>The field will contain a \"Y\" if the student was given more than the allotted time to finish the assessment or any subtest of the assessment as defined by the vendor in a standard administration.</td><td>Y</td></tr></tbody></table>",
|
|
"languages": [
|
|
"eng"
|
|
],
|
|
"filetype": "text/html"
|
|
}
|
|
}
|
|
]
|