Files
BasesCGL/CLAUDE.md
T
yann64 f75cbebb44 Initial commit: GEDCOM export scripts and generated filiations
Includes export_lignees_to_gedcom.py (Drupal book → GEDCOM 5.5.1),
export_users_to_webtrees.py, generated GEDCOM files for 16 family
lineages, and webtrees user import SQL. Excludes basesgen.sql (966 MB)
and webtrees_temp_passwords.csv (sensitive).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 13:44:28 +02:00

84 lines
4.3 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This repository contains `basesgen.sql`, a ~966 MB phpMyAdmin dump of the `basesgen` MySQL database for the **CGL (Centres de Généalogie du Languedoc)** — a genealogical research organization covering the Languedoc region of southern France. The dump was generated from MySQL 8.0 via phpMyAdmin 5.2.2 and contains marriage records from French civil registry archives.
There is no application source code in this repository; the deliverable is the SQL database itself.
## Working with the SQL File
**Restore the database:**
```bash
mysql -u <user> -p basesgen < basesgen.sql
```
**Search schema definitions without importing (file is 966 MB — avoid full loads):**
```bash
grep -A 30 "CREATE TABLE \`<table_name>\`" basesgen.sql
```
**Count records in a table (approximate, from INSERT lines):**
```bash
grep -c "^INSERT INTO \`mariage\`" basesgen.sql
```
## Database Schema
### Core Genealogical Tables
**`mariage`** — The central table. Contains individual marriage act records from civil registry (état civil). Key columns:
- `CLE_MARIAGE` — primary key (integer, sequential across batch imports)
- `JOUR_MARIAGE`, `MOIS_MARIAGE`, `ANNEE_MARIAGE` — marriage date (split fields)
- `NOM_EPOUX` / `NOM_EPOUSE` — groom/bride surnames (uppercase, `latin1_general_ci`)
- `PRENOM_EPOUX` / `PRENOM_EPOUSE` — given names
- `AGE_EPOUX` / `AGE_EPOUSE` — age at marriage
- `NOM_PERE_EPOUX`, `NOM_MERE_EPOUX` etc. — parents of each spouse
- `VEUF_EPOUX` / `VEUVE_EPOUSE` — widower/widow status
- `LIEU_ACTE` — location of the act
- `TYPE_ACTE` — type of record
- `CODE_INSEE` — INSEE code linking to `ville`
**`ville`** — Reference table of all French towns. Columns: `NOM_MAJ` (uppercase name), `CODE_INSEE` (bigint), `LATITUDE`, `LONGITUDE`.
**`utilisateur`** — Application users (genealogy researchers). Login format: `CGL[A-Z][nnn]` for members, `Gestionnaire` for admins. Stores MD5-hashed passwords (`mdp`), group (1=admin, 2=member), validity date, last IP.
**`departements`** — French departments from INSEE. Includes `CODE`, `NCC` (uppercase name), `CHEFLIEU` (INSEE chef-lieu code), `PRESENT` flag (whether the department's records are in the database).
**`MISEAJOUR`** — Tracks batch imports: `DATE` of import, `MIN`/`MAX` range of `CLE_MARIAGE` values added in that batch.
**`cgl_34_stats_req`** — Search query audit log. Columns: `DATE_REQ`, `REMOTE_ADDR`, `HTTP_USER_AGENT`, `TYPE_RECHERCHE` (search type), `NOM_EPOUX`/`NOM_EPOUSE` (search terms), `VARIATION_EPOUX`/`VARIATION_EPOUSE` (phonetic variant flags), `DATE_MIN`/`DATE_MAX` (year range), `DUREE_REQUETE` (query duration in seconds), `USERID`, `VILLE`.
### Supporting / Utility Tables
- `sauvegarde_mariage` — backup copy of `mariage`
- `MARIAGE_CP` — copy/staging table for `mariage`
- `testmariage` — test/scratch table for `mariage`
- `save_ville` / `ville_orig` — backup copies of `ville`
- `stat_ville` — per-town statistics
- `copie_MISEAJOUR` — backup of `MISEAJOUR`
- `menu_user` — application menu entries per user role
- `mois` — month name lookup table (French)
- `mariage-bad` — rejected/bad records
### Drupal CMS Tables (100+ tables, prefix `drupal_`)
The application front-end was Drupal 6. These tables manage the website (nodes, users, blocks, cache, roles, menus, etc.) and are largely independent of the genealogical data. Notable:
- `drupal_users` — Drupal user accounts (separate from `utilisateur`)
- `drupal_content_type_ville` — Drupal content nodes for towns
- `drupal_content_type_soldat` — Drupal content nodes for soldier records
## Character Encoding
Most genealogical tables use `latin1` / `latin1_general_ci`. The `departements` table uses `utf8mb3`. When querying names, case-insensitive collation is already set; accent-sensitive matching may require `COLLATE` overrides.
## Search Logic (from `cgl_34_stats_req` logs)
The application supported two search modes:
- **Recherche Standard** — surname prefix search using `^NAME` pattern (regex anchored at start)
- **Recherche Evoluée** — advanced search with phonetic variant expansion (`VARIATION_EPOUX`/`VARIATION_EPOUSE` flags)
Year range filtering uses `DATE_MIN`/`DATE_MAX` against `ANNEE_MARIAGE`.